首页新闻找找看学习计划

python 爬虫 re 如何抓取 这种html的代码

0
[待解决问题]
            <div style='display:block; width:100%; height:475.066px;'>
            <table width='100%' border='0' align='center' cellpadding='0' cellspacing='0'>
              <tr><td>
              <table width='100%' border='0' cellspacing='0' cellpadding='0'>
                  <tr>
                    <td style ='text-align:center;padding-bottom:6px;'><span class='titleC'><b>采购单【09-04 21:00~09-05 21:00】</b></span></td
                  </tr>
                </table>
              </td></tr>
              <tr>
                <td><table width='100%' border='0' cellspacing='0' cellpadding='0'>
                  <tr>
                    <td width='9'></td>
                    <td>
                        <table width='100%' border='0' align='center' cellpadding='0' cellspacing='1' bgcolor='#000000'>
                          <tr style ='font-size:14px; font-weight:700;'>
                            <td width='20%' height='26' bgcolor='#FFFFFF'><div align='center'>供应商</div></td>
                            <td width='5%' height='26' bgcolor='#FFFFFF'><div align='center'>序号</div></td>
                            <td width='20%' height='26' bgcolor='#FFFFFF'><div align='center'>订单号</div></td>
                            <td width='27%' height='26' bgcolor='#FFFFFF'><div align='center'>产品名称</div></td>
                            <td width='16%' height='26' bgcolor='#FFFFFF'><div align='center'>产品规格</div></td>
                            <td width='5%' height='26' bgcolor='#FFFFFF'><div align='center'>数量</div></td>
                            <td width='7%' height='26' bgcolor='#FFFFFF'><div align='center'>进货价</div></td>
                          </tr>
                          
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >供货商一</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >15</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >2300015046608937</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >老豆腐 口味独特 料理易入味</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >2</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1.00</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >供货商一</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >2</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >8600015045631605</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >绢豆腐</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1.60</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >供货商一</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >3</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >8600015045431761</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >白豆干 美味干子</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1.00</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >供货商一</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >6</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >2050015045577381</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >白豆干 美味干子</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >2</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1.00</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >供货商一</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >15</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >2300015046608937</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >花干 秘传工艺 香味十足</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >2</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1.20</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >供货商一</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >7</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >3277015045579973</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >臭大元</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >2</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >0.80</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >蛋类</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >3</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >8600015045431761</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >草鸡蛋</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >8</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >0.90</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >熟食</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >13</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >4000015046808509</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >鱼丸</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >10.00</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >熟食</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >14</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >4068015046908594</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >牛腱子 小腿瓜子肉</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >52.00</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >熟食</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >12</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >3688015046306995</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >牛肉 大腿瓜子肉(整块)</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >40.00</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >水果</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >6</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >2050015045577381</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >精品红富士     艳若胭脂    精挑细选</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >6</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >4.00</div></td>           
            </tr>
            <tr>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >水果</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' >16</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1600015046809003</div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >冬枣</div></td>
                <td height='26' bgcolor='#FFFFFF'><div align='center' ></div></td>                
                <td height='26' bgcolor='#FFFFFF'><div align='center' >1</div></td>        
                <td height='26' bgcolor='#FFFFFF'><div align='center' >14.00</div></td>           
            </tr>  
                        </table>
                    </td>
                    <td width='9' ></td>
                  </tr>
                </table></td>
              </tr>
              <tr><td height='9'></td></tr>  
              <tr>
                <td height='29'>
                <table width='100%' border='0' cellspacing='0' cellpadding='0'>
                  <tr>
                    <td width='15' height='29'></td>
                    <td>
                        第1页/共7页
                    </td>
                    <td width='14'></td>
                  </tr>
                </table></td>
              </tr>
            </table>
            </div>

python 爬虫 re 如何抓取 这种html的表格代码?代码基本都是一样的,但是试了好久就是没办法

试着抓取了下,

pattern = re.compile('采购单(.*?)</b>.*?<tr>.*?height.*?center(.*?)</div>.*?height=(str).*?center(.*?)</div>', re.S)

结果都是把供应商这行表头抓取了,下面的内容却不抓取。

也尝试过用bs4,也是搞不定。

请教高手,正则该如何写?

大叔写博客的主页 大叔写博客 | 菜鸟二级 | 园豆:202
提问于:2017-09-05 20:46
< >
分享
所有回答(1)
0

正则表达式

text = 你的文本

t = re.findall("align='center'\s*>(.*?)<",text,re.S)
for i in t:
    print i

Masako | 园豆:1631 (小虾三级) | 2017-09-06 18:11
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册