1 import re 2 #txt='there are a big apple and a big pear and three big peachs apple' 3 #reg=r'(big pear|big apple|big peachs)' 4 txt='<img src="http://www.comicyu.com/Skin/Default/Item/newartick/imager/comicweixin.png" title="\u6f2b\u57df\u7f51\u5b98\u65b9\u5fae\u4fe1" alt="\u6f2b\u57df\u7f51\u5b98\u65b9\u5fae\u4fe1" /><img src="http://www.comicyu.com/Skin/Default/Item/newartick/imager/weixin.jpg">' 5 reg=r'src="(.+?jpg|.+?png)"' 6 reg1=r'src="(.+png|.+jpg)"' 7 a= re.compile(reg) 8 a1=re.compile(reg1) 9 lisg=re.findall(a,txt) 10 lisg1=re.findall(a1,txt) 11 print lisg 12 print lisg1
结果是不同的:
['http://www.comicyu.com/Skin/Default/Item/newartick/imager/comicweixin.png" title="\\u6f2b\\u57df\\u7f51\\u5b98\\u65b9\\u5fae\\u4fe1" alt="\\u6f2b\\u57df\\u7f51\\u5b98\\u65b9\\u5fae\\u4fe1" /><img src="http://www.comicyu.com/Skin/Default/Item/newartick/imager/weixin.jpg']
['http://www.comicyu.com/Skin/Default/Item/newartick/imager/comicweixin.png', 'http://www.comicyu.com/Skin/Default/Item/newartick/imager/weixin.jpg']
而理想爬虫时,希望最终结果都是第二个显示的样子,那么应该如何做呢