我想从如下字符串中截取出图片的url,使用正则表达式应该怎么写?
01.<div class="photos-container"> 02. <div class="photos"> 03. <a target="_blank" href="/shop/18211174/photos"> 04. <img style="width: 240px; height: 180px;" class="auto" itemprop="photo" src="http://qcloud.dpfile.com/pc/6rK6z9s3bORn1GxbJMgMiQGmS6TLvf_rBk_HtsgTKreEHNko21uzmPrb649XPQOhCjM_FsO3sW809PHY7spB8g.jpg" title="鲜芋仙的图片" alt="鲜芋仙的图片"> 05. </a>
下面是我写的:
pattern = re.compile('<div class="photos".*?<a target="_blank".*?<img.*?src="(.*?)".*?</a>',re.S)
这样的话每次只能获取一个字符:h
测试发现正常。
1 >>> pattern.findall(txt) 2
['http://qcloud.dpfile.com/pc/6rK6z9s3bORn1GxbJMgMiQGmS6TLvf_rBk_HtsgTKreEHNko21uzmPrb649XPQOhCjM_FsO3sW809PHY7spB8g.jpg']
把你上段5行代码作为一个Str字符串来传入
string result = string.Empty;
Regex rgx = new Regex("\"(http://(a-z|A-Z|0-9|.|/|_)+)\"\\stitle");
Match m = rgx.Match(str);
if (m.Success)
{ return m.Groups[1].ToString(); }
else { return ""; }
测试过啦 木有问题
#!/usr/bin/python #coding=UTF-8 import re; strHtml = '<div class="photos-container">\ <div class="photos">\ <a target="_blank" href="https://www.baidu.com">\ <img style="width: 240px; height: 180px;" class="auto" itemprop="photo" src="http://qcloud.dpfile.com/pc/6rK6z9s3bORn1GxbJMgMiQGmS6TLvf_rBk_HtsgTKreEHNko21uzmPrb649XPQOhCjM_FsO3sW809PHY7spB8g.jpg" tit le="鲜芋仙的图片" alt="鲜芋仙的图片">\ </a>'; listMatch_1 = re.findall('(?:src|href)="(https?[:/a-z.0-9_]+)"', strHtml, re.I); print listMatch_1; listMatch_2 = re.findall('src="(https?[:/a-z.0-9_]+)"', strHtml, re.I); print listMatch_2;
结果:
['https://www.baidu.com', 'http://qcloud.dpfile.com/pc/6rK6z9s3bORn1GxbJMgMiQGmS6TLvf_rBk_HtsgTKreEHNko21uzmPrb649XPQOhCjM_FsO3sW809PHY7spB8g.jpg']
['http://qcloud.dpfile.com/pc/6rK6z9s3bORn1GxbJMgMiQGmS6TLvf_rBk_HtsgTKreEHNko21uzmPrb649XPQOhCjM_FsO3sW809PHY7spB8g.jpg']