1 <tr onmouseover="this.classname='tr3 t_two'" onmouseout="this.classname='tr3 t_one'" class="tr3 t_one" align=center><td><a title=打開新窗口 href="htm_data/2/1502/1388768.html" target=_blank>.::</a></td> 2 <td style="text-align: left; padding-left: 8px"> 3 <h3><a href="标注1" target=_blank>标注2</a></h3> <i style="color: red">new</i> </td> 4 <td class="tal y-style"><a class=bl href="profile.php?action=show&uid=85521">标注3</a> 5 <div class=f10>标注4</div></td> 6 <td class="tal f10 y-style">2</td> 7 <td class="tal y-style"><a class=f10 href="read.php?tid=1388768&page=e&fpage=1#a">2015-02-21 00:53 </a><br>by: 一日清欢</td></tr>
HTML格式如上所诉,现在希望获取到代码中标注1、标注2、标注3、标注4这四个字符串,正则要怎么写呢?
string str12="<a href=\"标注1\" target=_blank>标注2</a>"; string pattern12 = "<a\\shref=\"([^\"]*)\"\\starget=_blank>(.*?)</a>"; var m = Regex.Match(str12, pattern12, RegexOptions.Singleline | RegexOptions.IgnoreCase); string bs1 = m.Groups[1].Value; string bs2 = m.Groups[2].Value; string str3 = "<a class=bl href=\"profile.php?action=show&uid=85521\">标注3</a>"; string pattern3 = "<a\\sclass=bl[^<]*>(.*?)</a>"; m = Regex.Match(str3, pattern3, RegexOptions.Singleline | RegexOptions.IgnoreCase); string bs3 = m.Groups[1].Value; string str4 = "<div class=f10>标注4</div>"; string pattern4 = "<div\\sclass=f10[^<]*>(.*?)</div>"; m = Regex.Match(str4, pattern4, RegexOptions.Singleline | RegexOptions.IgnoreCase); string bs4 = m.Groups[1].Value;
具体细化,你可以抽象一下
\d+.*\>\d+\<
自己捕捉一下值,在手机上,没有调试
web的话用JS来获取,WIN的话用XmlReader来获取正确率会更高