首页新闻找找看学习计划

正则表达式筛选html代码 ?

0
悬赏园豆:20 [待解决问题]
<pre>&lt;div&gt;sadfasdfasd&lt;/div&gt;</pre> <pre>&lt;div class="video_1_left"&gt; </pre> <pre> &lt;UL&gt;</pre> <pre> &lt;li&gt;&lt;a href="/news/12718.html" target="_blank"&gt;标题sadfasdfasdfasdf&lt;/a&gt;&lt;/li&gt;</pre> <pre>&lt;li&gt;&lt;a href="/news/12710.html" target="_blank"&gt;标题asdfasdfasdf&lt;/a&gt;&lt;/li&gt;</pre> <pre>&lt;li&gt;&lt;a href="/news/12729.html" target="_blank"&gt;v2sdfasdf&lt;/a&gt;&lt;/li&gt;</pre> <pre>&lt;li&gt;&lt;a href="/news/12728.html" target="_blank"&gt;标题sdfsadf&lt;/a&gt;&lt;/li&gt; </pre> <pre> &lt;/UL&gt; </pre> <pre>&lt;/div&gt; </pre> <pre></pre> <pre>&lt;div class="video_1_right"&gt; </pre> <pre> &lt;UL&gt; </pre> <pre>&lt;li&gt;&lt;a href="/news/12705.html" target="_blank"&gt;标题xxxfasdfasdfx&lt;/a&gt;&lt;/li&gt;</pre> <pre>&lt;li&gt;&lt;a href="/news/12737.html" target="_blank"&gt;标题xxxdfasdfasax&lt;/a&gt;&lt;/li&gt; </pre> <pre>&lt;/UL&gt; </pre> <pre>&lt;/div&gt;</pre> <pre>&lt;div&gt;sadfasdfasd&lt;/div&gt;</pre> <pre>我只要&lt;li&gt;中href属性跟&lt;li&gt;要显示的那些文字</pre> <pre>举个例子:&lt;li&gt;&lt;a href="/news/12737.html" target="_blank"&gt;标题xxxdfasdfasax&lt;/a&gt;&lt;/li&gt;</pre> <pre>我只要"news/12737.html&ldquo;跟标题xxxdfasdfasax</pre>
问题补充: <div>sadfasdfasd</div> <div class="video_1_left"> <UL> <li><a href="/news/12718.html" target="_blank">标题sadfasdfasdfasdf</a></li> <li><a href="/news/12710.html" target="_blank">标题asdfasdfasdf</a></li> <li><a href="/news/12729.html" target="_blank">v2sdfasdf</a></li> <li><a href="/news/12728.html" target="_blank">标题sdfsadf</a></li> </UL> </div> <div class="video_1_right"> <UL> <li><a href="/news/12705.html" target="_blank">标题xxxfasdfasdfx</a></li> <li><a href="/news/12737.html" target="_blank">标题xxxdfasdfasax</a></li> </UL> </div> <div>sadfasdfasd</div>
吊炸天的阿旺的主页 吊炸天的阿旺 | 初学一级 | 园豆:135
提问于:2011-02-17 16:10
< >
分享
所有回答(3)
0

Regex reg = new Regex(@"<li><a href=""(?<url>[^""]*)"" target=""_blank"">(?<title>[^<]*)</a></li>");
string html = "<div>sadfasdfasd</div> <div class=\"video_1_left\"> <UL> <li><a href=\"/news/12718.html\" target=\"_blank\">标题sadfasdfasdfasdf</a></li> <li><a href=\"/news/12710.html\" target=\"_blank\">标题asdfasdfasdf</a></li> <li><a href=\"/news/12729.html\" target=\"_blank\">v2sdfasdf</a></li> <li><a href=\"/news/12728.html\" target=\"_blank\">标题sdfsadf</a></li> </UL> </div> <div class=\"video_1_right\"> <UL> <li><a href=\"/news/12705.html\" target=\"_blank\">标题xxxfasdfasdfx</a></li> <li><a href=\"/news/12737.html\" target=\"_blank\">标题xxxdfasdfasax</a></li> </UL> </div> <div>sadfasdfasd</div> ";
foreach (Match m in reg.Matches(html))
{
Console.WriteLine(m.Groups[
"url"].Value + "\t" + m.Groups["title"].Value);
}

 

artwl | 园豆:16526 (专家六级) | 2011-02-17 16:50
0

http://www.cnblogs.com/xingshao/archive/2009/10/27/1590806.html

类似的问题、通用的解决思路。上面的例子是一篇页面截取天气数据的例子。更改一下正则。就可以分析其他类型的数据。

邢少 | 园豆:10922 (专家六级) | 2011-02-18 10:56
0

如果li和a之间有\r\n这样的换行符,你这样就取不到值

wavegui | 园豆:80 (初学一级) | 2012-10-30 14:04
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册