Regex reg = new Regex(@"<li><a href=""(?<url>[^""]*)"" target=""_blank"">(?<title>[^<]*)</a></li>");
string html = "<div>sadfasdfasd</div> <div class=\"video_1_left\"> <UL> <li><a href=\"/news/12718.html\" target=\"_blank\">标题sadfasdfasdfasdf</a></li> <li><a href=\"/news/12710.html\" target=\"_blank\">标题asdfasdfasdf</a></li> <li><a href=\"/news/12729.html\" target=\"_blank\">v2sdfasdf</a></li> <li><a href=\"/news/12728.html\" target=\"_blank\">标题sdfsadf</a></li> </UL> </div> <div class=\"video_1_right\"> <UL> <li><a href=\"/news/12705.html\" target=\"_blank\">标题xxxfasdfasdfx</a></li> <li><a href=\"/news/12737.html\" target=\"_blank\">标题xxxdfasdfasax</a></li> </UL> </div> <div>sadfasdfasd</div> ";
foreach (Match m in reg.Matches(html))
{
Console.WriteLine(m.Groups["url"].Value + "\t" + m.Groups["title"].Value);
}
http://www.cnblogs.com/xingshao/archive/2009/10/27/1590806.html
类似的问题、通用的解决思路。上面的例子是一篇页面截取天气数据的例子。更改一下正则。就可以分析其他类型的数据。
如果li和a之间有\r\n这样的换行符,你这样就取不到值