小弟看到一个这样的网址,正文提纯的演示,非常好,提取新闻,博客,论坛(带楼层一起提取),非常准。
大家可以看看,有谁做过类似的这样的东西,给小弟说说,一些成功的经验俺想学。
园子里的蛙蛙的正文提纯看过了,只是只对大片文字的新闻比较有效。对论坛不行。
现在贴出一些论坛,兄弟们可以去这个链接上试试效果哦
http://bbs.pcbeta.com/thread-437830-1-1.html |
http://bbs.iwindows7.com/thread-7741-1-1.html |
http://bbs.vista123.com/thread-161747-1-1.html |
http://softbbs.pconline.com.cn/9960486.html |
http://bbs.realqwh.cn/read-htm-tid-69019.html |
http://bbs.xtbeta.com/read.php?tid=7199 |
http://bbs.itjmz.com/read.php?tid=61261 |
http://bbs.windows7en.com/thread-8828-1-1.html |
http://bbs.kafan.cn/thread-443359-1-1.html |
http://www.in9.cn/read.php?tid=435443 |
http://bbs.zdnet.com.cn/thread-1152490-1-1.html |
http://forum.51nb.com/thread-799684-1-1.html |
http://www.coolaler.com/~coolaler/forum/showthread.php?p=2304154 |
http://softbbs.it168.com/thread-669601-0-1-1.html |
http://www.kpfans.com/bbs/thread-443404-1-1.html |
http://bbs.jujumao.com/dispbbs.php?boardid=119&replyid=2658388&id=259072&skin=0&page=1&star=1 |
http://www.xtzj.com/read-htm-tid-379497.html |
http://www.mydianping.com/bbsinfo100786-53612.html |
http://bbs.bitscn.com/210519 |
http://www.tomatoll.com/thread-45570-1-1.html |
http://bbs.cfan.com.cn/thread-860583-1-1.html |
http://baike.360.cn/3232114/22741520.html |
http://bbs.win7c.com/thread-2179-1-1.html |
http://www.win75.cn/thread-2175-1-1.html |
http://win.chinaunix.net/bbs/thread-26215-1-1.html |
http://bbs.crsky.com/read.php?tid=1590643 |
http://www.cnaxh.com/forum/showtopic.aspx?topicid=719&forumpage=1&onlyauthor=1 |
http://bbs.ws2008.net/showtopic.aspx?forumid=5&topicid=807&go=next |
http://wenda.tianya.cn/wenda/thread?tid=6238094d355823a7 |
http://itbbs.pconline.com.cn/diy/9942640.html |
http://bbs.xtghost.com/read.php?tid=4848 |
http://forum.byr.edu.cn/wForum/disparticle.php?boardName=Windows&ID=89065&start=31&listType=1 |
http://bbs.51vip.net/dispbbs_4_1928_0_1.html?boardid=4&id=1928&move=next |
http://bbs.levelup.cn/showtree.aspx?topicid=768957&postid=15854936 |
http://bbs.ctocio.com.cn/thread-7843895-1-1.html |
http://tianyi.it168.com/thread-555477-1-1.html |
http://www.luobo.cc/read.php?tid=4875266%27 |
http://bbs.cngho.com/viewthread.php?tid=30043 |
http://bbs.mspil.edu.cn/BBS/dispbbs_107_142696_1.htm |
http://51nb.com/forum/thread-797114-1-1.html |
用webrequest可以做到的..
不知道下面的是不是能满足需要
#region 抓取图片
try
{
string url = TextBox1.Text;
WebRequest request = WebRequest.Create("http://" + url);
WebResponse response = request.GetResponse();
Stream reader = response.GetResponseStream();
FileStream writer = new FileStream("D:\\logo.gif", FileMode.OpenOrCreate, FileAccess.Write);
byte[] buff = new byte[512];
int c = 0; //实际读取的字节数
while ((c = reader.Read(buff, 0, buff.Length)) > 0)
{
writer.Write(buff, 0, c);
}
writer.Close();
writer.Dispose();
reader.Close();
reader.Dispose();
response.Close();
tb.Text = "保存成功!";
}
catch (Exception ex)
{
tb.Text = ex.Message;
}
#endregion
#region 抓取文字
try
{
string url = TextBox1.Text;
WebRequest request = WebRequest.Create("http://"+url);
WebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("gb2312"));
tb.Text = reader.ReadToEnd();
reader.Close();
reader.Dispose();
response.Close();
}
catch (Exception ex)
{
tb.Text = ex.Message;
}
#endregion
http://www.weixinxi.wang/open/extract.html 这个提取效果不错