首页 新闻 会员 周边 捐助

web,网页提取正文

0
悬赏园豆:50 [已关闭问题]

小弟看到一个这样的网址,正文提纯的演示,非常好,提取新闻,博客,论坛(带楼层一起提取),非常准。

http://61.128.196.27/txt/

大家可以看看,有谁做过类似的这样的东西,给小弟说说,一些成功的经验俺想学。

园子里的蛙蛙的正文提纯看过了,只是只对大片文字的新闻比较有效。对论坛不行。

现在贴出一些论坛,兄弟们可以去这个链接上试试效果哦

http://bbs.pcbeta.com/thread-437830-1-1.html
http://bbs.iwindows7.com/thread-7741-1-1.html
http://bbs.vista123.com/thread-161747-1-1.html
http://softbbs.pconline.com.cn/9960486.html
http://bbs.realqwh.cn/read-htm-tid-69019.html
http://bbs.xtbeta.com/read.php?tid=7199
http://bbs.itjmz.com/read.php?tid=61261
http://bbs.windows7en.com/thread-8828-1-1.html
http://bbs.kafan.cn/thread-443359-1-1.html
http://www.in9.cn/read.php?tid=435443
http://bbs.zdnet.com.cn/thread-1152490-1-1.html
http://forum.51nb.com/thread-799684-1-1.html
http://www.coolaler.com/~coolaler/forum/showthread.php?p=2304154
http://softbbs.it168.com/thread-669601-0-1-1.html
http://www.kpfans.com/bbs/thread-443404-1-1.html
http://bbs.jujumao.com/dispbbs.php?boardid=119&replyid=2658388&id=259072&skin=0&page=1&star=1
http://www.xtzj.com/read-htm-tid-379497.html
http://www.mydianping.com/bbsinfo100786-53612.html
http://bbs.bitscn.com/210519
http://www.tomatoll.com/thread-45570-1-1.html
http://bbs.cfan.com.cn/thread-860583-1-1.html
http://baike.360.cn/3232114/22741520.html
http://bbs.win7c.com/thread-2179-1-1.html
http://www.win75.cn/thread-2175-1-1.html
http://win.chinaunix.net/bbs/thread-26215-1-1.html
http://bbs.crsky.com/read.php?tid=1590643
http://www.cnaxh.com/forum/showtopic.aspx?topicid=719&forumpage=1&onlyauthor=1
http://bbs.ws2008.net/showtopic.aspx?forumid=5&topicid=807&go=next
http://wenda.tianya.cn/wenda/thread?tid=6238094d355823a7
http://itbbs.pconline.com.cn/diy/9942640.html
http://bbs.xtghost.com/read.php?tid=4848
http://forum.byr.edu.cn/wForum/disparticle.php?boardName=Windows&ID=89065&start=31&listType=1
http://bbs.51vip.net/dispbbs_4_1928_0_1.html?boardid=4&id=1928&move=next
http://bbs.levelup.cn/showtree.aspx?topicid=768957&postid=15854936
http://bbs.ctocio.com.cn/thread-7843895-1-1.html
http://tianyi.it168.com/thread-555477-1-1.html
http://www.luobo.cc/read.php?tid=4875266%27
http://bbs.cngho.com/viewthread.php?tid=30043
http://bbs.mspil.edu.cn/BBS/dispbbs_107_142696_1.htm
http://51nb.com/forum/thread-797114-1-1.html

微微一记的主页 微微一记 | 初学一级 | 园豆:12
提问于:2009-04-13 13:58
< >
分享
其他回答(2)
0

用webrequest可以做到的..

一挣一闭一天过去了 | 园豆:205 (菜鸟二级) | 2009-04-13 15:10
0

不知道下面的是不是能满足需要

 #region 抓取图片
        try
        {
            string url = TextBox1.Text;
            WebRequest request = WebRequest.Create("http://" + url);
            WebResponse response = request.GetResponse();
            Stream reader = response.GetResponseStream();

            FileStream writer = new FileStream("D:\\logo.gif", FileMode.OpenOrCreate, FileAccess.Write);
            byte[] buff = new byte[512];
            int c = 0; //实际读取的字节数
            while ((c = reader.Read(buff, 0, buff.Length)) > 0)
            {
                writer.Write(buff, 0, c);
            }
            writer.Close();
            writer.Dispose();

            reader.Close();
            reader.Dispose();
            response.Close();

            tb.Text = "保存成功!";
        }
        catch (Exception ex)
        {
            tb.Text = ex.Message;
        }
        #endregion
   

          #region 抓取文字
         try
         {
             string url = TextBox1.Text;
             WebRequest request = WebRequest.Create("http://"+url);
             WebResponse response = request.GetResponse();
             StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("gb2312"));

             tb.Text = reader.ReadToEnd();

             reader.Close();
             reader.Dispose();
             response.Close();
         }
         catch (Exception ex)
         {
             tb.Text = ex.Message;
         }
         #endregion

随心飘 | 园豆:250 (菜鸟二级) | 2009-04-20 08:46
0

http://www.weixinxi.wang/open/extract.html 这个提取效果不错

豆豆De巴比 | 园豆:202 (菜鸟二级) | 2016-03-22 22:05
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册