c#采集百度贴吧代码

悬赏园豆：5 [待解决问题]

用c#写了个采集网页html代码的方法，试了几个网站，还行，但是就百度贴吧的代码采集不了，求解救啊!!!!.....

public static string PostAndGetHTML(string targetURL)
        {
            if (targetURL.IndexOf("http://") < 0)
                targetURL = "http://" + targetURL;
            WebClient MyWebClient = new WebClient();
            string pageHtml="";
            try
            {
                MyWebClient.Credentials = CredentialCache.DefaultCredentials;//获取或设置用于向Internet资源的请求进行身份验证的网络凭据
                Byte[] pageData = MyWebClient.DownloadData(targetURL); //从指定网站下载数据
                if (targetURL.IndexOf("66") > 0 || targetURL.IndexOf("tieba") > 0 || targetURL.IndexOf("rugao") > 0 || targetURL.IndexOf("xici") > 0 || targetURL.IndexOf("bbs") > 0)
                    pageHtml = Encoding.Default.GetString(pageData); //如果获取网站页面采用的是GB2312，则使用这句
                else
                    pageHtml = Encoding.UTF8.GetString(pageData); //如果获取网站页面采用的是UTF-8，则使用这句
            }
            catch(Exception ex)
            {
                MessageBox.Show(ex.Message.ToString());
            }
            return pageHtml;
        }

c# 百度贴吧网页采集

code_play | 初学一级 | 园豆：197
提问于：2013-03-16 12:02

< >

所有回答(1)

有一个开源的组件你去Codeplex上面找找HtmlAgilityPack，很方便的使用Xpath进行匹配的，有兴趣的看看

LastPc | 园豆：225 (菜鸟二级) | 2013-03-19 09:54

(*^__^*) 嘻嘻……，我先去试试

支持(0) 反对(0) code_play | 园豆：197 (初学一级) | 2013-03-22 22:06

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。