首页 新闻 赞助 找找看

请问.net怎么可以在下载页面时,判断编码方式后再下载?如果得不到编码方式的话,网页经常为乱码

0
悬赏园豆:15 [已解决问题] 解决于 2013-01-28 11:45

我已经用过很多种方式来判断了,但效果都不怎么样?在未知网页的编码方式时,请问怎么才可以正常下载到网页,而不是乱码呢?请大家指教下

        #region【获取网页HTML文本】
        /// <summary>
        /// 获取url网页的HTML文本信息
        /// </summary>
        /// <param name="url">网页URL</param>
        /// <param name="codeType">编码方式</param>
        /// <returns>返回HTML文本字符串</returns>
        public static String GetResponseText(string url, string codeType)
        {
            string responseFromServer = null;
            Stream dataStream = null;
            StreamReader reader = null;
            try
            {
                WebRequest request = WebRequest.Create(url);
                request.Credentials = CredentialCache.DefaultCredentials;
                HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                if (response.StatusDescription == "OK")
                {
                    try
                    {
                        dataStream = response.GetResponseStream();
                        reader = new StreamReader(dataStream, GetPageEncoding(url));
                        responseFromServer = reader.ReadToEnd();
                        Regex rex = new Regex(@"(?<=charset\s*=\s*)[^""]*?(?="")", RegexOptions.IgnoreCase);
                        string charset = rex.Match(responseFromServer, 0).Value;
                        if (!charset.Equals("utf-8")) //如果编码方式不是utf-8的话,则重新用默认方式下载网页
                        {
                            reader = new StreamReader(dataStream, Encoding.Default);
                            responseFromServer = reader.ReadToEnd();
                        }
                    }
                    finally
                    {
                        reader.Close();
                        dataStream.Close();
                    }
                }
                response.Close();
                return responseFromServer;
            }
            catch (Exception ex)
            {
                return ex.Message;
            }

        }
        #endregion
诺ヾ誩.的主页 诺ヾ誩. | 初学一级 | 园豆:8
提问于:2012-10-30 20:47
< >
分享
最佳答案
0

public void NewCreateStaticPage(string url, string urlHTML)
    {
        System.Net.WebClient wc = new System.Net.WebClient();
        wc.Credentials = System.Net.CredentialCache.DefaultCredentials;
        byte[] buffer = wc.DownloadData(url);
        string file = System.Text.Encoding.Default.GetString(buffer);
        string pathHTML = HttpContext.Current.Server.MapPath(urlHTML);
        using (FileStream fs = new FileStream(pathHTML, FileMode.Create))
        {
            StreamWriter sw = new StreamWriter(fs, UnicodeEncoding.UTF8);
            sw.Write(file);
            fs.Dispose();
            fs.Close();
        }
        HttpContext.Current.Response.Redirect(urlHTML);
    }

试试这段代码,先变成byte类型,然后再写入HTML文件。

收获园豆:15
假扮天使 | 初学一级 |园豆:30 | 2012-11-01 10:07
其他回答(1)
0
邀月 | 园豆:25475 (高人七级) | 2012-10-30 20:51

不错。

支持(0) 反对(0) chenping2008 | 园豆:9836 (大侠五级) | 2012-10-31 09:50
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册