我使用以下代码:
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.joyme.com/news/hotpics/201501/2968957.html");
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();
StreamReader streamReader = new StreamReader(responseStream, System.Text.Encoding.GetEncoding("utf-8"));
string html = streamReader.ReadToEnd();
最终html 这个值是乱码,如果我换成其他网页就没问题,不知哪位能帮忙看下?谢谢
你是来黑马云多么?
抓个包就行了啊。
请求头:
Accept: text/html, application/xhtml+xml, */*
Accept-Language: zh-CN
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Accept-Encoding: gzip, deflate // 这个设置不设置都不影响响应的内容是否压缩
Host: www.joyme.com
DNT: 1
Connection: Keep-Alive
Pragma: no-cache
响应头:
Server: nginx/1.6.2
Date: Fri, 30 Jan 2015 02:48:30 GMT
Content-Type: text/html;charset=utf-8 // 这里告诉你是 utf-8,我愿意相信服务器讲了真话。
Content-Length: 6339
Connection: keep-alive
Content-Encoding: gzip // 告诉你 Content 使用了 gzip 压缩
因此,你拿到 ResponseStream 后,先用 GZipStream 解压缩,然后在用 utf-8 编码来解析成字符串。
if (response.ContentEncoding.ToUpper() == "GZIP")
{
using (GZipStream gzipStream = new GZipStream(networkStream, CompressionMode.Decompress))
{
StreamReader streamReader = new StreamReader(gzipStream, System.Text.Encoding.GetEncoding("utf-8"));
}
}
多谢,我在这方面初学,给大家添麻烦了
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.joyme.com/news/hotpics/201501/2968957.html");
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();
StreamReader streamReader = new StreamReader(responseStream, System.Text.Encoding.GetEncoding("utf-8"));
string html = streamReader.ReadToEnd();
你确定这个网页是UTF-8编码的?
编码,gzip压缩等都有可能引起这样的结果。
我看其他的工具倒是可以,比较火车头
@tonyhangzhou: 那种成熟的工具一般都考虑这些,肯定会做一定的处理。