该错误提示背景:想获取某一IP的80端口的网页的信息。该网页的源文件如下:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style>
<!--{ font-family: Arial; font-size: 12pt }
-->
</style>
<title>Login</title>
</head>
<!-- Z01 Copyright 2002 Metavert Corporation www.metavert.com -->
<body bgcolor="#0000FF" text="#FFFFFF" LANGUAGE="javascript"
onload="return window_onload()">
<p align="right"> </p>
<form method="POST" name="thisForm" action="../Setup.htm">
<div align="center"><center><h3>Controller Status</h3>
</center></div><div align="center"><center><table border="1" cellspacing="0" width="397">
<tr>
<td width="215">System time elapsed</td>
<td width="174"><input type="text" readonly="true" name="ctElapse" size="10"></td>
</tr>
<tr>
<td width="215">Firmware release date</td>
<td width="174"><input type="text" readonly="true" name="ctVersion" size="17"></td>
</tr>
<tr>
<td width="215">Serial Number</td>
<td width="174"><input type="text" readonly="true" name="ctMAC" size="18"></td>
</tr>
</table>
</center></div><div align="center"><center><h3>Setup Login</h3>
</center></div><div align="center"><center><table border="0" cellpadding="0"
cellspacing="0" width="259">
<tr>
<td width="75">Password</td>
<td width="184"><input type="password" name="ctPassword" size="16" tabindex="1"> </td>
</tr>
</table>
</center></div><div align="center"><center><p><input type="submit" value="Login"
name="ctLogin"></p>
</center></div>
</form>
</body>
</html>
<script ID="clientEventHandlersJS" LANGUAGE="javascript">
<!--
function set(sField,sValue)
{ document.thisForm[sField].value=sValue;
}
function window_onload() {
document.thisForm.ctPassword.focus();
set("ctElapse","00:56:04");
set("ctVersion","Dec 13 2006 16:56");
set("ctMAC","Z4L-0531-3CD0772C");
}
//-->
</script>
<!-- Memory 1670584 -->
获取该网页的源代码如下:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.Net;
using System.Text.RegularExpressions;
namespace WindowsApplication2
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
textBox1.Text = getHtml("http://www.baidu.com","");
}
private string getHtml(string url, string charSet)
{
WebClient myWebClient = new WebClient();
myWebClient.Credentials = CredentialCache.DefaultCredentials;
byte[] myDataBuffer = myWebClient.DownloadData(url);
string strWebData = Encoding.Default.GetString(myDataBuffer);
Match charSetMatch = Regex.Match(strWebData, "<meta([^<]*)charset=([^<]*)\"", RegexOptions.IgnoreCase | RegexOptions.Multiline);
string webCharSet = charSetMatch.Groups[2].Value;
if (charSet == null || charSet == "")
charSet = webCharSet;
if (charSet != null && charSet != "" && Encoding.GetEncoding(charSet) != Encoding.Default)
strWebData = Encoding.GetEncoding(charSet).GetString(myDataBuffer);
return strWebData;
}
}
}
再获取百度等网页源代码时无错误产生,但是在获取上述的IP地址的80端口的网页时出现上述错误,请各位高手赐教
你的问题的原因是这样的,ASP.Net 2.0 增强了安全性,对一些有危害的http 头进行了判断,比如url中有空格的情况,以帮助网站提高网络攻击的防御能力。如果你的http头中有一些ASP.NET 认为是有危害的信息,则会返回这个错误。你访问百度没有问题,是因为百度根本就不是asp.net 做的。
这个问题的解决办法是在你的 web.config 中添加如下配置:
<configuration>
<system.net>
<settings>
<httpWebRequest useUnsafeHeaderParsing="true" />
</settings>
</system.net>
</configuration>
详细解答请参见 微软在线技术支持的如下回答:
http://www.velocityreviews.com/forums/t302174-why-do-i-get-quotthe-server-committed-a-protocol-violationquot.html
xuexi
有可能是对方网站会判断你的webclient是否是抓取程序,你可以加上UserAgent HttpHeader试一下
myWebClient.Headers.Set(System.Net.HttpRequestHeader.UserAgent, " Mozilla/5.0 (Windows; U; Windows NT 5.2; zh-CN; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3");