UWP 获取网页xml文档出现异常

悬赏园豆：50 [已关闭问题] 关闭于 2016-09-03 16:39

uwp（SDK：14393）中，使用HTTP get请求和XmlReader类获取网页的xml文档时出现“ Cannot open '<!DOCTYPE html>……（后面还有其他已经读取出的xml文档内容），The Uri parameter must be a file system relative or absolute path”。

代码：

get请求：

1         public async static Task<string> SentGet(string url)
2         {
3             HttpClient client = new HttpClient();
4             Uri uri = new Uri(url);
5             HttpResponseMessage msg = await client.GetAsync(uri);
6             return await msg.Content.ReadAsStringAsync();
7         }

xml获取（异常出现在第10行）：

 1         public async static Task<XmlDocument> GetXML(string url)
 2         {
 3             string xml = await BaseService.SentGet(url);
 4             if (xml != null)
 5             {
 6                 XmlDocument doc = new XmlDocument();
 7                 XmlReaderSettings settings = new XmlReaderSettings();
 8                 settings.DtdProcessing = DtdProcessing.Ignore;
 9                 settings.CheckCharacters = false;
10                 XmlReader reader = XmlReader.Create(await BaseService.SentGet(url), settings);
11                 doc.Load(reader);
12                 return doc;
13             }
14             return null;
15         }

其中，url参数为用户输入的网址。

uwp xml

问题补充：

上面使用的命名空间是system.xml,使用Windows.Data.Xml.Dom代之，改变后的xml获取代码部分为：

 1         public async static Task<XmlDocument> GetXML(string url)
 2         {
 3             string xml = await BaseService.SentGetAsync(url);
 4             if (xml != null)
 5             {
 6                 XmlDocument doc = new XmlDocument();
 7                 XmlLoadSettings settings = new XmlLoadSettings();
 8                 settings.ProhibitDtd = false;
 9                 doc.LoadXml(xml, settings);
10                 return doc;
11             }
12             return null;
13         }

这样，会报错“Exception from HRESULT: 0xC00CE56D”。根据此代码查询的结果显示，问题出在xml的开始/结束标记不匹配。可是试了很多网站，都显示此错误。

一只菜鸡 | 初学一级 | 园豆：152
提问于：2016-08-27 19:13

< >

所有回答(1)

你的XML文档是正确的文档么？

顾晓北 | 园豆：10900 (专家六级) | 2016-08-29 09:48

是正确的。初步判定是未闭合的<meta>标签引发了错误。但是现在的很多网站采取的还是这种旧版的（没有关闭标签的）meta。

支持(0) 反对(0) 一只菜鸡 | 园豆：152 (初学一级) | 2016-08-29 21:34

@一只菜鸡: meta，怎么看着像是HTML文档？

支持(0) 反对(0) 顾晓北 | 园豆：10900 (专家六级) | 2016-08-30 08:30

@顾晓北: 对，是HTML文档。现在暂时是引用了第三方的用于HTML解析的命名空间读取了。但是我还是想用原生的xml解析器。

支持(0) 反对(0) 一只菜鸡 | 园豆：152 (初学一级) | 2016-08-31 11:04

@一只菜鸡: 难道你认为HTML就是XML文档么？

支持(0) 反对(0) 顾晓北 | 园豆：10900 (专家六级) | 2016-08-31 11:09

@顾晓北: 那请问一下我这种情况该怎么解决呢？毕竟C#的库里貌似没有专用于HTML文档解析的命名空间。我看了其他一些第三方开发者开发的网站客户端应用的源码，他们的处理方式有的和我一样使用第三方命名空间，有的则干脆租借个用来将原网页的内容转换成xml的web服务。

补充一下：我指的XML是文档格式，HTML也遵循，但是是不严格的遵循，所以才会出问题。

再次感谢回复！

支持(0) 反对(0) 一只菜鸡 | 园豆：152 (初学一级) | 2016-09-02 17:11

@一只菜鸡: XML比较严格，HTML肯定也有分析的类库，可以找找。。。

支持(0) 反对(0) 顾晓北 | 园豆：10900 (专家六级) | 2016-09-02 17:18

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。