c#抓取网站的网站地图内容（小白）

[已解决问题] 解决于 2019-05-23 11:00

c#抓取网站的网站地图内容，比如https://pizzeria-latina.nl/sitemap.xml里面https://pizzeria-latina.nl/sitemap1.xml、https://pizzeria-latina.nl/sitemap2.xml等等里面的内容，求大神告知。

c# 爬虫

天然白 | 初学一级 | 园豆：1
提问于：2019-05-22 16:34

< >

最佳答案

curl -s https://pizzeria-latina.nl/sitemap.xml | cut -d '>' -f 3 | cut -d '<' -f 1

奖励园豆：5

BUTTERAPPLE | 老鸟四级 |园豆：3190 | 2019-05-22 20:28

其他回答(2)

你想多了，爬虫主要是根据页面链接递归并排除来完成搜索“整个”网站的uri。这玩意儿不是谁家都提供的。

花飘水流兮 | 园豆：13775 (专家六级) | 2019-05-22 17:31

XmlDocument doc = new XmlDocument();
doc.Load(sa.Text); //网站地图地址
String relativePath = Server.MapPath(""+sm.Text+""); //保存位置
StreamWriter sw = new StreamWriter(relativePath);
XmlNodeList nodeList = doc.DocumentElement.ChildNodes;
for (int i = 0; i < nodeList.Count; i++)
{
string id = nodeList[i].ChildNodes[0].InnerText;
string strs = "" + id + "";
XmlDocument docc = new XmlDocument();
docc.Load(strs);
XmlNodeList nodeListc = docc.DocumentElement.ChildNodes;
for (int s = 0; s < nodeListc.Count; s++)
{
string ids = nodeListc[s].ChildNodes[0].InnerText;
sw.WriteLine(ids);
}
}
sw.Close();
Response.Write("<script>alert('成功')</script>");

天然白 | 园豆：1 (初学一级) | 2019-05-23 11:00

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。