文件操作HTML文档

悬赏园豆：50 [待解决问题]

1.扫描文件夹里面的HTML网页文档；

2.获取HTML网页文档里面的一个层，并把内容提取出来；

3.根据数据库里面的关键字，扫描HTML文档提取出来的内容，匹配该关键字的文字加上A标签；

我是新手，还望各位前辈指点一下；上面的功能，谢谢....

.NET技术 C# ASP.NET

向磊 | 初学一级 | 园豆：10
提问于：2012-06-13 12:56

< >

所有回答(1)

把它当做文本处理读取里面的内容，再查找，匹配就好了。

可以参考这儿：http://www.cnblogs.com/xiaoyao2011/archive/2011/09/29/2195197.html

http://www.cnblogs.com/xybs/archive/2012/05/12/2497325.html

悟行 | 园豆：12559 (专家六级) | 2012-06-13 13:34

没有更改成功

try
{
int totalFile = 0;
if (this.txtHTML.Text.Trim() == "")
{
MessageBox.Show("请输入HTML文件路径！");
}
else
{
string dirPath = this.txtHTML.Text.Trim();
if (!dirPath.Substring(dirPath.Length - 1).Contains("\\"))
{
dirPath = dirPath + "\\";
}
DirectoryInfo dirInfo = new DirectoryInfo(dirPath);
FileInfo[] files = dirInfo.GetFiles(); ;
int i = 0;

foreach (FileInfo fileinfo in files)
{
if (fileinfo.Extension.Equals(".htm"))//遍历所有htm文件
{
totalFile = totalFile + 1;
WebRequest myWebRequest = WebRequest.Create(dirPath + fileinfo.Name);
WebResponse myWebResponse = myWebRequest.GetResponse();
Stream myStream = myWebResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("gb2312");
StreamReader myStreamReader = new StreamReader(myStream, encode);
string strhtml = myStreamReader.ReadToEnd();//获取HTML文档里面的值
//int num = strhtml.IndexOf("心包炎");
string str = string.Format("<a href='#'>{0}</a>", "心包炎");
string strNew = strhtml.Replace("心包炎",str);//html文档值里面有心包炎的替换成 str

myWebResponse.Close();
string stroutput = strhtml;

SaveFileDialog sfd = new SaveFileDialog();
sfd.Title = "保存";
sfd.Filter = "htm file(*.htm)|*.htm";
sfd.ShowDialog();
string path = sfd.FileName;
FileInfo file = new FileInfo(path);
if (file.Exists)
{
file.Delete();
}

FileStream fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write);
StreamWriter sw = new StreamWriter(fs, Encoding.UTF8);
sw.Write(strNew);
sw.Flush();
sw.Close();
fs.Close();
}

}
}

}
catch (Exception ee) { MessageBox.Show("操作失败：" + ee.Message); }

支持(0) 反对(0) 向磊 | 园豆：10 (初学一级) | 2012-06-13 15:38

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。

欢迎，请先 登录 或者 注册 。

文件操作HTML文档

欢迎，请先登录或者注册。