C# html抽取所有文本内容

[待解决问题]

只抽取HTML中所有之间的文本！正则该怎么写？谢谢了！！！

.NET技术 C#

On the Way | 初学一级 | 园豆：10
提问于：2010-11-25 11:33

所有回答(3)

(?is)<p[^>]*>(?><p[^>]*>(?<o>)|</p>(?<-o>)|(?:(?!</?p\b).)*)*(?(o)(?!))</p>

测试代码：


public static void Main(string[] args)
        {
            string text="<p>sdfasdfsa</p>sxcvxc<Img src=><p>23424</p>";
            string regex=@"(?is)<p[^>]*>(?><p[^>]*>(?<o>)|</p>(?<-o>)|(?:(?!</?p\b).)*)*(?(o)(?!))</p>";
            GetListByHtml(text, regex);
            Console.ReadKey();
        }

public static void GetListByHtml(string text,string pat)
        {
            System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(pat, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Match m = r.Match(text);
            //int matchCount = 0;
            while (m.Success)
            {
                Console.WriteLine(m.Value);
                m = m.NextMatch();
            }
        }

测试结果：

sdfasdfsa
23424

邀月 | 园豆：25475 (高人七级) | 2010-11-25 13:30

大侠不对..返回的结果是.. 这是抽取herf的例子 string regexs = "href=[\\\"\\\'](http:\\/\\/|\\.\\/|\\/)?\\w+(\\.\\w+)*(\\/\\w+(\\.\\w+)?)*(\\/|\\?\\w*=\\w*(&\\w*=\\w*)*)?[\\\"\\\']";

支持(0) 反对(0) On the Way | 园豆：10 (初学一级) | 2010-11-25 14:16

@On the Way:上面的这个表达式是提取字符串中所有包含“***”的内容，你再替换去头去尾即可得到所有中间的文本内容。

支持(0) 反对(0) 邀月 | 园豆：25475 (高人七级) | 2010-11-25 14:29

返回的不是*** 而是 中间没有内容....

支持(0) 反对(0) On the Way | 园豆：10 (初学一级) | 2010-11-25 14:32

@On the Way:你是否取错了？参看上面添加的示例代码！

支持(0) 反对(0) 邀月 | 园豆：25475 (高人七级) | 2010-11-25 15:13

很强悍

Tester Chen | 园豆：1690 (小虾三级) | 2010-11-25 14:48

正则也能写出来，强啊！

喬喬AI | 园豆：996 (小虾三级) | 2011-10-24 12:30

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。

欢迎，请先 登录 或者 注册 。

C# html抽取所有文本内容

微信扫一扫：分享

欢迎，请先登录或者注册。