样例字符串:
.....
<div>等你等等你</div>
<h2>xxx</h2>
<ul class="mlist">
<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>
<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>
<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>
<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>
</ul>
<div class="answercount">
<div class="diggit">
<div class="diggnum unanswered">0</div>
<div class="diggword">回答数</div>
</div>
<div class="clear">
</div>
</div>
我要的数据是ul class="mlist" 下面的li的集合,如:
<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>
<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>
<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>
<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>`
<ul[^>]+>\s+(<li>.+<\/li>\s+)+<\/ul>
我觉得用一些语法类似jQuery的开源组件会更方便点
<ul[^>]*>\s*(?:<li[^>]*>(?:[^<]+<[^li])+li>\s*)+<\/ul>
@并排逗比北边跑: 这玩意我是一点也不会啊,这样取出来怎么在取里面的li的集合啊
@黑山妖: https://stackoverflow.com/questions/20965477/how-can-i-extract-certain-html-tags-e-g-ul-using-regex-with-preg-match-all-in
楼上正解。为了预防页面中有其他<li>,只能先连<ul>取出来,第二次在通过<li>.+<\/li>匹配