首页新闻招聘找找看知识库

求一个正则表达式取ul下面li的集合

0
悬赏园豆:5 [待解决问题]

样例字符串:

.....

<div>等你等等你</div>

<h2>xxx</h2>

<ul class="mlist">

<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>

<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>

<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>

<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>

</ul>

<div class="answercount">

        <div class="diggit">

            <div class="diggnum unanswered">0</div>

            <div class="diggword">回答数</div>

        </div>

        <div class="clear">

        </div>

    </div>

 

我要的数据是ul class="mlist" 下面的li的集合,如:

<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>

<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>

<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>

<li>alsjdflasjlfdjlasdfjjsadfjalsfj<span>张三</span><a href="xxxxxx"></a></li>`

黑山妖的主页 黑山妖 | 初学一级 | 园豆:3
提问于:2017-11-14 17:28
< >
分享
所有回答(2)
1

<ul[^>]+>\s+(<li>.+<\/li>\s+)+<\/ul>

并排逗比北边跑 | 园豆:10 (初学一级) | 2017-11-14 18:02

用这个试了试,我这个取不出来啊,我要的是取下面这个html 里面ul class="mlist" 下的li,上面的那个例子能取出来,下面这个Html的就取不出来。。。。。。。


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<script src="/static/js/base64.js" charset="utf-8" ></script>
<script src="/static/js/ad.js" charset="utf-8" ></script>
<link rel="shortcut icon" href="/static/image/favicon.ico"/>
<link href="/static/css/style.css" type="text/css" rel="stylesheet"/>
<title>毛泽东传</title>
<meta name="keywords" content="毛泽东传"/>
<meta name="description" content="毛泽东传" />
<style>
.divcss7 a:link{ color:#F00}/* 链接默认为红色
</style>
</head>
<body id="list" >
<div id="topbar" class="topbox"><ul>
<li class="on"><a href="/"毛泽东传</a></li>
</ul></div>

<div id="wrapper">
<!-- header start -->
<div id="header">

<h1 id="logo"><a href="/" ><img width="135" height="30" /></a></h1>

<div id="sbox">

<form name="btform" onsubmit="return mysubmit(this);" action="/">
<input type="text" baiduSug="2" autocomplete="off" id="input" name="name" placeholder="毛泽东传" class="stbox" value="毛泽东传" />

 

<input type="submit" onmouseout="this.className=''" onmousedown="this.className='mousedown'" onmouseover="this.className='hover'" value="" id="sbutton"/>

</form>
</div>


</div>
<!-- header end -->

<!-- container start -->
<div id="container">

<div class="leftconbox">

<ul class="sidenav1">
</ul>
</div>
<div class="sele">
<a href="/list/XXXXXXXX/1/time_d" class="desc"">创建时间</a>
<a href="/list/XXXXXXXX/1/size_d" ">文件大小</a>
<a href="/list/XXXXXXXX/1/rala_d" ">相关度</a>
</div>
<div class="main">

<div class="rststat">找到约 34 条记录 </div>

<ul class="mlist">
<li>
<div class="T1">
<a name='file_title' target="_blank" href="xxxxxx">三度赤水河</a>
</div>
<dl class="BotInfo">
<dt>大小:<span> 1.3 M</span> &nbsp;&nbsp;&nbsp;&nbsp;
文件数:<span> 3</span> &nbsp;&nbsp;&nbsp;&nbsp;
创建日期:<span> 2年前 </span> &nbsp;&nbsp;&nbsp;&nbsp;
访问热度:<span> 3 </span>
</dt>
</dl>
<div class="dInfo">
<a href="www.baidu.com">[下载地址] </a>
</div>
</li>

<li>
<div class="T1">
<a name='file_title' target="_blank" href="xxxxxx">三度赤水河</a>
</div>
<dl class="BotInfo">
<dt>大小:<span> 1.3 M</span> &nbsp;&nbsp;&nbsp;&nbsp;
文件数:<span> 3</span> &nbsp;&nbsp;&nbsp;&nbsp;
创建日期:<span> 2年前 </span> &nbsp;&nbsp;&nbsp;&nbsp;
访问热度:<span> 3 </span>
</dt>
</dl>
<div class="dInfo">
<a href="www.baidu.com">[下载地址] </a>
</div>
</li>
</ul>
<div id="mpages">
<div class="pg">
<ul class="pg">
<span class="current">1</span>
<a class="flag_pg" href="/ls/毛泽东/2"> 2 </a>
<a class="flag_pg" href="/ls/毛泽东/3"> 3 </a>
<a class="flag_pg" href="/ls/毛泽东/4"> 4 </a>
<a class="flag_pg" href="/ls/毛泽东/2">下一页</a>
</ul>
</div>
</div>
</div>

</div>

</div>

</div>
<div id="footer">
<p>&copy; 2014 diaosisou | <a href=" " >XXXX</a> |</p>
</div>


</body>
<script charset="GBK" src="/js/opensug.js"></script>
<div style="display:none;align=center">

</div>
<script type="text/javascript" src="js/ad.js"></script>
</html>

支持(0) 反对(0) 黑山妖 | 园豆:3 (初学一级) | 2017-11-15 09:14

我觉得用一些语法类似jQuery的开源组件会更方便点

 

<ul[^>]*>\s*(?:<li[^>]*>(?:[^<]+<[^li])+li>\s*)+<\/ul>

 

https://regex101.com/r/Wn94sv/1

支持(0) 反对(0) 并排逗比北边跑 | 园豆:10 (初学一级) | 2017-11-15 10:58

@并排逗比北边跑: 这玩意我是一点也不会啊,这样取出来怎么在取里面的li的集合啊

支持(0) 反对(0) 黑山妖 | 园豆:3 (初学一级) | 2017-11-15 12:03

@黑山妖: https://stackoverflow.com/questions/20965477/how-can-i-extract-certain-html-tags-e-g-ul-using-regex-with-preg-match-all-in

支持(0) 反对(0) 并排逗比北边跑 | 园豆:10 (初学一级) | 2017-11-15 12:50
0

楼上正解。为了预防页面中有其他<li>,只能先连<ul>取出来,第二次在通过<li>.+<\/li>匹配

龙葛格 | 园豆:569 (小虾三级) | 2017-11-14 19:37
   您需要登录以后才能回答,未注册用户请先注册