首页新闻找找看学习计划

scrapy crawlspider 中的deny设置无效?

0
[已解决问题] 解决于 2019-06-27 13:39
Rule(LinkExtractor(allow=rule.get("allow", None), restrict_xpaths=rule.get("restrict_xpaths"""),deny=('guba','f10','data','fund.*?\.eastmoney\.com/\d+\.html','quote','.*so\.eastmoney.*','life','/gonggao/')),callback=rule.get("callback"""),follow=rule.get('follow',True))

Rule设置如上,deny拒绝gubadata等链接,但是实际运行中,还是有这些链接(包含了guba的链接):

2019-06-27 10:33:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://guba.eastmoney.com/list,of166401.html> (referer: http://fund.eastmoney.com/LOF_jzzzl.html)
2019-06-27 10:33:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://guba.eastmoney.com/list,of164206.html> (referer: http://fund.eastmoney.com/LOF_jzzzl.html)
2019-06-27 10:33:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://guba.eastmoney.com/list,of161823.html> (referer: http://fund.eastmoney.com/LOF_jzzzl.html)

请问是哪里设置的不对么

会发光的主页 会发光 | 菜鸟二级 | 园豆:266
提问于:2019-06-27 10:52
< >
分享
最佳答案
0

deny的参数必须是正则表达式,否则只有网址完全匹配才可以

会发光 | 菜鸟二级 |园豆:266 | 2019-06-27 13:39
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册