scrapy crawlspider 中的deny设置无效？

[已解决问题] 解决于 2019-06-27 13:39

Rule(LinkExtractor(allow=rule.get("allow", None), restrict_xpaths=rule.get("restrict_xpaths", ""),deny=('guba','f10','data','fund.*?\.eastmoney\.com/\d+\.html','quote','.*so\.eastmoney.*','life','/gonggao/')),callback=rule.get("callback", ""),follow=rule.get('follow',True))

Rule设置如上，deny拒绝guba、data等链接，但是实际运行中，还是有这些链接（包含了guba的链接）：

2019-06-27 10:33:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://guba.eastmoney.com/list,of166401.html> (referer: http://fund.eastmoney.com/LOF_jzzzl.html)
2019-06-27 10:33:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://guba.eastmoney.com/list,of164206.html> (referer: http://fund.eastmoney.com/LOF_jzzzl.html)
2019-06-27 10:33:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://guba.eastmoney.com/list,of161823.html> (referer: http://fund.eastmoney.com/LOF_jzzzl.html)

请问是哪里设置的不对么

python scrapy

会发光 | 菜鸟二级 | 园豆：258
提问于：2019-06-27 10:52

< >

最佳答案

deny的参数必须是正则表达式，否则只有网址完全匹配才可以

会发光 | 菜鸟二级 |园豆：258 | 2019-06-27 13:39

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。