这是scrapy pipelines模块,运行爬虫的话一次性迭代出所有内容,而不是逐个迭代,
请问原因是什么,加上索引count可以解决这个问题
def parse(self, response):
for article in response.xpath('//div[@class="post_item"]'):
item = TextItem()
item['title'] = article.xpath('//h3/a/text()').extract()
item['author'] =article.xpath('//div[@class="post_item_foot"]/a/text()').extract()
item['rindex'] =article.xpath('//span[@class="diggnum"]/text()').extract()
item['url'] =article.xpath('//h3/a/@href').extract()
yield item
用的是python3.4版本,scrapy是最新版本
import scrapy
from text.items import TextItem
import time
header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.59 Safari/537.36'}
class ConblogSpider(scrapy.Spider):
name = "conblog"
allowed_domains = ["cnblogs.com"]
start_urls = (
'http://www.cnblogs.com/pick/',
)