首页 新闻 搜索 专区 学院

关于Scrapy报错 Error while obtaining start requests

0
悬赏园豆:10 [待解决问题]
class SouthwestSpider(scrapy.Spider):
    name = 'southwest'
    # allowed_domains = ['www.xxx.com']
    # start_urls = ['https://www.southwest.com']
    url = 'https://www.southwest.com/api/air-booking/v1/air-booking/page/air/booking/shopping'
    def start_requests(self):
        post_data = {
            "adultPassengersCount": "1",
            "application": "air-booking",
            "departureDate": "2020-10-01",
            "departureTimeOfDay": "ALL_DAY",
            "destinationAirportCode": "BDL",
            "fareType": "USD",
            "int": "HOMEQBOMAIR",
            "originationAirportCode": "LAX",
            "passengerType": "ADULT",
            "reset": "true",
            "returnDate": "2020-11-06",
            "returnTimeOfDay": "ALL_DAY",
            "site": "southwest",
            "tripType": "roundtrip",
        }
        yield scrapy.FormRequest(self.url,formdata=json.dumps(post_data),callback=self.parse)
    def parse(self, response):
        print(response)

报错信息

2020-09-30 20:14:37 [scrapy.utils.log] INFO: Scrapy 2.3.0 started (bot: southwestPro)
2020-09-30 20:14:37 [scrapy.utils.log] INFO: Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 17.9.0, Python 3.6.8 (v3.6.8:3c6b436a57, Dec 24 2018, 02:04:31) - [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 3.0, Platform Darwin-18.7.0-x86_64-i386-64bit
2020-09-30 20:14:37 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2020-09-30 20:14:37 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'southwestPro',
 'NEWSPIDER_MODULE': 'southwestPro.spiders',
 'SPIDER_MODULES': ['southwestPro.spiders'],
 'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) '
               'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 '
               'Safari/537.36'}
2020-09-30 20:14:37 [scrapy.extensions.telnet] INFO: Telnet Password: 6c139fcac3ae306c
2020-09-30 20:14:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2020-09-30 20:14:37 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-09-30 20:14:37 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-09-30 20:14:37 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-09-30 20:14:37 [scrapy.core.engine] INFO: Spider opened
2020-09-30 20:14:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-09-30 20:14:37 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-09-30 20:14:37 [scrapy.core.engine] ERROR: Error while obtaining start requests
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapy/core/engine.py", line 129, in _next_request
    request = next(slot.start_requests)
  File "/Users/PycharmProjects/爬虫练手/2.100个简单练手的网站/30.机票/西南航空/southwestPro/southwestPro/spiders/southwest.py", line 26, in start_requests
    yield scrapy.FormRequest(self.url,formdata=json.dumps(post_data),callback=self.parse)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapy/http/request/form.py", line 31, in __init__
    querystr = _urlencode(items, self.encoding)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapy/http/request/form.py", line 72, in _urlencode
    for k, vs in seq
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapy/http/request/form.py", line 72, in <listcomp>
    for k, vs in seq
ValueError: not enough values to unpack (expected 2, got 1)
2020-09-30 20:14:37 [scrapy.core.engine] INFO: Closing spider (finished)
2020-09-30 20:14:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 0.012494,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2020, 9, 30, 12, 14, 37, 712716),
 'log_count/ERROR': 1,
 'log_count/INFO': 10,
 'memusage/max': 52596736,
 'memusage/startup': 52596736,
 'start_time': datetime.datetime(2020, 9, 30, 12, 14, 37, 700222)}
2020-09-30 20:14:37 [scrapy.core.engine] INFO: Spider closed (finished)

请问下大佬们问题出在哪里,请帮忙看看~~

石昊丶的主页 石昊丶 | 初学一级 | 园豆:46
提问于:2020-09-30 20:16
< >
分享
所有回答(1)
0

formdata参数直接用post_data,不需要json.dumps

FormRequest方法内部实现片段:

 if formdata:
            items = formdata.items() if isinstance(formdata, dict) else formdata
            querystr = _urlencode(items, self.encoding)

_urlencode方法是把字典拼接成url字符串,例如:a=1&b=2

E行者 | 园豆:1741 (小虾三级) | 2020-09-30 20:45

额。之前没加json这个模块他报错是 Crawled (400) <POST然后在网上说加json模块以后就不会发生

支持(0) 反对(0) 石昊丶 | 园豆:46 (初学一级) | 2020-09-30 20:56

2020-09-30 21:01:00 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-09-30 21:01:00 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-09-30 21:01:01 [scrapy.core.engine] DEBUG: Crawled (400) <POST https://www.southwest.com/api/air-booking/v1/air-booking/page/air/booking/shopping> (referer: https://www.southwest.com/air/booking/index.html?adultPassengersCount=1&departureDate=2020-09-30&departureTimeOfDay=ALL_DAY&destinationAirportCode=BDL&fareType=USD&int=HOMEQBOMAIR&originationAirportCode=LAX&passengerType=ADULT&reset=true&returnDate=2020-10-03&returnTimeOfDay=ALL_DAY&tripType=roundtrip&validate=true)

我这个是个国外的网站会不会是因为ip或者DNS的问题呀,不过我在网页上这个网站可以正常的打开

支持(0) 反对(0) 石昊丶 | 园豆:46 (初学一级) | 2020-09-30 21:06
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册