首页 新闻 会员 周边

scrapy爬虫使用undetected_chromedriver登录总是失败

0
悬赏园豆:5 [待解决问题]

初学者,有没有大佬能解答一下,万分感谢

C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\python.exe -X pycache_prefix=C:\Users\JJJhr_\AppData\Local\JetBrains\PyCharm2023.3\cpython-cache "D:/PyCharm 2023.3.2/plugins/python/helpers/pydev/pydevd.py" --multiprocess --qt-support=auto --client 127.0.0.1 --port 63728 --file F:\PythonProject\ArticleSpider\main.py
Connected to pydev debugger (build 233.13135.95)
2024-04-28 02:08:29 [scrapy.utils.log] INFO: Scrapy 2.11.1 started (bot: ArticleSpider)
2024-04-28 02:08:29 [scrapy.utils.log] INFO: Versions: lxml 5.2.1.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.9.1, w3lib 2.1.2, Twisted 24.3.0, Python 3.11.7 (tags/v3.11.7:fa7a6f2, Dec 4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)], pyOpenSSL 24.1.0 (OpenSSL 3.2.1 30 Jan 2024), cryptography 42.0.5, Platform Windows-10-10.0.19045-SP0
2024-04-28 02:08:30 [scrapy.addons] INFO: Enabled addons:
[]
2024-04-28 02:08:30 [asyncio] DEBUG: Using selector: SelectSelector
2024-04-28 02:08:30 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2024-04-28 02:08:30 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events.WindowsSelectorEventLoop
2024-04-28 02:08:30 [scrapy.extensions.telnet] INFO: Telnet Password: 69dbb14f7af72e05
2024-04-28 02:08:30 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2024-04-28 02:08:30 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'ArticleSpider',
'FEED_EXPORT_ENCODING': 'utf-8',
'NEWSPIDER_MODULE': 'ArticleSpider.spiders',
'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
'SPIDER_MODULES': ['ArticleSpider.spiders'],
'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2024-04-28 02:08:30 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2024-04-28 02:08:30 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2024-04-28 02:08:30 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-04-28 02:08:30 [scrapy.core.engine] INFO: Spider opened
2024-04-28 02:08:30 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-04-28 02:08:30 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-04-28 02:08:30 [undetected_chromedriver.patcher] DEBUG: getting release number from /last-known-good-versions-with-downloads.json
2024-04-28 02:08:31 [undetected_chromedriver.patcher] DEBUG: downloading from https://storage.googleapis.com/chrome-for-testing-public/124.0.6367.91/win32/chromedriver-win32.zip
2024-04-28 02:08:32 [scrapy.core.engine] ERROR: Error while obtaining start requests
Traceback (most recent call last):
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1294, in request
self.send_request(method, url, body, headers, encode_chunked)
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1340, in send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1289, in endheaders
self.send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1048, in send_output
self.send(msg)
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 986, in send
self.connect()
File "C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1466, in connect
self.sock = self.context.wrap_socket(self.sock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\ssl.py", line 517, in wrap_socket
return self.sslsocket_class.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\ssl.py", line 1108, in create
self.do_handshake()
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\ssl.py", line 1383, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\core\engine.py", line 181, in next_request
request = next(self.slot.start_requests)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\PythonProject\ArticleSpider\ArticleSpider\spiders\jobbole.py", line 24, in start_requests
browser = uc.Chrome()
^^^^^^^^^^^
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\site-packages\undetected_chromedriver_init_.py", line 258, in init
self.patcher.auto()
File "C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\site-packages\undetected_chromedriver\patcher.py", line 178, in auto
self.unzip_package(self.fetch_package())
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\site-packages\undetected_chromedriver\patcher.py", line 287, in fetch_package
return urlretrieve(download_url)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
^^^^^^^^^^^^^^^^^^
File "C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py", line 216, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py", line 519, in open
response = self.open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py", line 536, in open
result = self.call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py", line 496, in call_chain
result = func(*args)
^^^^^^^^^^^
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\JJJhr
\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10054] 远程主机强迫关闭了一个现有的连接。>
2024-04-28 02:08:32 [scrapy.core.engine] INFO: Closing spider (finished)
2024-04-28 02:08:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 1.282641,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2024, 4, 27, 18, 8, 32, 71424, tzinfo=datetime.timezone.utc),
'log_count/DEBUG': 5,
'log_count/ERROR': 1,
'log_count/INFO': 10,
'start_time': datetime.datetime(2024, 4, 27, 18, 8, 30, 788783, tzinfo=datetime.timezone.utc)}
2024-04-28 02:08:32 [scrapy.core.engine] INFO: Spider closed (finished)

Process finished with exit code 0

JJJhr的主页 JJJhr | 初学一级 | 园豆:197
提问于:2024-04-28 02:25
< >
分享
所有回答(1)
0

从你的日志来看,主要问题出现在 ChromeDriver 的下载和连接过程中。具体来说,存在以下两个错误:

ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。这个错误意味着在尝试下载 ChromeDriver 时,连接被远程主机关闭了。这可能是由于网络问题或服务器端的问题导致的。你可以尝试重新运行程序,看看是否会解决这个问题。如果问题持续存在,可以尝试使用代理或者检查网络设置以确保网络连接正常。
urllib.error.URLError: <urlopen error [WinError 10054] 远程主机强迫关闭了一个现有的连接。>这个错误是在尝试下载 ChromeDriver 时再次出现的连接问题。解决方法与上述相同,可以尝试重新运行程序或者检查网络设置。
另外,确保你的 ChromeDriver 版本与你使用的 Chrome 浏览器版本匹配。如果版本不匹配,可能会导致兼容性问题。

最后,如果以上方法都无法解决问题,你可能需要查看一下你的网络配置、防火墙设置或者与网络管理员联系,以解决连接问题。

Technologyforgood | 园豆:6468 (大侠五级) | 2024-04-28 22:50
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册