首页 新闻 会员 周边

如何批量下载网页中的文件链接?

0
[待解决问题]

<div class="bookMl"><strong>卷一·上焦篇</strong></div>
<div style=" clear:both; overflow:hidden; height:auto;">

          <span><a href="https://so.gushiwen.cn/guwen/bookv_46653FD803893E4FB26CC490CA53BD64.aspx">序</a></span>
           
          <span><a href="https://so.gushiwen.cn/guwen/bookv_46653FD803893E4F8AD41D06CFD8C8C6.aspx">原病篇</a></span>
           
          <span><a href="https://so.gushiwen.cn/guwen/bookv_46653FD803893E4F8F3A8474F1C283B5.aspx">风温、温热、温疫、温</a></span>
           
          <span><a href="https://so.gushiwen.cn/guwen/bookv_46653FD803893E4F0D406BBBD3795F62.aspx">暑温</a></span>
           
          <span><a href="https://so.gushiwen.cn/guwen/bookv_46653FD803893E4FB806330877EE2459.aspx">伏暑</a></span>
           
          <span><a href="https://so.gushiwen.cn/guwen/bookv_46653FD803893E4FDD4DA0A5ACF2A66F.aspx">湿温、寒湿</a></span>
           
          <span><a href="https://so.gushiwen.cn/guwen/bookv_46653FD803893E4F3E0A97159149E65A.aspx">温疟</a></span>
           
          <span><a href="https://so.gushiwen.cn/guwen/bookv_46653FD803893E4FC46098CBC139BFC1.aspx">秋燥</a></span>
           
          <span><a href="https://so.gushiwen.cn/guwen/bookv_46653FD803893E4F3DA65DC503838A79.aspx">补秋燥胜气论</a></span>
          
    </div>
guanghui2022的主页 guanghui2022 | 菜鸟二级 | 园豆:204
提问于:2023-03-19 12:08
< >
分享
所有回答(3)
0
ycyzharry | 园豆:25653 (高人七级) | 2023-03-19 15:14
0

python 的 爬虫工具

快乐的凡人721 | 园豆:3918 (老鸟四级) | 2023-03-19 20:53
0

import requests
from bs4 import BeautifulSoup

网页地址

url = 'https://so.gushiwen.cn/guwen/bookv_46653FD803893E4FB26CC490CA53BD64.aspx'

使用requests库获取网页源代码

response = requests.get(url)

使用beautifulsoup库解析源代码,获取所有<a>标签的href属性

soup = BeautifulSoup(response.text, 'html.parser')
link_list = []
for link in soup.find_all('a'):
href = link.get('href')
if href:
link_list.append(href)

将链接存储到文本文件中,每个链接占一行

with open('links.txt', 'w') as f:
for link in link_list:
f.write(link + '\n')

批量下载链接

for link in link_list:
url = link.strip() # 去除链接中的空格和换行符
response = requests.get(url)
file_name = url.split('/')[-1] # 从链接中提取文件名
with open(file_name, 'wb') as f:
f.write(response.content)
在此代码中,爬虫部分和批量下载部分分别在两个不同的for循环中实现,因此可以先爬取链接并保存到文本文件中,然后再使用另一个for循环进行批量下载。

没有烟抽的日子 | 园豆:260 (菜鸟二级) | 2023-03-20 15:12
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册