首页 新闻 会员 周边 捐助

求大佬写个爬虫 Khan Academy 所有视频 字幕 transcripts

0
悬赏园豆:200 [待解决问题]

求大佬写个爬虫在GitHub 我可以捐款30块。
只需要爬虫 2种语言 EN 和 RU
爬虫 Khan Academy 所有视频 transcripts
爬虫类别

EN

  1. MATH: HIGH SCHOOL & COLLEGE https://www.khanacademy.org/math
  2. TEST PREP https://www.khanacademy.org/test-prep
  3. SCIENCE https://www.khanacademy.org/science
  4. COMPUTING https://www.khanacademy.org/computing
  5. ARTS & HUMANITIES https://www.khanacademy.org/humanities
  6. ECONOMICS https://www.khanacademy.org/economics-finance-domain
  7. READING & LANGUAGE ARTS https://www.khanacademy.org/ela
  8. LIFE SKILLS https://www.khanacademy.org/college-careers-more
  9. PARTNER COURSES https://www.khanacademy.org/partner-content

RU

  1. МАТЕМАТИКА https://ru.khanacademy.org/math
  2. ЕСТЕСТВЕННЫЕ НАУКИ https://ru.khanacademy.org/science
  3. ЭКОНОМИКА И ФИНАНСЫ https://ru.khanacademy.org/economics-finance-domain
  4. ИНФОРМАТИКА https://ru.khanacademy.org/computing
  5. ИСКУССТВО И ГУМАНИТАРНЫЕ НАУКИ https://ru.khanacademy.org/humanities

Add ScreenShots

爬虫步骤 转到 math 转到二级目录
1
Early math review>进入目录 Unit 1>点击播放图标,网站底部的 Video transcript 是字幕。

2

Early math review 下的章节全部字幕 合成一个文件 如 Early math review.txt





为什么需要这个爬虫

我 用来制作 AnkiDroid 或 https://github.com/VaibhavCodeClub/learn 学习列表.

用爱为教育做贡献的主页 用爱为教育做贡献 | 初学一级 | 园豆:2
提问于:2024-08-10 13:12
< >
分享
所有回答(1)
0

import requests
from bs4 import BeautifulSoup

languages = ["EN", "RU"]

for language in languages:
url = f"https://www.khanacademy.org/{language}/video-transcripts"
try:
response = requests.get(url)
response.raise_for_status() # 检查请求是否成功
soup = BeautifulSoup(response.content, "html.parser")
transcripts = soup.find_all("div", class_="video-transcript")
for transcript in transcripts:
try:
title = transcript.find("h3", class_="video-title").text.strip()
link = transcript.find("a", class_="video-link")["href"]
print(f"语言: {language}")
print(f"视频标题: {title}")
print(f"视频链接: {link}")
print("=" * 50)
except AttributeError:
print(f"在 {language} 语言页面中,获取标题或链接时出现错误")
except requests.exceptions.RequestException as e:
print(f"在获取 {language} 语言页面时发生错误: {e}")

高级铲屎官 | 园豆:202 (菜鸟二级) | 2024-08-14 09:04
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册