在Python2.x版本上写的代码 ,运行代码的时候报错,KeyError: 'content'
代码如下:
import json import requests def get_job_information(pn,keyword): url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false' data={"first":"true","pn":pn,"kd":keyword} req=requests.post(url,data=data) jobs_data=req.json() jobs_data=jobs_data["content"]["positionResult"]["result"] for job_data in jobs_data: print "公司名称:",job_data["companyFullName"] print "学历要求:",job_data["education"] print "工作年限:",job_data["workYear"] print "工资范围:",job_data["salary"] print "福利待遇:",job_data["companyLabelList"] if __name__=='__main__': get_job_information(pn=1,keyword="Python爬虫")
被封ip了吧,查看实际返回内容,应该没有content字段
是呀,被封ip 了
这个做了防止爬虫处理的,你可以加一个登录,登录后在访问这个地址试试
是不是要加cookies? 模拟了浏览器还是没爬到
@wgq0_0: 要做登录
直接访问你写的这个url 也是有问题的 {"success":false,"msg":"您操作太频繁,请稍后再访问","clientIp":""}
被封了ip
@wgq0_0: 调整抓取间隔时间。
1.缺少js渲染
2.反爬虫了
headers都不加,不是明确告诉别人你是爬虫吗?
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.8",
"Host": "www.lagou.com",
"X-Requested-With":"XMLHttpRequest",
"Origin": "https://www.lagou.com",
"Referer": "ttps://www.lagou.com/jobs/list_Python%E7%88%AC%E8%99%AB?labelWords=&fromSearch=true&suginput=",
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36",
}
url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
data={"first":"true","pn":pn,"kd":keyword}
print session.cookies
req=session.post(url,data=data, headers=headers)
print req.content