python爬虫抓取google搜索结果出现403的问题？

悬赏园豆：10 [待解决问题]

最近学习了下爬虫，觉得python不错。试着写了个抓取百度搜索结果的爬虫，成功抓取成功了。我试着依样画葫芦，想抓取一下google的搜索结果。结果，却失败了。下面是我的抓取代码：

 1 #抓取谷歌结果
 2 #param {word:关键词，filepath：结果文件路径}
 3 #return
 4 def searchGoogle(word,filepath):
 5     #设置url
 6     url="https://www.google.com.hk/search?"
 7     #伪装浏览器环境
 8     user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
 9     headers={'User-Agent':user_agent}
10 
11     #传入关键词,百度的关键词标签是wd
12     values={'q':word}
13     #转码
14     urlf=url+urllib.urlencode(values)
15     #print urlf
16 
17     #取结果
18     response=urllib2.urlopen(urlf)
19     html=response.read()
20 
21     #保存文件
22     open(filepath,'w').write(html)

没有成功抓到搜索结果页面，而是抓到了一个乱码的搜索页面。

python 爬虫

程序猿小叶 | 菜鸟二级 | 园豆：222
提问于：2014-04-25 11:17

< >

所有回答(0)

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。

欢迎，请先 登录 或者 注册 。

python爬虫抓取google搜索结果出现403的问题？

欢迎，请先登录或者注册。