首页 新闻 搜索 专区 学院

java写爬虫小程序报错:CloseabaleHttpResponse response = httpClient.execute(httpGet);报错

0
悬赏园豆:5 [待解决问题]


import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpEntityEnclosingRequestBase;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.impl.client.HttpClients;

 

public class crawlHtml {
public static void main(String[] args){
}

//爬取网页数据
public static String pickData(String url){
try{
CloseableHttpClient httpClient=HttpClientBuilder.create().build();

HttpGet httpGet=new HttpGet(url);
CloseableHttpResponse response=httpClient.execute(httpGet);//这一行报错,错误我放在了最下面

}catch(Exception e){
e.printStackTrace();
}
}
}

错误:

Multiple markers at this line
- The type org.apache.http.HttpResponse cannot be resolved. It is indirectly referenced from required .class files
- The type org.apache.http.HttpHost cannot be resolved. It is indirectly referenced from required .class files
- The type org.apache.http.protocol.HttpContext cannot be resolved. It is indirectly referenced from
required .class files

我的Eclipse和JDK1.8,运行其他项目是可以的。应该没有不兼容的问题

只是想写一个简单的爬虫程序,被卡在这里好久了,麻烦各位大神帮忙看看。谢谢!

丶在雨中漫步的主页 丶在雨中漫步 | 初学一级 | 园豆:197
提问于:2017-12-25 17:28
< >
分享
所有回答(2)
0

试试

import org.apache.http.HttpResponse;
dudu | 园豆:38805 (高人七级) | 2017-12-25 22:02

不行啊

支持(0) 反对(0) 丶在雨中漫步 | 园豆:197 (初学一级) | 2017-12-29 13:14
0

爬虫还是用python写比较好,一看就会.给你个小demo,给点园豆

from urllib import request
from bs4 import BeautifulSoup  # Beautiful Soup是一个可以从HTML或XML文件中提取结构化数据的Python库

# 构造头文件,模拟浏览器访问
url = "http://www.jianshu.com"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}
page = request.Request(url, headers=headers)
page_info = request.urlopen(page).read().decode('utf-8')  # 打开Url,获取HttpResponse返回对象并读取其ResposneBody

# 将获取到的内容转换成BeautifulSoup格式,并将html.parser作为解析器
soup = BeautifulSoup(page_info, 'html.parser')
# 以格式化的形式打印html
print(soup.prettify())

titles = soup.find_all('a', 'title')  # 查找所有a标签中class='title'的语句


# 打印查找到的每一个a标签的string和文章链接
for title in titles:
        print(title.string)
        print("http://www.jianshu.com" + title.get('href'))


# open()是读写文件的函数,with语句会自动close()已打开文件
with open(r"F:\PhyWorkSpeace\aaa.txt", "w") as file:  # 在磁盘以只写的方式打开/创建一个名为 aaa 的txt文件
    for title in titles:
        file.write(title.string + '\n')
        file.write("http://www.jianshu.com" + title.get('href') + '\n\n')
DanBrown | 园豆:1496 (小虾三级) | 2017-12-27 15:10

我知道python写爬虫方便些,可是我问的是java啊 老哥

支持(0) 反对(0) 丶在雨中漫步 | 园豆:197 (初学一级) | 2017-12-29 13:14
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册