file_get_contents()函数如何使用

悬赏园豆：30 [已解决问题] 解决于 2012-02-22 23:00

我编了一段类似简易网络爬虫的PHP代码，通过一级页面得到二级页面，并在二级页面中搜索关键字，代码如下：

 1 <?php  
 2     $search=$_GET['search'];
 3     echo $search;
 4     $str = file_get_contents("http://www.sina.com");  //抓取网页
 5     $pat = '/<a(.*?)href="(.*?)"(.*?)>(.*?)<\/a>/i';  //匹配模式     
 6         
 7     preg_match_all($pat, $str, $m);     //进行正则匹配
 8     $m_unique=array_unique($m);           //去除重复项
 9     
10     foreach ($m as $value){              //获得匹配结果
11         foreach ($value as $val) {  
12             $model='/"http(.*?)"/';
13               preg_match($model, $val,$res);    //得出超链接
14               $res_unique=array_unique($res);    //去除重复项
15             if ($res_unique[0]!=null){        
16                   echo $res_unique[0];
17                 $num=$num+1; 
18                 $pg=file_get_contents($res_unique[0]);//读入二级网页
19                 if ($pg==null){
20                     echo 'error';
21                 }else {
22                     echo 'success';
23                 }
24                 echo "<br>";
25             }
26         }  
27     }  
28 ?>

二级页面的URL抓取无误，但是二级页面却无法通过file_get_contents()函数进行读取，全部显示为空，想了很久想不通为什么orz

请教各位大牛需要如何实现这个小程序，如何解决此问题，谢谢各位了！！！

php 爬虫抓取页面 file_get_contents 函数使用

天涯无尘 | 初学一级 | 园豆：196
提问于：2012-02-22 20:45

< >

最佳答案

参考文章：php文件下载中jpg文件为空文件的问题

收获园豆：30

dudu | 高人七级 |园豆：24802 | 2012-02-22 20:55

可是这个是下载html文件问题不是JPG文件，并且第4行的抓取是有效地啊？

天涯无尘 | 园豆：196 (初学一级) | 2012-02-22 21:04

@天涯无尘: 把$res_unique[0]显示出来，看得到的网址是什么？

dudu | 园豆：24802 (高人七级) | 2012-02-22 21:10

@天涯无尘: 哇塞,原来博问中支持回复这么长的文字的.哥们,你在刷屏吧 :)

LCM | 园豆：6876 (大侠五级) | 2012-02-22 22:03

@LCM: 无意中刷的T_T,这个问题过于困扰……

天涯无尘 | 园豆：196 (初学一级) | 2012-02-22 22:08

@天涯无尘: 怎么这么多，用一个网址测试一下就行了。

这个回复太长了，删除一下吧

dudu | 园豆：24802 (高人七级) | 2012-02-22 22:15

@dudu: 刚才那个用的是新浪测试的，根据打印结果感觉file_get_contents()的参数应该是没问题的，但是就是不返回字符流，全部都是null。不明白是代码问题还是file_get_contents()使用时有限制

天涯无尘 | 园豆：196 (初学一级) | 2012-02-22 22:20

@天涯无尘: 你可以换一个网址测试一下，比如：file_get_contents("http://q.cnblogs.com/q/32185/");

dudu | 园豆：24802 (高人七级) | 2012-02-22 22:27

@dudu: 这个是测试结果，后面的error是判断是否file_get_contents()的返回值为空的结果

"http://www.cnblogs.com"error
"http://www.cnblogs.com"error
"http://home.cnblogs.com/"error
"http://home.cnblogs.com/followees/"error
"http://home.cnblogs.com/followers/"error
"http://home.cnblogs.com/feed/all/"error
"http://space.cnblogs.com/msg/recent"error
"http://home.cnblogs.com/ing/"error
"http://home.cnblogs.com/blog/"error
"http://home.cnblogs.com/group/newpost/"error
"http://home.cnblogs.com/group/"error
"http://news.cnblogs.com/n/publish"error
"http://home.cnblogs.com/news/"error
"http://q.cnblogs.com/"error
"http://home.cnblogs.com/wz/"error
"http://home.cnblogs.com/job/myresume/"error
"http://home.cnblogs.com/job/"error
"http://home.cnblogs.com/kb/"error
"http://space.cnblogs.com/forum/public"error
"http://www.cnblogs.com/goodbye305/archive/2011/03/31/2000711.html"error
"http://q.cnblogs.com/q/32185/"error
"http://www.cnblogs.com/AboutUS.aspx"error
"http://www.cnblogs.com/SiteMap.aspx"error
"http://www.cnblogs.com/ContactUs.aspx"error
"http://www.cnblogs.com/ad.aspx"error
"http://www.cnblogs.com"error

天涯无尘 | 园豆：196 (初学一级) | 2012-02-22 22:31

@天涯无尘: 参考这里：

I fixed this issue on my server (running PHP 5.3.3 on Fedora 14) by removing the --with-curlwrapper from the PHP configuration and rebuilding it.

dudu | 园豆：24802 (高人七级) | 2012-02-22 22:36

@dudu: 是因为配置问题吗？需要开启php_curl.dll？但是开启后仍不行啊。。。还有别的方法取其他页面的内容吗？

天涯无尘 | 园豆：196 (初学一级) | 2012-02-22 22:50

@天涯无尘: 应该是配置问题，别的方法我也不知道。

dudu | 园豆：24802 (高人七级) | 2012-02-22 22:57

@dudu: 哦，那我再研究研究。今天真是谢谢了^_^

天涯无尘 | 园豆：196 (初学一级) | 2012-02-22 22:59

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。

欢迎，请先 登录 或者 注册 。

file_get_contents()函数如何使用

欢迎，请先登录或者注册。