首页 新闻 搜索 专区 学院

python爬取数据输出乱码

-1
[待解决问题]
# -*- coding: utf-8 -*-
import re
import urllib2

def get_onepage(url):
    response=urllib2.urlopen(url)
    html=response.read()
    print html
    return html
  
c=get_onepage('http://maoyan.com/board/4')

results=re.findall('<p.*?title="(.*?)".*?</p>.*?>(.*?)</p>',c,re.S)
print(results)

输出:

[('\xe9\x9c\xb8\xe7\x8e\x8b\xe5\x88\xab\xe5\xa7\xac', '\n                \xe4\xb8\xbb\xe6\xbc\x94\xef\xbc\x9a\xe5\xbc\xa0\xe5\x9b\xbd\xe8\x8d\xa3,\xe5\xbc\xa0\xe4\xb8\xb0\xe6\xaf\x85,\xe5\xb7\xa9\xe4\xbf\x90\n        '), ('\xe8\x82\x96\xe7\x94\xb3\xe5\x85\x8b\xe7\x9a\x84\xe6\x95\x91\xe8\xb5\x8e', '\n                \xe4\xb8\xbb\xe6\xbc\x94\xef\xbc\x9a\xe8\x92\x82\xe5\xa7\x86\xc2\xb7\xe7\xbd\x97\xe5\xae\xbe\xe6\x96\xaf,\xe6\x91\xa9\xe6\xa0\xb9\xc2\xb7\xe5\xbc\x97\xe9\x87\x8c\xe6\x9b\xbc,\xe9\xb2\x8d\xe5\x8b\x83\xc2\xb7\xe5\x86\x88\xe9\xa1\xbf\n        '), ('\xe7\xbd\x97\xe9\xa9\xac\xe5\x81\x87\xe6\x97\xa5', '\n                \xe4\xb8\xbb\xe6\xbc\x94\xef\xbc\x9a\xe6\xa0\xbc\xe5\x88\xa9\xe9\xab\x98\xe5\x88\xa9\xc2\xb7\xe6\xb4\xbe\xe5\x85\x8b,\xe5\xa5\xa5\xe9\xbb\x9b\xe4\xb8\xbd\xc2\xb7\xe8\xb5\xab\xe6\x9c\xac,\xe5\x9f\x83\xe8\xbf\xaa\xc2\xb7\xe8\x89\xbe\xe4\xbc\xaf\xe7\x89\xb9\n        '), ('\xe8\xbf\x99\xe4\xb8\xaa\xe6\x9d\x80\xe6\x89\x8b\xe4\xb8\x8d\xe5\xa4\xaa\xe5\x86\xb7', '\n                \xe4\xb8\xbb\xe6\xbc\x94\xef\xbc\x9a\xe8\xae\xa9\xc2\xb7\xe9\x9b\xb7\xe8\xaf\xba,\xe5\x8a\xa0\xe9\x87\x8c\xc2\xb7\xe5\xa5\xa5\xe5\xbe\xb7\xe6\x9b\xbc,\xe5\xa8\x9c\xe5\xa1\x94\xe8\x8e\x89\xc2\xb7\xe6\xb3\xa2\xe7\x89\xb9\xe6\x9b\xbc\n        '), ('\xe6\x95\x99\xe7\x88\xb6', '\n                \xe4\xb8\xbb\xe6\xbc\x94\xef\xbc\x9a\xe9\xa9\xac\xe9\xbe\x99\xc2\xb7\xe7\x99\xbd\xe5\x85\xb0\xe5\xba\xa6,\xe9\x98\xbf\xe5\xb0\x94\xc2\xb7\xe5\xb8\x95\xe8\xa5\xbf\xe8\xaf\xba,\xe8\xa9\xb9\xe5\xa7\x86\xe6\x96\xaf\xc2\xb7\xe5\x87\xaf\xe6\x81\xa9\n        '), ('\xe6\xb3\xb0\xe5\x9d\xa6\xe5\xb0\xbc\xe5\x85\x8b\xe5\x8f\xb7', '\n                \xe4\xb8\xbb\xe6\xbc\x94\xef\xbc\x9a\xe8\x8e\xb1\xe6\x98\x82\xe7\xba\xb3\xe5\xa4\x9a\xc2\xb7\xe8\xbf\xaa\x

foreverlove~的主页 foreverlove~ | 菜鸟二级 | 园豆:206
提问于:2018-04-12 17:43
< >
分享
所有回答(1)
1

加个decode 就可以了

import urllib3


def getonepage(url):
    http = urllib3.PoolManager()

    r = http.request('Get', url)

    return r.data.decode('utf-8')


c = getonepage('http://maoyan.com/board/4')
print(c)
# RESULT
<!DOCTYPE html>

<!--[if IE 8]><html class="ie8"><![endif]-->
<!--[if IE 9]><html class="ie9"><![endif]-->
<!--[if gt IE 9]><!--><html><!--<![endif]-->
<head>
  <title>TOP100榜 - 猫眼电影 - 一网打尽好电影</title>
  
  <link rel="dns-prefetch" href="//p0.meituan.net"  />
  <link rel="dns-prefetch" href="//p1.meituan.net"  />
  <link rel="dns-prefetch" href="//ms0.meituan.net" />
  <link rel="dns-prefetch" href="//ms1.meituan.net" />
  <link rel="dns-prefetch" href="//analytics.meituan.com" />
  <link rel="dns-prefetch" href="//report.meituan.com" />
  <link rel="dns-prefetch" href="//frep.meituan.com" />

  
  <meta charset="utf-8">
  <meta name="keywords" content="猫眼电影,电影排行榜,热映口碑榜,最受期待榜,国内票房榜,北美票房榜,猫眼TOP100">
  <meta name="description" content="猫眼电影热门榜单,包括热映口碑榜,最受期待榜,国内票房榜,北美票房榜,猫眼TOP100,多维度为用户进行选片决策">
  <meta http-equiv="cleartype" content="yes" />
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <meta name="renderer" content="webkit" />

  <meta name="HandheldFriendly" content="true" />
  <meta name="format-detection" content="email=no" />
  <meta name="format-detection" content="telephone=no" />
  <meta name="viewport" content="width=device-width, initial-scale=1">

  
  <script>
  cid = "c_wx6zb55";
  ci = 50;
val = {"subnavId":4};    window.system = {};

  window.openPlatform = '';

  </script>
  <link rel="stylesheet" href="//ms0.meituan.net/mywww/common.4b838ec3.css"/>
<link rel="stylesheet" href="//ms0.meituan.net/mywww/board-index.92a06072.css"/>
  <script src="//ms0.meituan.net/mywww/stat.583e6097.js"></script>
  <script>if(window.devicePixelRatio >= 2) { document.write('<link rel="stylesheet" href="//ms0.meituan.net/mywww/image-2x.8ba7074d.css"/>') }</script>
  <style>
    @font-face {
      font-family: stonefont;
      src: url('//vfile.meituan.net/colorstone/83791eb89c7a55ce2fe5d0aace1d019b3168.eot');
      src: url('//vfile.meituan.net/colorstone/83791eb89c7a55ce2fe5d0aace1d019b3168.eot?#iefix') format('embedded-opentype'),
           url('//vfile.meituan.net/colorstone/f6d5e1bb204dfb957411808f91ef47832072.woff') format('woff');
    }

    .stonefont {
      font-family: stonefont;
    }
  </style>
</head>
<body>


<div class="header">
  <div class="header-inner">
        <a href="/" class="logo" data-act="icon-click"></a>
        <div class="city-container" data-val="{currentcityid:50 }">
            <div class="city-selected">
                <div class="city-name">
                  杭州
                  <span class="caret"></span>
                </div>
            </div>
            <div class="city-list" data-val="{ localcityid: 50 }">
                <div class="city-list-header">定位城市:<a class="js-geo-city">杭州</a></div>
                
            </div>
        </div>


        <div class="nav">
            <ul class="navbar">
                <p><a href="/" data-act="home-click"  >首页</a></p>
                <p><a href="/films" data-act="movies-click" >电影</a></p>
                <p><a href="/cinemas" data-act="cinemas-click" >影院</a></p> 
                
                <p><a href="/board" data-act="board-click"  class="active" >榜单</a></p>
                <p><a href="/news" data-act="hotNews-click" >热点</a></p>
            </ul>
        </div>

        <div class="user-info">
            <div class="user-avatar J-login">
              <img src="http://p0.meituan.net/movie/7dd82a16316ab32c8359debdb04396ef2897.png">
              <span class="caret"></span>
              <ul class="user-menu">
                <p><a href="javascript:void 0">登录</a></p>
              </ul>
            </div>
        </div>

        <form action="/query" target="_blank" class="search-form" data-actform="search-click">
            <input name="kw" class="search" type="search" maxlength="32" placeholder="找影视剧、影人、影院" autocomplete="off">
            <input class="submit" type="submit" value="">
        </form>

        <div class="app-download">
          <a href="/app" target="_blank">
            <span class="iphone-icon"></span>
            <span class="apptext">APP下载</span>
            <span class="caret"></span>
            <div class="download-icon">
                <p class="down-title">扫码下载APP</p>
                <p class='down-content'>选座更优惠</p>
            </div>
          </a>
        </div>
  </div>
</div>
<div class="header-placeholder"></div>

<div class="subnav">
  <ul class="navbar">
    <p>
      <a data-act="subnav-click" data-val="{subnavClick:7}"
          href="/board/7"
      >热映口碑榜</a>
    </p>
    <p>
      <a data-act="subnav-click" data-val="{subnavClick:6}"
          href="/board/6"
      >最受期待榜</a>
    </p>
    <p>
      <a data-act="subnav-click" data-val="{subnavClick:1}"
          href="/board/1"
      >国内票房榜</a>
    </p>
    <p>
      <a data-act="subnav-click" data-val="{subnavClick:2}"
          href="/board/2"
      >北美票房榜</a>
    </p>
    <p>
      <a data-act="subnav-click" data-val="{subnavClick:4}"
          data-state-val="{subnavId:4}"
          class="active" href="javascript:void(0);"
      >TOP100榜</a>
    </p>
  </ul>
</div>


    <div class="container" id="app" class="page-board/index" >

<div class="content">
    <div class="wrapper">
        <div class="main">
            <p class="update-time">2018-04-12<span class="has-fresh-text">已更新</span></p>
            <p class="board-content">榜单规则:将猫眼电影库中的经典影片,按照评分和评分人数从高到低综合排序取前100名,每天上午10点更新。相关数据来源于“猫眼电影库”。</p>
            <dl class="board-wrapper">
                <dd>
                        <i class="board-index board-index-1">1</i>
    <a href="/films/1203" title="霸王别姬" class="image-link" data-act="boarditem-click" data-val="{movieId:1203}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p1.meituan.net/movie/20803f59291c47e1e116c11963ce019e68711.jpg@160w_220h_1e_1c" alt="霸王别姬" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/1203" title="霸王别姬" data-act="boarditem-click" data-val="{movieId:1203}">霸王别姬</a></p>
        <p class="star">
                主演:张国荣,张丰毅,巩俐
        </p>
<p class="releasetime">上映时间:1993-01-01(中国香港)</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">6</i></p>        
    </div>

      </div>
    </div>

                </dd>
                <dd>
                        <i class="board-index board-index-2">2</i>
    <a href="/films/1297" title="肖申克的救赎" class="image-link" data-act="boarditem-click" data-val="{movieId:1297}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p0.meituan.net/movie/__40191813__4767047.jpg@160w_220h_1e_1c" alt="肖申克的救赎" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/1297" title="肖申克的救赎" data-act="boarditem-click" data-val="{movieId:1297}">肖申克的救赎</a></p>
        <p class="star">
                主演:蒂姆·罗宾斯,摩根·弗里曼,鲍勃·冈顿
        </p>
<p class="releasetime">上映时间:1994-10-14(美国)</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">5</i></p>        
    </div>

      </div>
    </div>

                </dd>
                <dd>
                        <i class="board-index board-index-3">3</i>
    <a href="/films/2641" title="罗马假日" class="image-link" data-act="boarditem-click" data-val="{movieId:2641}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p0.meituan.net/movie/23/6009725.jpg@160w_220h_1e_1c" alt="罗马假日" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/2641" title="罗马假日" data-act="boarditem-click" data-val="{movieId:2641}">罗马假日</a></p>
        <p class="star">
                主演:格利高利·派克,奥黛丽·赫本,埃迪·艾伯特
        </p>
<p class="releasetime">上映时间:1953-09-02(美国)</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">1</i></p>        
    </div>

      </div>
    </div>

                </dd>
                <dd>
                        <i class="board-index board-index-4">4</i>
    <a href="/films/4055" title="这个杀手不太冷" class="image-link" data-act="boarditem-click" data-val="{movieId:4055}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p0.meituan.net/movie/fc9d78dd2ce84d20e53b6d1ae2eea4fb1515304.jpg@160w_220h_1e_1c" alt="这个杀手不太冷" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/4055" title="这个杀手不太冷" data-act="boarditem-click" data-val="{movieId:4055}">这个杀手不太冷</a></p>
        <p class="star">
                主演:让·雷诺,加里·奥德曼,娜塔莉·波特曼
        </p>
<p class="releasetime">上映时间:1994-09-14(法国)</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">5</i></p>        
    </div>

      </div>
    </div>

                </dd>
                <dd>
                        <i class="board-index board-index-5">5</i>
    <a href="/films/1247" title="教父" class="image-link" data-act="boarditem-click" data-val="{movieId:1247}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p0.meituan.net/movie/92/8212889.jpg@160w_220h_1e_1c" alt="教父" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/1247" title="教父" data-act="boarditem-click" data-val="{movieId:1247}">教父</a></p>
        <p class="star">
                主演:马龙·白兰度,阿尔·帕西诺,詹姆斯·凯恩
        </p>
<p class="releasetime">上映时间:1972-03-24(美国)</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">3</i></p>        
    </div>

      </div>
    </div>

                </dd>
                <dd>
                        <i class="board-index board-index-6">6</i>
    <a href="/films/267" title="泰坦尼克号" class="image-link" data-act="boarditem-click" data-val="{movieId:267}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p0.meituan.net/movie/11/324629.jpg@160w_220h_1e_1c" alt="泰坦尼克号" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/267" title="泰坦尼克号" data-act="boarditem-click" data-val="{movieId:267}">泰坦尼克号</a></p>
        <p class="star">
                主演:莱昂纳多·迪卡普里奥,凯特·温丝莱特,比利·赞恩
        </p>
<p class="releasetime">上映时间:1998-04-03</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">5</i></p>        
    </div>

      </div>
    </div>

                </dd>
                <dd>
                        <i class="board-index board-index-7">7</i>
    <a href="/films/123" title="龙猫" class="image-link" data-act="boarditem-click" data-val="{movieId:123}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p0.meituan.net/movie/c8f224ca9939cd9dd58f709c9c4deb0924422.jpg@160w_220h_1e_1c" alt="龙猫" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/123" title="龙猫" data-act="boarditem-click" data-val="{movieId:123}">龙猫</a></p>
        <p class="star">
                主演:日高法子,坂本千夏,糸井重里
        </p>
<p class="releasetime">上映时间:1988-04-16(日本)</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">2</i></p>        
    </div>

      </div>
    </div>

                </dd>
                <dd>
                        <i class="board-index board-index-8">8</i>
    <a href="/films/837" title="唐伯虎点秋香" class="image-link" data-act="boarditem-click" data-val="{movieId:837}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p0.meituan.net/movie/62/109878.jpg@160w_220h_1e_1c" alt="唐伯虎点秋香" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/837" title="唐伯虎点秋香" data-act="boarditem-click" data-val="{movieId:837}">唐伯虎点秋香</a></p>
        <p class="star">
                主演:周星驰,巩俐,郑佩佩
        </p>
<p class="releasetime">上映时间:1993-07-01(中国香港)</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">2</i></p>        
    </div>

      </div>
    </div>

                </dd>
                <dd>
                        <i class="board-index board-index-9">9</i>
    <a href="/films/2760" title="魂断蓝桥" class="image-link" data-act="boarditem-click" data-val="{movieId:2760}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p1.meituan.net/movie/94c3a84626fd7650d6891088c4b88e5c27012.jpg@160w_220h_1e_1c" alt="魂断蓝桥" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/2760" title="魂断蓝桥" data-act="boarditem-click" data-val="{movieId:2760}">魂断蓝桥</a></p>
        <p class="star">
                主演:费雯·丽,罗伯特·泰勒,露塞尔·沃特森
        </p>
<p class="releasetime">上映时间:1940-05-17(美国)</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">2</i></p>        
    </div>

      </div>
    </div>

                </dd>
                <dd>
                        <i class="board-index board-index-10">10</i>
    <a href="/films/1212" title="千与千寻" class="image-link" data-act="boarditem-click" data-val="{movieId:1212}">
      <img src="//ms0.meituan.net/mywww/image/loading_2.e3d934bf.png" alt="" class="poster-default" />
      <img data-src="http://p0.meituan.net/movie/9bf7d7b81001a9cf8adbac5a7cf7d766132425.jpg@160w_220h_1e_1c" alt="千与千寻" class="board-img" />
    </a>
    <div class="board-item-main">
      <div class="board-item-content">
              <div class="movie-item-info">
        <p class="name"><a href="/films/1212" title="千与千寻" data-act="boarditem-click" data-val="{movieId:1212}">千与千寻</a></p>
        <p class="star">
                主演:柊瑠美,入野自由,夏木真理
        </p>
<p class="releasetime">上映时间:2001-07-20(日本)</p>    </div>
    <div class="movie-item-number score-num">
<p class="score"><i class="integer">9.</i><i class="fraction">3</i></p>        
    </div>

      </div>
    </div>

                </dd>
            </dl>

        </div>
            <div class="pager-main">
                
  
  <ul class="list-pager">



  
      <li class="active">
    <a class="page_1"
      href="javascript:void(0);" style="cursor: default"
  >1</a>

</p>
  <li >
    <a class="page_2"
      href="?offset=10"
  >2</a>

</p>
  <li >
    <a class="page_3"
      href="?offset=20"
  >3</a>

</p>
  <li >
    <a class="page_4"
      href="?offset=30"
  >4</a>

</p>
  <li >
    <a class="page_5"
      href="?offset=40"
  >5</a>

</p>

    <li class="sep">...</p>
      <li >
    <a class="page_10"
      href="?offset=90"
  >10</a>

</p>

  

<p>  <a class="page_2"
      href="?offset=10"
  >下一页</a>
</p>
</ul>


            </div>
    </div>
</div>

    </div>
<div class="footer">
    <p class="friendly-links">
      违法和不良信息举报电话: 4006018900
      举报邮箱: tousujubao@meituan.com
    </p>
    <p class="friendly-links">
        友情链接 :
        <a href="http://www.meituan.com" data-query="utm_source=wwwmaoyan" target="_blank">美团网</a>
        <span></span>
        <a href="http://i.meituan.com/client" data-query="utm_source=wwwmaoyan" target="_blank">美团下载</a>
    </p>
    <p>
        &copy;2016
        猫眼电影 maoyan.com
        <a href="https://tsm.miit.gov.cn/pages/EnterpriseSearchList_Portal.aspx?type=0&keyword=京ICP证160733号&pageNo=1" target="_blank">京ICP证160733</a>
        <a href="http://www.miibeian.gov.cn" target="_blank">京ICP备16022489-1</a>
        京公网安备 11010502030881<a href="/about/licence" target="_blank">网络文化经营许可证</a>
        <a href="http://www.meituan.com/about/rules" target="_blank">电子公告服务规则</a>
    </p>
    <p>北京猫眼文化传媒有限公司</p>
</div>

    <!--[if IE 8]><script src="//ms0.meituan.net/mywww/es5-shim.bbad933f.js"></script><![endif]-->
    <!--[if IE 8]><script src="//ms0.meituan.net/mywww/es5-sham.d6ea26f4.js"></script><![endif]-->
    <script src="//ms0.meituan.net/mywww/common.dc33ab40.js"></script>
<script src="//ms0.meituan.net/mywww/board-index.4aa00764.js"></script>
</body>
</html>
BUTTERAPPLE | 园豆:2950 (老鸟四级) | 2018-04-12 18:08
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册