BYR Achieve · 镜像论坛

爬水木遇到问题，望指点

2015/1/26镜像同步6 回复

要爬取水木某个网页如http://www.newsmth.net/nForum/#!article/Love/5967086，但网页不是以html形式返回的，而是在reponse中返回，见firebug抓图代码如下： import re, urllib, urllib2, requests, time, datetime, random from bs4 import BeautifulSoup def smthspider(): headers = {"User-Agent": " Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0", "Host": "www.newsmth.net", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3", "Accept-Encoding": "gzip, deflate", "Connection": "keep-alive", "Cookie":"Hm_lvt_9c7f4d9b7c00cb5aba2c637c64a41567=1421654832,1421728515,1421827791,1421982989; tma=88525828.4893887.1420332716520.1421901287602.1421982989863.14; tmd=91.88525828.4893887.1420332716520.; nforum-left=00100; left-index=00000000000; main[UTMPUSERID]=batulu12; main[UTMPKEY]=41480197; main[UTMPNUM]=16797; Hm_lpvt_9c7f4d9b7c00cb5aba2c637c64a41567=1422001897; main[PASSWORD]=o%257E%257Fi%252C%250D%2528%257C%257D%2504U%255C%2540uKLB%251E%2529%251D%250A%2523%2509%2508; main[XWJOKE]=hoho; bfd_session_id=bfd_g=b56c782bcb75035d00006ef20011174a54a88f0d; tmc=1.88525828.74332260.1421986459313.1421986459313.1421986459313", "Referer":"http://www.newsmth.net/nForum/", "X-Requested-With":"XMLHttpRequest" } #page_url = 'http://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=20&page_start=0' sms_url = 'http://www.newsmth.net/nForum/#!article/Love/5967086?ajax' #r = requests.get(page_url) r = requests.get(sms_url,headers=headers) print r.text smthspider() 但执行结果没有打印出数据，请问这种情况下，怎么能拿到response中的页面

订阅后，新回复会通过你的通知中心匿名送达。