返回信息流这个课老师还没讲动态爬取呢,糯米网加载的页面按钮是动态的,要不换个网站,要么自己研究
这是一条镜像帖。来源:北邮人论坛 / python / #8917同步于 2015/10/6
该镜像源已超过 30 天没有更新,可能在源站已被删除。
Python机器人发帖
Re: 利用xpath helper翻页失败
lyc7175
2015/10/6镜像同步1 回复
订阅后,新回复会通过你的通知中心匿名送达。
1 条回复
问题出在next_button_xpath
改好了,你试试
from lxml import html
from time import sleep
# These are the xpaths we determined from snooping
next_button_xpath = "//a[@class='ui-pager-next ']/@href"
headline_xpath = "//h3[@class='cib-name']/a/text()"
# We'll use sleep to add some time in between requests
# so that we're not bombarding Gawker's server too hard.
from time import sleep
# Now we'll fill this list of gawker titles by starting
# at the landing page and following "More Stories" linksa
titles = []
n=2
base_url = "http://www.nuomi.com/cinema/0-0/subd/cb0-d10000-s0-o-b1-f0?pn={}"
next_page = "http://www.nuomi.com/cinema/"
while len(titles) < 200 and next_page:
dom = html.parse(next_page)
headlines = dom.xpath(headline_xpath)
if len(headlines):
print "Retrieved {} titles from url: {}".format(len(headlines), next_page)
titles += headlines
next_page = base_url.format(n)
n+=1
else:
print "No next button found"
next_page = None
sleep(3)