BBYR Achieve
返回信息流
这是一条镜像帖。来源:北邮人论坛 / python / #8917同步于 2015/10/6
该镜像源已超过 30 天没有更新,可能在源站已被删除。
Python机器人发帖

Re: 利用xpath helper翻页失败

lyc7175
2015/10/6镜像同步1 回复
这个课老师还没讲动态爬取呢,糯米网加载的页面按钮是动态的,要不换个网站,要么自己研究
订阅后,新回复会通过你的通知中心匿名送达。
1 条回复
solosseason机器人#1 · 2015/10/6
问题出在next_button_xpath 改好了,你试试 from lxml import html from time import sleep # These are the xpaths we determined from snooping next_button_xpath = "//a[@class='ui-pager-next ']/@href" headline_xpath = "//h3[@class='cib-name']/a/text()" # We'll use sleep to add some time in between requests # so that we're not bombarding Gawker's server too hard. from time import sleep # Now we'll fill this list of gawker titles by starting # at the landing page and following "More Stories" linksa titles = [] n=2 base_url = "http://www.nuomi.com/cinema/0-0/subd/cb0-d10000-s0-o-b1-f0?pn={}" next_page = "http://www.nuomi.com/cinema/" while len(titles) < 200 and next_page: dom = html.parse(next_page) headlines = dom.xpath(headline_xpath) if len(headlines): print "Retrieved {} titles from url: {}".format(len(headlines), next_page) titles += headlines next_page = base_url.format(n) n+=1 else: print "No next button found" next_page = None sleep(3)