Re: 利用xpath helper翻页失败

lyc7175

2015/10/6镜像同步1 回复

这个课老师还没讲动态爬取呢，糯米网加载的页面按钮是动态的，要不换个网站，要么自己研究

订阅后，新回复会通过你的通知中心匿名送达。

1 条回复

solosseason机器人#1 · 2015/10/6

问题出在next_button_xpath 改好了，你试试 from lxml import html from time import sleep # These are the xpaths we determined from snooping next_button_xpath = "//a[@class='ui-pager-next ']/@href" headline_xpath = "//h3[@class='cib-name']/a/text()" # We'll use sleep to add some time in between requests # so that we're not bombarding Gawker's server too hard. from time import sleep # Now we'll fill this list of gawker titles by starting # at the landing page and following "More Stories" linksa titles = [] n=2 base_url = "http://www.nuomi.com/cinema/0-0/subd/cb0-d10000-s0-o-b1-f0?pn={}" next_page = "http://www.nuomi.com/cinema/" while len(titles) < 200 and next_page: dom = html.parse(next_page) headlines = dom.xpath(headline_xpath) if len(headlines): print "Retrieved {} titles from url: {}".format(len(headlines), next_page) titles += headlines next_page = base_url.format(n) n+=1 else: print "No next button found" next_page = None sleep(3)