搜索引擎 5行小爬虫，为什么老师的是对的，我的是错的

2015/9/27镜像同步19 回复

#encoding=utf-8 from lxml import html x = html.parse('http://www.mtime.com/hotest/') titles = x.xpath('//dt/a/text()') print "We got %s titles. Here are the first 5:" % len(titles) for title in titles: print title 上为老师爬时光网输出电影名称的源程序，输出结果如下： We got 10 titles. Here are the first 5: 港囧 Lost In Hongkong(2015) 九层妖塔 Chronicles of the Ghostly Tribe(2015) 像素大战 Pixels(2015) 碟中谍5：神秘国度 Mission: Impossible - Rogue Nation(2015) 暗杀 Assassination(2015) 第三种爱情 The Third Way Of Love(2015) 小黄人大眼萌 Minions(2015) 解救吾先生 Saving Mr.Wu(2015) 魔镜 The Mirror(2015) 夏洛特烦恼 Goodbye Mr.Loser(2015)[/color] 我就稍微改动了下 #encoding=utf-8 from lxml import html x = html.parse('http://bbs.byr.cn/#!board/Recommend?p=1') titles = x.xpath("//td[@class='title_9']/a/text()|//td[@class='title_9 bg-odd']/a/text()") print "We got %s titles:" % len(titles) for title in titles: print title 输出结果如下： We got 0 titles: 在google的插件xpath里面 //td[@class='title_9']/a/text()|//td[@class='title_9 bg-odd']/a/text() 是可以正常的找到我要爬下来的语句的有没有人帮我看下，感谢

订阅后，新回复会通过你的通知中心匿名送达。

9 条回复

BYRTQ机器人#1 · 2015/9/27

先帮自己顶一下

BYRTQ机器人#2 · 2015/9/27

来人呀

nuanyangyang机器人#3 · 2015/9/27

这有什么错吗？

lizz机器人#4 · 2015/9/27

因为你那个页面显示的帖子是用ajax渲染上去的，直接获取的html里面没有任何帖子的信息

wanghaohebe机器人#5 · 2015/9/27

右键查看源代码你看看那里面有没有帖子信息就知道了 html.parse获取的就是那个

huangxin1993机器人#6 · 2015/9/27

lz和我是一个课的

BYRTQ机器人#7 · 2015/9/27

谢谢【在 wanghaohebe 的大作中提到: 】 : 右键查看源代码你看看那里面有没有帖子信息就知道了 html.parse获取的就是那个

BYRTQ机器人#8 · 2015/9/27

谢谢，确实是这样的【在 wanghaohebe 的大作中提到: 】 : 右键查看源代码你看看那里面有没有帖子信息就知道了 html.parse获取的就是那个

shixu机器人#9 · 2015/9/27

网络搜索引擎发自「贵邮」

搜索引擎 5行小爬虫，为什么老师的是对的 ，我的是错的

搜索引擎 5行小爬虫，为什么老师的是对的，我的是错的