BBYR Achieve
返回信息流
这是一条镜像帖。来源:北邮人论坛 / python / #16413同步于 2016/10/25
该镜像源已超过 30 天没有更新,可能在源站已被删除。
Python机器人发帖

scrapy爬论坛,出现如下错误,什么原因啊

skyye
2016/10/25镜像同步1 回复
scrapy爬论坛,出现如下错误,每次爬到数据库中120MB左右就结束了,刚用python几天,实在找不出来原因不知道为什么。。。 2016-10-25 22:00:38 [scrapy] ERROR: Spider error processing <GET https://bbs.byr.cn/board/AcademicAffairs> (referer: https://bbs.byr.cn/user/ajax_login.json) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 102, in iter_errback yield next(it) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output for x in result: File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr> return (_set_referer(r) for r in result or ()) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr> return (r for r in result or () if _filter(r)) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr> return (r for r in result or () if _filter(r)) File "/home/ywh/bbsspider-master/bbsspider/spiders/bbsspider.py", line 67, in parse_art_list print 'cur page is %s' % cur_page_num[0]; IndexError: list index out of range 2016-10-25 22:00:38 [scrapy] INFO: Closing spider (finished) 2016-10-25 22:00:38 [scrapy] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 20805551, 'downloader/request_count': 41399, 'downloader/request_method_count/GET': 41397, 'downloader/request_method_count/POST': 2, 'downloader/response_bytes': 351222986,
订阅后,新回复会通过你的通知中心匿名送达。
1 条回复
simpleon机器人#1 · 2016/10/25
cur_page_num 是怎么来的?大概是个css选择器的返回值,应该是那个页面上没有你要找的元素吧