论文爬虫求指导！！可有偿，可请客

2016/1/19镜像同步12 回复

我想爬取拍拍贷上的用户数据，做实证分析。开始界面是这样的，如图。然后我要用他们的用户信息，如下图我尝试在在网上学的scrapy框架做，但是域名解析这块，我还是无法实现翻页。代码真的很渣，希望能有大神来指点迷津。下面是我写的spider的那个文件。另外，我是要写论文的，所以，此贴可以是有偿求助，只要能拿到数据，必有重谢！ # -*- coding: utf-8 -*- from scrapy.selector import Selector from scrapy.contrib.spiders import CrawlSpider,Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from paipaidai.items import PaipaidaiItem class PaipaidaiSpider(CrawlSpider): name="paipaidai" allowed_domains=["www.ppdai.com"] start_urls=["http://invest.ppdai.com/loan/list"] rules=[ Rule(SgmlLinkExtractor(allow=(r'http://invest/ppdai/com/loan/list_safe_s0_p\d+?Rate=0',))), Rule(SgmlLinkExtractor(allow=('http://www/ppdai/com/user/'),restrict_xpaths=('//p[@class="userInfo clearfix"]')),callback="parse_item",follow=True) ] def parse_item(self,response): sel=Selector(response) item=PaipaidaiItem() item['name']=sel.xpath('//*[@class="user-name"]/a/text()').extract() return item

订阅后，新回复会通过你的通知中心匿名送达。

9 条回复

icybee机器人#1 · 2016/1/19

摸摸头，啥也不会，坐看大神怎么说

sdlslx机器人#2 · 2016/1/19

要多少量？

liuxinxin机器人#3 · 2016/1/19

要3000左右就够了吧，我这样每次只能爬10个，，【在 sdlslx 的大作中提到: 】 : 要多少量？

sdlslx机器人#4 · 2016/1/19

私信【在 liuxinxin (liuxinxin) 的大作中提到: 】 : 要3000左右就够了吧，我这样每次只能爬10个，，

liuxinxin机器人#5 · 2016/1/19

加你QQ了也【在 sdlslx 的大作中提到: 】 : 私信

ztinpn机器人#6 · 2016/1/19

最后咋收费？

wanghaohebe机器人#7 · 2016/1/19

http://invest.ppdai.com/loan/list_safe_s0_p<页数>?Rate=0 直接用这个链接生成request给框架去crawl不就翻页了吗

liuxinxin机器人#8 · 2016/1/19

这么简单？求详细【在 wanghaohebe 的大作中提到: 】 : http://invest.ppdai.com/loan/list_safe_s0_p<页数>?Rate=0 直接用这个链接生成request给框架去crawl不就翻页了吗

wanghaohebe机器人#9 · 2016/1/19

def parse(self, response): pagesize = 10 for i in range(pagesize): href = 'http://invest.ppdai.com/loan/list_safe_s0_p%d?Rate=0' % i from scrapy import Request request = Request(href, self.parse_item) yield request 你试试