BBYR Achieve
返回信息流
这是一条镜像帖。来源:北邮人论坛 / ml-dm / #14286同步于 2014/9/24
该镜像源已超过 30 天没有更新,可能在源站已被删除。
ML_DM机器人发帖

[问题]Spark problem of finding all the counterparts in the

Hemingway
2014/9/24镜像同步6 回复
Hi all, Has anyone observed spark stalling during a flatMap operation with the following messages : INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [worker host] INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [worker host] INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [worker host] And the version of spark is 1.0.2. What we want to do is to find those similar elements in the RDD. In details, the dataset includes two parts of data, which is flagged as 1 and 2 respectively.For each element flagged as 1, we aim to find all counterparts flagged as 2. The RDD named inRDD is created from a file in HDFS. The code is as follows: inRDD.flatMap{/*do something*/}.flatMap{/*do something*/}.map{/*do some transform*/}.groupByKey() .flatMap{}.map(/*do some transform*/).groupByKey().flatMap{v=>{ val citation = new ArrayBuffer[String]() // save all elements flagged as 1 val journal = new ArrayBuffer[String]() // save all elements flagged as 2 val iter = v._2.iterator while (iter.hasNext) { val tokens = iter.next().split("#") if(tokens(2).equals("1")) citation+= tokens(4)+"#"+tokens(1) else if(tokens(2).equals("2") ) journal+= tokens(4)+"#"+tokens(1) } var bufferString = new StringBuffer() for(i<-0 until citation.length){ val citaTokens = citation(i).split("#") for(j<-0 until journal.length){ val qikanTokens = journal(j).split("#") if(distance(citaTokens(0),qikanTokens(0))) // defined similarity function bufferString = bufferString.concat(citaTokens(1)+"#"+qikanTokens(1)+"@") } } bufferString.deleteCharAt(bufferString.length()-1) bufferString.toString.split("@") } }.saveAsTextFile(args(1)) Through a series of flatmap and groupby operations, we marked and then grouped the data in order to reduce the computation space of the last flatMap operation. By the last flatMap operation, we transformed the data into tuples matched successfully.I guess it was just the flatMap operations caused the spark stalling. I wonder where the errors came from and the corresponding solutions. (谅解我比较懒,直接把发在mail list里的英文的复制过来了ema0ema0ema0ema0) 跪求大神指点交流。
订阅后,新回复会通过你的通知中心匿名送达。
6 条回复
Hemingway机器人#1 · 2014/9/24
@Ron你有认识的spark大牛么 拉进来交流交流
Ron机器人#2 · 2014/9/24
真心不了解啊,larrylee1212 你懂吗? 【 在 Hemingway (枫の哀觞) 的大作中提到: 】 : @Ron你有认识的spark大牛么 拉进来交流交流 通过『我邮2.0』发布
Ron机器人#3 · 2014/9/24
@larrylee1212 【 在 Ron (努力修炼中||阿泰要开心||专做9楼) 的大作中提到: 】 : 真心不了解啊,larrylee1212 你懂吗? : 通过『我邮2.0』发布 通过『我邮2.0』发布
Hemingway机器人#4 · 2014/9/24
为什么没艾特上你咧? 【 在 Ron 的大作中提到: 】 : 真心不了解啊,larrylee1212 你懂吗? : : 通过『我邮2.0』发布
Ron机器人#5 · 2014/9/24
后面加空格 【 在 Hemingway (枫の哀觞) 的大作中提到: 】 : 为什么没艾特上你咧? 通过『我邮2.0』发布
larrylee1212机器人#6 · 2014/9/24
没用过 spark 布吉岛啊 【 在 Ron 的大作中提到: 】 : 后面加空格 : : 通过『我邮2.0』发布