返回信息流Hi all,
Has anyone observed spark stalling during a flatMap operation with the following messages :
INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [worker host]
INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [worker host]
INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [worker host]
And the version of spark is 1.0.2. What we want to do is to find those similar elements in the RDD. In details, the dataset includes two parts of data, which is flagged as 1 and 2 respectively.For each element flagged as 1, we aim to find all counterparts flagged as 2. The RDD named inRDD is created from a file in HDFS. The code is as follows:
inRDD.flatMap{/*do something*/}.flatMap{/*do something*/}.map{/*do some transform*/}.groupByKey()
.flatMap{}.map(/*do some transform*/).groupByKey().flatMap{v=>{
val citation = new ArrayBuffer[String]() // save all elements flagged as 1
val journal = new ArrayBuffer[String]() // save all elements flagged as 2
val iter = v._2.iterator
while (iter.hasNext)
{
val tokens = iter.next().split("#")
if(tokens(2).equals("1"))
citation+= tokens(4)+"#"+tokens(1)
else if(tokens(2).equals("2") )
journal+= tokens(4)+"#"+tokens(1)
}
var bufferString = new StringBuffer()
for(i<-0 until citation.length){
val citaTokens = citation(i).split("#")
for(j<-0 until journal.length){
val qikanTokens = journal(j).split("#")
if(distance(citaTokens(0),qikanTokens(0))) // defined similarity function
bufferString = bufferString.concat(citaTokens(1)+"#"+qikanTokens(1)+"@")
}
}
bufferString.deleteCharAt(bufferString.length()-1)
bufferString.toString.split("@")
}
}.saveAsTextFile(args(1))
Through a series of flatmap and groupby operations, we marked and then grouped the data in order to reduce the computation space of the last flatMap operation. By the last flatMap operation, we transformed the data into tuples matched successfully.I guess it was just the flatMap operations caused the spark stalling. I wonder where the errors came from and the corresponding solutions.
(谅解我比较懒,直接把发在mail list里的英文的复制过来了ema0ema0ema0ema0) 跪求大神指点交流。
这是一条镜像帖。来源:北邮人论坛 / ml-dm / #14286同步于 2014/9/24
该镜像源已超过 30 天没有更新,可能在源站已被删除。
ML_DM机器人发帖
[问题]Spark problem of finding all the counterparts in the
Hemingway
2014/9/24镜像同步6 回复
订阅后,新回复会通过你的通知中心匿名送达。
6 条回复
真心不了解啊,larrylee1212 你懂吗?
【 在 Hemingway (枫の哀觞) 的大作中提到: 】
: @Ron你有认识的spark大牛么 拉进来交流交流
通过『我邮2.0』发布
@larrylee1212
【 在 Ron (努力修炼中||阿泰要开心||专做9楼) 的大作中提到: 】
: 真心不了解啊,larrylee1212 你懂吗?
: 通过『我邮2.0』发布
通过『我邮2.0』发布