[SPARK-4672][GraphX]Non-transient PartitionsRDDs will lead to StackOverflow error

The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672

In a nutshell, if `val partitionsRDD` in EdgeRDDImpl and VertexRDDImpl are non-transient, the serialization chain can become very long in iterative algorithms and finally lead to the StackOverflow error. More details and explanation can be found in the JIRA.

Author: JerryLead <JerryLead@163.com>
Author: Lijie Xu <csxulijie@gmail.com>

Closes #3544 from JerryLead/my_graphX and squashes the following commits:

628f33c [JerryLead] set PartitionsRDD to be transient in EdgeRDDImpl and VertexRDDImpl
c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark
52799e3 [Lijie Xu] Merge pull request #1 from apache/master
This commit is contained in:
JerryLead 2014-12-02 17:14:11 -08:00 коммит произвёл Ankur Dave
Родитель fc0a1475ef
Коммит 17c162f668
2 изменённых файлов: 2 добавлений и 2 удалений

Просмотреть файл

@ -26,7 +26,7 @@ import org.apache.spark.storage.StorageLevel
import org.apache.spark.graphx._
class EdgeRDDImpl[ED: ClassTag, VD: ClassTag] private[graphx] (
override val partitionsRDD: RDD[(PartitionID, EdgePartition[ED, VD])],
@transient override val partitionsRDD: RDD[(PartitionID, EdgePartition[ED, VD])],
val targetStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY)
extends EdgeRDD[ED](partitionsRDD.context, List(new OneToOneDependency(partitionsRDD))) {

Просмотреть файл

@ -27,7 +27,7 @@ import org.apache.spark.storage.StorageLevel
import org.apache.spark.graphx._
class VertexRDDImpl[VD] private[graphx] (
val partitionsRDD: RDD[ShippableVertexPartition[VD]],
@transient val partitionsRDD: RDD[ShippableVertexPartition[VD]],
val targetStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY)
(implicit override protected val vdTag: ClassTag[VD])
extends VertexRDD[VD](partitionsRDD.context, List(new OneToOneDependency(partitionsRDD))) {