pyspark.RDD.checkpoint

RDD.checkpoint() → None[source]

Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint directory set with SparkContext.setCheckpointDir() and all references to its parent RDDs will be removed. This function must be called before any job has been executed on this RDD. It is strongly recommended that this RDD is persisted in memory, otherwise saving it on a file will require recomputation.

New in version 0.7.0.

Examples

>>> rdd = sc.range(5)
>>> rdd.is_checkpointed
False
>>> rdd.getCheckpointFile() == None
True
>>> rdd.checkpoint()
>>> rdd.is_checkpointed
True
>>> rdd.getCheckpointFile() == None
True
>>> rdd.count()
5
>>> rdd.is_checkpointed
True
>>> rdd.getCheckpointFile() == None
False