pyspark.RDD.cache

RDD.cache() → pyspark.rdd.RDD[T][source]

Persist this RDD with the default storage level (MEMORY_ONLY).

New in version 0.7.0.

Returns
RDD

The same RDD with storage level set to MEMORY_ONLY

Examples

>>> rdd = sc.range(5)
>>> rdd2 = rdd.cache()
>>> rdd2 is rdd
True
>>> str(rdd.getStorageLevel())
'Memory Serialized 1x Replicated'
>>> _ = rdd.unpersist()