pyspark.RDD.persist¶
- 
RDD.persist(storageLevel: pyspark.storagelevel.StorageLevel = StorageLevel(False, True, False, False, 1)) → pyspark.rdd.RDD[T][source]¶
- Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (MEMORY_ONLY). - New in version 0.9.1. - Parameters
- storageLevelStorageLevel, default MEMORY_ONLY
- the target storage level 
 
- storageLevel
- Returns
 - Examples - >>> rdd = sc.parallelize(["b", "a", "c"]) >>> rdd.persist().is_cached True >>> str(rdd.getStorageLevel()) 'Memory Serialized 1x Replicated' >>> _ = rdd.unpersist() >>> rdd.is_cached False - >>> from pyspark import StorageLevel >>> rdd2 = sc.range(5) >>> _ = rdd2.persist(StorageLevel.MEMORY_AND_DISK) >>> rdd2.is_cached True >>> str(rdd2.getStorageLevel()) 'Disk Memory Serialized 1x Replicated' - Can not override existing storage level - >>> _ = rdd2.persist(StorageLevel.MEMORY_ONLY_2) Traceback (most recent call last): ... py4j.protocol.Py4JJavaError: ... - Assign another storage level after unpersist - >>> _ = rdd2.unpersist() >>> rdd2.is_cached False >>> _ = rdd2.persist(StorageLevel.MEMORY_ONLY_2) >>> str(rdd2.getStorageLevel()) 'Memory Serialized 2x Replicated' >>> rdd2.is_cached True >>> _ = rdd2.unpersist()