pyspark.resource.ResourceProfile

class pyspark.resource.ResourceProfile(_java_resource_profile: Optional[py4j.java_gateway.JavaObject] = None, _exec_req: Optional[Dict[str, pyspark.resource.requests.ExecutorResourceRequest]] = None, _task_req: Optional[Dict[str, pyspark.resource.requests.TaskResourceRequest]] = None)[source]

Resource profile to associate with an RDD. A pyspark.resource.ResourceProfile allows the user to specify executor and task requirements for an RDD that will get applied during a stage. This allows the user to change the resource requirements between stages. This is meant to be immutable so user cannot change it after building.

New in version 3.1.0.

Notes

This API is evolving.

Examples

Create Executor resource requests.

>>> executor_requests = (
...     ExecutorResourceRequests()
...     .cores(2)
...     .memory("6g")
...     .memoryOverhead("1g")
...     .pysparkMemory("2g")
...     .offheapMemory("3g")
...     .resource("gpu", 2, "testGpus", "nvidia.com")
... )

Create task resource requasts.

>>> task_requests = TaskResourceRequests().cpus(2).resource("gpu", 2)

Create a resource profile.

>>> builder = ResourceProfileBuilder()
>>> resource_profile = builder.require(executor_requests).require(task_requests).build

Create an RDD with the resource profile.

>>> rdd = sc.parallelize(range(10)).withResources(resource_profile)
>>> rdd.getResourceProfile()
<pyspark.resource.profile.ResourceProfile object ...>
>>> rdd.getResourceProfile().taskResources
{'cpus': <...TaskResourceRequest...>, 'gpu': <...TaskResourceRequest...>}
>>> rdd.getResourceProfile().executorResources
{'gpu': <...ExecutorResourceRequest...>,
 'cores': <...ExecutorResourceRequest...>,
 'offHeap': <...ExecutorResourceRequest...>,
 'memoryOverhead': <...ExecutorResourceRequest...>,
 'pyspark.memory': <...ExecutorResourceRequest...>,
 'memory': <...ExecutorResourceRequest...>}

Attributes

executorResources

Returns

id

Returns

taskResources

Returns