pyspark.SparkContext.addJobTag

SparkContext.addJobTag(tag: str) → None[source]

Add a tag to be assigned to all the jobs started by this thread.

New in version 3.5.0.

Parameters
tagstr

The tag to be added. Cannot contain ‘,’ (comma) character.

Examples

>>> import threading
>>> from time import sleep
>>> from pyspark import InheritableThread
>>> sc.setInterruptOnCancel(interruptOnCancel=True)
>>> result = "Not Set"
>>> lock = threading.Lock()
>>> def map_func(x):
...     sleep(100)
...     raise RuntimeError("Task should have been cancelled")
...
>>> def start_job(x):
...     global result
...     try:
...         sc.addJobTag("job_to_cancel")
...         result = sc.parallelize(range(x)).map(map_func).collect()
...     except Exception as e:
...         result = "Cancelled"
...     lock.release()
...
>>> def stop_job():
...     sleep(5)
...     sc.cancelJobsWithTag("job_to_cancel")
...
>>> suppress = lock.acquire()
>>> suppress = InheritableThread(target=start_job, args=(10,)).start()
>>> suppress = InheritableThread(target=stop_job).start()
>>> suppress = lock.acquire()
>>> print(result)
Cancelled
>>> sc.clearJobTags()