pyspark.RDD.pipe

RDD.pipe(command: str, env: Optional[Dict[str, str]] = None, checkCode: bool = False) → pyspark.rdd.RDD[str][source]

Return an RDD created by piping elements to a forked external process.

New in version 0.7.0.

Parameters
commandstr

command to run.

envdict, optional

environment variables to set.

checkCodebool, optional

whether to check the return value of the shell command.

Returns
RDD

a new RDD of strings

Examples

>>> sc.parallelize(['1', '2', '', '3']).pipe('cat').collect()
['1', '2', '', '3']