pyspark.sql.functions.
xxhash64
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42.
New in version 3.0.0.
Changed in version 3.4.0: Supports Spark Connect.
Column
one or more columns to compute on.
hash value as long column.
Examples
>>> df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
Hash for one column
>>> df.select(xxhash64('c1').alias('hash')).show() +-------------------+ | hash| +-------------------+ |4105715581806190027| +-------------------+
Two or more columns
>>> df.select(xxhash64('c1', 'c2').alias('hash')).show() +-------------------+ | hash| +-------------------+ |3233247871021311208| +-------------------+