pyspark.sql.functions.tuple_sketch_estimate_integer#

pyspark.sql.functions.tuple_sketch_estimate_integer(col)[source]#

Returns the estimated number of distinct keys from a Datasketches TupleSketch with integer summaries.

New in version 4.2.0.

Parameters
colColumn or column name

The column containing a binary TupleSketch representation

Returns
Column

The estimated cardinality.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1, 10), (2, 20), (2, 30)], ["key", "value"])
>>> df.agg(sf.tuple_sketch_estimate_integer(
...     sf.tuple_sketch_agg_integer("key", "value"))).show()
+----------------------------------------------------------------------------+
|tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, value, 12, sum))|
+----------------------------------------------------------------------------+
|                                                                         2.0|
+----------------------------------------------------------------------------+