pyspark.sql.functions.tuple_union_agg_integer#
- pyspark.sql.functions.tuple_union_agg_integer(col, lgNomEntries=None, mode=None)[source]#
Aggregate function: returns the compact binary representation of the Datasketches TupleSketch that is the union of the integer TupleSketch objects in the input column.
New in version 4.2.0.
- Parameters
- Returns
ColumnThe binary representation of the merged TupleSketch.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df1 = spark.createDataFrame([(1, 10), (2, 20)], ["key", "value"]) >>> df1 = df1.agg(sf.tuple_sketch_agg_integer("key", "value").alias("sketch")) >>> df2 = spark.createDataFrame([(3, 30), (4, 40)], ["key", "value"]) >>> df2 = df2.agg(sf.tuple_sketch_agg_integer("key", "value").alias("sketch")) >>> df3 = df1.union(df2) >>> df3.agg(sf.tuple_sketch_estimate_integer(sf.tuple_union_agg_integer("sketch"))).show() +-----------------------------------------------------------------------+ |tuple_sketch_estimate_integer(tuple_union_agg_integer(sketch, 12, sum))| +-----------------------------------------------------------------------+ | 4.0| +-----------------------------------------------------------------------+