Get invited to our slack community and get access to opportunities and data science insights

count_min_sketch


count_min_sketch(col, eps, confidence, seed) – Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a `CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.
Platforms: WhereOS, Spark, Hive
Class: org.apache.spark.sql.catalyst.expressions.aggregate.CountMinSketchAgg

More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.

Related Post

Leave a Comment