Get invited to our slack community and get access to opportunities and data science insights

percentile_approx

Function percentile_approx(column, percentage, accuracy=10000) returns the approximate percentile value of the specified numeric column at the given percentage.
Parameter Description
column numeric column
percentage percentile to be calculated
accuracy used for approximation algorithm, higher value for better accuracy. OPTIONAL
The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of `column` at the given percentage array.
select 
    percentile_approx(value, 0.1) as 10th,
    percentile_approx(value, 0.5) as 50th,
    percentile_approx(value, 0.9) as 90th 
from temperature_stream

select 
    percentile_approx(value, array(0.1,0.5,0.9)) as quantiles 
from temperature_stream
Platforms: WhereOS, Spark, Hive
Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.

Related Post

Leave a Comment