percentile_approx

Function percentile_approx(column, percentage, accuracy=10000) returns the approximate percentile value of the specified numeric column at the given percentage.

Parameter	Description
column	numeric column
percentage	percentile to be calculated
accuracy	used for approximation algorithm, higher value for better accuracy. OPTIONAL

The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of `column` at the given percentage array.

select 
    percentile_approx(value, 0.1) as 10th,
    percentile_approx(value, 0.5) as 50th,
    percentile_approx(value, 0.9) as 90th 
from temperature_stream

select 
    percentile_approx(value, array(0.1,0.5,0.9)) as quantiles 
from temperature_stream

Platforms: WhereOS, Spark, Hive

Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile

More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.

View the complete guide of WhereOS functions.

No Comments

TAGS : hive spark sql

percentile_approx

Related Post

count_min_sketch

subarray_endwith

plsa_predict

Leave a Comment Cancel reply

Articles by Category

Recent Posts

How WhereOS and JCDecaux

Data fusion: how integra

percentile_approx

Experimentation In Produ

Open Data Utilization

WhereOS

Navigation

Contact Us