feature_binning

feature_binning [WhereOS, SQL, Spark, Hive]

Class hivemall.ftvec.binning.FeatureBinningUDF
Usage feature_binning(array features, map> quantiles_map) - returns a binned feature vector as an array feature_binning(number weight, array quantiles) - returns bin ID as int

WITH extracted as (
select
extract_feature(feature) as index,
extract_weight(feature) as value
from
input l
LATERAL VIEW explode(features) r as feature
),
mapping as (
select
index,
build_bins(value, 5, true) as quantiles -- 5 bins with auto bin shrinking
from
extracted
group by
index
),
bins as (
select
to_map(index, quantiles) as quantiles
from
mapping
)
select
l.features as original,
feature_binning(l.features, r.quantiles) as features
from
input l
cross join bins r

> ["name#Jacob","gender#Male","age:20.0"] ["name#Jacob","gender#Male","age:2"]
> ["name#Isabella","gender#Female","age:20.0"] ["name#Isabella","gender#Female","age:2"]

More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.

Leave a Reply