Get invited to our slack community and get access to opportunities and data science insights

feature_binning


feature_binning(array features, map> quantiles_map) – returns a binned feature vector as an array feature_binning(number weight, array quantiles) – returns bin ID as int

WITH extracted as (
select
extract_feature(feature) as index,
extract_weight(feature) as value
from
input l
LATERAL VIEW explode(features) r as feature
),
mapping as (
select
index,
build_bins(value, 5, true) as quantiles — 5 bins with auto bin shrinking
from
extracted
group by
index
),
bins as (
select
to_map(index, quantiles) as quantiles
from
mapping
)
select
l.features as original,
feature_binning(l.features, r.quantiles) as features
from
input l
cross join bins r

> [“name#Jacob”,”gender#Male”,”age:20.0″] [“name#Jacob”,”gender#Male”,”age:2″]
> [“name#Isabella”,”gender#Female”,”age:20.0″] [“name#Isabella”,”gender#Female”,”age:2″]

Platforms: WhereOS, Spark, Hive
Class: hivemall.ftvec.binning.FeatureBinningUDF

More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.

Related Post

Leave a Comment