Get invited to our slack community and get access to opportunities and data science insights

onehot_encoding


onehot_encoding(PRIMITIVE feature, …) – Compute onehot encoded label for each feature

WITH mapping as (
select
m.f1, m.f2
from (
select onehot_encoding(species, category) m
from test
) tmp
)
select
array(m.f1[t.species],m.f2[t.category],feature(‘count’,count)) as sparse_features
from
test t
CROSS JOIN mapping m;

[“2″,”8″,”count:9”]
[“5″,”8″,”count:10”]
[“1″,”6″,”count:101”]

Platforms: WhereOS, Spark, Hive
Class: hivemall.ftvec.trans.OnehotEncodingUDAF

More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.

Related Post

Leave a Comment