Get invited to our slack community and get access to opportunities and data science insights

angular_distance


angular_distance(ftvec1, ftvec2) – Returns an angular distance of the given two vectors

WITH docs as (
select 1 as docid, array(‘apple:1.0’, ‘orange:2.0’, ‘banana:1.0’, ‘kuwi:0’) as features
union all
select 2 as docid, array(‘apple:1.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
union all
select 3 as docid, array(‘apple:2.0’, ‘orange:0’, ‘banana:2.0’, ‘kuwi:1.0’) as features
)
select
l.docid as doc1,
r.docid as doc2,
angular_distance(l.features, r.features) as distance,
distance2similarity(angular_distance(l.features, r.features)) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance asc;

doc1 doc2 distance similarity
1 3 0.31678355 0.75942624
1 2 0.33333337 0.75
2 3 0.09841931 0.91039914
2 1 0.33333337 0.75
3 2 0.09841931 0.91039914
3 1 0.31678355 0.75942624

Platforms: WhereOS, Spark, Hive
Class: hivemall.knn.distance.AngularDistanceUDF

More functions can be added to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store.

Related Post

Leave a Comment