HOW TO: WhereOS SQL Function Documentation – Comprehensive Guide to SparkSQL & Hive Functions

Introduction

This is a list of built-in functions of WhereOS, based on Spark & Hive functions and 3rd party libraries. More functions can be to WhereOS via Python or R bindings or as Java & Scala UDF (user-defined function), UDAF (user-defined aggregation function) and UDTF (user-defined table generating function) extensions. Custom libraries can be added on via Settings-page or installed from WhereOS Store

Function: !

Class org.apache.spark.sql.catalyst.expressions.Not
Usage ! expr - Logical not.

Function: %

Class org.apache.spark.sql.catalyst.expressions.Remainder
Usage expr1 % expr2 - Returns the remainder after `expr1`/`expr2`.

Function: &

Class org.apache.spark.sql.catalyst.expressions.BitwiseAnd
Usage expr1 & expr2 - Returns the result of bitwise AND of `expr1` and `expr2`.

Function: *

Class org.apache.spark.sql.catalyst.expressions.Multiply
Usage expr1 * expr2 - Returns `expr1`*`expr2`.

Function: +

Class org.apache.spark.sql.catalyst.expressions.Add
Usage expr1 + expr2 - Returns `expr1`+`expr2`.

Function: –

Class org.apache.spark.sql.catalyst.expressions.Subtract
Usage expr1 - expr2 - Returns `expr1`-`expr2`.

Function: /

Class org.apache.spark.sql.catalyst.expressions.Divide
Usage expr1 / expr2 - Returns `expr1`/`expr2`. It always performs floating point division.

Function: <

Class org.apache.spark.sql.catalyst.expressions.LessThan
Usage expr1 < expr2 - Returns true if `expr1` is less than `expr2`.

Function: <=

Class org.apache.spark.sql.catalyst.expressions.LessThanOrEqual
Usage expr1 <= expr2 - Returns true if `expr1` is less than or equal to `expr2`.

Function: <=>

Class org.apache.spark.sql.catalyst.expressions.EqualNullSafe
Usage expr1 <=> expr2 - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.

Function: =

Class org.apache.spark.sql.catalyst.expressions.EqualTo
Usage expr1 = expr2 - Returns true if `expr1` equals `expr2`, or false otherwise.

Function: ==

Class org.apache.spark.sql.catalyst.expressions.EqualTo
Usage expr1 == expr2 - Returns true if `expr1` equals `expr2`, or false otherwise.

Function: >

Class org.apache.spark.sql.catalyst.expressions.GreaterThan
Usage expr1 > expr2 - Returns true if `expr1` is greater than `expr2`.

Function: >=

Class org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual
Usage expr1 >= expr2 - Returns true if `expr1` is greater than or equal to `expr2`.

Function: ^

Class org.apache.spark.sql.catalyst.expressions.BitwiseXor
Usage expr1 ^ expr2 - Returns the result of bitwise exclusive OR of `expr1` and `expr2`.

Function: abs

Class org.apache.spark.sql.catalyst.expressions.Abs
Usage abs(expr) - Returns the absolute value of the numeric value.

Function: acos

Class org.apache.spark.sql.catalyst.expressions.Acos
Usage acos(expr) - Returns the inverse cosine (a.k.a. arc cosine) of `expr`, as if computed by `java.lang.Math.acos`.

Function: add_bias

Class hivemall.ftvec.AddBiasUDF
Usage add_bias(feature_vector in array) - Returns features with a bias in array

Function: add_days

Class brickhouse.udf.date.AddDaysUDF
Usage

Function: add_feature_index

Class hivemall.ftvec.AddFeatureIndexUDF
Usage add_feature_index(ARRAY[DOUBLE]: dense feature vector) - Returns a feature vector with feature indices

Function: add_field_indices

Class hivemall.ftvec.trans.AddFieldIndicesUDF
Usage add_field_indices(array features) - Returns arrays of string that field indices (:)* are augmented

Function: add_field_indicies

Class hivemall.ftvec.trans.AddFieldIndicesUDF
Usage add_field_indicies(array features) - Returns arrays of string that field indices (:)* are augmented

Function: add_months

Class org.apache.spark.sql.catalyst.expressions.AddMonths
Usage add_months(start_date, num_months) - Returns the date that is `num_months` after `start_date`.

Function: aggregate

Class org.apache.spark.sql.catalyst.expressions.ArrayAggregate
Usage aggregate(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

Function: amplify

Class hivemall.ftvec.amplify.AmplifierUDTF
Usage amplify(const int xtimes, *) - amplify the input records x-times

Function: and

Class org.apache.spark.sql.catalyst.expressions.And
Usage expr1 and expr2 - Logical AND.

Function: angular_distance

Class hivemall.knn.distance.AngularDistanceUDF
Usage angular_distance(ftvec1, ftvec2) - Returns an angular distance of the given two vectors

WITH docs as (
select 1 as docid, array('apple:1.0', 'orange:2.0', 'banana:1.0', 'kuwi:0') as features
union all
select 2 as docid, array('apple:1.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
union all
select 3 as docid, array('apple:2.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
)
select
l.docid as doc1,
r.docid as doc2,
angular_distance(l.features, r.features) as distance,
distance2similarity(angular_distance(l.features, r.features)) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance asc;

doc1 doc2 distance similarity
1 3 0.31678355 0.75942624
1 2 0.33333337 0.75
2 3 0.09841931 0.91039914
2 1 0.33333337 0.75
3 2 0.09841931 0.91039914
3 1 0.31678355 0.75942624

Function: angular_similarity

Class hivemall.knn.similarity.AngularSimilarityUDF
Usage angular_similarity(ftvec1, ftvec2) - Returns an angular similarity of the given two vectors

WITH docs as (
select 1 as docid, array('apple:1.0', 'orange:2.0', 'banana:1.0', 'kuwi:0') as features
union all
select 2 as docid, array('apple:1.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
union all
select 3 as docid, array('apple:2.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
)
select
l.docid as doc1,
r.docid as doc2,
angular_similarity(l.features, r.features) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
similarity desc;

doc1 doc2 similarity
1 3 0.68321645
1 2 0.6666666
2 3 0.9015807
2 1 0.6666666
3 2 0.9015807
3 1 0.68321645

Function: append_array

Class brickhouse.udf.collect.AppendArrayUDF
Usage

Function: approx_count_distinct

Class org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus
Usage approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. `relativeSD` defines the maximum estimation error allowed.

Function: approx_percentile

Class org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
Usage approx_percentile(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.

Function: argmin_kld

Class hivemall.ensemble.ArgminKLDistanceUDAF
Usage argmin_kld(float mean, float covar) - Returns mean or covar that minimize a KL-distance among distributions

The returned value is (1.0 / (sum(1.0 / covar))) * (sum(mean / covar)

Function: array

Class org.apache.spark.sql.catalyst.expressions.CreateArray
Usage array(expr, ...) - Returns an array with the given elements.

Function: array_append

Class hivemall.tools.array.ArrayAppendUDF
Usage array_append(array arr, T elem) - Append an element to the end of an array

SELECT array_append(array(1,2),3);
1,2,3

SELECT array_append(array('a','b'),'c');
"a","b","c"

Function: array_avg

Class hivemall.tools.array.ArrayAvgGenericUDAF
Usage array_avg(array) - Returns an array in which each element is the mean of a set of numbers

WITH input as (
select array(1.0, 2.0, 3.0) as nums
UNION ALL
select array(2.0, 3.0, 4.0) as nums
)
select
array_avg(nums)
from
input;

["1.5","2.5","3.5"]

Function: array_concat

Class hivemall.tools.array.ArrayConcatUDF
Usage array_concat(array x1, array x2, ..) - Returns a concatenated array

SELECT array_concat(array(1),array(2,3));
[1,2,3]

Function: array_contains

Class org.apache.spark.sql.catalyst.expressions.ArrayContains
Usage array_contains(array, value) - Returns true if the array contains the value.

Function: array_distinct

Class org.apache.spark.sql.catalyst.expressions.ArrayDistinct
Usage array_distinct(array) - Removes duplicate values from the array.

Function: array_except

Class org.apache.spark.sql.catalyst.expressions.ArrayExcept
Usage array_except(array1, array2) - Returns an array of the elements in array1 but not in array2,without duplicates.

Function: array_flatten

Class hivemall.tools.array.ArrayFlattenUDF
Usage array_flatten(array>) - Returns an array with the elements flattened.

SELECT array_flatten(array(array(1,2,3),array(4,5),array(6,7,8)));
[1,2,3,4,5,6,7,8]

Function: array_hash_values

Class hivemall.ftvec.hashing.ArrayHashValuesUDF
Usage array_hash_values(array values, [string prefix [, int numFeatures], boolean useIndexAsPrefix]) returns hash values in array

Function: array_index

Class brickhouse.udf.collect.ArrayIndexUDF
Usage

Function: array_intersect

Class org.apache.spark.sql.catalyst.expressions.ArrayIntersect
Usage array_intersect(array1, array2) - Returns an array of the elements in the intersection of array1 andarray2, without duplicates.

Function: array_join

Class org.apache.spark.sql.catalyst.expressions.ArrayJoin
Usage array_join(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.

Function: array_max

Class org.apache.spark.sql.catalyst.expressions.ArrayMax
Usage array_max(array) - Returns the maximum value in the array. NULL elements are skipped.

Function: array_min

Class org.apache.spark.sql.catalyst.expressions.ArrayMin
Usage array_min(array) - Returns the minimum value in the array. NULL elements are skipped.

Function: array_position

Class org.apache.spark.sql.catalyst.expressions.ArrayPosition
Usage array_position(array, element) - Returns the (1-based) index of the first element of the array as long.

Function: array_remove

Class org.apache.spark.sql.catalyst.expressions.ArrayRemove
Usage array_remove(array, element) - Remove all elements that equal to element from array.

Function: array_repeat

Class org.apache.spark.sql.catalyst.expressions.ArrayRepeat
Usage array_repeat(element, count) - Returns the array containing element count times.

Function: array_slice

Class hivemall.tools.array.ArraySliceUDF
Usage array_slice(array values, int offset [, int length]) - Slices the given array by the given offset and length parameters.

SELECT
array_slice(array(1,2,3,4,5,6),2,4),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
0, -- offset
2 -- length
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
6, -- offset
3 -- length
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
6, -- offset
10 -- length
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
6 -- offset
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
-3 -- offset
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
-3, -- offset
2 -- length
);

[3,4]
["zero","one"]
["six","seven","eight"]
["six","seven","eight","nine","ten"]
["six","seven","eight","nine","ten"]
["eight","nine","ten"]
["eight","nine"]

Function: array_sort

Class org.apache.spark.sql.catalyst.expressions.ArraySort
Usage array_sort(array) - Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.

Function: array_sum

Class hivemall.tools.array.ArraySumUDAF
Usage array_sum(array) - Returns an array in which each element is summed up

WITH input as (
select array(1.0, 2.0, 3.0) as nums
UNION ALL
select array(2.0, 3.0, 4.0) as nums
)
select
array_sum(nums)
from
input;

["3.0","5.0","7.0"]

Function: array_to_str

Class hivemall.tools.array.ArrayToStrUDF
Usage array_to_str(array arr [, string sep=',']) - Convert array to string using a sperator

SELECT array_to_str(array(1,2,3),'-');
1-2-3

Function: array_union

Class org.apache.spark.sql.catalyst.expressions.ArrayUnion
Usage array_union(array1, array2) - Returns an array of the elements in the union of array1 and array2, without duplicates.

Function: arrays_overlap

Class org.apache.spark.sql.catalyst.expressions.ArraysOverlap
Usage arrays_overlap(a1, a2) - Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.

Function: arrays_zip

Class org.apache.spark.sql.catalyst.expressions.ArraysZip
Usage arrays_zip(a1, a2, ...) - Returns a merged array of structs in which the N-th struct contains allN-th values of input arrays.

Function: ascii

Class org.apache.spark.sql.catalyst.expressions.Ascii
Usage ascii(str) - Returns the numeric value of the first character of `str`.

Function: asin

Class org.apache.spark.sql.catalyst.expressions.Asin
Usage asin(expr) - Returns the inverse sine (a.k.a. arc sine) the arc sin of `expr`, as if computed by `java.lang.Math.asin`.

Function: assert

Class brickhouse.udf.sanity.AssertUDF
Usage Asserts in case boolean input is false. Optionally it asserts with message if input string provided. assert(boolean) assert(boolean, string)

Function: assert_equals

Class brickhouse.udf.sanity.AssertEqualsUDF
Usage

Function: assert_less_than

Class brickhouse.udf.sanity.AssertLessThanUDF
Usage

Function: assert_true

Class org.apache.spark.sql.catalyst.expressions.AssertTrue
Usage assert_true(expr) - Throws an exception if `expr` is not true.

Function: atan

Class org.apache.spark.sql.catalyst.expressions.Atan
Usage atan(expr) - Returns the inverse tangent (a.k.a. arc tangent) of `expr`, as if computed by `java.lang.Math.atan`

Function: atan2

Class org.apache.spark.sql.catalyst.expressions.Atan2
Usage atan2(exprY, exprX) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (`exprX`, `exprY`), as if computed by `java.lang.Math.atan2`.

Function: auc

Class hivemall.evaluation.AUCUDAF
Usage auc(array rankItems | double score, array correctItems | int label [, const int recommendSize = rankItems.size ]) - Returns AUC

Function: average_precision

Class hivemall.evaluation.MAPUDAF
Usage average_precision(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) - Returns MAP

Function: avg

Class org.apache.spark.sql.catalyst.expressions.aggregate.Average
Usage avg(expr) - Returns the mean calculated from values of a group.

Function: base64

Class org.apache.spark.sql.catalyst.expressions.Base64
Usage base64(bin) - Converts the argument from a binary `bin` to a base 64 string.

Function: base91

Class hivemall.tools.text.Base91UDF
Usage base91(BINARY bin) - Convert the argument from binary to a BASE91 string

SELECT base91(deflate('aaaaaaaaaaaaaaaabbbbccc'));
AA+=kaIM|WTt!+wbGAA

Function: bbit_minhash

Class hivemall.knn.lsh.bBitMinHashUDF
Usage bbit_minhash(array<> features [, int numHashes]) - Returns a b-bits minhash value

Function: bigint

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage bigint(expr) - Casts the value `expr` to the target data type `bigint`.

Function: bin

Class org.apache.spark.sql.catalyst.expressions.Bin
Usage bin(expr) - Returns the string representation of the long value `expr` represented in binary.

Function: binarize_label

Class hivemall.ftvec.trans.BinarizeLabelUDTF
Usage binarize_label(int/long positive, int/long negative, ...) - Returns positive/negative records that are represented as (..., int label) where label is 0 or 1

Function: binary

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage binary(expr) - Casts the value `expr` to the target data type `binary`.

Function: bit_length

Class org.apache.spark.sql.catalyst.expressions.BitLength
Usage bit_length(expr) - Returns the bit length of string data or number of bits of binary data.

Function: bits_collect

Class hivemall.tools.bits.BitsCollectUDAF
Usage bits_collect(int|long x) - Returns a bitset in array

Function: bits_or

Class hivemall.tools.bits.BitsORUDF
Usage bits_or(array b1, array b2, ..) - Returns a logical OR given bitsets

SELECT unbits(bits_or(to_bits(array(1,4)),to_bits(array(2,3))));
[1,2,3,4]

Function: bloom

Class brickhouse.udf.bloom.BloomUDAF
Usage Constructs a BloomFilter by aggregating a set of keys bloom(string key)

Function: bloom_and

Class brickhouse.udf.bloom.BloomAndUDF
Usage Returns the logical AND of two bloom filters; representing the intersection of values in both bloom1 AND bloom2 bloom_and(string bloom1, string bloom2)

Function: bloom_contains

Class brickhouse.udf.bloom.BloomContainsUDF
Usage Returns true if the referenced bloom filter contains the key.. bloom_contains(string key, string bloomfilter)

Function: bloom_contains_any

Class hivemall.sketch.bloom.BloomContainsAnyUDF
Usage bloom_contains_any(string bloom, string key) or bloom_contains_any(string bloom, array keys)- Returns true if the bloom filter contains any of the given key

WITH data1 as (
SELECT explode(array(1,2,3,4,5)) as id
),
data2 as (
SELECT explode(array(1,3,5,6,8)) as id
),
bloom as (
SELECT bloom(id) as bf
FROM data1
)
SELECT
l.*
FROM
data2 l
CROSS JOIN bloom r
WHERE
bloom_contains_any(r.bf, array(l.id))

Function: bloom_not

Class brickhouse.udf.bloom.BloomNotUDF
Usage Returns the logical NOT of a bloom filters; representing the set of values NOT in bloom1 bloom_not(string bloom)

Function: bloom_or

Class brickhouse.udf.bloom.BloomOrUDF
Usage Returns the logical OR of two bloom filters; representing the intersection of values in either bloom1 OR bloom2 bloom_or(string bloom1, string bloom2)

Function: boolean

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage boolean(expr) - Casts the value `expr` to the target data type `boolean`.

Function: bpr_sampling

Class hivemall.ftvec.ranking.BprSamplingUDTF
Usage bpr_sampling(int userId, List posItems [, const string options])- Returns a relation consists of

Function: bround

Class org.apache.spark.sql.catalyst.expressions.BRound
Usage bround(expr, d) - Returns `expr` rounded to `d` decimal places using HALF_EVEN rounding mode.

Function: build_bins

Class hivemall.ftvec.binning.BuildBinsUDAF
Usage build_bins(number weight, const int num_of_bins[, const boolean auto_shrink = false]) - Return quantiles representing bins: array

Function: cardinality

Class org.apache.spark.sql.catalyst.expressions.Size
Usage cardinality(expr) - Returns the size of an array or a map.The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true.If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input.By default, the spark.sql.legacy.sizeOfNull parameter is set to true.

Function: cast

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage cast(expr AS type) - Casts the value `expr` to the target data type `type`.

Function: cast_array

Class brickhouse.udf.collect.CastArrayUDF
Usage

Function: cast_map

Class brickhouse.udf.collect.CastMapUDF
Usage

Function: categorical_features

Class hivemall.ftvec.trans.CategoricalFeaturesUDF
Usage categorical_features(array featureNames, feature1, feature2, .. [, const string options]) - Returns a feature vector array

Function: cbrt

Class org.apache.spark.sql.catalyst.expressions.Cbrt
Usage cbrt(expr) - Returns the cube root of `expr`.

Function: ceil

Class org.apache.spark.sql.catalyst.expressions.Ceil
Usage ceil(expr) - Returns the smallest integer not smaller than `expr`.

Function: ceiling

Class org.apache.spark.sql.catalyst.expressions.Ceil
Usage ceiling(expr) - Returns the smallest integer not smaller than `expr`.

Function: changefinder

Class hivemall.anomaly.ChangeFinderUDF
Usage changefinder(double|array x [, const string options]) - Returns outlier/change-point scores and decisions using ChangeFinder. It will return a tuple

Function: char

Class org.apache.spark.sql.catalyst.expressions.Chr
Usage char(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)

Function: char_length

Class org.apache.spark.sql.catalyst.expressions.Length
Usage char_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.

Function: character_length

Class org.apache.spark.sql.catalyst.expressions.Length
Usage character_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.

Function: chi2

Class hivemall.ftvec.selection.ChiSquareUDF
Usage chi2(array> observed, array> expected) - Returns chi2_val and p_val of each columns as , array>

Function: chr

Class org.apache.spark.sql.catalyst.expressions.Chr
Usage chr(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)

Function: coalesce

Class org.apache.spark.sql.catalyst.expressions.Coalesce
Usage coalesce(expr1, expr2, ...) - Returns the first non-null argument if exists. Otherwise, null.

Function: collect

Class brickhouse.udf.collect.CollectUDAF
Usage collect(x) - Returns an array of all the elements in the aggregation group

Function: collect_list

Class org.apache.spark.sql.catalyst.expressions.aggregate.CollectList
Usage collect_list(expr) - Collects and returns a list of non-unique elements.

Function: collect_max

Class brickhouse.udf.collect.CollectMaxUDAF
Usage collect_max(x, val, n) - Returns an map of the max N numeric values in the aggregation group

Function: collect_set

Class org.apache.spark.sql.catalyst.expressions.aggregate.CollectSet
Usage collect_set(expr) - Collects and returns a set of unique elements.

Function: combine

Class brickhouse.udf.collect.CombineUDF
Usage combine(a,b) - Returns a combined list of two lists, or a combined map of two maps

Function: combine_hyperloglog

Class brickhouse.udf.hll.CombineHyperLogLogUDF
Usage combine_hyperloglog(x) - Combined two HyperLogLog++ binary blobs.

Function: combine_previous_sketch

Class brickhouse.udf.sketch.CombinePreviousSketchUDF
Usage combine_previous_sketch(grouping, map) - Returns a map of the combined keys of previous calls to this

Function: combine_sketch

Class brickhouse.udf.sketch.CombineSketchUDF
Usage combine_sketch(x) - Combine two sketch sets.

Function: combine_unique

Class brickhouse.udf.collect.CombineUniqueUDAF
Usage combine_unique(x) - Returns an array of all distinct elements of all lists in the aggregation group

Function: concat

Class org.apache.spark.sql.catalyst.expressions.Concat
Usage concat(col1, col2, ..., colN) - Returns the concatenation of col1, col2, ..., colN.

Function: concat_array

Class hivemall.tools.array.ArrayConcatUDF
Usage concat_array(array x1, array x2, ..) - Returns a concatenated array

SELECT array_concat(array(1),array(2,3));
[1,2,3]

Function: concat_ws

Class org.apache.spark.sql.catalyst.expressions.ConcatWs
Usage concat_ws(sep, [str | array(str)]+) - Returns the concatenation of the strings separated by `sep`.

Function: conditional_emit

Class brickhouse.udf.collect.ConditionalEmit
Usage conditional_emit(a,b) - Emit features of a row according to various conditions

Function: conv

Class org.apache.spark.sql.catalyst.expressions.Conv
Usage conv(num, from_base, to_base) - Convert `num` from `from_base` to `to_base`.

Function: conv2dense

Class hivemall.ftvec.conv.ConvertToDenseModelUDAF
Usage conv2dense(int feature, float weight, int nDims) - Return a dense model in array

Function: convert_label

Class hivemall.tools.ConvertLabelUDF
Usage convert_label(const int|const float) - Convert from -1|1 to 0.0f|1.0f, or from 0.0f|1.0f to -1|1

Function: convert_to_sketch

Class brickhouse.udf.sketch.ConvertToSketchUDF
Usage convert_to_sketch(x) - Truncate a large array of strings, and return a list of strings representing a sketch of those items

Function: corr

Class org.apache.spark.sql.catalyst.expressions.aggregate.Corr
Usage corr(expr1, expr2) - Returns Pearson coefficient of correlation between a set of number pairs.

Function: cos

Class org.apache.spark.sql.catalyst.expressions.Cos
Usage cos(expr) - Returns the cosine of `expr`, as if computed by `java.lang.Math.cos`.

Function: cosh

Class org.apache.spark.sql.catalyst.expressions.Cosh
Usage cosh(expr) - Returns the hyperbolic cosine of `expr`, as if computed by `java.lang.Math.cosh`.

Function: cosine_distance

Class hivemall.knn.distance.CosineDistanceUDF
Usage cosine_distance(ftvec1, ftvec2) - Returns a cosine distance of the given two vectors

WITH docs as (
select 1 as docid, array('apple:1.0', 'orange:2.0', 'banana:1.0', 'kuwi:0') as features
union all
select 2 as docid, array('apple:1.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
union all
select 3 as docid, array('apple:2.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
)
select
l.docid as doc1,
r.docid as doc2,
cosine_distance(l.features, r.features) as distance,
distance2similarity(cosine_distance(l.features, r.features)) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance asc;

doc1 doc2 distance similarity
1 3 0.45566893 0.6869694
1 2 0.5 0.6666667
2 3 0.04742068 0.95472616
2 1 0.5 0.6666667
3 2 0.04742068 0.95472616
3 1 0.45566893 0.6869694

Function: cosine_similarity

Class hivemall.knn.similarity.CosineSimilarityUDF
Usage cosine_similarity(ftvec1, ftvec2) - Returns a cosine similarity of the given two vectors

WITH docs as (
select 1 as docid, array('apple:1.0', 'orange:2.0', 'banana:1.0', 'kuwi:0') as features
union all
select 2 as docid, array('apple:1.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
union all
select 3 as docid, array('apple:2.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
)
select
l.docid as doc1,
r.docid as doc2,
cosine_similarity(l.features, r.features) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
similarity desc;

doc1 doc2 similarity
1 3 0.5443311
1 2 0.5
2 3 0.9525793
2 1 0.5
3 2 0.9525793
3 1 0.5443311

Function: cot

Class org.apache.spark.sql.catalyst.expressions.Cot
Usage cot(expr) - Returns the cotangent of `expr`, as if computed by `1/java.lang.Math.cot`.

Function: count

Class org.apache.spark.sql.catalyst.expressions.aggregate.Count
Usage count(*) - Returns the total number of retrieved rows, including rows containing null. count(expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are all non-null. count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-null.

Function: count_min_sketch

Class org.apache.spark.sql.catalyst.expressions.aggregate.CountMinSketchAgg
Usage count_min_sketch(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a `CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

Function: covar_pop

Class org.apache.spark.sql.catalyst.expressions.aggregate.CovPopulation
Usage covar_pop(expr1, expr2) - Returns the population covariance of a set of number pairs.

Function: covar_samp

Class org.apache.spark.sql.catalyst.expressions.aggregate.CovSample
Usage covar_samp(expr1, expr2) - Returns the sample covariance of a set of number pairs.

Function: crc32

Class org.apache.spark.sql.catalyst.expressions.Crc32
Usage crc32(expr) - Returns a cyclic redundancy check value of the `expr` as a bigint.

Function: cube

Class org.apache.spark.sql.catalyst.expressions.Cube
Usage cube([col1[, col2 ..]]) - create a multi-dimensional cube using the specified columns so that we can run aggregation on them.

Function: cume_dist

Class org.apache.spark.sql.catalyst.expressions.CumeDist
Usage cume_dist() - Computes the position of a value relative to all values in the partition.

Function: current_database

Class org.apache.spark.sql.catalyst.expressions.CurrentDatabase
Usage current_database() - Returns the current database.

Function: current_date

Class org.apache.spark.sql.catalyst.expressions.CurrentDate
Usage current_date() - Returns the current date at the start of query evaluation.

Function: current_timestamp

Class org.apache.spark.sql.catalyst.expressions.CurrentTimestamp
Usage current_timestamp() - Returns the current timestamp at the start of query evaluation.

Function: date

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage date(expr) - Casts the value `expr` to the target data type `date`.

Function: date_add

Class org.apache.spark.sql.catalyst.expressions.DateAdd
Usage date_add(start_date, num_days) - Returns the date that is `num_days` after `start_date`.

Function: date_format

Class org.apache.spark.sql.catalyst.expressions.DateFormatClass
Usage date_format(timestamp, fmt) - Converts `timestamp` to a value of string in the format specified by the date format `fmt`.

Function: date_range

Class brickhouse.udf.date.DateRangeUDTF
Usage date_range(a,b,c) - Generates a range of integers from a to b incremented by c or the elements of a map into multiple rows and columns

Function: date_sub

Class org.apache.spark.sql.catalyst.expressions.DateSub
Usage date_sub(start_date, num_days) - Returns the date that is `num_days` before `start_date`.

Function: date_trunc

Class org.apache.spark.sql.catalyst.expressions.TruncTimestamp
Usage date_trunc(fmt, ts) - Returns timestamp `ts` truncated to the unit specified by the format model `fmt`.`fmt` should be one of ["YEAR", "YYYY", "YY", "MON", "MONTH", "MM", "DAY", "DD", "HOUR", "MINUTE", "SECOND", "WEEK", "QUARTER"]

Function: datediff

Class org.apache.spark.sql.catalyst.expressions.DateDiff
Usage datediff(endDate, startDate) - Returns the number of days from `startDate` to `endDate`.

Function: dateseries

Class com.whereos.udf.DateSeriesUDF
Usage

Function: day

Class org.apache.spark.sql.catalyst.expressions.DayOfMonth
Usage day(date) - Returns the day of month of the date/timestamp.

Function: dayofmonth

Class org.apache.spark.sql.catalyst.expressions.DayOfMonth
Usage dayofmonth(date) - Returns the day of month of the date/timestamp.

Function: dayofweek

Class org.apache.spark.sql.catalyst.expressions.DayOfWeek
Usage dayofweek(date) - Returns the day of the week for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday).

Function: dayofyear

Class org.apache.spark.sql.catalyst.expressions.DayOfYear
Usage dayofyear(date) - Returns the day of year of the date/timestamp.

Function: decimal

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage decimal(expr) - Casts the value `expr` to the target data type `decimal`.

Function: decode

Class org.apache.spark.sql.catalyst.expressions.Decode
Usage decode(bin, charset) - Decodes the first argument using the second argument character set.

Function: deflate

Class hivemall.tools.compress.DeflateUDF
Usage deflate(TEXT data [, const int compressionLevel]) - Returns a compressed BINARY object by using Deflater. The compression level must be in range [-1,9]

SELECT base91(deflate('aaaaaaaaaaaaaaaabbbbccc'));
AA+=kaIM|WTt!+wbGAA

Function: degrees

Class org.apache.spark.sql.catalyst.expressions.ToDegrees
Usage degrees(expr) - Converts radians to degrees.

Function: dense_rank

Class org.apache.spark.sql.catalyst.expressions.DenseRank
Usage dense_rank() - Computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike the function rank, dense_rank will not produce gaps in the ranking sequence.

Function: dimsum_mapper

Class hivemall.knn.similarity.DIMSUMMapperUDTF
Usage dimsum_mapper(array row, map colNorms [, const string options]) - Returns column-wise partial similarities

Function: distance2similarity

Class hivemall.knn.similarity.Distance2SimilarityUDF
Usage distance2similarity(float d) - Returns 1.0 / (1.0 + d)

Function: distcache_gets

Class hivemall.tools.mapred.DistributedCacheLookupUDF
Usage distcache_gets(filepath, key, default_value [, parseKey]) - Returns map|value_type

Function: distributed_bloom

Class brickhouse.udf.bloom.DistributedBloomUDF
Usage Loads a bloomfilter from a file in distributed cache, and makes available as a named bloom. distributed_bloom(string filename) distributed_bloom(string filename, boolean returnEncoded)

Function: distributed_map

Class brickhouse.udf.dcache.DistributedMapUDF
Usage

Function: double

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage double(expr) - Casts the value `expr` to the target data type `double`.

Function: e

Class org.apache.spark.sql.catalyst.expressions.EulerNumber
Usage e() - Returns Euler's number, e.

Function: each_top_k

Class hivemall.tools.EachTopKUDTF
Usage each_top_k(int K, Object group, double cmpKey, *) - Returns top-K values (or tail-K values when k is less than 0)

Function: element_at

Class org.apache.spark.sql.catalyst.expressions.ElementAt
Usage element_at(array, index) - Returns element of array at given (1-based) index. If index < 0, accesses elements from the last to the first. Returns NULL if the index exceeds the length of the array. element_at(map, key) - Returns value for given key, or NULL if the key is not contained in the map

Function: elt

Class org.apache.spark.sql.catalyst.expressions.Elt
Usage elt(n, input1, input2, ...) - Returns the `n`-th input, e.g., returns `input2` when `n` is 2.

Function: encode

Class org.apache.spark.sql.catalyst.expressions.Encode
Usage encode(str, charset) - Encodes the first argument using the second argument character set.

Function: estimated_reach

Class brickhouse.udf.sketch.EstimatedReachUDF
Usage estimated_reach(x) - Estimate reach from a sketch set of Strings.

Function: euclid_distance

Class hivemall.knn.distance.EuclidDistanceUDF
Usage euclid_distance(ftvec1, ftvec2) - Returns the square root of the sum of the squared differences: sqrt(sum((x - y)^2))

WITH docs as (
select 1 as docid, array('apple:1.0', 'orange:2.0', 'banana:1.0', 'kuwi:0') as features
union all
select 2 as docid, array('apple:1.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
union all
select 3 as docid, array('apple:2.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
)
select
l.docid as doc1,
r.docid as doc2,
euclid_distance(l.features, r.features) as distance,
distance2similarity(euclid_distance(l.features, r.features)) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance asc;

doc1 doc2 distance similarity
1 2 2.4494898 0.28989795
1 3 2.6457512 0.2742919
2 3 1.0 0.5
2 1 2.4494898 0.28989795
3 2 1.0 0.5
3 1 2.6457512 0.2742919

Function: euclid_similarity

Class hivemall.knn.similarity.EuclidSimilarity
Usage euclid_similarity(ftvec1, ftvec2) - Returns a euclid distance based similarity, which is `1.0 / (1.0 + distance)`, of the given two vectors

WITH docs as (
select 1 as docid, array('apple:1.0', 'orange:2.0', 'banana:1.0', 'kuwi:0') as features
union all
select 2 as docid, array('apple:1.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
union all
select 3 as docid, array('apple:2.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
)
select
l.docid as doc1,
r.docid as doc2,
euclid_similarity(l.features, r.features) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
similarity desc;

doc1 doc2 similarity
1 2 0.28989795
1 3 0.2742919
2 3 0.5
2 1 0.28989795
3 2 0.5
3 1 0.2742919

Function: exists

Class org.apache.spark.sql.catalyst.expressions.ArrayExists
Usage exists(expr, pred) - Tests whether a predicate holds for one or more elements in the array.

Function: exp

Class org.apache.spark.sql.catalyst.expressions.Exp
Usage exp(expr) - Returns e to the power of `expr`.

Function: explode

Class org.apache.spark.sql.catalyst.expressions.Explode
Usage explode(expr) - Separates the elements of array `expr` into multiple rows, or the elements of map `expr` into multiple rows and columns.

Function: explode_outer

Class org.apache.spark.sql.catalyst.expressions.Explode
Usage explode_outer(expr) - Separates the elements of array `expr` into multiple rows, or the elements of map `expr` into multiple rows and columns.

Function: explodegeometry

Class com.whereos.udf.ExplodeGeometryUDTF
Usage

Function: explodemultipolygon

Class com.whereos.udf.ExplodeMultiPolygonUDTF
Usage

Function: expm1

Class org.apache.spark.sql.catalyst.expressions.Expm1
Usage expm1(expr) - Returns exp(`expr`) - 1.

Function: extract_feature

Class hivemall.ftvec.ExtractFeatureUDF
Usage extract_feature(feature_vector in array) - Returns features in array

Function: extract_weight

Class hivemall.ftvec.ExtractWeightUDF
Usage extract_weight(feature_vector in array) - Returns the weights of features in array

Function: extractframes

Class com.whereos.udf.ExtractFramesUDTF
Usage

Function: extracttilepixels

Class com.whereos.udf.ExtractPixelsUDTF
Usage

Function: f1score

Class hivemall.evaluation.F1ScoreUDAF
Usage f1score(array[int], array[int]) - Return a F1 score

Function: factorial

Class org.apache.spark.sql.catalyst.expressions.Factorial
Usage factorial(expr) - Returns the factorial of `expr`. `expr` is [0..20]. Otherwise, null.

Function: feature

Class hivemall.ftvec.FeatureUDF
Usage feature( feature, value) - Returns a feature string

Function: feature_binning

Class hivemall.ftvec.binning.FeatureBinningUDF
Usage feature_binning(array features, map> quantiles_map) - returns a binned feature vector as an array feature_binning(number weight, array quantiles) - returns bin ID as int

WITH extracted as (
select
extract_feature(feature) as index,
extract_weight(feature) as value
from
input l
LATERAL VIEW explode(features) r as feature
),
mapping as (
select
index,
build_bins(value, 5, true) as quantiles -- 5 bins with auto bin shrinking
from
extracted
group by
index
),
bins as (
select
to_map(index, quantiles) as quantiles
from
mapping
)
select
l.features as original,
feature_binning(l.features, r.quantiles) as features
from
input l
cross join bins r

> ["name#Jacob","gender#Male","age:20.0"] ["name#Jacob","gender#Male","age:2"]
> ["name#Isabella","gender#Female","age:20.0"] ["name#Isabella","gender#Female","age:2"]

Function: feature_hashing

Class hivemall.ftvec.hashing.FeatureHashingUDF
Usage feature_hashing(array features [, const string options]) - returns a hashed feature vector in array

select feature_hashing(array('aaa:1.0','aaa','bbb:2.0'), '-libsvm');
["4063537:1.0","4063537:1","8459207:2.0"]

select feature_hashing(array('aaa:1.0','aaa','bbb:2.0'), '-features 10');
["7:1.0","7","1:2.0"]

select feature_hashing(array('aaa:1.0','aaa','bbb:2.0'), '-features 10 -libsvm');
["1:2.0","7:1.0","7:1"]

Function: feature_index

Class hivemall.ftvec.FeatureIndexUDF
Usage feature_index(feature_vector in array) - Returns feature indices in array

Function: feature_pairs

Class hivemall.ftvec.pairing.FeaturePairsUDTF
Usage feature_pairs(feature_vector in array, [, const string options]) - Returns a relation

Function: ffm_features

Class hivemall.ftvec.trans.FFMFeaturesUDF
Usage ffm_features(const array featureNames, feature1, feature2, .. [, const string options]) - Takes categorical variables and returns a feature vector array in a libffm format ::

Function: filter

Class org.apache.spark.sql.catalyst.expressions.ArrayFilter
Usage filter(expr, func) - Filters the input array using the given predicate.

Function: find_in_set

Class org.apache.spark.sql.catalyst.expressions.FindInSet
Usage find_in_set(str, str_array) - Returns the index (1-based) of the given string (`str`) in the comma-delimited list (`str_array`). Returns 0, if the string was not found or if the given string (`str`) contains a comma.

Function: first

Class org.apache.spark.sql.catalyst.expressions.aggregate.First
Usage first(expr[, isIgnoreNull]) - Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.

Function: first_element

Class hivemall.tools.array.FirstElementUDF
Usage first_element(x) - Returns the first element in an array

SELECT first_element(array('a','b','c'));
a

SELECT first_element(array());
NULL

Function: first_index

Class brickhouse.udf.collect.FirstIndexUDF
Usage first_index(x) - Last value in an array

Function: first_value

Class org.apache.spark.sql.catalyst.expressions.aggregate.First
Usage first_value(expr[, isIgnoreNull]) - Returns the first value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.

Function: flatten

Class org.apache.spark.sql.catalyst.expressions.Flatten
Usage flatten(arrayOfArrays) - Transforms an array of arrays into a single array.

Function: float

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage float(expr) - Casts the value `expr` to the target data type `float`.

Function: float_array

Class hivemall.tools.array.AllocFloatArrayUDF
Usage float_array(nDims) - Returns an array of nDims elements

Function: floor

Class org.apache.spark.sql.catalyst.expressions.Floor
Usage floor(expr) - Returns the largest integer not greater than `expr`.

Function: fmeasure

Class hivemall.evaluation.FMeasureUDAF
Usage fmeasure(array|int|boolean actual, array|int| boolean predicted [, const string options]) - Return a F-measure (f1score is the special with beta=1.0)

Function: format_number

Class org.apache.spark.sql.catalyst.expressions.FormatNumber
Usage format_number(expr1, expr2) - Formats the number `expr1` like '#,###,###.##', rounded to `expr2` decimal places. If `expr2` is 0, the result has no decimal point or fractional part. `expr2` also accept a user specified format. This is supposed to function like MySQL's FORMAT.

Function: format_string

Class org.apache.spark.sql.catalyst.expressions.FormatString
Usage format_string(strfmt, obj, ...) - Returns a formatted string from printf-style format strings.

Function: from_camel_case

Class brickhouse.udf.json.ConvertFromCamelCaseUDF
Usage from_camel_case(a) - Converts a string in CamelCase to one containing underscores.

Function: from_json

Class org.apache.spark.sql.catalyst.expressions.JsonToStructs
Usage from_json(jsonStr, schema[, options]) - Returns a struct value with the given `jsonStr` and `schema`.

Function: from_unixtime

Class org.apache.spark.sql.catalyst.expressions.FromUnixTime
Usage from_unixtime(unix_time, format) - Returns `unix_time` in the specified `format`.

Function: from_utc_timestamp

Class org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp
Usage from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.

Function: generate_series

Class hivemall.tools.GenerateSeriesUDTF
Usage generate_series(const int|bigint start, const int|bigint end) - Generate a series of values, from start to end. A similar function to PostgreSQL's [generate_serics](https://www.postgresql.org/docs/current/static/functions-srf.html)

SELECT generate_series(2,4);

2
3
4

SELECT generate_series(5,1,-2);

5
3
1

SELECT generate_series(4,3);

(no return)

SELECT date_add(current_date(),value),value from (SELECT generate_series(1,3)) t;

2018-04-21 1
2018-04-22 2
2018-04-23 3

WITH input as (
SELECT 1 as c1, 10 as c2, 3 as step
UNION ALL
SELECT 10, 2, -3
)
SELECT generate_series(c1, c2, step) as series
FROM input;

1
4
7
10
10
7
4

Function: generateheatmap

Class com.whereos.udf.HeatmapGenerateUDTF
Usage

Function: geocode

Class com.whereos.udf.GeocodingUDTF
Usage

Function: geokeyradius

Class com.whereos.udf.GeoKeyRadiusUDTF
Usage

Function: geokeys

Class com.whereos.udf.GeoKeysUDTF
Usage

Function: get_json_object

Class org.apache.spark.sql.catalyst.expressions.GetJsonObject
Usage get_json_object(json_txt, path) - Extracts a json object from `path`.

Function: greatest

Class org.apache.spark.sql.catalyst.expressions.Greatest
Usage greatest(expr, ...) - Returns the greatest value of all parameters, skipping null values.

Function: group_count

Class brickhouse.udf.collect.GroupCountUDF
Usage A sequence id for all rows with the same value for a specific grouping

Function: grouping

Class org.apache.spark.sql.catalyst.expressions.Grouping
Usage grouping(col) - indicates whether a specified column in a GROUP BY is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.",

Function: grouping_id

Class org.apache.spark.sql.catalyst.expressions.GroupingID
Usage grouping_id([col1[, col2 ..]]) - returns the level of grouping, equals to `(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)`

Function: guess_attribute_types

Class hivemall.smile.tools.GuessAttributesUDF
Usage guess_attribute_types(ANY, ...) - Returns attribute types

select guess_attribute_types(*) from train limit 1;
Q,Q,C,C,C,C,Q,C,C,C,Q,C,Q,Q,Q,Q,C,Q

Function: hamming_distance

Class hivemall.knn.distance.HammingDistanceUDF
Usage hamming_distance(integer A, integer B) - Returns Hamming distance between A and B

select
hamming_distance(0,3) as c1,
hamming_distance("0","3") as c2 -- 0=0x00, 3=0x11
;

c1 c2
2 2

Function: hash

Class org.apache.spark.sql.catalyst.expressions.Murmur3Hash
Usage hash(expr1, expr2, ...) - Returns a hash value of the arguments.

Function: hash_md5

Class brickhouse.udf.sketch.HashMD5UDF
Usage hash_md5(x) - Hash MD5.

Function: haversine_distance

Class hivemall.geospatial.HaversineDistanceUDF
Usage haversine_distance(double lat1, double lon1, double lat2, double lon2, [const boolean mile=false])::double - return distance between two locations in km [or miles] using `haversine` formula

Usage: select latlon_distance(lat1, lon1, lat2, lon2) from ...

Function: hbase_balanced_key

Class brickhouse.hbase.GenerateBalancedKeyUDF
Usage hbase_balanced_key(keyStr,numRegions) - Returns an HBase key balanced evenly across regions

Function: hbase_batch_get

Class brickhouse.hbase.BatchGetUDF
Usage hbase_batch_get(table,key,family) - Do a single HBase Get on a table

Function: hbase_batch_put

Class brickhouse.hbase.BatchPutUDAF
Usage hbase_batch_put(config_map, key, value) - Perform batch HBase updates of a table

Function: hbase_cached_get

Class brickhouse.hbase.CachedGetUDF
Usage hbase_cached_get(configMap,key,template) - Returns a cached object, given an HBase config, a key, and a template object used to interpret JSON

Function: hbase_get

Class brickhouse.hbase.GetUDF
Usage hbase_get(table,key,family) - Do a single HBase Get on a table

Function: hbase_put

Class brickhouse.hbase.PutUDF
Usage string hbase_put(config, map key_value) - string hbase_put(config, key, value) - Do a HBase Put on a table. Config must contain zookeeper quorum, table name, column, and qualifier. Example of usage: hbase_put(map('hbase.zookeeper.quorum', 'hb-zoo1,hb-zoo2', 'table_name', 'metrics', 'family', 'c', 'qualifier', 'q'), 'test.prod.visits.total', '123456')

Function: hex

Class org.apache.spark.sql.catalyst.expressions.Hex
Usage hex(expr) - Converts `expr` to hexadecimal.

Function: hitrate

Class hivemall.evaluation.HitRateUDAF
Usage hitrate(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) - Returns HitRate

Function: hivemall_version

Class hivemall.HivemallVersionUDF
Usage hivemall_version() - Returns the version of Hivemall

SELECT hivemall_version();

Function: hll_est_cardinality

Class brickhouse.udf.hll.EstimateCardinalityUDF
Usage hll_est_cardinality(x) - Estimate reach from a HyperLogLog++.

Function: hour

Class org.apache.spark.sql.catalyst.expressions.Hour
Usage hour(timestamp) - Returns the hour component of the string/timestamp.

Function: hyperloglog

Class brickhouse.udf.hll.HyperLogLogUDAF
Usage hyperloglog(x, [b]) - Constructs a HyperLogLog++ estimator to estimate reach for large values, with optional bit parameter for specifying precision (b must be in [4,16]). Default is b = 6. Returns a binary value that represents the HyperLogLog++ data structure.

Function: hypot

Class org.apache.spark.sql.catalyst.expressions.Hypot
Usage hypot(expr1, expr2) - Returns sqrt(`expr1`**2 + `expr2`**2).

Function: if

Class org.apache.spark.sql.catalyst.expressions.If
Usage if(expr1, expr2, expr3) - If `expr1` evaluates to true, then returns `expr2`; otherwise returns `expr3`.

Function: ifnull

Class org.apache.spark.sql.catalyst.expressions.IfNull
Usage ifnull(expr1, expr2) - Returns `expr2` if `expr1` is null, or `expr1` otherwise.

Function: in

Class org.apache.spark.sql.catalyst.expressions.In
Usage expr1 in(expr2, expr3, ...) - Returns true if `expr` equals to any valN.

Function: indexed_features

Class hivemall.ftvec.trans.IndexedFeatures
Usage indexed_features(double v1, double v2, ...) - Returns a list of features as array: [1:v1, 2:v2, ..]

Function: infinity

Class hivemall.tools.math.InfinityUDF
Usage infinity() - Returns the constant representing positive infinity.

Function: inflate

Class hivemall.tools.compress.InflateUDF
Usage inflate(BINARY compressedData) - Returns a decompressed STRING by using Inflater

SELECT inflate(unbase91(base91(deflate('aaaaaaaaaaaaaaaabbbbccc'))));
aaaaaaaaaaaaaaaabbbbccc

Function: initcap

Class org.apache.spark.sql.catalyst.expressions.InitCap
Usage initcap(str) - Returns `str` with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space.

Function: inline

Class org.apache.spark.sql.catalyst.expressions.Inline
Usage inline(expr) - Explodes an array of structs into a table.

Function: inline_outer

Class org.apache.spark.sql.catalyst.expressions.Inline
Usage inline_outer(expr) - Explodes an array of structs into a table.

Function: input_file_block_length

Class org.apache.spark.sql.catalyst.expressions.InputFileBlockLength
Usage input_file_block_length() - Returns the length of the block being read, or -1 if not available.

Function: input_file_block_start

Class org.apache.spark.sql.catalyst.expressions.InputFileBlockStart
Usage input_file_block_start() - Returns the start offset of the block being read, or -1 if not available.

Function: input_file_name

Class org.apache.spark.sql.catalyst.expressions.InputFileName
Usage input_file_name() - Returns the name of the file being read, or empty string if not available.

Function: instr

Class org.apache.spark.sql.catalyst.expressions.StringInstr
Usage instr(str, substr) - Returns the (1-based) index of the first occurrence of `substr` in `str`.

Function: int

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage int(expr) - Casts the value `expr` to the target data type `int`.

Function: intersect_array

Class brickhouse.udf.collect.ArrayIntersectUDF
Usage intersect_array(array1, array2, ...) - Returns the intersection of a set of arrays

Function: is_finite

Class hivemall.tools.math.IsFiniteUDF
Usage is_finite(x) - Determine if x is finite.

SELECT is_finite(333), is_finite(infinity());
true false

Function: is_infinite

Class hivemall.tools.math.IsInfiniteUDF
Usage is_infinite(x) - Determine if x is infinite.

Function: is_nan

Class hivemall.tools.math.IsNanUDF
Usage is_nan(x) - Determine if x is not-a-number.

Function: is_stopword

Class hivemall.tools.text.StopwordUDF
Usage is_stopword(string word) - Returns whether English stopword or not

Function: isnotnull

Class org.apache.spark.sql.catalyst.expressions.IsNotNull
Usage isnotnull(expr) - Returns true if `expr` is not null, or false otherwise.

Function: isnull

Class org.apache.spark.sql.catalyst.expressions.IsNull
Usage isnull(expr) - Returns true if `expr` is null, or false otherwise.

Function: isochronedistanceedges

Class com.whereos.udf.IsochroneDistanceEdgesUDTF
Usage

Function: isochronedistancepolygons

Class com.whereos.udf.IsochroneDistancePolygonsUDTF
Usage

Function: isochronedurationedges

Class com.whereos.udf.IsochroneDurationEdgesUDTF
Usage

Function: isochronedurationpolygons

Class com.whereos.udf.IsochroneDistancePolygonsUDTF
Usage

Function: item_pairs_sampling

Class hivemall.ftvec.ranking.ItemPairsSamplingUDTF
Usage item_pairs_sampling(array pos_items, const int max_item_id [, const string options])- Returns a relation consists of

Function: jaccard_distance

Class hivemall.knn.distance.JaccardDistanceUDF
Usage jaccard_distance(integer A, integer B [,int k=128]) - Returns Jaccard distance between A and B

select
jaccard_distance(0,3) as c1,
jaccard_distance("0","3") as c2, -- 0=0x00, 0=0x11
jaccard_distance(0,4) as c3
;

c1 c2 c3
0.03125 0.03125 0.015625

Function: jaccard_similarity

Class hivemall.knn.similarity.JaccardIndexUDF
Usage jaccard_similarity(A, B [,int k]) - Returns Jaccard similarity coefficient of A and B

WITH docs as (
select 1 as docid, array('apple:1.0', 'orange:2.0', 'banana:1.0', 'kuwi:0') as features
union all
select 2 as docid, array('apple:1.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
union all
select 3 as docid, array('apple:2.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
)
select
l.docid as doc1,
r.docid as doc2,
jaccard_similarity(l.features, r.features) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
similarity desc;

doc1 doc2 similarity
1 2 0.14285715
1 3 0.0
2 3 0.6
2 1 0.14285715
3 2 0.6
3 1 0.0

Function: java_method

Class org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
Usage java_method(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.

Function: jobconf_gets

Class hivemall.tools.mapred.JobConfGetsUDF
Usage jobconf_gets() - Returns the value from JobConf

Function: jobid

Class hivemall.tools.mapred.JobIdUDF
Usage jobid() - Returns the value of mapred.job.id

Function: join_array

Class brickhouse.udf.collect.JoinArrayUDF
Usage

Function: json_map

Class brickhouse.udf.json.JsonMapUDF
Usage json_map(json) - Returns a map of key-value pairs from a JSON string

Function: json_split

Class brickhouse.udf.json.JsonSplitUDF
Usage json_split(json) - Returns a array of JSON strings from a JSON Array

Function: json_tuple

Class org.apache.spark.sql.catalyst.expressions.JsonTuple
Usage json_tuple(jsonStr, p1, p2, ..., pn) - Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.

Function: kld

Class hivemall.knn.distance.KLDivergenceUDF
Usage kld(double mu1, double sigma1, double mu2, double sigma2) - Returns KL divergence between two distributions

Function: kpa_predict

Class hivemall.classifier.KPAPredictUDAF
Usage kpa_predict(@Nonnull double xh, @Nonnull double xk, @Nullable float w0, @Nonnull float w1, @Nonnull float w2, @Nullable float w3) - Returns a prediction value in Double

Function: kurtosis

Class org.apache.spark.sql.catalyst.expressions.aggregate.Kurtosis
Usage kurtosis(expr) - Returns the kurtosis value calculated from values of a group.

Function: l1_normalize

Class hivemall.ftvec.scaling.L1NormalizationUDF
Usage l1_normalize(ftvec string) - Returned a L1 normalized value

Function: l2_norm

Class hivemall.tools.math.L2NormUDAF
Usage l2_norm(double x) - Return a L2 norm of the given input x.

WITH input as (
select generate_series(1,3) as v
)
select l2_norm(v) as l2norm
from input;
3.7416573867739413 = sqrt(1^2+2^2+3^2))

Function: l2_normalize

Class hivemall.ftvec.scaling.L2NormalizationUDF
Usage l2_normalize(ftvec string) - Returned a L2 normalized value

Function: lag

Class org.apache.spark.sql.catalyst.expressions.Lag
Usage lag(input[, offset[, default]]) - Returns the value of `input` at the `offset`th row before the current row in the window. The default value of `offset` is 1 and the default value of `default` is null. If the value of `input` at the `offset`th row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the first row of the window does not have any previous row), `default` is returned.

Function: last

Class org.apache.spark.sql.catalyst.expressions.aggregate.Last
Usage last(expr[, isIgnoreNull]) - Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.

Function: last_day

Class org.apache.spark.sql.catalyst.expressions.LastDay
Usage last_day(date) - Returns the last day of the month which the date belongs to.

Function: last_element

Class hivemall.tools.array.LastElementUDF
Usage last_element(x) - Return the last element in an array

SELECT last_element(array('a','b','c'));
c

Function: last_index

Class brickhouse.udf.collect.LastIndexUDF
Usage last_index(x) - Last value in an array

Function: last_value

Class org.apache.spark.sql.catalyst.expressions.aggregate.Last
Usage last_value(expr[, isIgnoreNull]) - Returns the last value of `expr` for a group of rows. If `isIgnoreNull` is true, returns only non-null values.

Function: lat2tiley

Class hivemall.geospatial.Lat2TileYUDF
Usage lat2tiley(double lat, int zoom)::int - Returns the tile number of the given latitude and zoom level

Function: lcase

Class org.apache.spark.sql.catalyst.expressions.Lower
Usage lcase(str) - Returns `str` with all characters changed to lowercase.

Function: lda_predict

Class hivemall.topicmodel.LDAPredictUDAF
Usage lda_predict(string word, float value, int label, float lambda[, const string options]) - Returns a list which consists of

Function: lead

Class org.apache.spark.sql.catalyst.expressions.Lead
Usage lead(input[, offset[, default]]) - Returns the value of `input` at the `offset`th row after the current row in the window. The default value of `offset` is 1 and the default value of `default` is null. If the value of `input` at the `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), `default` is returned.

Function: least

Class org.apache.spark.sql.catalyst.expressions.Least
Usage least(expr, ...) - Returns the least value of all parameters, skipping null values.

Function: left

Class org.apache.spark.sql.catalyst.expressions.Left
Usage left(str, len) - Returns the leftmost `len`(`len` can be string type) characters from the string `str`,if `len` is less or equal than 0 the result is an empty string.

Function: length

Class org.apache.spark.sql.catalyst.expressions.Length
Usage length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.

Function: levenshtein

Class org.apache.spark.sql.catalyst.expressions.Levenshtein
Usage levenshtein(str1, str2) - Returns the Levenshtein distance between the two given strings.

Function: like

Class org.apache.spark.sql.catalyst.expressions.Like
Usage str like pattern - Returns true if str matches pattern, null if any arguments are null, false otherwise.

Function: ln

Class org.apache.spark.sql.catalyst.expressions.Log
Usage ln(expr) - Returns the natural logarithm (base e) of `expr`.

Function: locate

Class org.apache.spark.sql.catalyst.expressions.StringLocate
Usage locate(substr, str[, pos]) - Returns the position of the first occurrence of `substr` in `str` after position `pos`. The given `pos` and return value are 1-based.

Function: log

Class org.apache.spark.sql.catalyst.expressions.Logarithm
Usage log(base, expr) - Returns the logarithm of `expr` with `base`.

Function: log10

Class org.apache.spark.sql.catalyst.expressions.Log10
Usage log10(expr) - Returns the logarithm of `expr` with base 10.

Function: log1p

Class org.apache.spark.sql.catalyst.expressions.Log1p
Usage log1p(expr) - Returns log(1 + `expr`).

Function: log2

Class org.apache.spark.sql.catalyst.expressions.Log2
Usage log2(expr) - Returns the logarithm of `expr` with base 2.

Function: logloss

Class hivemall.evaluation.LogarithmicLossUDAF
Usage logloss(double predicted, double actual) - Return a Logrithmic Loss

Function: logress

Class hivemall.regression.LogressUDTF
Usage logress(array features, float target [, constant string options]) - Returns a relation consists of <{int|bigint|string} feature, float weight>

Function: lon2tilex

Class hivemall.geospatial.Lon2TileXUDF
Usage lon2tilex(double lon, int zoom)::int - Returns the tile number of the given longitude and zoom level

Function: lower

Class org.apache.spark.sql.catalyst.expressions.Lower
Usage lower(str) - Returns `str` with all characters changed to lowercase.

Function: lpad

Class org.apache.spark.sql.catalyst.expressions.StringLPad
Usage lpad(str, len, pad) - Returns `str`, left-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters.

Function: lr_datagen

Class hivemall.dataset.LogisticRegressionDataGeneratorUDTF
Usage lr_datagen(options string) - Generates a logistic regression dataset

WITH dual AS (SELECT 1) SELECT lr_datagen('-n_examples 1k -n_features 10') FROM dual;

Function: ltrim

Class org.apache.spark.sql.catalyst.expressions.StringTrimLeft
Usage ltrim(str) - Removes the leading space characters from `str`. ltrim(trimStr, str) - Removes the leading string contains the characters from the trim string

Function: mae

Class hivemall.evaluation.MeanAbsoluteErrorUDAF
Usage mae(double predicted, double actual) - Return a Mean Absolute Error

Function: manhattan_distance

Class hivemall.knn.distance.ManhattanDistanceUDF
Usage manhattan_distance(list x, list y) - Returns sum(|x - y|)

WITH docs as (
select 1 as docid, array('apple:1.0', 'orange:2.0', 'banana:1.0', 'kuwi:0') as features
union all
select 2 as docid, array('apple:1.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
union all
select 3 as docid, array('apple:2.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
)
select
l.docid as doc1,
r.docid as doc2,
manhattan_distance(l.features, r.features) as distance,
distance2similarity(angular_distance(l.features, r.features)) as similarity
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance asc;

doc1 doc2 distance similarity
1 2 4.0 0.75
1 3 5.0 0.75942624
2 3 1.0 0.91039914
2 1 4.0 0.75
3 2 1.0 0.91039914
3 1 5.0 0.75942624

Function: map

Class org.apache.spark.sql.catalyst.expressions.CreateMap
Usage map(key0, value0, key1, value1, ...) - Creates a map with the given key/value pairs.

Function: map_concat

Class org.apache.spark.sql.catalyst.expressions.MapConcat
Usage map_concat(map, ...) - Returns the union of all the given maps

Function: map_exclude_keys

Class hivemall.tools.map.MapExcludeKeysUDF
Usage map_exclude_keys(Map map, array filteringKeys) - Returns the filtered entries of a map not having specified keys

SELECT map_exclude_keys(map(1,'one',2,'two',3,'three'),array(2,3));
{1:"one"}

Function: map_filter_keys

Class brickhouse.udf.collect.MapFilterKeysUDF
Usage map_filter_keys(map, key_array) - Returns the filtered entries of a map corresponding to a given set of keys

Function: map_from_arrays

Class org.apache.spark.sql.catalyst.expressions.MapFromArrays
Usage map_from_arrays(keys, values) - Creates a map with a pair of the given key/value arrays. All elements in keys should not be null

Function: map_from_entries

Class org.apache.spark.sql.catalyst.expressions.MapFromEntries
Usage map_from_entries(arrayOfEntries) - Returns a map created from the given array of entries.

Function: map_get_sum

Class hivemall.tools.map.MapGetSumUDF
Usage map_get_sum(map src, array keys) - Returns sum of values that are retrieved by keys

Function: map_include_keys

Class hivemall.tools.map.MapIncludeKeysUDF
Usage map_include_keys(Map map, array filteringKeys) - Returns the filtered entries of a map having specified keys

SELECT map_include_keys(map(1,'one',2,'two',3,'three'),array(2,3));
{2:"two",3:"three"}

Function: map_index

Class brickhouse.udf.collect.MapIndexUDF
Usage

Function: map_key_values

Class brickhouse.udf.collect.MapKeyValuesUDF
Usage map_key_values(map) - Returns a Array of key-value pairs contained in a Map

Function: map_keys

Class org.apache.spark.sql.catalyst.expressions.MapKeys
Usage map_keys(map) - Returns an unordered array containing the keys of the map.

Function: map_tail_n

Class hivemall.tools.map.MapTailNUDF
Usage map_tail_n(map SRC, int N) - Returns the last N elements from a sorted array of SRC

Function: map_url

Class hivemall.geospatial.MapURLUDF
Usage map_url(double lat, double lon, int zoom [, const string option]) - Returns a URL string

OpenStreetMap: http://tile.openstreetmap.org/${zoom}/${xtile}/${ytile}.png
Google Maps: https://www.google.com/maps/@${lat},${lon},${zoom}z

Function: map_values

Class org.apache.spark.sql.catalyst.expressions.MapValues
Usage map_values(map) - Returns an unordered array containing the values of the map.

Function: max

Class org.apache.spark.sql.catalyst.expressions.aggregate.Max
Usage max(expr) - Returns the maximum value of `expr`.

Function: max_label

Class hivemall.ensemble.MaxValueLabelUDAF
Usage max_label(double value, string label) - Returns a label that has the maximum value

Function: maxrow

Class hivemall.ensemble.MaxRowUDAF
Usage maxrow(ANY compare, ...) - Returns a row that has maximum value in the 1st argument

Function: md5

Class org.apache.spark.sql.catalyst.expressions.Md5
Usage md5(expr) - Returns an MD5 128-bit checksum as a hex string of `expr`.

Function: mean

Class org.apache.spark.sql.catalyst.expressions.aggregate.Average
Usage mean(expr) - Returns the mean calculated from values of a group.

Function: mhash

Class hivemall.ftvec.hashing.MurmurHash3UDF
Usage mhash(string word) returns a murmurhash3 INT value starting from 1

Function: min

Class org.apache.spark.sql.catalyst.expressions.aggregate.Min
Usage min(expr) - Returns the minimum value of `expr`.

Function: minhash

Class hivemall.knn.lsh.MinHashUDTF
Usage minhash(ANY item, array features [, constant string options]) - Returns n different k-depth signatures (i.e., clusterid) for each item

Function: minhashes

Class hivemall.knn.lsh.MinHashesUDF
Usage minhashes(array<> features [, int numHashes, int keyGroup [, boolean noWeight]]) - Returns minhash values

Function: minkowski_distance

Class hivemall.knn.distance.MinkowskiDistanceUDF
Usage minkowski_distance(list x, list y, double p) - Returns sum(|x - y|^p)^(1/p)

WITH docs as (
select 1 as docid, array('apple:1.0', 'orange:2.0', 'banana:1.0', 'kuwi:0') as features
union all
select 2 as docid, array('apple:1.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
union all
select 3 as docid, array('apple:2.0', 'orange:0', 'banana:2.0', 'kuwi:1.0') as features
)
select
l.docid as doc1,
r.docid as doc2,
minkowski_distance(l.features, r.features, 1) as distance1, -- p=1 (manhattan_distance)
minkowski_distance(l.features, r.features, 2) as distance2, -- p=2 (euclid_distance)
minkowski_distance(l.features, r.features, 3) as distance3, -- p=3
manhattan_distance(l.features, r.features) as manhattan_distance,
euclid_distance(l.features, r.features) as euclid_distance
from
docs l
CROSS JOIN docs r
where
l.docid != r.docid
order by
doc1 asc,
distance1 asc;

doc1 doc2 distance1 distance2 distance3 manhattan_distance euclid_distance
1 2 4.0 2.4494898 2.1544347 4.0 2.4494898
1 3 5.0 2.6457512 2.2239802 5.0 2.6457512
2 3 1.0 1.0 1.0 1.0 1.0
2 1 4.0 2.4494898 2.1544347 4.0 2.4494898
3 2 1.0 1.0 1.0 1.0 1.0
3 1 5.0 2.6457512 2.2239802 5.0 2.6457512

Function: minute

Class org.apache.spark.sql.catalyst.expressions.Minute
Usage minute(timestamp) - Returns the minute component of the string/timestamp.

Function: mod

Class org.apache.spark.sql.catalyst.expressions.Remainder
Usage expr1 mod expr2 - Returns the remainder after `expr1`/`expr2`.

Function: monotonically_increasing_id

Class org.apache.spark.sql.catalyst.expressions.MonotonicallyIncreasingID
Usage monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. The function is non-deterministic because its result depends on partition IDs.

Function: month

Class org.apache.spark.sql.catalyst.expressions.Month
Usage month(date) - Returns the month component of the date/timestamp.

Function: months_between

Class org.apache.spark.sql.catalyst.expressions.MonthsBetween
Usage months_between(timestamp1, timestamp2[, roundOff]) - If `timestamp1` is later than `timestamp2`, then the result is positive. If `timestamp1` and `timestamp2` are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false.

Function: moving_avg

Class brickhouse.udf.timeseries.MovingAvgUDF
Usage return the moving average of a time series for a given timewindow

Function: mrr

Class hivemall.evaluation.MRRUDAF
Usage mrr(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) - Returns MRR

Function: mse

Class hivemall.evaluation.MeanSquaredErrorUDAF
Usage mse(double predicted, double actual) - Return a Mean Squared Error

Function: multiday_count

Class brickhouse.udf.sketch.MultiDaySketcherUDAF
Usage multiday_count(x) - Returns a count of events over several different periods,

Function: named_struct

Class org.apache.spark.sql.catalyst.expressions.CreateNamedStruct
Usage named_struct(name1, val1, name2, val2, ...) - Creates a struct with the given field names and values.

Function: nan

Class hivemall.tools.math.NanUDF
Usage nan() - Returns the constant representing not-a-number.

SELECT nan(), is_nan(nan());
NaN true

Function: nanvl

Class org.apache.spark.sql.catalyst.expressions.NaNvl
Usage nanvl(expr1, expr2) - Returns `expr1` if it's not NaN, or `expr2` otherwise.

Function: ndcg

Class hivemall.evaluation.NDCGUDAF
Usage ndcg(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) - Returns nDCG

Function: negative

Class org.apache.spark.sql.catalyst.expressions.UnaryMinus
Usage negative(expr) - Returns the negated value of `expr`.

Function: next_day

Class org.apache.spark.sql.catalyst.expressions.NextDay
Usage next_day(start_date, day_of_week) - Returns the first date which is later than `start_date` and named as indicated.

Function: normalize_unicode

Class hivemall.tools.text.NormalizeUnicodeUDF
Usage normalize_unicode(string str [, string form]) - Transforms `str` with the specified normalization form. The `form` takes one of NFC (default), NFD, NFKC, or NFKD

SELECT normalize_unicode('ハンカクカナ','NFKC');
ハンカクカナ

SELECT normalize_unicode('㈱㌧㌦Ⅲ','NFKC');
(株)トンドルIII

Function: not

Class org.apache.spark.sql.catalyst.expressions.Not
Usage not expr - Logical not.

Function: now

Class org.apache.spark.sql.catalyst.expressions.CurrentTimestamp
Usage now() - Returns the current timestamp at the start of query evaluation.

Function: ntile

Class org.apache.spark.sql.catalyst.expressions.NTile
Usage ntile(n) - Divides the rows for each window partition into `n` buckets ranging from 1 to at most `n`.

Function: nullif

Class org.apache.spark.sql.catalyst.expressions.NullIf
Usage nullif(expr1, expr2) - Returns null if `expr1` equals to `expr2`, or `expr1` otherwise.

Function: numeric_range

Class brickhouse.udf.collect.NumericRange
Usage numeric_range(a,b,c) - Generates a range of integers from a to b incremented by c or the elements of a map into multiple rows and columns

Function: nvl

Class org.apache.spark.sql.catalyst.expressions.Nvl
Usage nvl(expr1, expr2) - Returns `expr2` if `expr1` is null, or `expr1` otherwise.

Function: nvl2

Class org.apache.spark.sql.catalyst.expressions.Nvl2
Usage nvl2(expr1, expr2, expr3) - Returns `expr2` if `expr1` is not null, or `expr3` otherwise.

Function: octet_length

Class org.apache.spark.sql.catalyst.expressions.OctetLength
Usage octet_length(expr) - Returns the byte length of string data or number of bytes of binary data.

Function: onehot_encoding

Class hivemall.ftvec.trans.OnehotEncodingUDAF
Usage onehot_encoding(PRIMITIVE feature, ...) - Compute onehot encoded label for each feature

WITH mapping as (
select
m.f1, m.f2
from (
select onehot_encoding(species, category) m
from test
) tmp
)
select
array(m.f1[t.species],m.f2[t.category],feature('count',count)) as sparse_features
from
test t
CROSS JOIN mapping m;

["2","8","count:9"]
["5","8","count:10"]
["1","6","count:101"]

Function: or

Class org.apache.spark.sql.catalyst.expressions.Or
Usage expr1 or expr2 - Logical OR.

Function: parse_url

Class org.apache.spark.sql.catalyst.expressions.ParseUrl
Usage parse_url(url, partToExtract[, key]) - Extracts a part from a URL.

Function: percent_rank

Class org.apache.spark.sql.catalyst.expressions.PercentRank
Usage percent_rank() - Computes the percentage ranking of a value in a group of values.

Function: percentile

Class org.apache.spark.sql.catalyst.expressions.aggregate.Percentile
Usage percentile(col, percentage [, frequency]) - Returns the exact percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral percentile(col, array(percentage1 [, percentage2]...) [, frequency]) - Returns the exact percentile value array of numeric column `col` at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral

Function: percentile_approx

Class org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
Usage percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array.

Function: permutations

Class com.whereos.udf.PermutationUDTF
Usage

Function: pi

Class org.apache.spark.sql.catalyst.expressions.Pi
Usage pi() - Returns pi.

Function: plsa_predict

Class hivemall.topicmodel.PLSAPredictUDAF
Usage plsa_predict(string word, float value, int label, float prob[, const string options]) - Returns a list which consists of

Function: pmod

Class org.apache.spark.sql.catalyst.expressions.Pmod
Usage pmod(expr1, expr2) - Returns the positive value of `expr1` mod `expr2`.

Function: polynomial_features

Class hivemall.ftvec.pairing.PolynomialFeaturesUDF
Usage polynomial_features(feature_vector in array) - Returns a feature vectorhaving polynomial feature space

Function: popcnt

Class hivemall.knn.distance.PopcountUDF
Usage popcnt(a [, b]) - Returns a popcount value

select
popcnt(3),
popcnt("3"), -- 3=0x11
popcnt(array(1,3));

2 2 3

Function: populate_not_in

Class hivemall.ftvec.ranking.PopulateNotInUDTF
Usage populate_not_in(list items, const int max_item_id [, const string options])- Returns a relation consists of that item does not exist in the given items

Function: posexplode

Class org.apache.spark.sql.catalyst.expressions.PosExplode
Usage posexplode(expr) - Separates the elements of array `expr` into multiple rows with positions, or the elements of map `expr` into multiple rows and columns with positions.

Function: posexplode_outer

Class org.apache.spark.sql.catalyst.expressions.PosExplode
Usage posexplode_outer(expr) - Separates the elements of array `expr` into multiple rows with positions, or the elements of map `expr` into multiple rows and columns with positions.

Function: posexplodepairs

Class com.whereos.udf.PosExplodePairsUDTF
Usage

Function: position

Class org.apache.spark.sql.catalyst.expressions.StringLocate
Usage position(substr, str[, pos]) - Returns the position of the first occurrence of `substr` in `str` after position `pos`. The given `pos` and return value are 1-based.

Function: positive

Class org.apache.spark.sql.catalyst.expressions.UnaryPositive
Usage positive(expr) - Returns the value of `expr`.

Function: pow

Class org.apache.spark.sql.catalyst.expressions.Pow
Usage pow(expr1, expr2) - Raises `expr1` to the power of `expr2`.

Function: power

Class org.apache.spark.sql.catalyst.expressions.Pow
Usage power(expr1, expr2) - Raises `expr1` to the power of `expr2`.

Function: powered_features

Class hivemall.ftvec.pairing.PoweredFeaturesUDF
Usage powered_features(feature_vector in array, int degree [, boolean truncate]) - Returns a feature vector having a powered feature space

Function: precision_at

Class hivemall.evaluation.PrecisionUDAF
Usage precision_at(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) - Returns Precision

Function: prefixed_hash_values

Class hivemall.ftvec.hashing.ArrayPrefixedHashValuesUDF
Usage prefixed_hash_values(array values, string prefix [, boolean useIndexAsPrefix]) returns array that each element has the specified prefix

Function: printf

Class org.apache.spark.sql.catalyst.expressions.FormatString
Usage printf(strfmt, obj, ...) - Returns a formatted string from printf-style format strings.

Function: quantified_features

Class hivemall.ftvec.trans.QuantifiedFeaturesUDTF
Usage quantified_features(boolean output, col1, col2, ...) - Returns an identified features in a dense array

Function: quantify

Class hivemall.ftvec.conv.QuantifyColumnsUDTF
Usage quantify(boolean output, col1, col2, ...) - Returns an identified features

Function: quantitative_features

Class hivemall.ftvec.trans.QuantitativeFeaturesUDF
Usage quantitative_features(array featureNames, feature1, feature2, .. [, const string options]) - Returns a feature vector array

Function: quarter

Class org.apache.spark.sql.catalyst.expressions.Quarter
Usage quarter(date) - Returns the quarter of the year for date, in the range 1 to 4.

Function: r2

Class hivemall.evaluation.R2UDAF
Usage r2(double predicted, double actual) - Return R Squared (coefficient of determination)

Function: radians

Class org.apache.spark.sql.catalyst.expressions.ToRadians
Usage radians(expr) - Converts degrees to radians.

Function: raise_error

Class hivemall.tools.sanity.RaiseErrorUDF
Usage raise_error() or raise_error(string msg) - Throws an error

SELECT product_id, price, raise_error('Found an invalid record') FROM xxx WHERE price < 0.0

Function: rand

Class org.apache.spark.sql.catalyst.expressions.Rand
Usage rand([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).

Function: rand_amplify

Class hivemall.ftvec.amplify.RandomAmplifierUDTF
Usage rand_amplify(const int xtimes [, const string options], *) - amplify the input records x-times in map-side

Function: randn

Class org.apache.spark.sql.catalyst.expressions.Randn
Usage randn([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.

Function: rank

Class org.apache.spark.sql.catalyst.expressions.Rank
Usage rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence.

Function: readjsongeometry

Class com.whereos.udf.ReadJSONGeometryUDF
Usage

Function: recall_at

Class hivemall.evaluation.RecallUDAF
Usage recall_at(array rankItems, array correctItems [, const int recommendSize = rankItems.size]) - Returns Recall

Function: reflect

Class org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
Usage reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.

Function: regexp_extract

Class org.apache.spark.sql.catalyst.expressions.RegExpExtract
Usage regexp_extract(str, regexp[, idx]) - Extracts a group that matches `regexp`.

Function: regexp_replace

Class org.apache.spark.sql.catalyst.expressions.RegExpReplace
Usage regexp_replace(str, regexp, rep) - Replaces all substrings of `str` that match `regexp` with `rep`.

Function: rendergeometries

Class com.whereos.udf.CollectAndRenderGeometryUDF
Usage

Function: renderheatmap

Class com.whereos.udf.HeatmapRenderUDF
Usage

Function: rendertile

Class com.whereos.udf.TileRenderUDF
Usage

Function: repeat

Class org.apache.spark.sql.catalyst.expressions.StringRepeat
Usage repeat(str, n) - Returns the string which repeats the given string value n times.

Function: replace

Class org.apache.spark.sql.catalyst.expressions.StringReplace
Usage replace(str, search[, replace]) - Replaces all occurrences of `search` with `replace`.

Function: rescale

Class hivemall.ftvec.scaling.RescaleUDF
Usage rescale(value, min, max) - Returns rescaled value by min-max normalization

Function: reverse

Class org.apache.spark.sql.catalyst.expressions.Reverse
Usage reverse(array) - Returns a reversed string or an array with reverse order of elements.

Function: rf_ensemble

Class hivemall.smile.tools.RandomForestEnsembleUDAF
Usage rf_ensemble(int yhat [, array proba [, double model_weight=1.0]]) - Returns ensembled prediction results in probabilities>

Function: right

Class org.apache.spark.sql.catalyst.expressions.Right
Usage right(str, len) - Returns the rightmost `len`(`len` can be string type) characters from the string `str`,if `len` is less or equal than 0 the result is an empty string.

Function: rint

Class org.apache.spark.sql.catalyst.expressions.Rint
Usage rint(expr) - Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

Function: rlike

Class org.apache.spark.sql.catalyst.expressions.RLike
Usage str rlike regexp - Returns true if `str` matches `regexp`, or false otherwise.

Function: rmse

Class hivemall.evaluation.RootMeanSquaredErrorUDAF
Usage rmse(double predicted, double actual) - Return a Root Mean Squared Error

Function: rollup

Class org.apache.spark.sql.catalyst.expressions.Rollup
Usage rollup([col1[, col2 ..]]) - create a multi-dimensional rollup using the specified columns so that we can run aggregation on them.

Function: round

Class org.apache.spark.sql.catalyst.expressions.Round
Usage round(expr, d) - Returns `expr` rounded to `d` decimal places using HALF_UP rounding mode.

Function: row_number

Class org.apache.spark.sql.catalyst.expressions.RowNumber
Usage row_number() - Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.

Function: rowid

Class hivemall.tools.mapred.RowIdUDF
Usage rowid() - Returns a generated row id of a form {TASK_ID}-{SEQUENCE_NUMBER}

Function: rownum

Class hivemall.tools.mapred.RowNumberUDF
Usage rownum() - Returns a generated row number `sprintf(`%d%04d`,sequence,taskId)` in long

SELECT rownum() as rownum, xxx from ...

Function: rpad

Class org.apache.spark.sql.catalyst.expressions.StringRPad
Usage rpad(str, len, pad) - Returns `str`, right-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters.

Function: rtrim

Class org.apache.spark.sql.catalyst.expressions.StringTrimRight
Usage rtrim(str) - Removes the trailing space characters from `str`. rtrim(trimStr, str) - Removes the trailing string which contains the characters from the trim string from the `str`

Function: salted_bigint

Class brickhouse.hbase.SaltedBigIntUDF
Usage

Function: salted_bigint_key

Class brickhouse.hbase.SaltedBigIntUDF
Usage

Function: schema_of_json

Class org.apache.spark.sql.catalyst.expressions.SchemaOfJson
Usage schema_of_json(json[, options]) - Returns schema in the DDL format of JSON string.

Function: second

Class org.apache.spark.sql.catalyst.expressions.Second
Usage second(timestamp) - Returns the second component of the string/timestamp.

Function: select_k_best

Class hivemall.tools.array.SelectKBestUDF
Usage select_k_best(array array, const array importance, const int k) - Returns selected top-k elements as array

Function: sentences

Class org.apache.spark.sql.catalyst.expressions.Sentences
Usage sentences(str[, lang, country]) - Splits `str` into an array of array of words.

Function: sequence

Class org.apache.spark.sql.catalyst.expressions.Sequence
Usage sequence(start, stop, step) - Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions. Supported types are: byte, short, integer, long, date, timestamp. The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the 'date' or 'timestamp' type then the step expression must resolve to the 'interval' type, otherwise to the same type as the start and stop expressions.

Function: sessionize

Class hivemall.tools.datetime.SessionizeUDF
Usage sessionize(long timeInSec, long thresholdInSec [, String subject])- Returns a UUID string of a session.

SELECT
sessionize(time, 3600, ip_addr) as session_id,
time, ip_addr
FROM (
SELECT time, ipaddr
FROM weblog
DISTRIBUTE BY ip_addr, time SORT BY ip_addr, time DESC
) t1

Function: set_difference

Class brickhouse.udf.collect.SetDifferenceUDF
Usage set_difference(a,b) - Returns a list of those items in a, but not in b

Function: set_similarity

Class brickhouse.udf.sketch.SetSimilarityUDF
Usage set_similarity(a,b) - Compute the Jaccard set similarity of two sketch sets.

Function: sha

Class org.apache.spark.sql.catalyst.expressions.Sha1
Usage sha(expr) - Returns a sha1 hash value as a hex string of the `expr`.

Function: sha1

Class org.apache.spark.sql.catalyst.expressions.Sha1
Usage sha1(expr) - Returns a sha1 hash value as a hex string of the `expr`.

Function: sha2

Class org.apache.spark.sql.catalyst.expressions.Sha2
Usage sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of `expr`. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.

Function: shiftleft

Class org.apache.spark.sql.catalyst.expressions.ShiftLeft
Usage shiftleft(base, expr) - Bitwise left shift.

Function: shiftright

Class org.apache.spark.sql.catalyst.expressions.ShiftRight
Usage shiftright(base, expr) - Bitwise (signed) right shift.

Function: shiftrightunsigned

Class org.apache.spark.sql.catalyst.expressions.ShiftRightUnsigned
Usage shiftrightunsigned(base, expr) - Bitwise unsigned right shift.

Function: shuffle

Class org.apache.spark.sql.catalyst.expressions.Shuffle
Usage shuffle(array) - Returns a random permutation of the given array.

Function: sigmoid

Class hivemall.tools.math.SigmoidGenericUDF
Usage sigmoid(x) - Returns 1.0 / (1.0 + exp(-x))

WITH input as (
SELECT 3.0 as x
UNION ALL
SELECT -3.0 as x
)
select
1.0 / (1.0 + exp(-x)),
sigmoid(x)
from
input;
0.04742587317756678 0.04742587357759476
0.9525741268224334 0.9525741338729858

Function: sign

Class org.apache.spark.sql.catalyst.expressions.Signum
Usage sign(expr) - Returns -1.0, 0.0 or 1.0 as `expr` is negative, 0 or positive.

Function: signum

Class org.apache.spark.sql.catalyst.expressions.Signum
Usage signum(expr) - Returns -1.0, 0.0 or 1.0 as `expr` is negative, 0 or positive.

Function: simple_r

Class com.whereos.udf.RenjinUDF
Usage

Function: sin

Class org.apache.spark.sql.catalyst.expressions.Sin
Usage sin(expr) - Returns the sine of `expr`, as if computed by `java.lang.Math.sin`.

Function: singularize

Class hivemall.tools.text.SingularizeUDF
Usage singularize(string word) - Returns singular form of a given English word

SELECT singularize(lower("Apples"));

"apple"

Function: sinh

Class org.apache.spark.sql.catalyst.expressions.Sinh
Usage sinh(expr) - Returns hyperbolic sine of `expr`, as if computed by `java.lang.Math.sinh`.

Function: size

Class org.apache.spark.sql.catalyst.expressions.Size
Usage size(expr) - Returns the size of an array or a map.The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true.If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input.By default, the spark.sql.legacy.sizeOfNull parameter is set to true.

Function: sketch_hashes

Class brickhouse.udf.sketch.SketchHashesUDF
Usage sketch_hashes(x) - Return the MD5 hashes associated with a KMV sketch set of strings

Function: sketch_set

Class brickhouse.udf.sketch.SketchSetUDAF
Usage sketch_set(x) - Constructs a sketch set to estimate reach for large values

Function: skewness

Class org.apache.spark.sql.catalyst.expressions.aggregate.Skewness
Usage skewness(expr) - Returns the skewness value calculated from values of a group.

Function: slice

Class org.apache.spark.sql.catalyst.expressions.Slice
Usage slice(x, start, length) - Subsets array x starting from index start (or starting from the end if start is negative) with the specified length.

Function: smallint

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage smallint(expr) - Casts the value `expr` to the target data type `smallint`.

Function: snr

Class hivemall.ftvec.selection.SignalNoiseRatioUDAF
Usage snr(array features, array one-hot class label) - Returns Signal Noise Ratio for each feature as array

Function: sort_and_uniq_array

Class hivemall.tools.array.SortAndUniqArrayUDF
Usage sort_and_uniq_array(array) - Takes array and returns a sorted array with duplicate elements eliminated

SELECT sort_and_uniq_array(array(3,1,1,-2,10));
[-2,1,3,10]

Function: sort_array

Class org.apache.spark.sql.catalyst.expressions.SortArray
Usage sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.

Function: sort_by_feature

Class hivemall.ftvec.SortByFeatureUDF
Usage sort_by_feature(map in map) - Returns a sorted map

Function: soundex

Class org.apache.spark.sql.catalyst.expressions.SoundEx
Usage soundex(str) - Returns Soundex code of the string.

Function: space

Class org.apache.spark.sql.catalyst.expressions.StringSpace
Usage space(n) - Returns a string consisting of `n` spaces.

Function: spark_partition_id

Class org.apache.spark.sql.catalyst.expressions.SparkPartitionID
Usage spark_partition_id() - Returns the current partition id.

Function: split

Class org.apache.spark.sql.catalyst.expressions.StringSplit
Usage split(str, regex) - Splits `str` around occurrences that match `regex`.

Function: split_words

Class hivemall.tools.text.SplitWordsUDF
Usage split_words(string query [, string regex]) - Returns an array containing splitted strings

Function: splitlinestring

Class com.whereos.udf.LineSplitterUDTF
Usage

Function: sqrt

Class org.apache.spark.sql.catalyst.expressions.Sqrt
Usage sqrt(expr) - Returns the square root of `expr`.

Function: sst

Class hivemall.anomaly.SingularSpectrumTransformUDF
Usage sst(double|array x [, const string options]) - Returns change-point scores and decisions using Singular Spectrum Transformation (SST). It will return a tuple

Function: stack

Class org.apache.spark.sql.catalyst.expressions.Stack
Usage stack(n, expr1, ..., exprk) - Separates `expr1`, ..., `exprk` into `n` rows.

Function: std

Class org.apache.spark.sql.catalyst.expressions.aggregate.StddevSamp
Usage std(expr) - Returns the sample standard deviation calculated from values of a group.

Function: stddev

Class org.apache.spark.sql.catalyst.expressions.aggregate.StddevSamp
Usage stddev(expr) - Returns the sample standard deviation calculated from values of a group.

Function: stddev_pop

Class org.apache.spark.sql.catalyst.expressions.aggregate.StddevPop
Usage stddev_pop(expr) - Returns the population standard deviation calculated from values of a group.

Function: stddev_samp

Class org.apache.spark.sql.catalyst.expressions.aggregate.StddevSamp
Usage stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group.

Function: str_to_map

Class org.apache.spark.sql.catalyst.expressions.StringToMap
Usage str_to_map(text[, pairDelim[, keyValueDelim]]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for `pairDelim` and ':' for `keyValueDelim`.

Function: string

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage string(expr) - Casts the value `expr` to the target data type `string`.

Function: struct

Class org.apache.spark.sql.catalyst.expressions.NamedStruct
Usage struct(col1, col2, col3, ...) - Creates a struct with the given field values.

Function: subarray

Class hivemall.tools.array.ArraySliceUDF
Usage subarray(array values, int offset [, int length]) - Slices the given array by the given offset and length parameters.

SELECT
array_slice(array(1,2,3,4,5,6),2,4),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
0, -- offset
2 -- length
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
6, -- offset
3 -- length
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
6, -- offset
10 -- length
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
6 -- offset
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
-3 -- offset
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
-3, -- offset
2 -- length
);

[3,4]
["zero","one"]
["six","seven","eight"]
["six","seven","eight","nine","ten"]
["six","seven","eight","nine","ten"]
["eight","nine","ten"]
["eight","nine"]

Function: subarray_endwith

Class hivemall.tools.array.SubarrayEndWithUDF
Usage subarray_endwith(array original, int|text key) - Returns an array that ends with the specified key

SELECT subarray_endwith(array(1,2,3,4), 3);
[1,2,3]

Function: subarray_startwith

Class hivemall.tools.array.SubarrayStartWithUDF
Usage subarray_startwith(array original, int|text key) - Returns an array that starts with the specified key

SELECT subarray_startwith(array(1,2,3,4), 2);
[2,3,4]

Function: substr

Class org.apache.spark.sql.catalyst.expressions.Substring
Usage substr(str, pos[, len]) - Returns the substring of `str` that starts at `pos` and is of length `len`, or the slice of byte array that starts at `pos` and is of length `len`.

Function: substring

Class org.apache.spark.sql.catalyst.expressions.Substring
Usage substring(str, pos[, len]) - Returns the substring of `str` that starts at `pos` and is of length `len`, or the slice of byte array that starts at `pos` and is of length `len`.

Function: substring_index

Class org.apache.spark.sql.catalyst.expressions.SubstringIndex
Usage substring_index(str, delim, count) - Returns the substring from `str` before `count` occurrences of the delimiter `delim`. If `count` is positive, everything to the left of the final delimiter (counting from the left) is returned. If `count` is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for `delim`.

Function: sum

Class org.apache.spark.sql.catalyst.expressions.aggregate.Sum
Usage sum(expr) - Returns the sum calculated from values of a group.

Function: sum_array

Class brickhouse.udf.timeseries.SumArrayUDF
Usage sum an array of doubles

Function: tan

Class org.apache.spark.sql.catalyst.expressions.Tan
Usage tan(expr) - Returns the tangent of `expr`, as if computed by `java.lang.Math.tan`.

Function: tanh

Class org.apache.spark.sql.catalyst.expressions.Tanh
Usage tanh(expr) - Returns the hyperbolic tangent of `expr`, as if computed by `java.lang.Math.tanh`.

Function: taskid

Class hivemall.tools.mapred.TaskIdUDF
Usage taskid() - Returns the value of mapred.task.partition

Function: tf

Class hivemall.ftvec.text.TermFrequencyUDAF
Usage tf(string text) - Return a term frequency in

Function: throw_error

Class brickhouse.udf.sanity.ThrowErrorUDF
Usage

Function: tile

Class hivemall.geospatial.TileUDF
Usage tile(double lat, double lon, int zoom)::bigint - Returns a tile number 2^2n where n is zoom level. tile(lat,lon,zoom) = xtile(lon,zoom) + ytile(lat,zoom) * 2^zoom

refer https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames for detail

Function: tilex2lon

Class hivemall.geospatial.TileX2LonUDF
Usage tilex2lon(int x, int zoom)::double - Returns longitude of the given tile x and zoom level

Function: tiley2lat

Class hivemall.geospatial.TileY2LatUDF
Usage tiley2lat(int y, int zoom)::double - Returns latitude of the given tile y and zoom level

Function: timestamp

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage timestamp(expr) - Casts the value `expr` to the target data type `timestamp`.

Function: tinyint

Class org.apache.spark.sql.catalyst.expressions.Cast
Usage tinyint(expr) - Casts the value `expr` to the target data type `tinyint`.

Function: to_bits

Class hivemall.tools.bits.ToBitsUDF
Usage to_bits(int[] indexes) - Returns an bitset representation if the given indexes in long[]

SELECT to_bits(array(1,2,3,128));
[14,-9223372036854775808]

Function: to_camel_case

Class brickhouse.udf.json.ConvertToCamelCaseUDF
Usage to_camel_case(a) - Converts a string containing underscores to CamelCase

Function: to_date

Class org.apache.spark.sql.catalyst.expressions.ParseToDate
Usage to_date(date_str[, fmt]) - Parses the `date_str` expression with the `fmt` expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the `fmt` is omitted.

Function: to_dense

Class hivemall.ftvec.conv.ToDenseFeaturesUDF
Usage to_dense(array feature_vector, int dimensions) - Returns a dense feature in array

Function: to_dense_features

Class hivemall.ftvec.conv.ToDenseFeaturesUDF
Usage to_dense_features(array feature_vector, int dimensions) - Returns a dense feature in array

Function: to_json

Class org.apache.spark.sql.catalyst.expressions.StructsToJson
Usage to_json(expr[, options]) - Returns a JSON string with a given struct value

Function: to_map

Class hivemall.tools.map.UDAFToMap
Usage to_map(key, value) - Convert two aggregated columns into a key-value map

WITH input as (
select 'aaa' as key, 111 as value
UNION all
select 'bbb' as key, 222 as value
)
select to_map(key, value)
from input;

> {"bbb":222,"aaa":111}

Function: to_ordered_list

Class hivemall.tools.list.UDAFToOrderedList
Usage to_ordered_list(PRIMITIVE value [, PRIMITIVE key, const string options]) - Return list of values sorted by value itself or specific key

WITH t as (
SELECT 5 as key, 'apple' as value
UNION ALL
SELECT 3 as key, 'banana' as value
UNION ALL
SELECT 4 as key, 'candy' as value
UNION ALL
SELECT 2 as key, 'donut' as value
UNION ALL
SELECT 3 as key, 'egg' as value
)
SELECT -- expected output
to_ordered_list(value, key, '-reverse'), -- [apple, candy, (banana, egg | egg, banana), donut] (reverse order)
to_ordered_list(value, key, '-k 2'), -- [apple, candy] (top-k)
to_ordered_list(value, key, '-k 100'), -- [apple, candy, (banana, egg | egg, banana), dunut]
to_ordered_list(value, key, '-k 2 -reverse'), -- [donut, (banana | egg)] (reverse top-k = tail-k)
to_ordered_list(value, key), -- [donut, (banana, egg | egg, banana), candy, apple] (natural order)
to_ordered_list(value, key, '-k -2'), -- [donut, (banana | egg)] (tail-k)
to_ordered_list(value, key, '-k -100'), -- [donut, (banana, egg | egg, banana), candy, apple]
to_ordered_list(value, key, '-k -2 -reverse'), -- [apple, candy] (reverse tail-k = top-k)
to_ordered_list(value, '-k 2'), -- [egg, donut] (alphabetically)
to_ordered_list(key, '-k -2 -reverse'), -- [5, 4] (top-2 keys)
to_ordered_list(key), -- [2, 3, 3, 4, 5] (natural ordered keys)
to_ordered_list(value, key, '-k 2 -kv_map'), -- {4:"candy",5:"apple"}
to_ordered_list(value, key, '-k 2 -vk_map') -- {"candy":4,"apple":5}
FROM
t

Function: to_ordered_map

Class hivemall.tools.map.UDAFToOrderedMap
Usage to_ordered_map(key, value [, const int k|const boolean reverseOrder=false]) - Convert two aggregated columns into an ordered key-value map

with t as (
select 10 as key, 'apple' as value
union all
select 3 as key, 'banana' as value
union all
select 4 as key, 'candy' as value
)
select
to_ordered_map(key, value, true), -- {10:"apple",4:"candy",3:"banana"} (reverse)
to_ordered_map(key, value, 1), -- {10:"apple"} (top-1)
to_ordered_map(key, value, 2), -- {10:"apple",4:"candy"} (top-2)
to_ordered_map(key, value, 3), -- {10:"apple",4:"candy",3:"banana"} (top-3)
to_ordered_map(key, value, 100), -- {10:"apple",4:"candy",3:"banana"} (top-100)
to_ordered_map(key, value), -- {3:"banana",4:"candy",10:"apple"} (natural)
to_ordered_map(key, value, -1), -- {3:"banana"} (tail-1)
to_ordered_map(key, value, -2), -- {3:"banana",4:"candy"} (tail-2)
to_ordered_map(key, value, -3), -- {3:"banana",4:"candy",10:"apple"} (tail-3)
to_ordered_map(key, value, -100) -- {3:"banana",4:"candy",10:"apple"} (tail-100)
from t

Function: to_sparse

Class hivemall.ftvec.conv.ToSparseFeaturesUDF
Usage to_sparse(array feature_vector) - Returns a sparse feature in array

Function: to_sparse_features

Class hivemall.ftvec.conv.ToSparseFeaturesUDF
Usage to_sparse_features(array feature_vector) - Returns a sparse feature in array

Function: to_string_array

Class hivemall.tools.array.ToStringArrayUDF
Usage to_string_array(array) - Returns an array of strings

select to_string_array(array(1.0,2.0,3.0));

["1.0","2.0","3.0"]

Function: to_timestamp

Class org.apache.spark.sql.catalyst.expressions.ParseToTimestamp
Usage to_timestamp(timestamp[, fmt]) - Parses the `timestamp` expression with the `fmt` expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the `fmt` is omitted.

Function: to_unix_timestamp

Class org.apache.spark.sql.catalyst.expressions.ToUnixTimestamp
Usage to_unix_timestamp(expr[, pattern]) - Returns the UNIX timestamp of the given time.

Function: to_utc_timestamp

Class org.apache.spark.sql.catalyst.expressions.ToUTCTimestamp
Usage to_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

Function: tokenize

Class hivemall.tools.text.TokenizeUDF
Usage tokenize(string englishText [, boolean toLowerCase]) - Returns tokenized words in array

Function: train_adadelta_regr

Class hivemall.regression.AdaDeltaUDTF
Usage train_adadelta_regr(array features, float target [, constant string options]) - Returns a relation consists of <{int|bigint|string} feature, float weight>

Function: train_adagrad_rda

Class hivemall.classifier.AdaGradRDAUDTF
Usage train_adagrad_rda(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by Adagrad+RDA regularization binary classifier

Function: train_adagrad_regr

Class hivemall.regression.AdaGradUDTF
Usage train_adagrad_regr(array features, float target [, constant string options]) - Returns a relation consists of <{int|bigint|string} feature, float weight>

Function: train_arow

Class hivemall.classifier.AROWClassifierUDTF
Usage train_arow(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by Adaptive Regularization of Weight Vectors (AROW) binary classifier

Function: train_arow_regr

Class hivemall.regression.AROWRegressionUDTF
Usage train_arow_regr(array features, float target [, constant string options]) - a standard AROW (Adaptive Reguralization of Weight Vectors) regressor that uses `y - w^Tx` for the loss function.

SELECT
feature,
argmin_kld(weight, covar) as weight
FROM (
SELECT
train_arow_regr(features,label) as (feature,weight,covar)
FROM
training_data
) t
GROUP BY feature

Function: train_arowe2_regr

Class hivemall.regression.AROWRegressionUDTF$AROWe2
Usage train_arowe2_regr(array features, float target [, constant string options]) - a refined version of AROW (Adaptive Reguralization of Weight Vectors) regressor that usages adaptive epsilon-insensitive hinge loss `|w^t - y| - epsilon * stddev` for the loss function

SELECT
feature,
argmin_kld(weight, covar) as weight
FROM (
SELECT
train_arowe2_regr(features,label) as (feature,weight,covar)
FROM
training_data
) t
GROUP BY feature

Function: train_arowe_regr

Class hivemall.regression.AROWRegressionUDTF$AROWe
Usage train_arowe_regr(array features, float target [, constant string options]) - a refined version of AROW (Adaptive Reguralization of Weight Vectors) regressor that usages epsilon-insensitive hinge loss `|w^t - y| - epsilon` for the loss function

SELECT
feature,
argmin_kld(weight, covar) as weight
FROM (
SELECT
train_arowe_regr(features,label) as (feature,weight,covar)
FROM
training_data
) t
GROUP BY feature

Function: train_arowh

Class hivemall.classifier.AROWClassifierUDTF$AROWh
Usage train_arowh(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by AROW binary classifier using hinge loss

Function: train_classifier

Class hivemall.classifier.GeneralClassifierUDTF
Usage train_classifier(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by a generic classifier

Function: train_cw

Class hivemall.classifier.ConfidenceWeightedUDTF
Usage train_cw(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by Confidence-Weighted (CW) binary classifier

Function: train_kpa

Class hivemall.classifier.KernelExpansionPassiveAggressiveUDTF
Usage train_kpa(array features, int label [, const string options]) - returns a relation

Function: train_lda

Class hivemall.topicmodel.LDAUDTF
Usage train_lda(array words[, const string options]) - Returns a relation consists of

Function: train_logistic_regr

Class hivemall.regression.LogressUDTF
Usage train_logistic_regr(array features, float target [, constant string options]) - Returns a relation consists of <{int|bigint|string} feature, float weight>

Function: train_logregr

Class hivemall.regression.LogressUDTF
Usage train_logregr(array features, float target [, constant string options]) - Returns a relation consists of <{int|bigint|string} feature, float weight>

Function: train_multiclass_arow

Class hivemall.classifier.multiclass.MulticlassAROWClassifierUDTF
Usage train_multiclass_arow(list features, {int|string} label [, const string options]) - Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight, float covar>

Build a prediction model by Adaptive Regularization of Weight Vectors (AROW) multiclass classifier

Function: train_multiclass_arowh

Class hivemall.classifier.multiclass.MulticlassAROWClassifierUDTF$AROWh
Usage train_multiclass_arowh(list features, int|string label [, const string options]) - Returns a relation consists of

Build a prediction model by Adaptive Regularization of Weight Vectors (AROW) multiclass classifier using hinge loss

Function: train_multiclass_cw

Class hivemall.classifier.multiclass.MulticlassConfidenceWeightedUDTF
Usage train_multiclass_cw(list features, {int|string} label [, const string options]) - Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight, float covar>

Build a prediction model by Confidence-Weighted (CW) multiclass classifier

Function: train_multiclass_pa

Class hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF
Usage train_multiclass_pa(list features, {int|string} label [, const string options]) - Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight>

Build a prediction model by Passive-Aggressive (PA) multiclass classifier

Function: train_multiclass_pa1

Class hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF$PA1
Usage train_multiclass_pa1(list features, {int|string} label [, const string options]) - Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight>

Build a prediction model by Passive-Aggressive 1 (PA-1) multiclass classifier

Function: train_multiclass_pa2

Class hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF$PA2
Usage train_multiclass_pa2(list features, {int|string} label [, const string options]) - Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight>

Build a prediction model by Passive-Aggressive 2 (PA-2) multiclass classifier

Function: train_multiclass_perceptron

Class hivemall.classifier.multiclass.MulticlassPerceptronUDTF
Usage train_multiclass_perceptron(list features, {int|string} label [, const string options]) - Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight>

Build a prediction model by Perceptron multiclass classifier

Function: train_multiclass_scw

Class hivemall.classifier.multiclass.MulticlassSoftConfidenceWeightedUDTF$SCW1
Usage train_multiclass_scw(list features, {int|string} label [, const string options]) - Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight, float covar>

Build a prediction model by Soft Confidence-Weighted (SCW-1) multiclass classifier

Function: train_multiclass_scw2

Class hivemall.classifier.multiclass.MulticlassSoftConfidenceWeightedUDTF$SCW2
Usage train_multiclass_scw2(list features, {int|string} label [, const string options]) - Returns a relation consists of <{int|string} label, {string|int|bigint} feature, float weight, float covar>

Build a prediction model by Soft Confidence-Weighted 2 (SCW-2) multiclass classifier

Function: train_pa

Class hivemall.classifier.PassiveAggressiveUDTF
Usage train_pa(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by Passive-Aggressive (PA) binary classifier

Function: train_pa1

Class hivemall.classifier.PassiveAggressiveUDTF$PA1
Usage train_pa1(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by Passive-Aggressive 1 (PA-1) binary classifier

Function: train_pa1_regr

Class hivemall.regression.PassiveAggressiveRegressionUDTF
Usage train_pa1_regr(array features, float target [, constant string options]) - PA-1 regressor that returns a relation consists of `(int|bigint|string) feature, float weight`.

SELECT
feature,
avg(weight) as weight
FROM
(SELECT
train_pa1_regr(features,label) as (feature,weight)
FROM
training_data
) t
GROUP BY feature

Function: train_pa1a_regr

Class hivemall.regression.PassiveAggressiveRegressionUDTF$PA1a
Usage train_pa1a_regr(array features, float target [, constant string options]) - Returns a relation consists of `(int|bigint|string) feature, float weight`.

Function: train_pa2

Class hivemall.classifier.PassiveAggressiveUDTF$PA2
Usage train_pa2(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by Passive-Aggressive 2 (PA-2) binary classifier

Function: train_pa2_regr

Class hivemall.regression.PassiveAggressiveRegressionUDTF$PA2
Usage train_pa2_regr(array features, float target [, constant string options]) - Returns a relation consists of `(int|bigint|string) feature, float weight`.

Function: train_pa2a_regr

Class hivemall.regression.PassiveAggressiveRegressionUDTF$PA2a
Usage train_pa2a_regr(array features, float target [, constant string options]) - Returns a relation consists of `(int|bigint|string) feature, float weight`.

Function: train_perceptron

Class hivemall.classifier.PerceptronUDTF
Usage train_perceptron(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by Perceptron binary classifier

Function: train_plsa

Class hivemall.topicmodel.PLSAUDTF
Usage train_plsa(array words[, const string options]) - Returns a relation consists of

Function: train_randomforest_classifier

Class hivemall.smile.classification.RandomForestClassifierUDTF
Usage train_randomforest_classifier(array features, int label [, const string options, const array classWeights])- Returns a relation consists of var_importance, int oob_errors, int oob_tests>

Function: train_randomforest_regr

Class hivemall.smile.regression.RandomForestRegressionUDTF
Usage train_randomforest_regr(array features, double target [, string options]) - Returns a relation consists of var_importance, double oob_errors, int oob_tests>

Function: train_randomforest_regressor

Class hivemall.smile.regression.RandomForestRegressionUDTF
Usage train_randomforest_regressor(array features, double target [, string options]) - Returns a relation consists of var_importance, double oob_errors, int oob_tests>

Function: train_regressor

Class hivemall.regression.GeneralRegressorUDTF
Usage train_regressor(list features, double label [, const string options]) - Returns a relation consists of

Build a prediction model by a generic regressor

Function: train_scw

Class hivemall.classifier.SoftConfideceWeightedUDTF$SCW1
Usage train_scw(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by Soft Confidence-Weighted (SCW-1) binary classifier

Function: train_scw2

Class hivemall.classifier.SoftConfideceWeightedUDTF$SCW2
Usage train_scw2(list features, int label [, const string options]) - Returns a relation consists of

Build a prediction model by Soft Confidence-Weighted 2 (SCW-2) binary classifier

Function: train_slim

Class hivemall.recommend.SlimUDTF
Usage train_slim( int i, map r_i, map> topKRatesOfI, int j, map r_j [, constant string options]) - Returns row index, column index and non-zero weight value of prediction model

Function: transform

Class org.apache.spark.sql.catalyst.expressions.ArrayTransform
Usage transform(expr, func) - Transforms elements in an array using the function.

Function: translate

Class org.apache.spark.sql.catalyst.expressions.StringTranslate
Usage translate(input, from, to) - Translates the `input` string by replacing the characters present in the `from` string with the corresponding characters in the `to` string.

Function: transpose_and_dot

Class hivemall.tools.matrix.TransposeAndDotUDAF
Usage transpose_and_dot(array X, array Y) - Returns dot(X.T, Y) as array>, shape = (X.#cols, Y.#cols)

WITH input as (
select array(1.0, 2.0, 3.0, 4.0) as x, array(1, 2) as y
UNION ALL
select array(2.0, 3.0, 4.0, 5.0) as x, array(1, 2) as y
)
select
transpose_and_dot(x, y) as xy,
transpose_and_dot(y, x) as yx
from
input;

[["3.0","6.0"],["5.0","10.0"],["7.0","14.0"],["9.0","18.0"]] [["3.0","5.0","7.0","9.0"],["6.0","10.0","14.0","18.0"]]

Function: tree_export

Class hivemall.smile.tools.TreeExportUDF
Usage tree_export(string model, const string options, optional array featureNames=null, optional array classNames=null) - exports a Decision Tree model as javascript/dot]

Function: tree_predict

Class hivemall.smile.tools.TreePredictUDF
Usage tree_predict(string modelId, string model, array features [, const string options | const boolean classification=false]) - Returns a prediction result of a random forest in a posteriori> for classification and for regression

Function: tree_predict_v1

Class hivemall.smile.tools.TreePredictUDFv1
Usage tree_predict_v1(string modelId, int modelType, string script, array features [, const boolean classification]) - Returns a prediction result of a random forest

Function: trim

Class org.apache.spark.sql.catalyst.expressions.StringTrim
Usage trim(str) - Removes the leading and trailing space characters from `str`. trim(BOTH trimStr FROM str) - Remove the leading and trailing `trimStr` characters from `str` trim(LEADING trimStr FROM str) - Remove the leading `trimStr` characters from `str` trim(TRAILING trimStr FROM str) - Remove the trailing `trimStr` characters from `str`

Function: trunc

Class org.apache.spark.sql.catalyst.expressions.TruncDate
Usage trunc(date, fmt) - Returns `date` with the time portion of the day truncated to the unit specified by the format model `fmt`.`fmt` should be one of ["year", "yyyy", "yy", "mon", "month", "mm"]

Function: truncate_array

Class brickhouse.udf.collect.TruncateArrayUDF
Usage

Function: try_cast

Class hivemall.tools.TryCastUDF
Usage try_cast(ANY src, const string typeName) - Explicitly cast a value as a type. Returns null if cast fails.

SELECT try_cast(array(1.0,2.0,3.0), 'array')
SELECT try_cast(map('A',10,'B',20,'C',30), 'map')

Function: ucase

Class org.apache.spark.sql.catalyst.expressions.Upper
Usage ucase(str) - Returns `str` with all characters changed to uppercase.

Function: udfarrayconcat

Class com.whereos.udf.UDFArrayConcat
Usage udfarrayconcat(values) - Concatenates the array arguments

Function: unbase64

Class org.apache.spark.sql.catalyst.expressions.UnBase64
Usage unbase64(str) - Converts the argument from a base 64 string `str` to a binary.

Function: unbase91

Class hivemall.tools.text.Unbase91UDF
Usage unbase91(string) - Convert a BASE91 string to a binary

SELECT inflate(unbase91(base91(deflate('aaaaaaaaaaaaaaaabbbbccc'))));
aaaaaaaaaaaaaaaabbbbccc

Function: unbits

Class hivemall.tools.bits.UnBitsUDF
Usage unbits(long[] bitset) - Returns an long array of the give bitset representation

SELECT unbits(to_bits(array(1,4,2,3)));
[1,2,3,4]

Function: unhex

Class org.apache.spark.sql.catalyst.expressions.Unhex
Usage unhex(expr) - Converts hexadecimal `expr` to binary.

Function: union_hyperloglog

Class brickhouse.udf.hll.UnionHyperLogLogUDAF
Usage union_hyperloglog(x) - Merges multiple hyperloglogs together.

Function: union_map

Class brickhouse.udf.collect.UnionUDAF
Usage union_map(x) - Returns a map which contains the union of an aggregation of maps

Function: union_max

Class brickhouse.udf.collect.UnionMaxUDAF
Usage union_max(x, n) - Returns an map of the union of maps of max N elements in the aggregation group

Function: union_sketch

Class brickhouse.udf.sketch.UnionSketchSetUDAF
Usage union_sketch(x) - Constructs a sketch set to estimate reach for large values by collecting multiple sketches

Function: union_vector_sum

Class brickhouse.udf.timeseries.VectorUnionSumUDAF
Usage union_vector_sum(x) - Aggregate adding vectors together

Function: unix_timestamp

Class org.apache.spark.sql.catalyst.expressions.UnixTimestamp
Usage unix_timestamp([expr[, pattern]]) - Returns the UNIX timestamp of current or specified time.

Function: upper

Class org.apache.spark.sql.catalyst.expressions.Upper
Usage upper(str) - Returns `str` with all characters changed to uppercase.

Function: uuid

Class org.apache.spark.sql.catalyst.expressions.Uuid
Usage uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.

Function: var_pop

Class org.apache.spark.sql.catalyst.expressions.aggregate.VariancePop
Usage var_pop(expr) - Returns the population variance calculated from values of a group.

Function: var_samp

Class org.apache.spark.sql.catalyst.expressions.aggregate.VarianceSamp
Usage var_samp(expr) - Returns the sample variance calculated from values of a group.

Function: variance

Class org.apache.spark.sql.catalyst.expressions.aggregate.VarianceSamp
Usage variance(expr) - Returns the sample variance calculated from values of a group.

Function: vector_add

Class brickhouse.udf.timeseries.VectorAddUDF
Usage Add two vectors together

Function: vector_cross_product

Class brickhouse.udf.timeseries.VectorCrossProductUDF
Usage Multiply a vector times another vector

Function: vector_dot

Class hivemall.tools.vector.VectorDotUDF
Usage vector_dot(array x, array y) - Performs vector dot product.

SELECT vector_dot(array(1.0,2.0,3.0),array(2.0,3.0,4.0));
20

SELECT vector_dot(array(1.0,2.0,3.0),2);
[2.0,4.0,6.0]

Function: vector_dot_product

Class brickhouse.udf.timeseries.VectorDotProductUDF
Usage Return the Dot product of two vectors

Function: vector_magnitude

Class brickhouse.udf.timeseries.VectorMagnitudeUDF
Usage Magnitude of a vector.

Function: vector_scalar_mult

Class brickhouse.udf.timeseries.VectorMultUDF
Usage Multiply a vector times a scalar

Function: vectorize_features

Class hivemall.ftvec.trans.VectorizeFeaturesUDF
Usage vectorize_features(array featureNames, feature1, feature2, .. [, const string options]) - Returns a feature vector array

Function: voted_avg

Class hivemall.ensemble.bagging.VotedAvgUDAF
Usage voted_avg(double value) - Returns an averaged value by bagging for classification

Function: weekday

Class org.apache.spark.sql.catalyst.expressions.WeekDay
Usage weekday(date) - Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).

Function: weekofyear

Class org.apache.spark.sql.catalyst.expressions.WeekOfYear
Usage weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.

Function: weight_voted_avg

Class hivemall.ensemble.bagging.WeightVotedAvgUDAF
Usage weight_voted_avg(expr) - Returns an averaged value by considering sum of positive/negative weights

Function: when

Class org.apache.spark.sql.catalyst.expressions.CaseWhen
Usage CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.

Function: window

Class org.apache.spark.sql.catalyst.expressions.TimeWindow
Usage

Function: word_ngrams

Class hivemall.tools.text.WordNgramsUDF
Usage word_ngrams(array words, int minSize, int maxSize]) - Returns list of n-grams for given words, where `minSize <= n <= maxSize` SELECT word_ngrams(tokenize('Machine learning is fun!', true), 1, 2); ["machine","machine learning","learning","learning is","is","is fun","fun"]

Function: write_to_graphite

Class brickhouse.udf.sanity.WriteToGraphiteUDF
Usage Writes metric or collection of metrics to graphite.write_to_graphite(String hostname, int port, Map nameToValue, Long timestampInSeconds) write_to_graphite(String hostname, int port, Map nameToValue) write_to_graphite(String hostname, int port, String metricName, Double metricVaule, Long timestampInSeconds) write_to_graphite(String hostname, int port, String metricName, Double metricVaule)

Function: write_to_tsdb

Class brickhouse.udf.sanity.WriteToTSDBUDF
Usage This function writes metrics to the TSDB (metics names should look like proc.loadavg.1min, http.hits while tags string is space separated collection of tags). On failiure returns 'WRITE_FAILED' otherwise 'WRITE_OK' write_to_tsdb(String hostname, int port, Map nameToValue, String tags, Long timestampInSeconds) write_to_tsdb(String hostname, int port, Map nameToValue, String tags) write_to_tsdb(String hostname, int port, Map nameToValue) write_to_tsdb(String hostname, int port, String metricName, Double metricVaule, String tags, Long timestampInSeconds) write_to_tsdb(String hostname, int port, String metricName, Double metricVaule, String tags) write_to_tsdb(String hostname, int port, String metricName, Double metricVaule)

Function: x_rank

Class hivemall.tools.RankSequenceUDF
Usage x_rank(KEY) - Generates a pseudo sequence number starting from 1 for each key

Function: xpath

Class org.apache.spark.sql.catalyst.expressions.xml.XPathList
Usage xpath(xml, xpath) - Returns a string array of values within the nodes of xml that match the XPath expression.

Function: xpath_boolean

Class org.apache.spark.sql.catalyst.expressions.xml.XPathBoolean
Usage xpath_boolean(xml, xpath) - Returns true if the XPath expression evaluates to true, or if a matching node is found.

Function: xpath_double

Class org.apache.spark.sql.catalyst.expressions.xml.XPathDouble
Usage xpath_double(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Function: xpath_float

Class org.apache.spark.sql.catalyst.expressions.xml.XPathFloat
Usage xpath_float(xml, xpath) - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Function: xpath_int

Class org.apache.spark.sql.catalyst.expressions.xml.XPathInt
Usage xpath_int(xml, xpath) - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Function: xpath_long

Class org.apache.spark.sql.catalyst.expressions.xml.XPathLong
Usage xpath_long(xml, xpath) - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Function: xpath_number

Class org.apache.spark.sql.catalyst.expressions.xml.XPathDouble
Usage xpath_number(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Function: xpath_short

Class org.apache.spark.sql.catalyst.expressions.xml.XPathShort
Usage xpath_short(xml, xpath) - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Function: xpath_string

Class org.apache.spark.sql.catalyst.expressions.xml.XPathString
Usage xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the XPath expression.

Function: year

Class org.apache.spark.sql.catalyst.expressions.Year
Usage year(date) - Returns the year component of the date/timestamp.

Function: zip_with

Class org.apache.spark.sql.catalyst.expressions.ZipWith
Usage zip_with(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.

Function: zscore

Class hivemall.ftvec.scaling.ZScoreUDF
Usage zscore(value, mean, stddev) - Returns a standard score (zscore)

Function: |

Class org.apache.spark.sql.catalyst.expressions.BitwiseOr
Usage expr1 | expr2 - Returns the result of bitwise OR of `expr1` and `expr2`.

Function: ~

Class org.apache.spark.sql.catalyst.expressions.BitwiseNot
Usage ~ expr - Returns the result of bitwise NOT of `expr`.