Mathematical functions, or arithmetic functions, perform numeric calculations that are typically more complex than basic addition, subtraction, multiplication, and division. For example, these functions include trigonometric, logarithmic, and base conversion operations.
pow()
function rather than an
exponentiation operator such as **
.
Related information:
The mathematical functions operate mainly on these data types: INT Data Type, BIGINT Data Type, SMALLINT Data Type, TINYINT Data Type, DOUBLE Data Type, FLOAT Data Type, and DECIMAL Data Type (Impala 3.0 or higher only). For the operators that perform the standard operations such as addition, subtraction, multiplication, and division, see Arithmetic Operators.
Functions that perform bitwise operations are explained in Impala Bit Functions.
Function reference:
Impala supports the following mathematical functions:
Return type: Same as the input value
Usage notes: Use this function to ensure all return values are positive. This
is different than the POSITIVE()
function, which returns its
argument unchanged (even if the argument was negative).
Return type: DOUBLE
Return type: DOUBLE
Return type: DOUBLE
Return type: DOUBLE
Return type: STRING
Return type: Same as the input value
from_base
to to_base
. The first argument can be
specified as a number or a string. For example, CONV(100, 2, 10)
and
CONV('100', 2, 10)
both return '4'
.
Return type: STRING
Usage notes:
If to_base
is negative, the first argument is treated as signed,
and otherwise, it is treated as unsigned. For example:
conv(-17, 10, -2)
returns '-10001'
,
-17
in base 2.
conv(-17, 10, 10)
returns
'18446744073709551599'
. -17
is interpreted as an
unsigned, 2^64-17, and then the value is returned in base 10.
The function returns NULL
when the following illegal arguments are
specified:
NULL
.
from_base
or to_base
is below
-36
or above 36
.
from_base
or to_base
is -1
,
0
, or 1
.
from_base
is
a negative number.
If the first argument represents a negative number and from_base
is
a negative number, the function returns 0
.
If the first argument represents a number larger than the maximum
bigint
, the function returns:
to_base
if
to_base
is negative.
to_base
if to_base
is positive.
If the first argument does not represent a valid number in
from_base
, e.g. 3 in base 2 or '1a23' in base 10, the digits in the
first argument are evaluated from left-to-right and used if a valid digit in
from_base
. The invalid digit and the digits to the right are
ignored.
conv(445, 5, 10)
is converted to conv(44, 5,
10)
and returns '24'
.
conv('1a23', 10, 16)
is converted to conv('1', 10 ,
16)
and returns '1'
.
Return type: DOUBLE
Return type: DOUBLE
Return type: DOUBLE
Added in: Impala 2.3.0
Return type: DOUBLE
Return type: DOUBLE
Return type: DOUBLE
Added in: Impala 2.3.0
Usage notes: You can use either the factorial()
function or
the !
operator. The factorial of 0 is 1. Likewise, the
factorial()
function returns 1 for any negative value. The maximum
positive value for the input argument is 20; a value of 21 or greater overflows the
range for a BIGINT
and causes an error.
Return type: BIGINT
Added in: Impala 2.3.0
select factorial(5);
+--------------+
| factorial(5) |
+--------------+
| 120 |
+--------------+
select 5!;
+-----+
| 5! |
+-----+
| 120 |
+-----+
select factorial(0);
+--------------+
| factorial(0) |
+--------------+
| 1 |
+--------------+
select factorial(-100);
+-----------------+
| factorial(-100) |
+-----------------+
| 1 |
+-----------------+
Return type: Same as the input type
Return type: FLOAT
or DOUBLE
, depending on
type of arguments
Added in: Impala 1.1.1
Usage notes:
Because this function operates on DOUBLE
or FLOAT
values, it is subject to potential rounding errors for values that cannot be
represented precisely. Prefer to use whole numbers, or values that you know can be
represented precisely by the DOUBLE
or FLOAT
types.
Examples:
The following examples show equivalent operations with the fmod()
function and the %
arithmetic operator, for values not subject to
any rounding error.
select fmod(10,3);
+-------------+
| fmod(10, 3) |
+-------------+
| 1 |
+-------------+
select fmod(5.5,2);
+--------------+
| fmod(5.5, 2) |
+--------------+
| 1.5 |
+--------------+
select 10 % 3;
+--------+
| 10 % 3 |
+--------+
| 1 |
+--------+
select 5.5 % 2;
+---------+
| 5.5 % 2 |
+---------+
| 1.5 |
+---------+
The following examples show operations with the fmod()
function for
values that cannot be represented precisely by the DOUBLE
or
FLOAT
types, and thus are subject to rounding error.
fmod(9.9,3.0)
returns a value slightly different than the expected
0.9 because of rounding. fmod(9.9,3.3)
returns a value quite
different from the expected value of 0 because of rounding error during intermediate
calculations.
select fmod(9.9,3.0);
+--------------------+
| fmod(9.9, 3.0) |
+--------------------+
| 0.8999996185302734 |
+--------------------+
select fmod(9.9,3.3);
+-------------------+
| fmod(9.9, 3.3) |
+-------------------+
| 3.299999713897705 |
+-------------------+
Return type: BIGINT
Usage notes:
You might use the return value in an application where you perform load balancing, bucketing, or some other technique to divide processing or storage.
Because the result can be any 64-bit value, to restrict the value to a particular
range, you can use an expression that includes the ABS()
function
and the %
(modulo) operator. For example, to produce a hash value
in the range 0-9, you could use the expression ABS(FNV_HASH(x)) %
10
.
This function implements the same algorithm that Impala uses internally for hashing, on systems where the CRC32 instructions are not available.
This function implements the Fowler–Noll–Vo hash function, in particular the FNV-1a variation. This is not a perfect hash function: some combinations of values could produce the same result value. It is not suitable for cryptographic use.
Similar input values of different types could produce different hash values, for
example the same numeric value represented as SMALLINT
or
BIGINT
, FLOAT
or DOUBLE
, or
DECIMAL(5,2)
or DECIMAL(20,5)
.
Examples:
[localhost:21000] > create table h (x int, s string);
[localhost:21000] > insert into h values (0, 'hello'), (1,'world'), (1234567890,'antidisestablishmentarianism');
[localhost:21000] > select x, fnv_hash(x) from h;
+------------+----------------------+
| x | fnv_hash(x) |
+------------+----------------------+
| 0 | -2611523532599129963 |
| 1 | 4307505193096137732 |
| 1234567890 | 3614724209955230832 |
+------------+----------------------+
[localhost:21000] > select s, fnv_hash(s) from h;
+------------------------------+---------------------+
| s | fnv_hash(s) |
+------------------------------+---------------------+
| hello | 6414202926103426347 |
| world | 6535280128821139475 |
| antidisestablishmentarianism | -209330013948433970 |
+------------------------------+---------------------+
[localhost:21000] > select s, abs(fnv_hash(s)) % 10 from h;
+------------------------------+-------------------------+
| s | abs(fnv_hash(s)) % 10.0 |
+------------------------------+-------------------------+
| hello | 8 |
| world | 6 |
| antidisestablishmentarianism | 4 |
+------------------------------+-------------------------+
For short argument values, the high-order bits of the result have relatively low entropy:
[localhost:21000] > create table b (x boolean);
[localhost:21000] > insert into b values (true), (true), (false), (false);
[localhost:21000] > select x, fnv_hash(x) from b;
+-------+---------------------+
| x | fnv_hash(x) |
+-------+---------------------+
| true | 2062020650953872396 |
| true | 2062020650953872396 |
| false | 2062021750465500607 |
| false | 2062021750465500607 |
+-------+---------------------+
Added in: Impala 1.2.2
Return type: same as the initial argument value, except that integer values are
promoted to BIGINT
and floating-point values are promoted to
DOUBLE
; use CAST()
when inserting into a smaller
numeric column
Return type: STRING
Return type: BOOLEAN
Usage notes:
Infinity and NaN can be specified in text data files as inf
and
nan
respectively, and Impala interprets them as these special values.
They can also be produced by certain arithmetic expressions; for example,
1/0
returns Infinity
and pow(-1, 0.5)
returns NaN
. Or you can cast the literal values, such as
CAST('nan' AS DOUBLE)
or CAST('inf' AS DOUBLE)
.
Return type: BOOLEAN
Usage notes:
Infinity and NaN can be specified in text data files as inf
and
nan
respectively, and Impala interprets them as these special values.
They can also be produced by certain arithmetic expressions; for example,
1/0
returns Infinity
and pow(-1, 0.5)
returns NaN
. Or you can cast the literal values, such as
CAST('nan' AS DOUBLE)
or CAST('inf' AS DOUBLE)
.
Return type: same as the initial argument value, except that integer values are
promoted to BIGINT
and floating-point values are promoted to
DOUBLE
; use CAST()
when inserting into a smaller
numeric column
Return type: DOUBLE
Return type: DOUBLE
Return type: DOUBLE
Return type: DOUBLE
Return type: The same as the integral type being checked.
Usage notes: Use the corresponding min_
and
max_
functions to check if all values in a column are within the
allowed range, before copying data or altering column definitions. If not, switch to
the next higher integral type or to a DECIMAL
with sufficient
precision.
Return type: The same as the integral type being checked.
Usage notes: Use the corresponding min_
and
max_
functions to check if all values in a column are within the
allowed range, before copying data or altering column definitions. If not, switch to
the next higher integral type or to a DECIMAL
with sufficient
precision.
%
arithmetic operator. Works with any size integer type, any size floating-point type,
and DECIMAL
with any precision and scale.
Return type: Same as the input value
Added in: Impala 2.2.0
Usage notes:
Because this function works with DECIMAL
values, prefer it over
fmod()
when working with fractional values. It is not subject to
the rounding errors that make fmod()
problematic with
floating-point numbers.
Query plans shows the MOD()
function as the %
operator.
Examples:
The following examples show how the mod()
function works for whole
numbers and fractional values, and how the %
operator works the
same way. In the case of mod(9.9,3)
, the type conversion for the
second argument results in the first argument being interpreted as
DOUBLE
, so to produce an accurate DECIMAL
result
requires casting the second argument or writing it as a DECIMAL
literal, 3.0.
select mod(10,3);
+-------------+
| mod(10, 3) |
+-------------+
| 1 |
+-------------+
select mod(5.5,2);
+--------------+
| mod(5.5, 2) |
+--------------+
| 1.5 |
+--------------+
select 10 % 3;
+--------+
| 10 % 3 |
+--------+
| 1 |
+--------+
select 5.5 % 2;
+---------+
| 5.5 % 2 |
+---------+
| 1.5 |
+---------+
select mod(9.9,3.3);
+---------------+
| mod(9.9, 3.3) |
+---------------+
| 0.0 |
+---------------+
select mod(9.9,3);
+--------------------+
| mod(9.9, 3) |
+--------------------+
| 0.8999996185302734 |
+--------------------+
select mod(9.9, cast(3 as decimal(2,1)));
+-----------------------------------+
| mod(9.9, cast(3 as decimal(2,1))) |
+-----------------------------------+
| 0.9 |
+-----------------------------------+
select mod(9.9,3.0);
+---------------+
| mod(9.9, 3.0) |
+---------------+
| 0.9 |
+---------------+
Return type: BIGINT
Usage notes:
You might use the return value in an application where you perform load balancing, bucketing, or some other technique to divide processing or storage. This function provides a good performance for all kinds of keys such as number, ascii string and UTF-8. It can be recommended as general-purpose hashing function.
Regarding comparison of murmur_hash with fnv_hash, murmur_hash is based on Murmur2 hash algorithm and fnv_hash function is based on FNV-1a hash algorithm. Murmur2 and FNV-1a can show very good randomness and performance compared with well known other hash algorithms, but Murmur2 slightly show better randomness and performance than FNV-1a. See [1][2][3] for details.
Similar input values of different types could produce different hash values, for
example the same numeric value represented as SMALLINT
or
BIGINT
, FLOAT
or DOUBLE
, or
DECIMAL(5,2)
or DECIMAL(20,5)
.
Examples:
[localhost:21000] > create table h (x int, s string);
[localhost:21000] > insert into h values (0, 'hello'), (1,'world'), (1234567890,'antidisestablishmentarianism');
[localhost:21000] > select x, murmur_hash(x) from h;
+------------+----------------------+
| x | murmur_hash(x) |
+------------+----------------------+
| 0 | 6960269033020761575 |
| 1 | -780611581681153783 |
| 1234567890 | -5754914572385924334 |
+------------+----------------------+
[localhost:21000] > select s, murmur_hash(s) from h;
+------------------------------+----------------------+
| s | murmur_hash(s) |
+------------------------------+----------------------+
| hello | 2191231550387646743 |
| world | 5568329560871645431 |
| antidisestablishmentarianism | -2261804666958489663 |
+------------------------------+----------------------+
For short argument values, the high-order bits of the result have relatively higher entropy than fnv_hash:
[localhost:21000] > create table b (x boolean);
[localhost:21000] > insert into b values (true), (true), (false), (false);
[localhost:21000] > select x, murmur_hash(x) from b;
+-------+----------------------+
| x | murmur_hash(x) |
+-------+---------------------++
| true | -5720937396023583481 |
| true | -5720937396023583481 |
| false | 6351753276682545529 |
| false | 6351753276682545529 |
+-------+--------------------+-+
Added in: Impala 2.12.0
Return type: Same as the input value
Usage notes: Use -ABS(a)
instead if you need to ensure all
return values are negative.
Return type: DOUBLE
Return type: INT
or DOUBLE
, depending on
type of arguments
Examples:
The following examples show how the FMOD()
function sometimes
returns a negative value depending on the sign of its arguments, and the
PMOD()
function returns the same value as FMOD()
,
but sometimes with the sign flipped.
select fmod(-5,2);
+-------------+
| fmod(-5, 2) |
+-------------+
| -1 |
+-------------+
select pmod(-5,2);
+-------------+
| pmod(-5, 2) |
+-------------+
| 1 |
+-------------+
select fmod(-5,-2);
+--------------+
| fmod(-5, -2) |
+--------------+
| -1 |
+--------------+
select pmod(-5,-2);
+--------------+
| pmod(-5, -2) |
+--------------+
| -1 |
+--------------+
select fmod(5,-2);
+-------------+
| fmod(5, -2) |
+-------------+
| 1 |
+-------------+
select pmod(5,-2);
+-------------+
| pmod(5, -2) |
+-------------+
| -1 |
+-------------+
Return type: Same as the input value
Usage notes: Use ABS()
instead if you need to ensure all
return values are positive.
Return type: DOUBLE
DECIMAL
value.
Usage notes:
Typically used in combination with the SCALE()
function, to
determine the appropriate
DECIMAL(precision,scale)
type
to declare in a CREATE TABLE
statement or CAST()
function.
Return type: INT
Examples:
TINYINT
value, which ranges from -128
to 127, therefore 3 decimal digits are needed to represent the entire range, and because
it is an integer value there are no fractional digits. 1.333 is interpreted as a
DECIMAL
value, with 4 digits total and 3 digits after the decimal
point.
[localhost:21000] > select precision(5), scale(5);
+--------------+----------+
| precision(5) | scale(5) |
+--------------+----------+
| 3 | 0 |
+--------------+----------+
[localhost:21000] > select precision(1.333), scale(1.333);
+------------------+--------------+
| precision(1.333) | scale(1.333) |
+------------------+--------------+
| 4 | 3 |
+------------------+--------------+
[localhost:21000] > with t1 as
( select cast(12.34 as decimal(20,2)) x union select cast(1 as decimal(8,6)) x )
select precision(x), scale(x) from t1 limit 1;
+--------------+----------+
| precision(x) | scale(x) |
+--------------+----------+
| 24 | 6 |
+--------------+----------+
DOUBLE
as
happens with the /
SQL operator. Also includes
an overload that accepts DOUBLE
arguments, discards the fractional
part of each argument value before dividing, and again returns
BIGINT
. With integer arguments, this function works the same as the
DIV
operator.
Return type: BIGINT
Return type: DOUBLE
RAND()
is called with a seed argument, it produces a consistent random sequence based on the
seed value.
Return type: DOUBLE
Usage notes: Currently, the random sequence is reset after each query, and
multiple calls to RAND()
within the same query return the same
value each time. For different number sequences that are different for each query,
pass a unique seed value to each call to RAND()
. For example,
SELECT RAND(UNIX_TIMESTAMP()) FROM ...
Examples:
The following examples show how rand()
can produce sequences of
varying predictability, so that you can reproduce query results involving random
values or generate unique sequences of random values for each query. When
rand()
is called with no argument, it generates the same sequence
of values each time, regardless of the ordering of the result set. When
rand()
is called with a constant integer, it generates a different
sequence of values, but still always the same sequence for the same seed value. If
you pass in a seed value that changes, such as the return value of the expression
unix_timestamp(now())
, each query will use a different sequence of
random values, potentially more useful in probability calculations although more
difficult to reproduce at a later time. Therefore, the final two examples with an
unpredictable seed value also include the seed in the result set, to make it
possible to reproduce the same random sequence later.
select x, rand() from three_rows;
+---+-----------------------+
| x | rand() |
+---+-----------------------+
| 1 | 0.0004714746030380365 |
| 2 | 0.5895895192351144 |
| 3 | 0.4431900859080209 |
+---+-----------------------+
select x, rand() from three_rows order by x desc;
+---+-----------------------+
| x | rand() |
+---+-----------------------+
| 3 | 0.0004714746030380365 |
| 2 | 0.5895895192351144 |
| 1 | 0.4431900859080209 |
+---+-----------------------+
select x, rand(1234) from three_rows order by x;
+---+----------------------+
| x | rand(1234) |
+---+----------------------+
| 1 | 0.7377511392057646 |
| 2 | 0.009428468537250751 |
| 3 | 0.208117277924026 |
+---+----------------------+
select x, rand(1234) from three_rows order by x desc;
+---+----------------------+
| x | rand(1234) |
+---+----------------------+
| 3 | 0.7377511392057646 |
| 2 | 0.009428468537250751 |
| 1 | 0.208117277924026 |
+---+----------------------+
select x, unix_timestamp(now()), rand(unix_timestamp(now()))
from three_rows order by x;
+---+-----------------------+-----------------------------+
| x | unix_timestamp(now()) | rand(unix_timestamp(now())) |
+---+-----------------------+-----------------------------+
| 1 | 1440777752 | 0.002051228658320023 |
| 2 | 1440777752 | 0.5098743483004506 |
| 3 | 1440777752 | 0.9517714925817081 |
+---+-----------------------+-----------------------------+
select x, unix_timestamp(now()), rand(unix_timestamp(now()))
from three_rows order by x desc;
+---+-----------------------+-----------------------------+
| x | unix_timestamp(now()) | rand(unix_timestamp(now())) |
+---+-----------------------+-----------------------------+
| 3 | 1440777761 | 0.9985985015512437 |
| 2 | 1440777761 | 0.3251255333074953 |
| 1 | 1440777761 | 0.02422675025846192 |
+---+-----------------------+-----------------------------+
Return type: Same as the input type
DECIMAL
value.
Usage notes:
Typically used in combination with the PRECISION()
function, to
determine the appropriate
DECIMAL(precision,scale)
type
to declare in a CREATE TABLE
statement or CAST()
function.
Return type: INT
Examples:
TINYINT
value, which ranges from -128
to 127, therefore 3 decimal digits are needed to represent the entire range, and because
it is an integer value there are no fractional digits. 1.333 is interpreted as a
DECIMAL
value, with 4 digits total and 3 digits after the decimal
point.
[localhost:21000] > select precision(5), scale(5);
+--------------+----------+
| precision(5) | scale(5) |
+--------------+----------+
| 3 | 0 |
+--------------+----------+
[localhost:21000] > select precision(1.333), scale(1.333);
+------------------+--------------+
| precision(1.333) | scale(1.333) |
+------------------+--------------+
| 4 | 3 |
+------------------+--------------+
[localhost:21000] > with t1 as
( select cast(12.34 as decimal(20,2)) x union select cast(1 as decimal(8,6)) x )
select precision(x), scale(x) from t1 limit 1;
+--------------+----------+
| precision(x) | scale(x) |
+--------------+----------+
| 24 | 6 |
+--------------+----------+
Return type: INT
Return type: DOUBLE
Return type: DOUBLE
Return type: DOUBLE
Return type: DOUBLE
Return type: DOUBLE
Arguments: With a single floating-point argument, removes all fractional
digits, leaving an integer value. The optional second argument specifies the number
of fractional digits to include in the return value, and only applies when the
argument type is DECIMAL
. A second argument of 0 truncates to a
whole integer value. A second argument of negative N sets N digits to 0 on the left
side of the decimal
Scale argument: The scale argument applies only when truncating
DECIMAL
values. It is an integer specifying how many significant
digits to leave to the right of the decimal point. A scale argument of 0 truncates
to a whole integer value. A scale argument of negative N sets N digits to 0 on the
left side of the decimal point.
TRUNCATE()
, DTRUNC()
,
and TRUNC()
are
aliases for the same function.
Return type: Same as the input type
Added in: The TRUNC()
alias was added in
Impala 2.10.
Usage notes:
You can also pass a DOUBLE
argument, or DECIMAL
argument with optional scale, to the DTRUNC()
or
TRUNCATE
functions. Using the TRUNC()
function for
numeric values is common with other industry-standard database systems, so you might
find such TRUNC()
calls in code that you are porting to Impala.
The TRUNC()
function also has a signature that applies to
TIMESTAMP
values. See
Impala Date and Time Functions for details.
Examples:
The following examples demonstrate the TRUNCATE()
and
DTRUNC()
signatures for this function:
select truncate(3.45);
+----------------+
| truncate(3.45) |
+----------------+
| 3 |
+----------------+
select truncate(-3.45);
+-----------------+
| truncate(-3.45) |
+-----------------+
| -3 |
+-----------------+
select truncate(3.456,1);
+--------------------+
| truncate(3.456, 1) |
+--------------------+
| 3.4 |
+--------------------+
select dtrunc(3.456,1);
+------------------+
| dtrunc(3.456, 1) |
+------------------+
| 3.4 |
+------------------+
select truncate(3.456,2);
+--------------------+
| truncate(3.456, 2) |
+--------------------+
| 3.45 |
+--------------------+
select truncate(3.456,7);
+--------------------+
| truncate(3.456, 7) |
+--------------------+
| 3.4560000 |
+--------------------+
The following examples demonstrate using trunc()
with
DECIMAL
or DOUBLE
values, and with an optional
scale argument for DECIMAL
values. (The behavior is the same for
the truncate()
and dtrunc()
aliases also.)
create table t1 (d decimal(20,7));
-- By default, no digits to the right of the decimal point.
insert into t1 values (1.1), (2.22), (3.333), (4.4444), (5.55555);
select trunc(d) from t1 order by d;
+----------+
| trunc(d) |
+----------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+----------+
-- 1 digit to the right of the decimal point.
select trunc(d,1) from t1 order by d;
+-------------+
| trunc(d, 1) |
+-------------+
| 1.1 |
| 2.2 |
| 3.3 |
| 4.4 |
| 5.5 |
+-------------+
-- 2 digits to the right of the decimal point,
-- including trailing zeroes if needed.
select trunc(d,2) from t1 order by d;
+-------------+
| trunc(d, 2) |
+-------------+
| 1.10 |
| 2.22 |
| 3.33 |
| 4.44 |
| 5.55 |
+-------------+
insert into t1 values (9999.9999), (8888.8888);
-- Negative scale truncates digits to the left
-- of the decimal point.
select trunc(d,-2) from t1 where d > 100 order by d;
+--------------+
| trunc(d, -2) |
+--------------+
| 8800 |
| 9900 |
+--------------+
-- The scale of the result is adjusted to match the
-- scale argument.
select trunc(d,2),
precision(trunc(d,2)) as p,
scale(trunc(d,2)) as s
from t1 order by d;
+-------------+----+---+
| trunc(d, 2) | p | s |
+-------------+----+---+
| 1.10 | 15 | 2 |
| 2.22 | 15 | 2 |
| 3.33 | 15 | 2 |
| 4.44 | 15 | 2 |
| 5.55 | 15 | 2 |
| 8888.88 | 15 | 2 |
| 9999.99 | 15 | 2 |
+-------------+----+---+
create table dbl (d double);
insert into dbl values
(1.1), (2.22), (3.333), (4.4444), (5.55555),
(8888.8888), (9999.9999);
-- With double values, there is no optional scale argument.
select trunc(d) from dbl order by d;
+----------+
| trunc(d) |
+----------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 8888 |
| 9999 |
+----------+
Return type: STRING
expr
value
would fall in the histogram where its range between min_value
and
max_value
is divided into num_buckets
buckets of
identical sizes.