DECIMAL Data Type (Impala 1.4 or higher only)
A numeric data type with fixed scale and precision, used in CREATE TABLE
and ALTER
TABLE
statements. Suitable for financial and other arithmetic calculations where the imprecise
representation and rounding behavior of FLOAT
and DOUBLE
make those types
impractical.
Syntax:
In the column definition of a CREATE TABLE
statement:
column_name DECIMAL[(precision[,scale])]
DECIMAL
with no precision or scale values is equivalent to DECIMAL(9,0)
.
Precision and Scale:
precision represents the total number of digits that can be represented by the column, regardless of the location of the decimal point. This value must be between 1 and 38. For example, representing integer values up to 9999, and floatingpoint values up to 99.99, both require a precision of 4. You can also represent corresponding negative values, without any change in the precision. For example, the range 9999 to 9999 still only requires a precision of 4.
scale represents the number of fractional digits. This value must be less than or equal to precision. A scale of 0 produces integral values, with no fractional part. If precision and scale are equal, all the digits come after the decimal point, making all the values between 0 and 0.999... or 0 and 0.999...
When precision and scale are omitted, a DECIMAL
value
is treated as DECIMAL(9,0)
, that is, an integer value ranging from
999,999,999
to 999,999,999
. This is the largest DECIMAL
value that can still be represented in 4 bytes. If precision is specified but scale is omitted, Impala uses a
value of zero for the scale.
Both precision and scale must be specified as integer literals, not any other kind of constant expressions.
To check the precision or scale for arbitrary values, you can call the
precision()
and
scale()
builtin functions. For example, you might use these values to figure out how
many characters are required for various fields in a report, or to understand the rounding characteristics of
a formula as applied to a particular DECIMAL
column.
Range:
The maximum precision value is 38. Thus, the largest integral value is represented by
DECIMAL(38,0)
(999... with 9 repeated 38 times). The most precise fractional value (between
0 and 1, or 0 and 1) is represented by DECIMAL(38,38)
, with 38 digits to the right of the
decimal point. The value closest to 0 would be .0000...1 (37 zeros and the final 1). The value closest to 1
would be .999... (9 repeated 38 times).
For a given precision and scale, the range of DECIMAL
values is the same in the positive and
negative directions. For example, DECIMAL(4,2)
can represent from 99.99 to 99.99. This is
different from other integral numeric types where the positive and negative bounds differ slightly.
When you use DECIMAL
values in arithmetic expressions, the precision and scale of the result
value are determined as follows:

For addition and subtraction, the precision and scale are based on the maximum possible result, that is, if all the digits of the input values were 9s and the absolute values were added together.

For multiplication, the precision is the sum of the precisions of the input values. The scale is the sum of the scales of the input values.

For division, Impala sets the precision and scale to values large enough to represent the whole and fractional parts of the result.

For
UNION
, the scale is the larger of the scales of the input values, and the precision is increased if necessary to accommodate any additional fractional digits. If the same input value has the largest precision and the largest scale, the result value has the same precision and scale. If one value has a larger precision but smaller scale, the scale of the result value is increased. For example,DECIMAL(20,2) UNION DECIMAL(8,6)
produces a result of typeDECIMAL(24,6)
. The extra 4 fractional digits of scale (62) are accommodated by extending the precision by the same amount (20+4). 
To doublecheck, you can always call the
PRECISION()
andSCALE()
functions on the results of an arithmetic expression to see the relevant values, or use aCREATE TABLE AS SELECT
statement to define a column based on the return type of the expression.
Compatibility:

Using the
DECIMAL
type is only supported under Impala 1.4 and higher. 
Use the
DECIMAL
data type in Impala for applications where you used theNUMBER
data type in Oracle. The ImpalaDECIMAL
type does not support the Oracle idioms of*
for scale or negative values for precision.
Conversions and casting:
Casting an integer or floatingpoint value N
to
TIMESTAMP
produces a value that is N
seconds past the start of the epoch
date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
If the setting use_local_tz_for_unix_timestamp_conversions=true
is in effect,
the resulting TIMESTAMP
represents a date and time in the local time zone.
Impala automatically converts between DECIMAL
and other numeric types where possible. A
DECIMAL
with zero scale is converted to or from the smallest appropriate integral type. A
DECIMAL
with a fractional part is automatically converted to or from the smallest
appropriate floatingpoint type. If the destination type does not have sufficient precision or scale to hold
all possible values of the source type, Impala raises an error and does not convert the value.
For example, these statements show how expressions of DECIMAL
and other types are reconciled
to the same type in the context of UNION
queries and INSERT
statements:
[localhost:21000] > select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
++
 x 
++
 1.5000 
 1.0000 
++
[localhost:21000] > create table int_vs_decimal as select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
++
 summary 
++
 Inserted 2 row(s) 
++
[localhost:21000] > desc int_vs_decimal;
++++
 name  type  comment 
++++
 x  decimal(14,4)  
++++
To avoid potential conversion errors, you can use CAST()
to convert DECIMAL
values to FLOAT
, TINYINT
, SMALLINT
, INT
,
BIGINT
, STRING
, TIMESTAMP
, or BOOLEAN
.
You can use exponential notation in DECIMAL
literals or when casting from
STRING
, for example 1.0e6
to represent one million.
If you cast a value with more fractional digits than the scale of the destination type, any extra fractional
digits are truncated (not rounded). Casting a value to a target type with not enough precision produces a
result of NULL
and displays a runtime warning.
[localhost:21000] > select cast(1.239 as decimal(3,2));
++
 cast(1.239 as decimal(3,2)) 
++
 1.23 
++
[localhost:21000] > select cast(1234 as decimal(3));
++
 cast(1234 as decimal(3,0)) 
++
 NULL 
++
WARNINGS: Expression overflowed, returning NULL
When you specify integer literals, for example in INSERT ... VALUES
statements or arithmetic
expressions, those numbers are interpreted as the smallest applicable integer type. You must use
CAST()
calls for some combinations of integer literals and DECIMAL
precision. For example, INT
has a maximum value that is 10 digits long,
TINYINT
has a maximum value that is 3 digits long, and so on. If you specify a value such as
123456 to go into a DECIMAL
column, Impala checks if the column has enough precision to
represent the largest value of that integer type, and raises an error if not. Therefore, use an expression
like CAST(123456 TO DECIMAL(9,0))
for DECIMAL
columns with precision 9 or
less, CAST(50 TO DECIMAL(2,0))
for DECIMAL
columns with precision 2 or
less, and so on. For DECIMAL
columns with precision 10 or greater, Impala automatically
interprets the value as the correct DECIMAL
type; however, because
DECIMAL(10)
requires 8 bytes of storage while DECIMAL(9)
requires only 4
bytes, only use precision of 10 or higher when actually needed.
[localhost:21000] > create table decimals_9_0 (x decimal);
[localhost:21000] > insert into decimals_9_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
ERROR: AnalysisException: Possible loss of precision for target table 'decimal_testing.decimals_9_0'.
Expression '1' (type: INT) would need to be cast to DECIMAL(9,0) for column 'x'
[localhost:21000] > insert into decimals_9_0 values (cast(1 as decimal)), (cast(2 as decimal)), (cast(4 as decimal)), (cast(8 as decimal)), (cast(16 as decimal)), (cast(1024 as decimal)), (cast(32768 as decimal)), (cast(65536 as decimal)), (cast(1000000 as decimal));
[localhost:21000] > create table decimals_10_0 (x decimal(10,0));
[localhost:21000] > insert into decimals_10_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
Be aware that in memory and for binary file formats such as Parquet or Avro, DECIMAL(10)
or
higher consumes 8 bytes while DECIMAL(9)
(the default for DECIMAL
) or lower
consumes 4 bytes. Therefore, to conserve space in large tables, use the smallestprecision
DECIMAL
type that is appropriate and CAST()
literal values where necessary,
rather than declaring DECIMAL
columns with high precision for convenience.
To represent a very large or precise DECIMAL
value as a literal, for example one that
contains more digits than can be represented by a BIGINT
literal, use a quoted string or a
floatingpoint value for the number, and CAST()
to the desired DECIMAL
type:
insert into decimals_38_5 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000),
(cast("999999999999999999999999999999" as decimal(38,5))),
(cast(999999999999999999999999999999. as decimal(38,5)));

The result of the
SUM()
aggregate function onDECIMAL
values is promoted to a precision of 38, with the same precision as the underlying column. Thus, the result can represent the largest possible value at that particular precision. 
STRING
columns, literals, or expressions can be converted toDECIMAL
as long as the overall number of digits and digits to the right of the decimal point fit within the specified precision and scale for the declaredDECIMAL
type. By default, aDECIMAL
value with no specified scale or precision can hold a maximum of 9 digits of an integer value. If there are more digits in the string value than are allowed by theDECIMAL
scale and precision, the result isNULL
.The following examples demonstrate how
STRING
values with integer and fractional parts are represented when converted toDECIMAL
. If the scale is 0, the number is treated as an integer value with a maximum of precision digits. If the precision is greater than 0, the scale must be increased to account for the digits both to the left and right of the decimal point. As the precision increases, output values are printed with additional trailing zeros after the decimal point if needed. Any trailing zeros after the decimal point in theSTRING
value must fit within the number of digits specified by the precision.[localhost:21000] > select cast('100' as decimal);  Small integer value fits within 9 digits of scale. ++  cast('100' as decimal(9,0))  ++  100  ++ [localhost:21000] > select cast('100' as decimal(3,0));  Small integer value fits within 3 digits of scale. ++  cast('100' as decimal(3,0))  ++  100  ++ [localhost:21000] > select cast('100' as decimal(2,0));  2 digits of scale is not enough! ++  cast('100' as decimal(2,0))  ++  NULL  ++ [localhost:21000] > select cast('100' as decimal(3,1));  (3,1) = 2 digits left of the decimal point, 1 to the right. Not enough. ++  cast('100' as decimal(3,1))  ++  NULL  ++ [localhost:21000] > select cast('100' as decimal(4,1));  4 digits total, 1 to the right of the decimal point. ++  cast('100' as decimal(4,1))  ++  100.0  ++ [localhost:21000] > select cast('98.6' as decimal(3,1));  (3,1) can hold a 3 digit number with 1 fractional digit. ++  cast('98.6' as decimal(3,1))  ++  98.6  ++ [localhost:21000] > select cast('98.6' as decimal(15,1));  Larger scale allows bigger numbers but still only 1 fractional digit. ++  cast('98.6' as decimal(15,1))  ++  98.6  ++ [localhost:21000] > select cast('98.6' as decimal(15,5));  Larger precision allows more fractional digits, outputs trailing zeros. ++  cast('98.6' as decimal(15,5))  ++  98.60000  ++ [localhost:21000] > select cast('98.60000' as decimal(15,1));  Trailing zeros in the string must fit within 'scale' digits (1 in this case). ++  cast('98.60000' as decimal(15,1))  ++  NULL  ++

Most builtin arithmetic functions such as
SIN()
andCOS()
continue to accept onlyDOUBLE
values because they are so commonly used in scientific context for calculations of IEEE 954compliant values. The builtin functions that accept and returnDECIMAL
are:
ABS()

CEIL()

COALESCE()

FLOOR()

FNV_HASH()

GREATEST()

IF()

ISNULL()

LEAST()

NEGATIVE()

NULLIF()

POSITIVE()

PRECISION()

ROUND()

SCALE()

TRUNCATE()

ZEROIFNULL()


BIGINT
,INT
,SMALLINT
, andTINYINT
values can all be cast toDECIMAL
. The number of digits to the left of the decimal point in theDECIMAL
type must be sufficient to hold the largest value of the corresponding integer type. Note that integer literals are treated as the smallest appropriate integer type, meaning there is sometimes a range of values that require one more digit ofDECIMAL
scale than you might expect. For integer values, the precision of theDECIMAL
type can be zero; if the precision is greater than zero, remember to increase the scale value by an equivalent amount to hold the required number of digits to the left of the decimal point.The following examples show how different integer types are converted to
DECIMAL
.[localhost:21000] > select cast(1 as decimal(1,0)); ++  cast(1 as decimal(1,0))  ++  1  ++ [localhost:21000] > select cast(9 as decimal(1,0)); ++  cast(9 as decimal(1,0))  ++  9  ++ [localhost:21000] > select cast(10 as decimal(1,0)); ++  cast(10 as decimal(1,0))  ++  10  ++ [localhost:21000] > select cast(10 as decimal(1,1)); ++  cast(10 as decimal(1,1))  ++  10.0  ++ [localhost:21000] > select cast(100 as decimal(1,1)); ++  cast(100 as decimal(1,1))  ++  100.0  ++ [localhost:21000] > select cast(1000 as decimal(1,1)); ++  cast(1000 as decimal(1,1))  ++  1000.0  ++

When a
DECIMAL
value is converted to any of the integer types, any fractional part is truncated (that is, rounded towards zero):[localhost:21000] > create table num_dec_days (x decimal(4,1)); [localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1))); [localhost:21000] > insert into num_dec_days values (cast(0.1 as decimal(4,1))), (cast(.9 as decimal(4,1))), (cast(9.1 as decimal(4,1))), (cast(9.9 as decimal(4,1))); [localhost:21000] > select cast(x as int) from num_dec_days; ++  cast(x as int)  ++  1   2   4   0   0   9   9  ++

You cannot directly cast
TIMESTAMP
orBOOLEAN
values to or fromDECIMAL
values. You can turn aDECIMAL
value into a timerelated representation using a twostep process, by converting it to an integer value and then using that result in a call to a date and time function such asfrom_unixtime()
.[localhost:21000] > select from_unixtime(cast(cast(1000.0 as decimal) as bigint)); ++  from_unixtime(cast(cast(1000.0 as decimal(9,0)) as bigint))  ++  19700101 00:16:40  ++ [localhost:21000] > select now() + interval cast(x as int) days from num_dec_days;  x is a DECIMAL column. [localhost:21000] > create table num_dec_days (x decimal(4,1)); [localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1))); [localhost:21000] > select now() + interval cast(x as int) days from num_dec_days;  The 4.5 value is truncated to 4 and becomes '4 days'. ++  now() + interval cast(x as int) days  ++  20140513 23:11:55.163284000   20140514 23:11:55.163284000   20140516 23:11:55.163284000  ++

Because values in
INSERT
statements are checked rigorously for type compatibility, be prepared to useCAST()
function calls around literals, column references, or other expressions that you are inserting into aDECIMAL
column.
NULL considerations: Casting any nonnumeric value to this type produces a NULL
value.
DECIMAL differences from integer and floatingpoint types:
With the DECIMAL
type, you are concerned with the number of overall digits of a number
rather than powers of 2 (as in TINYINT
, SMALLINT
, and so on). Therefore,
the limits with integral values of DECIMAL
types fall around 99, 999, 9999, and so on rather
than 32767, 65535, 2
^{32}
1, and so on. For fractional values, you do not need to account for imprecise representation of the
fractional part according to the IEEE954 standard (as in FLOAT
and
DOUBLE
). Therefore, when you insert a fractional value into a DECIMAL
column, you can compare, sum, query, GROUP BY
, and so on that column and get back the
original values rather than some "close but not identical" value.
FLOAT
and DOUBLE
can cause problems or unexpected behavior due to inability
to precisely represent certain fractional values, for example dollar and cents values for currency. You might
find output values slightly different than you inserted, equality tests that do not match precisely, or
unexpected values for GROUP BY
columns. DECIMAL
can help reduce unexpected
behavior and rounding errors, at the expense of some performance overhead for assignments and comparisons.

When you use an integer literal such as
1
or999
in a SQL statement, depending on the context, Impala will treat it as either the smallest appropriateDECIMAL
type, or the smallest integer type (TINYINT
,SMALLINT
,INT
, orBIGINT
). To minimize memory usage, Impala prefers to treat the literal as the smallest appropriate integer type. 
When you use a floatingpoint literal such as
1.1
or999.44
in a SQL statement, depending on the context, Impala will treat it as either the smallest appropriateDECIMAL
type, or the smallest floatingpoint type (FLOAT
orDOUBLE
). To avoid loss of accuracy, Impala prefers to treat the literal as aDECIMAL
.
Storage considerations:

Only the precision determines the storage size for
DECIMAL
values; the scale setting has no effect on the storage size. 
Text, RCFile, and SequenceFile tables all use ASCIIbased formats. In these textbased file formats,
leading zeros are not stored, but trailing zeros are stored. In these tables, each
DECIMAL
value takes up as many bytes as there are digits in the value, plus an extra byte if the decimal point is present and an extra byte for negative values. Once the values are loaded into memory, they are represented in 4, 8, or 16 bytes as described in the following list items. The ondisk representation varies depending on the file format of the table. 
Parquet and Avro tables use binary formats, In these tables, Impala stores each value in as few bytes as
possible
depending on the precision specified for the
DECIMAL
column.
In memory,
DECIMAL
values with precision of 9 or less are stored in 4 bytes. 
In memory,
DECIMAL
values with precision of 10 through 18 are stored in 8 bytes. 
In memory,
DECIMAL
values with precision greater than 18 are stored in 16 bytes.

In memory,
File format considerations:

The
DECIMAL
data type can be stored in any of the file formats supported by Impala, as described in How Impala Works with Hadoop File Formats. Impala only writes to tables that use the Parquet and text formats, so those formats are the focus for file format compatibility. 
Impala can query Avro, RCFile, or SequenceFile tables containing
DECIMAL
columns, created by other Hadoop components. 
You can use
DECIMAL
columns in Impala tables that are mapped to HBase tables. Impala can query and insert into such tables. 
Text, RCFile, and SequenceFile tables all use ASCIIbased formats. In these tables, each
DECIMAL
value takes up as many bytes as there are digits in the value, plus an extra byte if the decimal point is present. The binary format of Parquet or Avro files offers more compact storage forDECIMAL
columns. 
Parquet and Avro tables use binary formats, In these tables, Impala stores each value in 4, 8, or 16 bytes
depending on the precision specified for the
DECIMAL
column.
UDF considerations: When writing a C++ UDF, use the DecimalVal
data type defined in
/usr/include/impala_udf/udf.h.
Partitioning:
You can use a DECIMAL
column as a partition key. Doing so provides a better match between
the partition key values and the HDFS directory names than using a DOUBLE
or
FLOAT
partitioning column:
Schema evolution considerations:

For textbased formats (text, RCFile, and SequenceFile tables), you can issue an
ALTER TABLE ... REPLACE COLUMNS
statement to change the precision and scale of an existingDECIMAL
column. As long as the values in the column fit within the new precision and scale, they are returned correctly by a query. Any values that do not fit within the new precision and scale are returned asNULL
, and Impala reports the conversion error. Leading zeros do not count against the precision value, but trailing zeros after the decimal point do.[localhost:21000] > create table text_decimals (x string); [localhost:21000] > insert into text_decimals values ("1"), ("2"), ("99.99"), ("1.234"), ("000001"), ("1.000000000"); [localhost:21000] > select * from text_decimals; ++  x  ++  1   2   99.99   1.234   000001   1.000000000  ++ [localhost:21000] > alter table text_decimals replace columns (x decimal(4,2)); [localhost:21000] > select * from text_decimals; ++  x  ++  1.00   2.00   99.99   NULL   1.00   NULL  ++ ERRORS: Backend 0:Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.234) file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/634d4bd3aa0 e8420b4b13bab7f1be787_56794587_data.0 record: 1.234 Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.000000000) file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/cd40dc68e20 c565acc4bd86c724c96ba_311873428_data.0 record: 1.000000000

For binary formats (Parquet and Avro tables), although an
ALTER TABLE ... REPLACE COLUMNS
statement that changes the precision or scale of aDECIMAL
column succeeds, any subsequent attempt to query the changed column results in a fatal error. (The other columns can still be queried successfully.) This is because the metadata about the columns is stored in the data files themselves, andALTER TABLE
does not actually make any updates to the data files. If the metadata in the data files disagrees with the metadata in the metastore database, Impala cancels the query.
Examples:
CREATE TABLE t1 (x DECIMAL, y DECIMAL(5,2), z DECIMAL(25,0));
INSERT INTO t1 VALUES (5, 99.44, 123456), (300, 6.7, 999999999);
SELECT x+y, ROUND(y,1), z/98.6 FROM t1;
SELECT CAST(1000.5 AS DECIMAL);
HBase considerations: This data type is fully compatible with HBase tables.
Parquet considerations: This type is fully compatible with Parquet tables.
Text table considerations: Values of this type are potentially larger in text tables than in tables using Parquet or other binary formats.
Column statistics considerations: Because this type has a fixed size, the maximum and average size
fields are always filled in for column statistics, even before you run the COMPUTE STATS
statement.
Kudu considerations:
Currently, the data types CHAR
, VARCHAR
,
ARRAY
, MAP
, and STRUCT
cannot be used with Kudu tables.
Related information:
Numeric Literals, TINYINT Data Type,
SMALLINT Data Type, INT Data Type,
BIGINT Data Type, DECIMAL Data Type (Impala 1.4 or higher only),
Impala Mathematical Functions (especially PRECISION()
and
SCALE()
)