Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as built-in functions. See Porting SQL from Other Database Systems to Impala for a general discussion of adapting SQL code from a variety of database systems to Impala.
Because Impala and Hive share the same metastore database and their tables are often used interchangeably, the following section covers differences between Impala and Hive in detail.
The current release of Impala does not support the following SQL features that you might be familiar with from HiveQL:
TRANSFORM
, custom file formats, or custom SerDes.
DATE
data type.
covar_pop
, covar_samp
,
corr
, percentile
, percentile_approx
,
histogram_numeric
, collect_set
; Impala supports the set of aggregate
functions listed in Impala Aggregate Functions and analytic
functions listed in Impala Analytic Functions.
STRUCT
, ARRAY
, or MAP
), using join notation
rather than the EXPLODE()
keyword.
See Complex Types (Impala 2.3 or higher only) for details about Impala support for complex types.
Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs.
Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently support user-defined table generating functions (UDTFs).
Only Impala-supported column types are supported in Java-based UDFs.
The Hive current_user()
function cannot be called from a Java UDF
through Impala.
Impala does not currently support these HiveQL statements:
ANALYZE TABLE
(the Impala equivalent is COMPUTE STATS
)
DESCRIBE COLUMN
DESCRIBE DATABASE
EXPORT TABLE
IMPORT TABLE
SHOW TABLE EXTENDED
SHOW TBLPROPERTIES
SHOW INDEXES
SHOW COLUMNS
INSERT OVERWRITE DIRECTORY
; use INSERT OVERWRITE table_name
or CREATE TABLE AS SELECT
to materialize query results into the HDFS directory associated
with an Impala table.
Impala respects the serialization.null.format
table
property only for TEXT tables and ignores the property for Parquet and
other formats. Hive respects the serialization.null.format
property for Parquet and other formats and converts matching values
to NULL during the scan. See Using Text Data Files with Impala Tables for
using the table property in Impala.
This section covers instances where Impala and Hive have similar functionality, sometimes including the same syntax, but there are differences in the runtime semantics of those features.
Security:
Impala utilizes the Apache Ranger authorization framework, which provides fine-grained role-based access control to protect data against unauthorized access or tampering.
The Hive component now includes Ranger-enabled GRANT
,
REVOKE
, and CREATE/DROP ROLE
statements. Earlier Hive releases had a
privilege system with GRANT
and REVOKE
statements that were primarily
intended to prevent accidental deletion of data, rather than a security mechanism to protect against
malicious users.
Impala can make use of privileges set up through Hive GRANT
and REVOKE
statements.
Impala has its own GRANT
and REVOKE
statements in Impala 2.0 and higher.
See Impala Authorization for the details of authorization in Impala.
SQL statements and clauses:
The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL statement and clause names:
[SHUFFLE]
and
[NOSHUFFLE]
rather than MapJoin
or StreamJoin
. See
Joins in Impala SELECT Statements for the Impala details.
SORT BY
, DISTRIBUTE
BY
, or CLUSTER BY
.
FROM
clause.
Data types:
CAST()
for these
conversions. SMALLINT
to a BIGINT
or
FLOAT
, but to convert from
DOUBLE
to FLOAT
or
INT
to TINYINT
requires a call
to CAST()
in the query. TIMESTAMP
data type and the
from_unixtime()
format string; see TIMESTAMP Data Type for details. See the topics under Data Types for full details on
implicit and explicit casting for each data type, and Impala Type Conversion Functions for
details about the CAST()
function.
TIMESTAMP
data or when converting between
the local time zone and UTC.
TIMESTAMP
type can represent dates ranging from 1400-01-01 to 9999-12-31.
This is different from the Hive date range, which is 0000-01-01 to 9999-12-31.
Impala does not return column overflows as NULL
, so that customers can
distinguish between NULL
data and overflow conditions similar to how
they do so with traditional database systems. Impala returns the largest or smallest
value in the range for the type. For example, valid values for a
tinyint
range from -128 to 127. In Impala, a tinyint
with a value of -200 returns -128 rather than NULL
. A
tinyint
with a value of 200 returns 127.
Miscellaneous features: