SQL Differences Between Impala and Hive
Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as built-in functions. See Porting SQL from Other Database Systems to Impala for a general discussion of adapting SQL code from a variety of database systems to Impala.
Because Impala and Hive share the same metastore database and their tables are often used interchangeably, the following section covers differences between Impala and Hive in detail.
HiveQL Features not Available in Impala
The current release of Impala does not support the following SQL features that you might be familiar with from HiveQL:
-
Extensibility mechanisms such as
TRANSFORM
, custom file formats, or custom SerDes. -
The
DATE
data type. - XML functions.
-
Certain aggregate functions from HiveQL:
covar_pop
,covar_samp
,corr
,percentile
,percentile_approx
,histogram_numeric
,collect_set
; Impala supports the set of aggregate functions listed in Impala Aggregate Functions and analytic functions listed in Impala Analytic Functions. - Sampling.
-
Lateral views. In Impala 2.3 and higher, Impala supports queries on complex types
(
STRUCT
,ARRAY
, orMAP
), using join notation rather than theEXPLODE()
keyword. See Complex Types (Impala 2.3 or higher only) for details about Impala support for complex types.
-
Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs.
-
Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently support user-defined table generating functions (UDTFs).
-
Only Impala-supported column types are supported in Java-based UDFs.
-
The Hive
current_user()
function cannot be called from a Java UDF through Impala.
Impala does not currently support these HiveQL statements:
-
ANALYZE TABLE
(the Impala equivalent isCOMPUTE STATS
) -
DESCRIBE COLUMN
-
DESCRIBE DATABASE
-
EXPORT TABLE
-
IMPORT TABLE
-
SHOW TABLE EXTENDED
-
SHOW TBLPROPERTIES
-
SHOW INDEXES
-
SHOW COLUMNS
-
INSERT OVERWRITE DIRECTORY
; useINSERT OVERWRITE table_name
orCREATE TABLE AS SELECT
to materialize query results into the HDFS directory associated with an Impala table.
Impala respects the serialization.null.format
table
property only for TEXT tables and ignores the property for Parquet and
other formats. Hive respects the serialization.null.format
property for Parquet and other formats and converts matching values
to NULL during the scan. See Using Text Data Files with Impala Tables for
using the table property in Impala.
Semantic Differences Between Impala and HiveQL Features
This section covers instances where Impala and Hive have similar functionality, sometimes including the same syntax, but there are differences in the runtime semantics of those features.
Security:
Impala utilizes the Apache Ranger authorization framework, which provides fine-grained role-based access control to protect data against unauthorized access or tampering.
The Hive component now includes Ranger-enabled GRANT
,
REVOKE
, and CREATE/DROP ROLE
statements. Earlier Hive releases had a
privilege system with GRANT
and REVOKE
statements that were primarily
intended to prevent accidental deletion of data, rather than a security mechanism to protect against
malicious users.
Impala can make use of privileges set up through Hive GRANT
and REVOKE
statements.
Impala has its own GRANT
and REVOKE
statements in Impala 2.0 and higher.
See Impala Authorization for the details of authorization in Impala.
SQL statements and clauses:
The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL statement and clause names:
-
Impala uses different syntax and names for query hints,
[SHUFFLE]
and[NOSHUFFLE]
rather thanMapJoin
orStreamJoin
. See Joins in Impala SELECT Statements for the Impala details. -
Impala does not expose MapReduce specific features of
SORT BY
,DISTRIBUTE BY
, orCLUSTER BY
. -
Impala does not require queries to include a
FROM
clause.
Data types:
- Impala supports a limited set of implicit casts. This can help
avoid undesired results from unexpected casting behavior.
- Impala does not implicitly cast between string and numeric or
Boolean types. Always use
CAST()
for these conversions. - Impala does perform implicit casts among the numeric types,
when going from a smaller or less precise type to a larger or more
precise one. For example, Impala will implicitly convert a
SMALLINT
to aBIGINT
orFLOAT
, but to convert fromDOUBLE
toFLOAT
orINT
toTINYINT
requires a call toCAST()
in the query. - Impala does perform implicit casts from string to timestamp.
Impala has a restricted set of literal formats for the
TIMESTAMP
data type and thefrom_unixtime()
format string; see TIMESTAMP Data Type for details.
See the topics under Data Types for full details on implicit and explicit casting for each data type, and Impala Type Conversion Functions for details about the
CAST()
function. - Impala does not implicitly cast between string and numeric or
Boolean types. Always use
-
Impala does not store or interpret timestamps using the local timezone, to avoid undesired results from
unexpected time zone issues. Timestamps are stored and interpreted relative to UTC. This difference can
produce different results for some calls to similarly named date/time functions between Impala and Hive.
See Impala Date and Time Functions for details about the Impala
functions. See TIMESTAMP Data Type for a discussion of how Impala handles
time zones, and configuration options you can use to make Impala match the Hive behavior more closely
when dealing with Parquet-encoded
TIMESTAMP
data or when converting between the local time zone and UTC. -
The Impala
TIMESTAMP
type can represent dates ranging from 1400-01-01 to 9999-12-31. This is different from the Hive date range, which is 0000-01-01 to 9999-12-31. -
Impala does not return column overflows as
NULL
, so that customers can distinguish betweenNULL
data and overflow conditions similar to how they do so with traditional database systems. Impala returns the largest or smallest value in the range for the type. For example, valid values for atinyint
range from -128 to 127. In Impala, atinyint
with a value of -200 returns -128 rather thanNULL
. Atinyint
with a value of 200 returns 127.
Miscellaneous features:
- Impala does not provide virtual columns.
- Impala does not expose locking.
- Impala does not expose some configuration properties.