SQL Differences Between Impala and Hive

Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as built-in functions. See Porting SQL from Other Database Systems to Impala for a general discussion of adapting SQL code from a variety of database systems to Impala.

Because Impala and Hive share the same metastore database and their tables are often used interchangeably, the following section covers differences between Impala and Hive in detail.

HiveQL Features not Available in Impala

The current release of Impala does not support the following SQL features that you might be familiar with from HiveQL:

User-defined functions (UDFs) are supported starting in Impala 1.2. See Impala User-Defined Functions (UDFs) for full details on Impala UDFs.
  • Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs.

  • Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently support user-defined table generating functions (UDTFs).

  • Only Impala-supported column types are supported in Java-based UDFs.

  • The Hive current_user() function cannot be called from a Java UDF through Impala.

Impala does not currently support these HiveQL statements:

Impala respects the serialization.null.format table property only for TEXT tables and ignores the property for Parquet and other formats. Hive respects the serialization.null.format property for Parquet and other formats and converts matching values to NULL during the scan. See Using Text Data Files with Impala Tables for using the table property in Impala.

Semantic Differences Between Impala and HiveQL Features

This section covers instances where Impala and Hive have similar functionality, sometimes including the same syntax, but there are differences in the runtime semantics of those features.

Security:

Impala utilizes the Apache Sentry authorization framework, which provides fine-grained role-based access control to protect data against unauthorized access or tampering.

The Hive component now includes Sentry-enabled GRANT, REVOKE, and CREATE/DROP ROLE statements. Earlier Hive releases had a privilege system with GRANT and REVOKE statements that were primarily intended to prevent accidental deletion of data, rather than a security mechanism to protect against malicious users.

Impala can make use of privileges set up through Hive GRANT and REVOKE statements. Impala has its own GRANT and REVOKE statements in Impala 2.0 and higher. See Enabling Sentry Authorization for Impala for the details of authorization in Impala, including how to switch from the original policy file-based privilege model to the Sentry service using privileges stored in the metastore database.

SQL statements and clauses:

The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL statement and clause names:

Data types:

Miscellaneous features: