Incompatible Changes and Limitations in Apache Impala

The Impala version covered by this documentation library contains the following incompatible changes. These are things such as file format changes, removed features, or changes to implementation, default configuration, dependencies, or prerequisites that could cause issues during or after an Impala upgrade.

Even added SQL statements or clauses can produce incompatibilities, if you have databases, tables, or columns whose names conflict with the new keywords. See Impala Reserved Words for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.

Incompatible Changes Introduced in Impala 3.1.x

For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 3.1.

Incompatible Changes Introduced in Impala 3.0.x

For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 3.0.

Incompatible Changes Introduced in Impala 2.12.x

For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 2.12.

Incompatible Changes Introduced in Impala 2.11.x

For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 2.11.

Incompatible Changes Introduced in Impala 2.10.x

For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 2.10.

Incompatible Changes Introduced in Impala 2.9.x

For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 2.9.

Incompatible Changes Introduced in Impala 2.8.x

Incompatible Changes Introduced in Impala 2.7.x

Incompatible Changes Introduced in Impala 2.6.x

Certain features are turned off by default, to avoid regressions or unexpected behavior following an upgrade. Consider turning on these features after suitable testing:

Incompatible Changes Introduced in Impala 2.5.x

Incompatible Changes Introduced in Impala 2.4.x

Other than support for DSSD storage, the Impala feature set for Impala 2.4 is the same as for Impala 2.3. Therefore, there are no incompatible changes for Impala introduced in Impala 2.4.

Incompatible Changes Introduced in Impala 2.3.x

Note:

The use of the Llama component for integrated resource management within YARN is no longer supported with Impala 2.3 and higher. The Llama support code is removed entirely in Impala 2.8 and higher.

For clusters running Impala alongside other data management components, you define static service pools to define the resources available to Impala and other components. Then within the area allocated for Impala, you can create dynamic service pools, each with its own settings for the Impala admission control feature.

Incompatible Changes Introduced in Impala 2.2.x

Changes to File Handling

Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any files with extensions .tmp or .copying are not considered part of the Impala table. The suffix matching is case-insensitive, so for example Impala ignores both .copying and .COPYING suffixes.

The log rotation feature in Impala 2.2.0 and higher means that older log files are now removed by default. The default is to preserve the latest 10 log files for each severity level, for each Impala-related daemon. If you have set up your own log rotation processes that expect older files to be present, either adjust your procedures or change the Impala -max_log_files setting. See Rotating Impala Logs for details.

Changes to Prerequisites

The prerequisite for CPU architecture has been relaxed in Impala 2.2.0 and higher. From this release onward, Impala works on CPUs that have the SSSE3 instruction set. The SSE4 instruction set is no longer required. This relaxed requirement simplifies the upgrade planning from Impala 1.x releases, which also worked on SSSE3-enabled processors.

Incompatible Changes Introduced in Impala 2.1.x

Changes to Prerequisites

Currently, Impala 2.1.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check the CPU level of the hosts in your cluster before upgrading to Impala 2.1.

Changes to Output Format

The "small query" optimization feature introduces some new information in the EXPLAIN plan, which you might need to account for if you parse the text of the plan output.

New Reserved Words

New SQL syntax introduces additional reserved words: FOR, GRANT, REVOKE, ROLE, ROLES, INCREMENTAL. As always, see Impala Reserved Words for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.

Incompatible Changes Introduced in Impala 2.0.5

No incompatible changes.

Incompatible Changes Introduced in Impala 2.0.4

No incompatible changes.

Incompatible Changes Introduced in Impala 2.0.3

Incompatible Changes Introduced in Impala 2.0.2

No incompatible changes.

Incompatible Changes Introduced in Impala 2.0.1

Incompatible Changes Introduced in Impala 2.0.0

Changes to Prerequisites

Currently, Impala 2.0.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check the CPU level of the hosts in your cluster before upgrading to Impala 2.0.

Changes to Query Syntax

The new syntax where query hints are allowed in comments causes some changes in the way comments are parsed in the impala-shell interpreter. Previously, you could end a -- comment line with a semicolon and impala-shell would treat that as a no-op statement. Now, a comment line ending with a semicolon is passed as an empty statement to the Impala daemon, where it is flagged as an error.

Impala 2.0 and later uses a different support library for regular expression parsing than in earlier Impala versions. Now, Impala uses the Google RE2 library rather than Boost for evaluating regular expressions. This implementation change causes some differences in the allowed regular expression syntax, and in the way certain regex operators are interpreted. The following are some of the major differences (not necessarily a complete list):

  • .*? notation for non-greedy matches is now supported, where it was not in earlier Impala releases.

  • By default, ^ and $ now match only begin/end of buffer, not begin/end of each line. This behavior can be overridden in the regex itself using the m flag.

  • By default, . does not match newline. This behavior can be overridden in the regex itself using the s flag.

  • \Z is not supported.

  • < and > for start of word and end of word are not supported.

  • Lookahead and lookbehind are not supported.

  • Shorthand notation for character classes, such as \d for digit, is not recognized. (This restriction is lifted in Impala 2.0.1, which restores the shorthand notation.)

Changes to Output Format

In Impala 2.0 and later, user() returns the full Kerberos principal string, such as user@example.com, in a Kerberized environment.

The changed format for the user name in secure environments is also reflected where the user name is displayed in the output of the PROFILE command.

In the output from SHOW FUNCTIONS, SHOW AGGREGATE FUNCTIONS, and SHOW ANALYTIC FUNCTIONS, arguments and return types of arbitrary DECIMAL scale and precision are represented as DECIMAL(*,*). Formerly, these items were displayed as DECIMAL(-1,-1).

Changes to Query Options

The PARQUET_COMPRESSION_CODEC query option has been replaced by the COMPRESSION_CODEC query option. See COMPRESSION_CODEC Query Option (Impala 2.0 or higher only) for details.

Changes to Configuration Options

The meaning of the --idle_query_timeout configuration option is changed, to accommodate the new QUERY_TIMEOUT_S query option. Rather than setting an absolute timeout period that applies to all queries, it now sets a maximum timeout period, which can be adjusted downward for individual queries by specifying a value for the QUERY_TIMEOUT_S query option. In sessions where no QUERY_TIMEOUT_S query option is specified, the --idle_query_timeout timeout period applies the same as in earlier versions.

The --strict_unicode option of impala-shell was removed. To avoid problems with Unicode values in impala-shell, define the following locale setting before running impala-shell:

export LC_CTYPE=en_US.UTF-8

New Reserved Words

Some new SQL syntax requires the addition of new reserved words: ANTI, ANALYTIC, OVER, PRECEDING, UNBOUNDED, FOLLOWING, CURRENT, ROWS, RANGE, CHAR, VARCHAR. As always, see Impala Reserved Words for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.

Changes to Data Files

The default Parquet block size for Impala is changed from 1 GB to 256 MB. This change could have implications for the sizes of Parquet files produced by INSERT and CREATE TABLE AS SELECT statements.

Although older Impala releases typically produced files that were smaller than the old default size of 1 GB, now the file size matches more closely whatever value is specified for the PARQUET_FILE_SIZE query option. Thus, if you use a non-default value for this setting, the output files could be larger than before. They still might be somewhat smaller than the specified value, because Impala makes conservative estimates about the space needed to represent each column as it encodes the data.

When you do not specify an explicit value for the PARQUET_FILE_SIZE query option, Impala tries to keep the file size within the 256 MB default size, but Impala might adjust the file size to be somewhat larger if needed to accommodate the layout for wide tables, that is, tables with hundreds or thousands of columns.

This change is unlikely to affect memory usage while writing Parquet files, because Impala does not pre-allocate the memory needed to hold the entire Parquet block.

Incompatible Changes Introduced in Impala 1.4.4

No incompatible changes.

Incompatible Changes Introduced in Impala 1.4.3

No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with Impala.

Incompatible Changes Introduced in Impala 1.4.2

None. Impala 1.4.2 is purely a bug-fix release. It does not include any incompatible changes.

Incompatible Changes Introduced in Impala 1.4.1

None. Impala 1.4.1 is purely a bug-fix release. It does not include any incompatible changes.

Incompatible Changes Introduced in Impala 1.4.0

Incompatible Changes Introduced in Impala 1.3.3

No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with Impala.

Incompatible Changes Introduced in Impala 1.3.2

With the fix for IMPALA-1019, you can use HDFS caching for files that are accessed by Impala.

Incompatible Changes Introduced in Impala 1.3.1

Incompatible Changes Introduced in Impala 1.3.0

Incompatible Changes Introduced in Impala 1.2.4

There are no incompatible changes introduced in Impala 1.2.4.

Previously, after creating a table in Hive, you had to issue the INVALIDATE METADATA statement with no table name, a potentially expensive operation on clusters with many databases, tables, and partitions. Starting in Impala 1.2.4, you can issue the statement INVALIDATE METADATA table_name for a table newly created through Hive. Loading the metadata for only this one table is faster and involves less network overhead. Therefore, you might revisit your setup DDL scripts to add the table name to INVALIDATE METADATA statements, in cases where you create and populate the tables through Hive before querying them through Impala.

Incompatible Changes Introduced in Impala 1.2.3

Because the feature set of Impala 1.2.3 is identical to Impala 1.2.2, there are no new incompatible changes. See Incompatible Changes Introduced in Impala 1.2.2 if you are upgrading from Impala 1.2.1 or 1.1.x.

Incompatible Changes Introduced in Impala 1.2.2

The following changes to SQL syntax and semantics in Impala 1.2.2 could require updates to your SQL code, or schema objects such as tables or views:

Because many users are likely to upgrade straight from Impala 1.x to Impala 1.2.2, also read Incompatible Changes Introduced in Impala 1.2.1 for things to note about upgrading to Impala 1.2.x in general.

Incompatible Changes Introduced in Impala 1.2.1

The following changes to SQL syntax and semantics in Impala 1.2.1 could require updates to your SQL code, or schema objects such as tables or views:

The new catalogd service might require changes to any user-written scripts that stop, start, or restart Impala services, install or upgrade Impala packages, or issue REFRESH or INVALIDATE METADATA statements:

Incompatible Changes Introduced in Impala 1.2.0 (Beta)

There are no incompatible changes to SQL syntax in Impala 1.2.0 (beta).

The new catalogd service might require changes to any user-written scripts that stop, start, or restart Impala services, install or upgrade Impala packages, or issue REFRESH or INVALIDATE METADATA statements:

The new resource management feature interacts with both YARN and Llama services. See Resource Management for Impala for usage information for Impala resource management.

Incompatible Changes Introduced in Impala 1.1.1

There are no incompatible changes in Impala 1.1.1.

Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now that Parquet support is available for Hive 10, reusing existing Impala Parquet data files in Hive requires updating the table metadata. Use the following command if you are already running Impala 1.1.1:

ALTER TABLE table_name SET FILEFORMAT PARQUETFILE;

If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:

ALTER TABLE table_name SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
ALTER TABLE table_name SET FILEFORMAT
  INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
  OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";

Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.

As usual, make sure to upgrade the Impala LZO package to the latest level at the same time as you upgrade the Impala server.

Incompatible Change Introduced in Impala 1.1

Incompatible Changes Introduced in Impala 1.0