Known Issues and Workarounds in Impala

The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and whether a fix is in the pipeline.

Note: The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue you are experiencing has already been reported, or which release an issue is fixed in, search on the JIRA tracker.

For issues fixed in various Impala releases, see Fixed Issues in Apache Impala.

Impala Known Issues: Startup

These issues can prevent one or more Impala-related daemons from starting properly.

Problem retrieving FQDN causes startup problem on kerberized clusters

The method Impala uses to retrieve the host name while constructing the Kerberos principal is the gethostname() system call. This function might not always return the fully qualified domain name, depending on the network configuration. If the daemons cannot determine the FQDN, Impala does not start on a kerberized cluster.

This problem might occur immediately after an upgrade of a CDH cluster, due to changes in Cloudera Manager that supplies the --hostname flag automatically to the Impala-related daemons. (See the issue "hostname parameter is not passed to Impala catalog role" at the Cloudera Manager Known Issues page.)

Bugs: IMPALA-4978, IMPALA-5253

Severity: High

Resolution: The issue is expected to occur less frequently on systems with fixes for IMPALA-4978, IMPALA-5253, or both. Even on systems with fixes for both of these issues, the workaround might still be required in some cases.

Workaround: Test if a host is affected by checking whether the output of the hostname command includes the FQDN. On hosts where hostname only returns the short name, pass the command-line flag --hostname=fully_qualified_domain_name in the startup options of all Impala-related daemons.

Impala Known Issues: Crashes and Hangs

These issues can cause Impala to quit or become unresponsive.

Unable to view large catalog objects in catalogd Web UI

In catalogd Web UI, you can list metadata objects and view their details. These details are accessed via a link and printed to a string formatted using thrift's DebugProtocol. Printing large objects (> 1 GB) in Web UI can crash catalogd.

Bug: IMPALA-6841

Crash when querying tables with "\0" as a row delimiter

When querying a textfile-based Impala table that uses \0 as a new line separator, Impala crashes.

The following sequence causes impalad to crash:

create table tab_separated(id bigint, s string, n int, t timestamp, b boolean)
  row format delimited
  fields terminated by '\t' escaped by '\\' lines terminated by '\000'
  stored as textfile;
select * from tab_separated; -- Done. 0 results.
insert into tab_separated (id, s) values (100, ''); -- Success.
select * from tab_separated; -- 20 second delay before getting "Cancelled due to unreachable impalad(s): xxxx:22000"

Bug: IMPALA-6389

Workaround: Use an alternative delimiter, e.g. \001.

Altering Kudu table schema outside of Impala may result in crash on read

Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Neither Impala nor the Kudu client validates the schema immediately before reading, so Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non-string column is added with the old string column's name.

Bug: IMPALA-4828

Severity: High

Resolution: Fixed in Impala 2.9.0.

Workaround: Run the statement REFRESH table_name after any occasion when the table structure, such as the number, names, and data types of columns, are modified outside of Impala using the Kudu API.

Queries that take a long time to plan can cause webserver to block other queries

Trying to get the details of a query through the debug web page while the query is planning will block new queries that had not started when the web page was requested. The web UI becomes unresponsive until the planning phase is finished.

Bug: IMPALA-1972

Severity: High

Resolution: Fixed in Impala 2.9.0.

Linking IR UDF module to main module crashes Impala

A UDF compiled as an LLVM module (.ll) could cause a crash when executed.

Bug: IMPALA-4595

Severity: High

Resolution: Fixed in Impala 2.8 and higher.

Workaround: Compile the external UDFs to a .so library instead of a .ll IR module.

Setting BATCH_SIZE query option too large can cause a crash

Using a value in the millions for the BATCH_SIZE query option, together with wide rows or large string values in columns, could cause a memory allocation of more than 2 GB resulting in a crash.

Bug: IMPALA-3069

Severity: High

Resolution: Fixed in Impala 2.7.0.

Impala should not crash for invalid avro serialized data

Malformed Avro data, such as out-of-bounds integers or values in the wrong format, could cause a crash when queried.

Bug: IMPALA-3441

Severity: High

Resolution: Fixed in Impala 2.7.0 and Impala 2.6.2.

Queries may hang on server-to-server exchange errors

The DataStreamSender::Channel::CloseInternal() does not close the channel on an error. This causes the node on the other side of the channel to wait indefinitely, causing a hang.

Bug: IMPALA-2592

Resolution: Fixed in Impala 2.5.0.

Impalad is crashing if udf jar is not available in hdfs location for first time

If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala CREATE FUNCTION statement is issued, the impalad daemon crashes.

Bug: IMPALA-2365

Resolution: Fixed in Impala 2.5.0.

Impala Known Issues: Performance

These issues involve the performance of operations such as queries or DDL statements.

Metadata operations block read-only operations on unrelated tables

Metadata operations that change the state of a table, like COMPUTE STATS or ALTER RECOVER PARTITIONS, may delay metadata propagation of unrelated unloaded tables triggered by statements like DESCRIBE or SELECT queries.

Bug: IMPALA-6671

Profile timers not updated during long-running sort

If you have a query plan with a long-running sort operation, e.g. minutes, the profile timers are not updated to reflect the time spent in the sort until the sort starts returning rows.

Bug: IMPALA-5200

Workaround: Slow sorts can be identified by looking at "Peak Mem" in the summary or "PeakMemoryUsage" in the profile. If a sort is consuming multiple GB of memory per host, it will likely spend a significant amount of time sorting the data.

Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true

The configuration setting convert_legacy_hive_parquet_utc_timestamps=true uses an underlying function that can be a bottleneck on high volume, highly concurrent queries due to the use of a global lock while loading time zone information. This bottleneck can cause slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of slowdown depends on factors such as the number of cores and number of threads involved in the query.


The slowdown only occurs when accessing TIMESTAMP columns within Parquet files that were generated by Hive, and therefore require the on-the-fly timezone conversion processing.

Bug: IMPALA-3316

Severity: High

Workaround: If the TIMESTAMP values stored in the table represent dates only, with no time portion, consider storing them as strings in yyyy-MM-dd format. Impala implicitly converts such string values to TIMESTAMP in calls to date/time functions.

Slow DDL statements for tables with large number of partitions

DDL statements for tables with a large number of partitions might be slow.

Bug: IMPALA-1480

Workaround: Run the DDL statement in Hive if the slowness is an issue.

Resolution: Fixed in Impala 2.5.0.

Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads

If a data file used by Impala is being continuously appended or overwritten in place by an HDFS mechanism, such as hdfs dfs -appendToFile, interaction with the file handle caching feature in Impala 2.10 and higher could cause short-circuit reads to sometimes be disabled on some DataNodes. When a mismatch is detected between the cached file handle and a data block that was rewritten because of an append, short-circuit reads are turned off on the affected host for a 10-minute period.

The possibility of encountering such an issue is the reason why the file handle caching feature is currently turned off by default. See Scalability Considerations for Impala for information about this feature and how to enable it.

Bug: HDFS-12528

Severity: High

Workaround: Verify whether your ETL process is susceptible to this issue before enabling the file handle caching feature. You can set the impalad configuration option unused_file_handle_timeout_sec to a time period that is shorter than the HDFS setting (Keep in mind that the HDFS setting is in milliseconds while the Impala setting is in seconds.)

Resolution: Fixed in HDFS 2.10 and higher. Use the new HDFS parameter dfs.domain.socket.disable.interval.seconds to specify the amount of time that short circuit reads are disabled on encountering an error. The default value is 10 minutes (600 seconds). It is recommended that you set dfs.domain.socket.disable.interval.seconds to a small value, such as 1 second, when using the file handle cache. Setting dfs.domain.socket.disable.interval.seconds to 0 is not recommended as a non-zero interval protects the system if there is a persistent problem with short circuit reads.

Impala Known Issues: Usability

These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue.

Impala shell tarball is not usable on systems with setuptools versions where '0.7' is a substring of the full version string

For example, this issue could occur on a system using setuptools version 20.7.0.

Bug: IMPALA-4570

Severity: High

Resolution: Fixed in Impala 2.8 and higher.

Workaround: Change to a setuptools version that does not have 0.7 as a substring.

Unexpected privileges in show output

Due to a timing condition in updating cached policy data from Sentry, the SHOW statements for Sentry roles could sometimes display out-of-date role settings. Because Impala rechecks authorization for each SQL statement, this discrepancy does not represent a security issue for other statements.

Bug: IMPALA-3133

Severity: High

Resolution: Fixed in Impala 2.6.0 and Impala 2.5.1.

Less than 100% progress on completed simple SELECT queries

Simple SELECT queries show less than 100% progress even though they are already completed.

Bug: IMPALA-1776

Unexpected column overflow behavior with INT datatypes

Impala does not return column overflows as NULL, so that customers can distinguish between NULL data and overflow conditions similar to how they do so with traditional database systems. Impala returns the largest or smallest value in the range for the type. For example, valid values for a tinyint range from -128 to 127. In Impala, a tinyint with a value of -200 returns -128 rather than NULL. A tinyint with a value of 200 returns 127.

Bug: IMPALA-3123

Resolution: Fixed in Impala 2.6.0.

Impala Known Issues: JDBC and ODBC Drivers

These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications in languages such as Java or C++.

ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)

If the ODBC SQLGetData is called on a series of columns, the function calls must follow the same order as the columns. For example, if data is fetched from column 2 then column 1, the SQLGetData call for column 1 returns NULL.

Bug: IMPALA-1792

Workaround: Fetch columns in the same order they are defined in the table.

Impala Known Issues: Security

These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and redaction.

Transient kerberos authentication error during table loading

A transient Kerberos error can cause a table to get into a bad state with an error: Failed to load metadata for table.

Bug: IMPALA-4712

Severity: High

Workaround: Resolve the Kerberos authentication problem and run INVALIDATE METADATA on the affected table.

Malicious user can gain unauthorized access to Kudu table data via Impala

A malicious user with ALTER permissions on an Impala table can access any other Kudu table data by altering the table properties to make it "external" and then changing the underlying table mapping to point to other Kudu tables. This violates and works around the authorization requirement that creating a Kudu external table via Impala requires an ALL privilege at the server scope. This privilege requirement for CREATE commands is enforced to precisely avoid this scenario where a malicious user can change the underlying Kudu table mapping. The fix is to enforce the same privilege requirement for ALTER commands that would make existing non-external Kudu tables external.

Bug: IMPALA-5638

Severity: High

Workaround: A temporary workaround is to revoke ALTER permissions on Impala tables.

Resolution: Fixed in Impala 2.10.0.

Kerberos tickets must be renewable

In a Kerberos environment, the impalad daemon might not start if Kerberos tickets are not renewable.

Workaround: Configure your KDC to allow tickets to be renewed, and configure krb5.conf to request renewable tickets.

Catalog server's kerberos ticket gets deleted after 'ticket_lifetime' on SLES11

On SLES11, after 'ticket_lifetime', the kerberos ticket gets deleted by the Java krb5 library.


Severity: High

Workaround: On Impala 2.11.0, set --use_kudu_kinit=false in Impala startup flag.

On Impala 2.12.0, set --use_kudu_kinit=false and --use_krpc=false in Impala startup flags.

Impala Known Issues: Resources

These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management features.

Configuration to prevent crashes caused by thread resource limits

Impala could encounter a serious error due to resource usage under very high concurrency. The error message is similar to:

F0629 08:20:02.956413 29088] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'

Bug: IMPALA-5605

Severity: High

Workaround: To prevent such errors, configure each host running an impalad daemon with the following settings:

echo 2000000 > /proc/sys/kernel/threads-max
echo 2000000 > /proc/sys/kernel/pid_max
echo 8000000 > /proc/sys/vm/max_map_count

Add the following lines in /etc/security/limits.conf:

impala soft nproc 262144
impala hard nproc 262144

Memory usage when compact_catalog_topic flag enabled

The efficiency improvement from IMPALA-4029 can cause an increase in size of the updates to Impala catalog metadata that are broadcast to the impalad daemons by the statestored daemon. The increase in catalog update topic size results in higher CPU and network utilization. By default, the increase in topic size is about 5-7%. If the compact_catalog_topic flag is used, the size increase is more substantial, with a topic size approximately twice as large as in previous versions.

Bug: IMPALA-5500

Severity: Medium

Workaround: Consider setting the compact_catalog_topic configuration setting to false until this issue is resolved.

Resolution: Fixed in Impala 2.10.

Kerberos initialization errors due to high memory usage

On a kerberized cluster with high memory utilization, kinit commands executed after every 'kerberos_reinit_interval' may cause out-of-memory errors, because executing the command involves a fork of the Impala process. The error looks similar to the following:

Failed to obtain Kerberos ticket for principal: <varname>principal_details</varname>
Failed to execute shell cmd: 'kinit -k -t <varname>keytab_details</varname>',
error was: Error(12): Cannot allocate memory

Bug: IMPALA-2294

Severity: High

Resolution: Fixed in Impala 2.11.


The following command changes the vm.overcommit_memory setting immediately on a running host. However, this setting is reset when the host is restarted.

echo 1 > /proc/sys/vm/overcommit_memory

To change the setting in a persistent way, add the following line to the /etc/sysctl.conf file:


Then run sysctl -p. No reboot is needed.

DROP TABLE PURGE on S3A table may not delete externally written files

A DROP TABLE PURGE statement against an S3 table could leave the data files behind, if the table directory and the data files were created with a combination of hadoop fs and aws s3 commands.

Bug: IMPALA-3558

Severity: High

Resolution: The underlying issue with the S3A connector depends on the resolution of HADOOP-13230.

Impala catalogd heap issues when upgrading to Impala 2.5

The default heap size for Impala catalogd has changed in Impala 2.5 and higher:

  • Previously, by default catalogd was using the JVM's default heap size, which is the smaller of 1/4th of the physical memory or 32 GB.

  • Starting with Impala 2.5.0, the default catalogd heap size is 4 GB.

For example, on a host with 128GB physical memory this will result in catalogd heap decreasing from 32GB to 4GB. This can result in out-of-memory errors in catalogd and leading to query failures.

Severity: High

Workaround: Increase the catalogd memory limit as follows.

For schemas with large numbers of tables, partitions, and data files, the catalogd daemon might encounter an out-of-memory error. To increase the memory limit for the catalogd daemon:
  1. Check current memory usage for the catalogd daemon by running the following commands on the host where that daemon runs on your cluster:

      jcmd catalogd_pid VM.flags
      jmap -heap catalogd_pid
  2. Decide on a large enough value for the catalogd heap. You express it as an environment variable value as follows:

  3. On systems not using cluster management software, put this environment variable setting into the startup script for the catalogd daemon, then restart the catalogd daemon.

  4. Use the same jcmd and jmap commands as earlier to verify that the new settings are in effect.

Breakpad minidumps can be very large when the thread count is high

The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.

Bug: IMPALA-3509

Severity: High

Workaround: Add --minidump_size_limit_hint_kb=size to set a soft upper limit on the size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump file can still grow larger than the "hinted" size. For example, if you have 10,000 threads, the minidump file can be more than 20 MB.

Parquet scanner memory increase after IMPALA-2736

The initial release of Impala 2.6 sometimes has a higher peak memory usage than in previous releases while reading Parquet files.

Impala 2.6 addresses the issue IMPALA-2736, which improves the efficiency of Parquet scans by up to 2x. The faster scans may result in a higher peak memory consumption compared to earlier versions of Impala due to the new column-wise row materialization strategy. You are likely to experience higher memory consumption in any of the following scenarios:
  • Very wide rows due to projecting many columns in a scan.

  • Very large rows due to big column values, for example, long strings or nested collections with many items.

  • Producer/consumer speed imbalances, leading to more rows being buffered between a scan (producer) and downstream (consumer) plan nodes.

Bug: IMPALA-3662

Severity: High

Resolution: Fixed in Impala 2.8.0.

Workaround: The following query options might help to reduce memory consumption in the Parquet scanner:
  • Reduce the number of scanner threads, for example: set num_scanner_threads=30
  • Reduce the batch size, for example: set batch_size=512
  • Increase the memory limit, for example: set mem_limit=64g

Process mem limit does not account for the JVM's memory usage

Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the impalad daemon.

Bug: IMPALA-691

Workaround: To monitor overall memory usage, use the top command, or add the memory figures in the Impala web UI /memz tab to JVM memory usage shown on the /metrics tab.

Fix issues with the legacy join and agg nodes using --enable_partitioned_hash_join=false and --enable_partitioned_aggregation=false

Bug: IMPALA-2375

Workaround: Transition away from the "old-style" join and aggregation mechanism if practical.

Resolution: Fixed in Impala 2.5.0.

Impala Known Issues: Correctness

These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.

Parquet scanner memory bug: I/O buffer is attached to output batch while scratch batch rows still reference it

Impala queries may return incorrect results when scanning plain-encoded string columns in uncompressed Parquet files. I/O buffers holding the string data are prematurely freed, leading to invalid memory reads and possibly non-deterministic results. This does not affect Parquet files that use a compression codec such as Snappy. Snappy is both strongly recommended generally and the default choice for Impala-written Parquet files.

How to determine whether a query might be affected:

  • The query must reference STRING columns from a Parquet table.
  • A selective filter on the Parquet table makes this issue more likely.
  • Identify any uncompressed Parquet files processed by the query. Examine the HDFS_SCAN_NODE portion of a query profile that scans the suspected table. Use a query that performs a full table scan, and materializes the column values. (For example, SELECT MIN(colname) FROM tablename.) Look for "File Formats". A value containing PARQUET/NONE means uncompressed Parquet.
  • Identify any plain-encoded string columns in the associated table. Pay special attention to tables containing Parquet files generated through Hive, Spark, or other mechanisms outside of Impala, because Impala uses Snappy compression by default for Parquet files. Use parquet-tools to dump the file metadata. Note that a column could have several encodings within the same file (the column data is stored in several column chunks). Look for VLE:PLAIN in the output of parquet-tools, which means the values are plain encoded.

Bug: IMPALA-4539

Severity: High

Resolution: Fixed in Impala 2.8.0.

Workaround: Use Snappy or another compression codec for Parquet files.

ABS(n) where n is the lowest bound for the int types returns negative values

If the abs() function evaluates a number that is right at the lower bound for an integer data type, the positive result cannot be represented in the same type, and the result is returned as a negative number. For example, abs(-128) returns -128 because the argument is interpreted as a TINYINT and the return value is also a TINYINT.

Bug: IMPALA-4513

Severity: High

Workaround: Cast the integer value to a larger type. For example, rewrite abs(tinyint_col) as abs(cast(tinyint_col as smallint)).

Java udf expression returning string in group by can give incorrect results.

If the GROUP BY clause included a call to a Java UDF that returned a string value, the UDF could return an incorrect result.

Bug: IMPALA-4266

Severity: High

Resolution: Fixed in Impala 2.8 and higher.

Workaround: Rewrite the expression to concatenate the results of the Java UDF with an empty string call. For example, rewrite my_hive_udf() as concat(my_hive_udf(), '').

Incorrect assignment of NULL checking predicate through an outer join of a nested collection.

A query could return wrong results (too many or too few NULL values) if it referenced an outer-joined nested collection and also contained a null-checking predicate (IS NULL, IS NOT NULL, or the <=> operator) in the WHERE clause.

Bug: IMPALA-3084

Severity: High

Resolution: Fixed in Impala 2.7.0.

Incorrect result due to constant evaluation in query with outer join

An OUTER JOIN query could omit some expected result rows due to a constant such as FALSE in another join clause. For example:

explain SELECT 1 FROM alltypestiny a1
  INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
  RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
| Explain String                                          |
| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
|                                                         |
| 00:EMPTYSET                                             |

Bug: IMPALA-3094

Severity: High


Incorrect assignment of an inner join On-clause predicate through an outer join.

Impala may return incorrect results for queries that have the following properties:

  • There is an INNER JOIN following a series of OUTER JOINs.

  • The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the preceding OUTER JOINs.

The following query demonstrates the issue:

select 1 from functional.alltypes a left outer join
  functional.alltypes b on = left outer join
  functional.alltypes c on = right outer join
  functional.alltypes d on = inner join functional.alltypes e
on b.int_col = c.int_col;

The following listing shows the incorrect EXPLAIN plan:

| Explain String                                            |
| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
|                                                           |
| 14:EXCHANGE [UNPARTITIONED]                               |
| |                                                         |
| |                                                         |
| |--13:EXCHANGE [BROADCAST]                                |
| |  |                                                      |
| |  04:SCAN HDFS [functional.alltypes e]                   |
| |     partitions=24/24 files=24 size=478.45KB             |
| |                                                         |
| |  hash predicates: =                           |
| |  runtime filters: RF000 <-                         |
| |                                                         |
| |--12:EXCHANGE [HASH(]                               |
| |  |                                                      |
| |  03:SCAN HDFS [functional.alltypes d]                   |
| |     partitions=24/24 files=24 size=478.45KB             |
| |                                                         |
| |  hash predicates: =                           |
| |  other predicates: b.int_col = c.int_col     <--- incorrect placement; should be at node 07 or 08
| |  runtime filters: RF001 <- c.int_col                    |
| |                                                         |
| |--11:EXCHANGE [HASH(]                               |
| |  |                                                      |
| |  02:SCAN HDFS [functional.alltypes c]                   |
| |     partitions=24/24 files=24 size=478.45KB             |
| |     runtime filters: RF000 ->                      |
| |                                                         |
| |  hash predicates: =                           |
| |  runtime filters: RF002 <-                         |
| |                                                         |
| |--10:EXCHANGE [HASH(]                               |
| |  |                                                      |
| |  00:SCAN HDFS [functional.alltypes a]                   |
| |     partitions=24/24 files=24 size=478.45KB             |
| |                                                         |
| 09:EXCHANGE [HASH(]                                  |
| |                                                         |
| 01:SCAN HDFS [functional.alltypes b]                      |
|    partitions=24/24 files=24 size=478.45KB                |
|    runtime filters: RF001 -> b.int_col, RF002 ->     |

Bug: IMPALA-3126

Severity: High

Resolution: Fixed in Impala 2.8.0.

Workaround: High

For some queries, this problem can be worked around by placing the problematic ON clause predicate in the WHERE clause instead, or changing the preceding OUTER JOINs to INNER JOINs (if the ON clause predicate would discard NULLs). For example, to fix the problematic query above:

select 1 from functional.alltypes a
  left outer join functional.alltypes b
    on =
  left outer join functional.alltypes c
    on =
  right outer join functional.alltypes d
    on =
  inner join functional.alltypes e
where b.int_col = c.int_col

| Explain String                                            |
| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
|                                                           |
| 14:EXCHANGE [UNPARTITIONED]                               |
| |                                                         |
| |                                                         |
| |--13:EXCHANGE [BROADCAST]                                |
| |  |                                                      |
| |  04:SCAN HDFS [functional.alltypes e]                   |
| |     partitions=24/24 files=24 size=478.45KB             |
| |                                                         |
| |  hash predicates: =                           |
| |  other predicates: b.int_col = c.int_col          <-- correct assignment
| |  runtime filters: RF000 <-                         |
| |                                                         |
| |--12:EXCHANGE [HASH(]                               |
| |  |                                                      |
| |  03:SCAN HDFS [functional.alltypes d]                   |
| |     partitions=24/24 files=24 size=478.45KB             |
| |                                                         |
| |  hash predicates: =                           |
| |                                                         |
| |--11:EXCHANGE [HASH(]                               |
| |  |                                                      |
| |  02:SCAN HDFS [functional.alltypes c]                   |
| |     partitions=24/24 files=24 size=478.45KB             |
| |     runtime filters: RF000 ->                      |
| |                                                         |
| |  hash predicates: =                           |
| |  runtime filters: RF001 <-                         |
| |                                                         |
| |--10:EXCHANGE [HASH(]                               |
| |  |                                                      |
| |  00:SCAN HDFS [functional.alltypes a]                   |
| |     partitions=24/24 files=24 size=478.45KB             |
| |                                                         |
| 09:EXCHANGE [HASH(]                                  |
| |                                                         |
| 01:SCAN HDFS [functional.alltypes b]                      |
|    partitions=24/24 files=24 size=478.45KB                |
|    runtime filters: RF001 ->                         |

Impala may use incorrect bit order with BIT_PACKED encoding

Parquet BIT_PACKED encoding as implemented by Impala is LSB first. The parquet standard says it is MSB first.

Bug: IMPALA-3006

Severity: High, but rare in practice because BIT_PACKED is infrequently used, is not written by Impala, and is deprecated in Parquet 2.0.

BST between 1972 and 1995

The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such as:

  extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
  extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;

Bug: IMPALA-3082

Severity: High

parse_url() returns incorrect result if @ character in URL

If a URL contains an @ character, the parse_url() function could return an incorrect value for the hostname field.


Resolution: Fixed in Impala 2.5.0 and Impala 2.3.4.

% escaping does not work correctly when occurs at the end in a LIKE clause

If the final character in the RHS argument of a LIKE operator is an escaped \% character, it does not match a % final character of the LHS argument.

Bug: IMPALA-2422

ORDER BY rand() does not work.

Because the value for rand() is computed early in a query, using an ORDER BY expression involving a call to rand() does not actually randomize the results.

Bug: IMPALA-397

Duplicated column in inline view causes dropping null slots during scan

If the same column is queried twice within a view, NULL values for that column are omitted. For example, the result of COUNT(*) on the view could be less than expected.

Bug: IMPALA-2643

Workaround: Avoid selecting the same column twice within an inline view.

Resolution: Fixed in Impala 2.5.0, Impala 2.3.2, and Impala 2.2.10.

Incorrect assignment of predicates through an outer join in an inline view.

A query involving an OUTER JOIN clause where one of the table references is an inline view might apply predicates from the ON clause incorrectly.

Bug: IMPALA-1459

Resolution: Fixed in Impala 2.5.0, Impala 2.3.2, and Impala 2.2.9.

Crash: impala::Coordinator::ValidateCollectionSlots

A query could encounter a serious error if includes multiple nested levels of INNER JOIN clauses involving subqueries.

Bug: IMPALA-2603

Incorrect assignment of On-clause predicate inside inline view with an outer join.

A query might return incorrect results due to wrong predicate assignment in the following scenario:

  1. There is an inline view that contains an outer join
  2. That inline view is joined with another table in the enclosing query block
  3. That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view

Bug: IMPALA-2665

Resolution: Fixed in Impala 2.5.0, Impala 2.3.2, and Impala 2.2.9.

Wrong assignment of having clause predicate across outer join

In an OUTER JOIN query with a HAVING clause, the comparison from the HAVING clause might be applied at the wrong stage of query processing, leading to incorrect results.

Bug: IMPALA-2144

Resolution: Fixed in Impala 2.5.0.

Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate

A NOT IN operator with a subquery that calls an aggregate function, such as NOT IN (SELECT SUM(...)), could return incorrect results.

Bug: IMPALA-2093

Resolution: Fixed in Impala 2.5.0 and Impala 2.3.4.

Impala Known Issues: Metadata

These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the COMPUTE STATS statement, and the Impala catalogd daemon.

Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats

Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100 columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network, this metadata exceeds the 2 GB Java array size limit and leads to a catalogd crash.

Bugs: IMPALA-2647, IMPALA-2648, IMPALA-2649

Workaround: If feasible, compute full stats periodically and avoid computing incremental stats for that table. The scalability of incremental stats computation is a continuing work item.

Can't update stats manually via alter table after upgrading to Impala 2.0

Bug: IMPALA-1420

Workaround: On Impala 2.0, when adjusting table statistics manually by setting the numRows, you must also enable the Boolean property STATS_GENERATED_VIA_STATS_TASK. For example, use a statement like the following to set both properties with a single ALTER TABLE statement:

ALTER TABLE table_name SET TBLPROPERTIES('numRows'='new_value', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');

Resolution: The underlying cause is the issue HIVE-8648 that affects the metastore in Hive 0.13.

Impala Known Issues: Interoperability

These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types and file formats.

DESCRIBE FORMATTED gives error on Avro table

This issue can occur either on old Avro tables (created prior to Hive 1.1) or when changing the Avro schema file by adding or removing columns. Columns added to the schema file will not show up in the output of the DESCRIBE FORMATTED command. Removing columns from the schema file will trigger a NullPointerException.

As a workaround, you can use the output of SHOW CREATE TABLE to drop and recreate the table. This will populate the Hive metastore database with the correct column definitions.

Only use this for external tables, or Impala will remove the data files. In case of an internal table, set it to external first:

(The part in parentheses is case sensitive.) Make sure to pick the right choice between internal and external when recreating the table. See Overview of Impala Tables for the differences between internal and external tables.

Severity: High

Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.

Anticipated Resolution: None

Workaround: Use explicit casts.

Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)

Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum allowed value of type (Hive returns NULL).

Workaround: None

Configuration needed for Flume to be compatible with Impala

For compatibility with Impala, the value for the Flume HDFS Sink hdfs.writeFormat must be set to Text, rather than its default value of Writable. The hdfs.writeFormat setting must be changed to Text before creating data files with Flume; otherwise, those files cannot be read by either Impala or Hive.

Resolution: This information has been requested to be added to the upstream Flume documentation.

Avro Scanner fails to parse some schemas

Querying certain Avro tables could cause a crash or return no rows, even though Impala could DESCRIBE the table.

Bug: IMPALA-635

Workaround: Swap the order of the fields in the schema specification. For example, ["null", "string"] instead of ["string", "null"].

Resolution: Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the crashing issue is resolved.

Impala BE cannot parse Avro schema that contains a trailing semi-colon

If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.

Bug: IMPALA-1024

Severity: Remove trailing semicolon from the Avro schema.

Fix decompressor to allow parsing gzips with multiple streams

Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated streams, the Impala query only processes the data from the first stream.

Bug: IMPALA-2154

Workaround: Use a different gzip tool to compress file to a single stream file.

Resolution: Fixed in Impala 2.5.0.

Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block

If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes the row following the \n\r pair twice.

Bug: IMPALA-1578

Workaround: Use the Parquet format for large volumes of data where practical.

Resolution: Fixed in Impala 2.6.0.

Invalid bool value not reported as a scanner error

In some cases, an invalid BOOLEAN value read from a table does not produce a warning message about the bad value. The result is still NULL as expected. Therefore, this is not a query correctness issue, but it could lead to overlooking the presence of invalid data.

Bug: IMPALA-1862

Resolution: Fixed in Impala 2.8.0.

Incorrect results with basic predicate on CHAR typed column.

When comparing a CHAR column value to a string literal, the literal value is not blank-padded and so the comparison might fail when it should match.

Bug: IMPALA-1652

Workaround: Use the RPAD() function to blank-pad literals compared with CHAR columns to the expected length.

Impala Known Issues: Limitations

These issues are current limitations of Impala that require evaluation as you plan how to integrate Impala into your data management workflow.

Set limits on size of expression trees

Very deeply nested expressions within queries can exceed internal Impala limits, leading to excessive memory usage.

Bug: IMPALA-4551

Severity: High


Workaround: Avoid queries with extremely large expression trees. Setting the query option disable_codegen=true may reduce the impact, at a cost of longer query runtime.

Impala does not support running on clusters with federated namespaces

Impala does not support running on clusters with federated namespaces. The impalad process will not start on a node running such a filesystem based on the org.apache.hadoop.fs.viewfs.ViewFs class.

Bug: IMPALA-77

Resolution: Fixed in Impala 1.0.

Workaround: Use standard HDFS on all Impala nodes.

Impala Known Issues: Miscellaneous / Older Issues

These issues do not fall into one of the above categories or have not been categorized yet.

A failed CTAS does not drop the table if the insert fails.

If a CREATE TABLE AS SELECT operation successfully creates the target table but an error occurs while querying the source table or copying the data, the new table is left behind rather than being dropped.

Bug: IMPALA-2005

Workaround: Drop the new table manually after a failed CREATE TABLE AS SELECT.

Casting scenarios with invalid/inconsistent results

Using a CAST() function to convert large literal values to smaller types, or to convert special values such as NaN or Inf, produces values not consistent with other database systems. This could lead to unexpected results from queries.

Bug: IMPALA-1821

Support individual memory allocations larger than 1 GB

The largest single block of memory that Impala can allocate during a query is 1 GiB. Therefore, a query could fail or Impala could crash if a compressed text file resulted in more than 1 GiB of data in uncompressed form, or if a string function such as group_concat() returned a value greater than 1 GiB.

Bug: IMPALA-1619

Resolution: Fixed in Impala 2.7.0 and Impala 2.6.3.

Impala Parser issue when using fully qualified table names that start with a number.

A fully qualified table name starting with a number could cause a parsing error. In a name such as db.571_market, the decimal point followed by digits is interpreted as a floating-point number.

Bug: IMPALA-941

Workaround: Surround each part of the fully qualified name with backticks (``).

Impala should tolerate bad locale settings

If the LC_* environment variables specify an unsupported locale, Impala does not start.

Bug: IMPALA-532

Workaround: Add LC_ALL="C" to the environment settings for both the Impala daemon and the Statestore daemon. See Modifying Impala Startup Options for details about modifying these environment settings.

Resolution: Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.

Log Level 3 Not Recommended for Impala

The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues.

Workaround: Reduce the log level to its default value of 1, that is, GLOG_v=1. See Setting Logging Levels for details about the effects of setting different logging levels.