Fixed Issues in Apache Impala

Issues Fixed in Impala 4.0

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 4.0.

Issues Fixed in Impala 3.4

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.4.

Issues Fixed in Impala 3.3

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.3.

Issues Fixed in Impala 3.2

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.2.

The following is a list of noteworthy issues fixed in Impala 3.2:

IMPALA-341 - Remote profiles are no longer ignored by the coordinator for the queries with the LIMIT clause.
IMPALA-941- Impala supports fully qualified table names that start with a number.
IMPALA-1048 - The query execution summary now includes the total time taken and memory consumed by the data sink at the root of each query fragment.
IMPALA-3323 - Fixed the issue where valid impala-shell options, such as --ldap_password_cmd, were unrecognized when the --config_file option was specified.
IMPALA-5397 - If a query has a dedicated coordinator, its end time is now set when the query releases its admission control resources. With no dedicated coordinator, the end time is set on un-registration.
IMPALA-5474 - Fixed an issue where adding a trivial subquery to a query with an error turns the error into a warning.
IMPALA-6521 - When set, experimental flags are now shown in /varz in web UI and log files.
IMPALA-6900 - INVALIDATE METADATA operation is no longer ignored when HMS is empty.
IMPALA-7446 - Impala enables buffer pool garbage collection when near process memory limit to prevent queries from spilling to disk earlier than necessary.
IMPALA-7659 - In COMPUTE STATS, Impala counts the number of NULL values in a table
IMPALA-7857 - Logs more information about StateStore failure detection.
IMPALA-7928 - To increase the efficiency of the HDFS file handle cache, remote reads for a particular file are scheduled to a consistent set of executor nodes.
IMPALA-7929 - Impala query on tables created via Hive and mapped to HBase failed with an internal exception because the qualifier of the HBase key column is null in the mapped table. Impala relaxed the requirement and allows a NULL qualifier.
IMPALA-7960 - Impala now returns a correct result when comparing TIMESTAMP to a string literal in a binary predicate where the TIMESTAMP is casted to VARCHAR of smaller length.
IMPALA-7961 - Fixed an issue where queries running with the SYNC_DDL query option can fail when the Catalog Server is under a heavy load with concurrent catalog operations of long-running DDLs.
IMPALA-8026 - Impala query profile now reports correct row counts for all nested loop join modes.
IMPALA-8061 - Impala correctly initializes S3_ACCESS_VALIDATED variable to zero when TARGET_FILESYSTEM=s3.
IMPALA-8154 - Disabled the Kerberos auth_to_local setting to prevent connection issues between impalads.
IMPALA-8188 - Impala now correctly detects an NVME device name and handles it.
IMPALA-8245 - Added hostname to the timeout error message to enable the user to easily identify the host which has reached a bad connection state with the HDFS NameNode.
IMPALA-8254 - COMPUTE STATS failed if COMPRESSION_CODEC is set.

Issues Fixed in Impala 3.1

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.1.

Issues Fixed in Impala 3.0

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.0.

Issues Fixed in Impala 2.12

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.12.

Issues Fixed in Impala 2.11

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.11.

Issues Fixed in Impala 2.10

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.10.

Issues Fixed in Impala 2.9.0

For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.9.

Issues Fixed in Impala 2.8.0

For the full list of Impala fixed issues in Impala 2.8, see this report in the Impala JIRA tracker.

Issues Fixed in Impala 2.7.0

For the full list of Impala fixed issues in Impala 2.7.0, see this report in the Impala JIRA tracker.

Issues Fixed in Impala 2.6.3

Issues Fixed in Impala 2.6.2

Issues Fixed in Impala 2.6.0

The following list contains the most critical fixed issues (priority='Blocker') from the JIRA system. For the full list of fixed issues in Impala 2.6.0, see this report in the Impala JIRA tracker.

RuntimeState::error_log_ crashes

A crash could occur, with stack trace pointing to impala::RuntimeState::ErrorLog.

Bug: IMPALA-3385

Severity: High

HiveUdfCall::Open() produces unsynchronized access to JniUtil::global_refs_ vector

A crash could occur because of contention between multiple calls to Java UDFs.

Bug: IMPALA-3378

Severity: High

HBaseTableWriter::CreatePutList() produces unsynchronized access to JniUtil::global_refs_ vector

A crash could occur because of contention between multiple concurrent statements writing to HBase.

Bug: IMPALA-3379

Severity: High

Stress test failure: sorter.cc:745] Check failed: i == 0 (1 vs. 0)

A crash or wrong results could occur if the spill-to-disk mechanism encountered a zero-length string at the very end of a data block.

Bug: IMPALA-3317

Severity: High

String data coming out of agg can be corrupted by blocking operators

If a query plan contains an aggregation node producing string values anywhere within a subplan (that is,if in the SQL statement, the aggregate function appears within an inline view over a collection column), the results of the aggregation may be incorrect.

Bug: IMPALA-3311

Severity: High

CTAS with subquery throws AuthzException

A CREATE TABLE AS SELECT operation could fail with an authorization error, due to a slight difference in the privilege checking for the CTAS operation.

Bug: IMPALA-3269

Severity: High

Crash on inserting into table with binary and parquet

Impala incorrectly allowed BINARY to be specified as a column type, resulting in a crash during a write to a Parquet table with a column of that type.

Bug: IMPALA-3237

Severity: High

RowBatch::MaxTupleBufferSize() calculation incorrect, may lead to memory corruption

A crash could occur while querying tables with very large rows, for example wide tables with many columns or very large string values. This problem was identified in Impala 2.3, but had low reproducibility in subsequent releases. The fix ensures the memory allocation size is correct.

Bug: IMPALA-3105

Severity: High

Thrift buffer overflows when serialize more than 3355443200 bytes in impala

A very large memory allocation within the catalogd daemon could exceed an internal Thrift limit, causing a crash.

Bug: IMPALA-3494

Severity: High

Altering table partition's storage format is not working and crashing the daemon

If a partitioned table used a file format other than Avro, and the file format of an individual partition was changed to Avro, subsequent queries could encounter a crash.

Bug: IMPALA-3314

Severity: High

Race condition may cause scanners to spin with runtime filters on Avro or Sequence files

A timing problem during runtime filter processing could cause queries against Avro or SequenceFile tables to hang.

Bug: IMPALA-3798

Severity: High

Issues Fixed in Impala 2.5.4

Issues Fixed in Impala 2.5.2

Issues Fixed in Impala 2.5.1

Issues Fixed in Impala 2.5.0

The following list contains the most critical issues (priority='Blocker') from the JIRA system. For the full list of fixed issues in Impala 2.5, see this report in the Impala JIRA tracker.

Stress test hit assert in LLVM: external function could not be resolved

Bug: IMPALA-2683

The stress test was running a build with the TPC-H, TPC-DS, and TPC-H nested queries with scale factor 3.

Impalad is crashing if udf jar is not available in hdfs location for first time

Bug: IMPALA-2365

If a UDF JAR was not available in the HDFS location specified in the CREATE FUNCTION statement, the impalad daemon could crash.

PAGG hits mem_limit when switching to I/O buffers

Bug: IMPALA-2535

A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory. The cause was the internal ordering of operations that could cause a later phase of the query to allocate memory required by an earlier phase of the query. The workaround was to either increase or decrease the MEM_LIMIT query option, because the issue would only occur for a specific combination of memory limit and data volume.

Prevent migrating incorrectly inferred identity predicates into inline views

Bug: IMPALA-2643

Referring to the same column twice in a view definition could cause the view to omit rows where that column contained a NULL value. This could cause incorrect results due to an inaccurate COUNT(*) value or rows missing from the result set.

Fix migration/assignment of On-clause predicates inside inline views

Bug: IMPALA-1459

Some combinations of ON clauses in join queries could result in comparisons being applied at the wrong stage of query processing, leading to incorrect results. Wrong predicate assignment could happen under the following conditions:

The query includes an inline view that contains an outer join.
That inline view is joined with another table in the enclosing query block.
That join has an ON clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view.

Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate

Bug: IMPALA-2093

IN subqueries might return wrong results if the left-hand side of the IN is a constant. For example:


select * from alltypestiny t1
  where 10 not in (select sum(int_col) from alltypestiny);

Parquet DictDecoders accumulate throughout query

Bug: IMPALA-2940

Parquet dictionary decoders can accumulate throughout query execution, leading to excessive memory usage. One decoder is created per-column per-split.

Planner doesn't set the has_local_target field correctly

Bug: IMPALA-3056

MemPool allocation growth behavior

Bug: IMPALA-2742

Currently, the MemPool would always double the size of the last allocation. This can lead to bad behavior if the MemPool transferred the ownership of all its data except the last chunk. In the next allocation, the next allocated chunk would double the size of this large chunk, which can be undesirable.

Drop partition operations don't follow the catalog's locking protocol

Bug: IMPALA-3035

The CatalogOpExecutor.alterTableDropPartition() function violates the locking protocol used in the catalog that requires catalogLock_ to be acquired before any table-level lock. That may cause deadlocks when ALTER TABLE DROP PARTITION is executed concurrently with other DDL operations.

HAVING clause without aggregation not applied properly

Bug: IMPALA-2215

A query with a HAVING clause but no GROUP BY clause was not being rejected, despite being invalid syntax. For example:


select case when 1=1 then 'didit' end as c1 from (select 1 as one) a having 1!=1;

Hit DCHECK Check failed: HasDateOrTime()

Bug: IMPALA-2914

TimestampValue::ToTimestampVal() requires a valid TimestampValue as input. This requirement was not enforced in some places, leading to serious errors.

Aggregation spill loop gives up too early leading to mem limit exceeded errors

Bug: IMPALA-2986

An aggregation query could fail with an out-of-memory error, despite sufficient memory being reported as available.

DataStreamSender::Channel::CloseInternal() does not close the channel on an error.

Bug: IMPALA-2592

Some queries do not close an internal communication channel on an error. This will cause the node on the other side of the channel to wait indefinitely, causing the query to hang. For example, this issue could happen on a Kerberos-enabled system if the credential cache was outdated. Although the affected query hangs, the impalad daemons continue processing other queries.

Codegen does not catch exceptions in FROM_UNIXTIME()

Bug: IMPALA-2184

Querying for the min or max value of a timestamp cast from a bigint via from_unixtime() fails silently and crashes instances of impalad when the input includes a value outside of the valid range.

Workaround: Disable native code generation with:


SET disable_codegen=true;

Impala returns wrong result for function 'conv(bigint, from_base, to_base)'

Bug: IMPALA-2788

Impala returns wrong result for function conv(). Function conv(bigint, from_base, to_base) returns an correct result, while conv(string, from_base, to_base) returns the correct value. For example:



select 2061013007, conv(2061013007, 16, 10), conv('2061013007', 16, 10);
+------------+--------------------------+----------------------------+
| 2061013007 | conv(2061013007, 16, 10) | conv('2061013007', 16, 10) |
+------------+--------------------------+----------------------------+
| 2061013007 | 1627467783               | 139066421255               |
+------------+--------------------------+----------------------------+
Fetched 1 row(s) in 0.65s

select 2061013007, conv(cast(2061013007 as bigint), 16, 10), conv('2061013007', 16, 10);
+------------+------------------------------------------+----------------------------+
| 2061013007 | conv(cast(2061013007 as bigint), 16, 10) | conv('2061013007', 16, 10) |
+------------+------------------------------------------+----------------------------+
| 2061013007 | 1627467783                               | 139066421255               |
+------------+------------------------------------------+----------------------------+

select 2061013007, conv(cast(2061013007 as string), 16, 10), conv('2061013007', 16, 10);
+------------+------------------------------------------+----------------------------+
| 2061013007 | conv(cast(2061013007 as string), 16, 10) | conv('2061013007', 16, 10) |
+------------+------------------------------------------+----------------------------+
| 2061013007 | 139066421255                             | 139066421255               |
+------------+------------------------------------------+----------------------------+

select 2061013007, conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10), conv('2061013007', 16, 10);
+------------+-----------------------------------------------------------------+----------------------------+
| 2061013007 | conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10) | conv('2061013007', 16, 10) |
+------------+-----------------------------------------------------------------+----------------------------+
| 2061013007 | 1627467783                                                      | 139066421255               |
+------------+-----------------------------------------------------------------+----------------------------+

Workaround: Cast the value to string and use conv(string, from_base, to_base) for conversion.

Issues Fixed in Impala 2.4.1

Issues Fixed in Impala 2.4.0

The set of fixes for Impala in Impala 2.4.0 is the same as in Impala 2.3.2.

Issues Fixed in Impala 2.3.4

Issues Fixed in Impala 2.3.2

This section lists the most serious or frequently encountered customer issues fixed in Impala 2.3.2.

SEGV in AnalyticEvalNode touching NULL input_stream_

A query involving an analytic function could encounter a serious error. This issue was encountered infrequently, depending upon specific combinations of queries and data.

Bug: IMPALA-2829

Free local allocations per row batch in non-partitioned AGG and HJ

An outer join query could fail unexpectedly with an out-of-memory error when the "spill to disk" mechanism was turned off.

Bug: IMPALA-2722

Free local allocations once for every row batch when building hash tables

A join query could encounter a serious error due to an internal failure to allocate memory, which resulted in dereferencing a NULL pointer.

Bug: IMPALA-2612

Prevent migrating incorrectly inferred identity predicates into inline views

Referring to the same column twice in a view definition could cause the view to omit rows where that column contained a NULL value. This could cause incorrect results due to an inaccurate COUNT(*) value or rows missing from the result set.

Bug: IMPALA-2643

Fix GRANTs on URIs with uppercase letters

A GRANT statement for a URI could be ineffective if the URI contained uppercase letters, for example in an uppercase directory name. Subsequent statements, such as CREATE EXTERNAL TABLE with a LOCATION clause, could fail with an authorization exception.

Bug: IMPALA-2695

Avoid sending large partition stats objects over thrift

The catalogd daemon could encounter a serious error when loading the incremental statistics metadata for tables with large numbers of partitions and columns. The problem occurred when the internal representation of metadata for the table exceeded 2 GB, for example in a table with 20K partitions and 77 columns. The fix causes a COMPUTE INCREMENTAL STATS operation to fail if it would produce metadata that exceeded the maximum size.

Bug: IMPALA-2664, IMPALA-2648

Throw AnalysisError if table properties are too large (for the Hive metastore)

CREATE TABLE or ALTER TABLE statements could fail with metastore database errors due to length limits on the SERDEPROPERTIES and TBLPROPERTIES clauses. (The limit on key size is 256, while the limit on value size is 4000.) The fix makes Impala handle these error conditions more cleanly, by detecting too-long values rather than passing them to the metastore database.

Bug: IMPALA-2226

Make MAX_PAGE_HEADER_SIZE configurable

Impala could fail to access Parquet data files with page headers larger than 8 MB, which could occur, for example, if the minimum or maximum values for a column were long strings. The fix adds a configuration setting --max_page_header_size, which you can use to increase the Impala size limit to a value higher than 8 MB.

Bug: IMPALA-2273

reduce scanner memory usage

Queries on Parquet tables could consume excessive memory (potentially multiple gigabytes) due to producing large intermediate data values while evaluating groups of rows. The workaround was to reduce the size of the NUM_SCANNER_THREADS query option, the BATCH_SIZE query option, or both.

Bug: IMPALA-2473

Handle error when distinct and aggregates are used with a having clause

A query that included a DISTINCT operator and a HAVING clause, but no aggregate functions or GROUP BY, would fail with an uninformative error message.

Bug: IMPALA-2113

Handle error when star based select item and aggregate are incorrectly used

A query that included * in the SELECT list, in addition to an aggregate function call, would fail with an uninformative message if the query had no GROUP BY clause.

Bug: IMPALA-2225

Refactor MemPool usage in HBase scan node

Queries involving HBase tables used substantially more memory than in earlier Impala versions. The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284. The fix for this issue involves removing a separate memory work area for HBase queries and reusing other memory that was already allocated.

Bug: IMPALA-2731

Fix migration/assignment of On-clause predicates inside inline views

Some combinations of ON clauses in join queries could result in comparisons being applied at the wrong stage of query processing, leading to incorrect results. Wrong predicate assignment could happen under the following conditions:

The query includes an inline view that contains an outer join.
That inline view is joined with another table in the enclosing query block.
That join has an ON clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view.

Bug: IMPALA-1459

DCHECK in parquet scanner after block read error

A debug build of Impala could encounter a serious error after encountering some kinds of I/O errors for Parquet files. This issue only occurred in debug builds, not release builds.

Bug: IMPALA-2558

PAGG hits mem_limit when switching to I/O buffers

A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory. The cause was the internal ordering of operations that could cause a later phase of the query to allocate memory required by an earlier phase of the query. The workaround was to either increase or decrease the MEM_LIMIT query option, because the issue would only occur for a specific combination of memory limit and data volume.

Bug: IMPALA-2535

Fix check failed: sorter_runs_.back()->is_pinned_

A query could fail with an internal error while calculating the memory limit. This was an infrequent condition uncovered during stress testing.

Bug: IMPALA-2559

Don't ignore Status returned by DataStreamRecvr::CreateMerger()

A query could fail with an internal error while calculating the memory limit. This was an infrequent condition uncovered during stress testing.

Bug: IMPALA-2614, IMPALA-2559

DataStreamSender::Send() does not return an error status if SendBatch() failed

Bug: IMPALA-2591

Re-enable SSL and Kerberos on server-server

These fixes lift the restriction on using SSL encryption and Kerberos authentication together for internal communication between Impala components.

Bug: IMPALA-2598, IMPALA-2747

Issues Fixed in Impala 2.3.1

The version of Impala that is included with Impala 2.3.1 is identical to Impala 2.3.0. There are no new bug fixes, new features, or incompatible changes.

Issues Fixed in Impala 2.3.0

This section lists the most serious or frequently encountered customer issues fixed in Impala 2.3. Any issues already fixed in Impala 2.2 maintenance releases (up through Impala 2.2.8) are also included. Those issues are listed under the respective Impala 2.2 sections and are not repeated here.

Fixes for Serious Errors

A number of issues were resolved that could result in serious errors when encountered. The most critical or commonly encountered are listed here.

Bugs: IMPALA-2168, IMPALA-2378, IMPALA-2369, IMPALA-2357, IMPALA-2319, IMPALA-2314, IMPALA-2016

Fixes for Correctness Errors

A number of issues were resolved that could result in wrong results when encountered. The most critical or commonly encountered are listed here.

Bugs: IMPALA-2192, IMPALA-2440, IMPALA-2090, IMPALA-2086, IMPALA-1947, IMPALA-1917

Issues Fixed in Impala 2.2.10

Issues Fixed in Impala 2.2.9

This section lists the most frequently encountered customer issues fixed in Impala 2.2.9.

Query return empty result if it contains NullLiteral in inlineview

If an inline view in a FROM clause contained a NULL literal, the result set was empty.

Bug: IMPALA-1917

HBase scan node uses 2-4x memory after upgrade to Impala 2.2.8

Queries involving HBase tables used substantially more memory than in earlier Impala versions. The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284. The fix for this issue involves removing a separate memory work area for HBase queries and reusing other memory that was already allocated.

Bug: IMPALA-2731

Fix migration/assignment of On-clause predicates inside inline views

Some combinations of ON clauses in join queries could result in comparisons being applied at the wrong stage of query processing, leading to incorrect results. Wrong predicate assignment could happen under the following conditions:

The query includes an inline view that contains an outer join.
That inline view is joined with another table in the enclosing query block.
That join has an ON clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view.

Bug: IMPALA-1459

Fix wrong predicate assignment in outer joins

The join predicate for an OUTER JOIN clause could be applied at the wrong stage of query processing, leading to incorrect results.

Bug: IMPALA-2446

Avoid sending large partition stats objects over thrift

The catalogd daemon could encounter a serious error when loading the incremental statistics metadata for tables with large numbers of partitions and columns. The problem occurred when the internal representation of metadata for the table exceeded 2 GB, for example in a table with 20K partitions and 77 columns. The fix causes a COMPUTE INCREMENTAL STATS operation to fail if it would produce metadata that exceeded the maximum size.

Bug: IMPALA-2648, IMPALA-2664

Avoid overflow when adding large intervals to TIMESTAMPs

Adding or subtracting a large INTERVAL value to a TIMESTAMP value could produce an incorrect result, with the value wrapping instead of returning an out-of-range error.

Bug: IMPALA-1675

Analysis exception when a binary operator contains an IN operator with values

An IN operator with literal values could cause a statement to fail if used as the argument to a binary operator, such as an equality test for a BOOLEAN value.

Bug: IMPALA-1949

Make MAX_PAGE_HEADER_SIZE configurable

Impala could fail to access Parquet data files with page headers larger than 8 MB, which could occur, for example, if the minimum or maximum values for a column were long strings. The fix adds a configuration setting --max_page_header_size, which you can use to increase the Impala size limit to a value higher than 8 MB.

Bug: IMPALA-2273

Fix spilling sorts with var-len slots that are NULL or empty.

A query that activated the spill-to-disk mechanism could fail if it contained a sort expression involving certain combinations of fixed-length or variable-length types.

Bug: IMPALA-2357

Work-around IMPALA-2344: Fail query with OOM in case block->Pin() fails

Some queries that activated the spill-to-disk mechanism could produce a serious error if there was insufficient memory to set up internal work areas. Now those queries produce normal out-of-memory errors instead.

Bug: IMPALA-2344

Crash (likely race) tearing down BufferedBlockMgr on query failure

A serious error could occur under rare circumstances, due to a race condition while freeing memory during heavily concurrent workloads.

Bug: IMPALA-2252

QueryExecState doesn't check for query cancellation or errors

A call to SetError() in a user-defined function (UDF) would not cause the query to fail as expected.

Bug: IMPALA-1746

Impala throws IllegalStateException when inserting data into a partition while select subquery group by partition columns

An INSERT ... SELECT operation into a partitioned table could fail if the SELECT query included a GROUP BY clause referring to the partition key columns.

Bug: IMPALA-2533

Issues Fixed in Impala 2.2.8

This section lists the most frequently encountered customer issues fixed in Impala 2.2.8.

Impala is unable to read hive tables created with the "STORED AS AVRO" clause

Impala could not read Avro tables created in Hive with the STORED AS AVRO clause.

Bug: IMPALA-1136, IMPALA-2161

make Parquet scanner fail query if the file size metadata is stale

If a Parquet file in HDFS was overwritten by a smaller file, Impala could encounter a serious error. Issuing a INVALIDATE METADATA statement before a subsequent query would avoid the error. The fix allows Impala to handle such inconsistencies in Parquet file length cleanly regardless of whether the table metadata is up-to-date.

Bug: IMPALA-2213

Avoid allocating StringBuffer > 1GB in ScannerContext::Stream::GetBytesInternal()

Impala could encounter a serious error when reading compressed text files larger than 1 GB. The fix causes Impala to issue an error message instead in this case.

Bug: IMPALA-2249

Disallow long (1<<30) strings in group_concat()

A query using the group_concat() function could encounter a serious error if the returned string value was larger than 1 GB. Now the query fails with an error message in this case.

Bug: IMPALA-2284

avoid FnvHash64to32 with empty inputs

An edge case in the algorithm used to distribute data among nodes could result in uneven distribution of work for some queries, with all data sent to the same node.

Bug: IMPALA-2270

The catalog does not close the connection to HMS during table invalidation

A communication error could occur between Impala and the Hive metastore database, causing Impala operations that update table metadata to fail.

Bug: IMPALA-2348

Wrong DCHECK in PHJ::ProcessProbeBatch

Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.

Bug: IMPALA-2364

Avoid cardinality 0 in scan nodes of small tables and low selectivity

Impala could generate a suboptimal query plan for some queries involving small tables.

Bug: IMPALA-2165

Issues Fixed in Impala 2.2.7

This section lists the most frequently encountered customer issues fixed in Impala 2.2.7.

Warn if table stats are potentially corrupt.

Impala warns if it detects a discrepancy in table statistics: a table considered to have zero rows even though there are data files present. In this case, Impala also skips query optimizations that are normally applied to very small tables.

Bug: IMPALA-1983:

Pass correct child node in 2nd phase merge aggregation.

A query could encounter a serious error if it included a particular combination of aggregate functions and inline views.

Bug: IMPALA-2266

Set the output smap of an EmptySetNode produced from an empty inline view.

A query could encounter a serious error if it included an inline view whose subquery had no FROM clause.

Bug: IMPALA-2216

Set an InsertStmt's result exprs from the source statement's result exprs.

A CREATE TABLE AS SELECT or INSERT ... SELECT statement could produce different results than a SELECT statement, for queries including a FULL JOIN clause and including literal values in the select list.

Bug: IMPALA-2203

Fix planning of empty union operands with analytics.

A query could return incorrect results if it contained a UNION clause, calls to analytic functions, and a constant expression that evaluated to FALSE.

Bug: IMPALA-2088

Retain eq predicates bound by grouping slots with complex grouping exprs.

A query containing an INNER JOIN clause could return undesired rows. Some predicate specified in the ON clause could be omitted from the filtering operation.

Bug: IMPALA-2089

Row count not set for empty partition when spec is used with compute incremental stats

A COMPUTE INCREMENTAL STATS statement could leave the row count for an emptyp partition as -1, rather than initializing the row count to 0. The missing statistic value could result in reduced query performance.

Bug: IMPALA-2199

Explicit aliases + ordinals analysis bug

A query could encounter a serious error if it included column aliases with the same names as table columns, and used ordinal numbers in an ORDER BY or GROUP BY clause.

Bug: IMPALA-1898

Fix TupleIsNullPredicate to return false if no tuples are nullable.

A query could return incorrect results if it included an outer join clause, inline views, and calls to functions such as coalesce() that can generate NULL values.

Bug: IMPALA-1987

fix Expr::ComputeResultsLayout() logic

A query could return incorrect results if the table contained multiple CHAR columns with length of 2 or less, and the query included a GROUP BY clause that referred to multiple such columns.

Bug: IMPALA-2178

Substitute an InsertStmt's partition key exprs with the root node's smap.

An INSERT statement could encounter a serious error if the SELECT portion called an analytic function.

Bug: IMPALA-1737

Issues Fixed in Impala Impala 2.2.5

This section lists the most frequently encountered customer issues fixed in Impala 2.2.5.

Impala DML/DDL operations corrupt table metadata leading to Hive query failures

When the Impala COMPUTE STATS statement was run on a partitioned Parquet table that was created in Hive, the table subsequently became inaccessible in Hive. The table was still accessible to Impala. Regaining access in Hive required a workaround of creating a new table. The error displayed in Hive was:

Error: Error while compiling statement: FAILED: SemanticException Class not found: org.apache.impala.hive.serde.ParquetInputFormat (state=42000,code=40000)

Bug: IMPALA-2048

Avoiding a DCHECK of NULL hash table in spilled right joins

A query could encounter a serious error if it contained a RIGHT OUTER, RIGHT ANTI, or FULL OUTER join clause and approached the memory limit on a host so that the "spill to disk" mechanism was activated.

Bug: IMPALA-1929

Bug in PrintTColumnValue caused wrong stats for TINYINT partition cols

Declaring a partition key column as a TINYINT caused problems with the COMPUTE STATS statement. The associated partitions would always have zero estimated rows, leading to potential inefficient query plans.

Bug: IMPALA-2136

Where clause does not propagate to joins inside nested views

A query that referred to a view whose query referred to another view containing a join, could return incorrect results. WHERE clauses for the outermost query were not always applied, causing the result set to include additional rows that should have been filtered out.

Bug: IMPALA-2018

Add effective_user() builtin

The user() function returned the name of the logged-in user, which might not be the same as the user name being checked for authorization if, for example, delegation was enabled.

Bug: IMPALA-2064

Resolution: Rather than change the behavior of the user() function, the fix introduces an additional function effective_user() that returns the user name that is checked during authorization.

Make UTC to local TimestampValue conversion faster.

Query performance was improved substantially for Parquet files containing TIMESTAMP data written by Hive, when the -convert_legacy_hive_parquet_utc_timestamps=true setting is in effect.

Bug: IMPALA-2125

Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory()

A join query could encounter a serious error if the query approached the memory limit on a host so that the "spill to disk" mechanism was activated, and data volume in the join was large enough that an internal memory buffer exceeded 1 GB in size on a particular host. (Exceeding this limit would only happen for huge join queries, because Impala could split this intermediate data into 16 parts during the join query, and the buffer only contains compact bookkeeping data rather than the actual join column data.)

Bug: IMPALA-2065

Issues Fixed in Impala 2.2.3

This section lists the most frequently encountered customer issues fixed in Impala 2.2.3.

Enable using Isilon as the underlying filesystem.

Enabling Impala to work with the Isilon filesystem involves a number of fixes to performance and flexibility for dealing with I/O using remote reads. See Using Impala with Isilon Storage for details on using Impala and Isilon together.

Bug: IMPALA-1968, IMPALA-1730

Expand set of supported timezones.

The set of timezones recognized by Impala was expanded. You can always find the latest list of supported timezones in the Impala source code, in the file timezone_db.cc.

Bug: IMPALA-1381

Impala Timestamp ISO-8601 Support.

Impala can now process TIMESTAMP literals including a trailing z, signifying "Zulu" time, a synonym for UTC.

Bug: IMPALA-1963

Fix wrong warning when insert overwrite to empty table

An INSERT OVERWRITE operation would encounter an error if the SELECT portion of the statement returned zero rows, such as with a LIMIT 0 clause.

Bug: IMPALA-2008

Expand parsing of decimals to include scientific notation

DECIMAL literals can now include e scientific notation. For example, now CAST(1e3 AS DECIMAL(5,3)) is a valid expression. Formerly it returned NULL. Some scientific expressions might have worked before in DECIMAL context, but only when the scale was 0.

Bug: IMPALA-1952

Issues Fixed in Impala 2.2.1

This section lists the most frequently encountered customer issues fixed in Impala 2.2.1.

Issues Fixed in Impala 2.2.0

This section lists the most frequently encountered customer issues fixed in Impala 2.2.0.

For the full list of fixed issues in Impala 2.2.0, including over 40 critical issues, see this report in the Impala JIRA tracker.

Altering a column's type causes column stats to stop sticking for that column

When the type of a column was changed in either Hive or Impala through ALTER TABLE CHANGE COLUMN, the metastore database did not correctly propagate that change to the table that contains the column statistics. The statistics (particularly the NDV) for that column were permanently reset and could not be changed by Impala's COMPUTE STATS command. The underlying cause is a Hive bug (HIVE-9866).

Bug: IMPALA-1607

Resolution: Resolved by incorporating the fix for HIVE-9866.

Workaround: On systems without the corresponding Hive fix, change the column back to its original type. The stats reappear and you can recompute or drop them.

Impala may leak or use too many file descriptors

If a file was truncated in HDFS without a corresponding REFRESH in Impala, Impala could allocate memory for file descriptors and not free that memory.

Bug: IMPALA-1854

Spurious stale block locality messages

Impala could issue messages stating the block locality metadata was stale, when the metadata was actually fine. The internal "remote bytes read" counter was not being reset properly. This issue did not cause an actual slowdown in query execution, but the spurious error could result in unnecessary debugging work and unnecessary use of the INVALIDATE METADATA statement.

Bug: IMPALA-1712

DROP TABLE fails after COMPUTE STATS and ALTER TABLE RENAME to a different database.

When a table was moved from one database to another, the column statistics were not pointed to the new database.i This could result in lower performance for queries due to unavailable statistics, and also an inability to drop the table.

Bug: IMPALA-1711

IMPALA-1556 causes memory leak with secure connections

impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.

Bug: IMPALA-1674

unix_timestamp() does not return correct time

The unix_timestamp() function could return an incorrect value (a constant value of 1).

Bug: IMPALA-1623

Impala incorrectly handles text data missing a newline on the last line

Some queries did not recognize the final line of a text data file if the line did not end with a newline character. This could lead to inconsistent results, such as a different number of rows for SELECT COUNT(*) as opposed to SELECT *.

Bug: IMPALA-1476

Impala's ACLs check do not consider all group ACLs, only checked first one.

If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.

Bug: IMPALA-1805

Fix infinite loop opening or closing file with invalid metadata

Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.

Bug: IMPALA-1794

Cannot write Parquet files when values are larger than 64KB

Impala could sometimes fail to INSERT into a Parquet table if a column value such as a STRING was larger than 64 KB.

Bug: IMPALA-1705

Impala Will Not Run on Certain Intel CPUs

This fix relaxes the CPU requirement for Impala. Now only the SSSE3 instruction set is required. Formerly, SSE4.1 instructions were generated, making Impala refuse to start on some older CPUs.

Bug: IMPALA-1646

Issues Fixed in Impala 2.1.10

Issues Fixed in Impala 2.1.7

This section lists the most significant Impala issues fixed in Impala 2.1.7.

Query return empty result if it contains NullLiteral in inlineview

If an inline view in a FROM clause contained a NULL literal, the result set was empty.

Bug: IMPALA-1917

Fix edge cases for decimal/integer cast

A value of type DECIMAL(3,0) could be incorrectly cast to TINYINT. The resulting out-of-range value could be incorrect. After the fix, the smallest type that is allowed for this cast is INT, and attempting to use DECIMAL(3,0) in a TINYINT context produces a "loss of precision" error.

Bug: IMPALA-2264

Constant filter expressions are not checked for errors and state cleanup on exception / DCHECK on destroying an ExprContext

An invalid constant expression in a WHERE clause (for example, an invalid regular expression pattern) could fail to clean up internal state after raising a query error. Therefore, certain combinations of invalid expressions in a query could cause a crash, or cause a query to continue when it should halt with an error.

Bug: IMPALA-1756, IMPALA-2514

QueryExecState does not check for query cancellation or errors

A call to SetError() in a user-defined function (UDF) would not cause the query to fail as expected.

Bug: IMPALA-1746, IMPALA-2141

Issues Fixed in Impala 2.1.6

This section lists the most significant Impala issues fixed in Impala 2.1.6.

Wrong DCHECK in PHJ::ProcessProbeBatch

Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.

Bug: IMPALA-2364

LargestSpilledPartition was not checking if partition is closed

Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.

Bug: IMPALA-2314

Avoid cardinality 0 in scan nodes of small tables and low selectivity

Impala could generate a suboptimal query plan for some queries involving small tables.

Bug: IMPALA-2165

fix Expr::ComputeResultsLayout() logic

Queries using the GROUP BY operator on multiple CHAR columns with length less than or equal to 2 characters could return incorrect results for some columns.

Bug: IMPALA-2178

Properly unescape string value for HBase filters

Queries against HBase tables could return incomplete results if the WHERE clause included string comparisons using literals containing escaped quotation marks.

Bug: IMPALA-2133

Avoiding a DCHECK of NULL hash table in spilled right joins

A query could encounter a serious error if it contained a RIGHT OUTER, RIGHT ANTI, or FULL OUTER join clause and approached the memory limit on a host so that the "spill to disk" mechanism was activated.

Bug: IMPALA-1929

Issues Fixed in Impala 2.1.5

This section lists the most significant Impala issues fixed in Impala 2.1.5.

Avoid calling ProcessBatch with out_batch->AtCapacity in right joins

Queries including RIGHT OUTER JOIN, RIGHT ANTI JOIN, or FULL OUTER JOIN clauses and involving a high volume of data could encounter a serious error.

Bug: IMPALA-1919

Issues Fixed in Impala 2.1.4

This section lists the most significant Impala issues fixed in Impala 2.1.4.

Crash: impala::TupleIsNullPredicate::Prepare

When expressions that tested for NULL were used in combination with analytic functions, an error could occur. (The original crash issue was fixed by an earlier patch.)

Bug: IMPALA-1519

Expand parsing of decimals to include scientific notation

DECIMAL literals could include e scientific notation. For example, now CAST(1e3 AS DECIMAL(5,3)) is a valid expression. Formerly it returned NULL. Some scientific expressions might have worked before in DECIMAL context, but only when the scale was 0.

Bug: IMPALA-1952

INSERT/CTAS evaluates and applies constant predicates.

An INSERT OVERWRITE statement would write new data, even if a constant clause such as WHERE 1 = 0 should have prevented it from writing any rows.

Bug: IMPALA-1860

Assign predicates below analytic functions with a compatible partition by clause

If the PARTITION BY clause in an analytic function refers to partition key columns in a partitioned table, now Impala can perform partition pruning during the analytic query.

Bug: IMPALA-1900

FIRST_VALUE may produce incorrect results with preceding windows

A query using the FIRST_VALUE analytic function and a window defined with the PRECEDING keyword could produce wrong results.

Bug: IMPALA-1888

FIRST_VALUE rewrite fn type might not match slot type

A query referencing a DECIMAL column with the FIRST_VALUE analytic function could encounter an error.

Bug: IMPALA-1559

AnalyticEvalNode cannot handle partition/order by exprs with NaN

A query using an analytic function could encounter an error if the evaluation of an analytic ORDER BY or PARTITION expression resulted in a NaN value, for example if the ORDER BY or PARTITION contained a division operation where both operands were zero.

Bug: IMPALA-1808

AnalyticEvalNode not properly handling nullable tuples

An analytic function containing only an OVER clause could encounter an error if another part of the query (specifically an outer join) produced all-NULL tuples.

Bug: IMPALA-1562

Issues Fixed in Impala 2.1.3

This section lists the most significant issues fixed in Impala 2.1.3.

Add compatibility flag for Hive-Parquet-Timestamps

When Hive writes TIMESTAMP values, it represents them in the local time zone of the server. Impala expects TIMESTAMP values to always be in the UTC time zone, possibly leading to inconsistent results depending on which component created the data files. This patch introduces a new startup flag, -convert_legacy_hive_parquet_utc_timestamps for the impalad daemon. Specify -convert_legacy_hive_parquet_utc_timestamps=true to make Impala recognize Parquet data files written by Hive and automatically adjust TIMESTAMP values read from those files into the UTC time zone for compatibility with other Impala TIMESTAMP processing. Although this setting is currently turned off by default, consider enabling it if practical in your environment, for maximum interoperability with Hive-created Parquet files.

Bug: IMPALA-1658

Use snprintf() instead of lexical_cast() in float-to-string casts

Converting a floating-point value to a STRING could be slower than necessary.

Bug: IMPALA-1738

Fix partition spilling cleanup when new stream OOMs

Certain calls to aggregate functions with STRING arguments could encounter a serious error when the system ran low on memory and attempted to activate the spill-to-disk mechanism. The error message referenced the function impala::AggregateFunctions::StringValGetValue.

Bug: IMPALA-1865

Impala's ACLs check do not consider all group ACLs, only checked first one.

If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.

Bug: IMPALA-1805

Fix infinite loop opening or closing file with invalid metadata

Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.

Bug: IMPALA-1794

external-data-source-executor leaking global jni refs

Successive calls to the data source API could result in excessive memory consumption, with memory allocated but never freed.

Bug: IMPALA-1801

Spurious stale block locality messages

Impala could issue messages stating the block locality metadata was stale, when the metadata was actually fine. The internal "remote bytes read" counter was not being reset properly. This issue did not cause an actual slowdown in query execution, but the spurious error could result in unnecessary debugging work and unnecessary use of the INVALIDATE METADATA statement.

Bug: IMPALA-1712

Issues Fixed in Impala 2.1.2

This section lists the most significant issues fixed in Impala 2.1.2.

For the full list of fixed issues in Impala 2.1.2, see this report in the Impala JIRA tracker.

Impala incorrectly handles double numbers with more than 19 significant decimal digits

When a floating-point value was read from a text file and interpreted as a FLOAT or DOUBLE value, it could be incorrectly interpreted if it included more than 19 significant digits.

Bug: IMPALA-1622

unix_timestamp() does not return correct time

The unix_timestamp() function could return an incorrect value (a constant value of 1).

Bug: IMPALA-1623

Row Count Mismatch: Partition pruning with NULL

A query against a partitioned table could return incorrect results if the WHERE clause compared the partition key to NULL using operators such as = or !=.

Bug: IMPALA-1535

Fetch column stats in bulk using new (Hive .13) HMS APIs

The performance of the COMPUTE STATS statement and queries was improved, particularly for wide tables.

Bug: IMPALA-1120

Issues Fixed in Impala 2.1.1

This section lists the most significant issues fixed in Impala 2.1.1.

For the full list of fixed issues in Impala 2.1.1, see this report in the Impala JIRA tracker.

IMPALA-1556 causes memory leak with secure connections

impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.

Bug: https://issues.apache.org/jira/browse/IMPALA-1674 IMPALA-1674

TSaslServerTransport::Factory::getTransport() leaks transport map entries

impalad daemons in clusters secured by Kerberos or LDAP could experience a slight memory leak on each connection. The accumulation of unreleased memory could cause problems on long-running clusters.

Bug: IMPALA-1668

Issues Fixed in Impala 2.1.0

This section lists the most significant issues fixed in Impala 2.1.0.

For the full list of fixed issues in Impala 2.1.0, see this report in the Impala JIRA tracker.

Kerberos fetches 3x slower

Transferring large result sets back to the client application on Kerberos

Bug: IMPALA-1455

Compressed file needs to be hold on entirely in Memory

Queries on gzipped text files required holding the entire data file and its uncompressed representation in memory at the same time. SELECT and COMPUTE STATS statements could fail or perform inefficiently as a result. The fix enables streaming reads for gzipped text, so that the data is uncompressed as it is read.

Bug: IMPALA-1556

Cannot read hbase metadata with NullPointerException: null

Impala might not be able to access HBase tables, depending on the associated levels of Impala and HBase on the system.

Bug: IMPALA-1611

Serious errors / crashes

Improved code coverage in Impala testing uncovered a number of potentially serious errors that could occur with specific query syntax. These errors are resolved in Impala 2.1.

Bug: IMPALA-1553, IMPALA-1528, IMPALA-1526, IMPALA-1524, IMPALA-1508, IMPALA-1493, IMPALA-1501, IMPALA-1483

Issues Fixed in Impala 2.0.5

For the full list of fixed issues in Impala 2.0.5, see this report in the Impala JIRA tracker.

Issues Fixed in Impala 2.0.4

This section lists the most significant issues fixed in Impala 2.0.4.

For the full list of fixed issues in Impala 2.0.4, see this report in the Impala JIRA tracker.

Add compatibility flag for Hive-Parquet-Timestamps

When Hive writes TIMESTAMP values, it represents them in the local time zone of the server. Impala expects TIMESTAMP values to always be in the UTC time zone, possibly leading to inconsistent results depending on which component created the data files. This patch introduces a new startup flag, -convert_legacy_hive_parquet_utc_timestamps for the impalad daemon. Specify -convert_legacy_hive_parquet_utc_timestamps=true to make Impala recognize Parquet data files written by Hive and automatically adjust TIMESTAMP values read from those files into the UTC time zone for compatibility with other Impala TIMESTAMP processing. Although this setting is currently turned off by default, consider enabling it if practical in your environment, for maximum interoperability with Hive-created Parquet files.

Bug: IMPALA-1658

IoMgr infinite loop opening/closing file when shorter than cached metadata size

If a table data file was replaced by a shorter file outside of Impala, such as with INSERT OVERWRITE in Hive producing an empty output file, subsequent Impala queries could hang.

Bug: IMPALA-1794

Issues Fixed in Impala 2.0.3

This section lists the most significant issues fixed in Impala 2.0.3.

For the full list of fixed issues in Impala 2.0.3, see this report in the Impala JIRA tracker.

Anti join could produce incorrect results when spilling

An anti-join query (or a NOT EXISTS operation that was rewritten internally into an anti-join) could produce incorrect results if Impala reached its memory limit, causing the query to write temporary results to disk.

Bug: IMPALA-1471

Row Count Mismatch: Partition pruning with NULL

A query against a partitioned table could return incorrect results if the WHERE clause compared the partition key to NULL using operators such as = or !=.

Bug: IMPALA-1535

Fetch column stats in bulk using new (Hive .13) HMS APIs

The performance of the COMPUTE STATS statement and queries was improved, particularly for wide tables.

Bug: IMPALA-1120

Issues Fixed in Impala 2.0.2

This section lists the most significant issues fixed in Impala 2.0.2.

For the full list of fixed issues in Impala 2.0.2, see this report in the Impala JIRA tracker.

GROUP BY on STRING column produces inconsistent results

Some operations in queries submitted through Hue or other HiveServer2 clients could produce inconsistent results.

Bug: IMPALA-1453

Fix leaked file descriptor and excessive file descriptor use

Impala could encounter an error from running out of file descriptors. The fix reduces the amount of time file descriptors are kept open, and avoids leaking file descriptors when read operations encounter errors.

unix_timestamp() does not return correct time

The unix_timestamp() function could return a constant value 1 instead of a representation of the time.

Bug: IMPALA-1623

Impala should randomly select cached replica

To avoid putting too heavy a load on any one node, Impala now randomizes which scan node processes each HDFS data block rather than choosing the first cached block replica.

Bug: IMPALA-1586

Impala does not always give short name to Llama.

In clusters secured by Kerberos or LDAP, a discrepancy in internal transmission of user names could cause a communication error with Llama.

Bug: IMPALA-1606

accept unmangled native UDF symbols

The CREATE FUNCTION statement could report that it could not find a function entry point within the .so file for a UDF written in C++, even if the corresponding function was present.

Bug: IMPALA-1475

Issues Fixed in Impala 2.0.1

This section lists the most significant issues fixed in Impala 2.0.1.

For the full list of fixed issues in Impala 2.0.1, see this report in the Impala JIRA tracker.

Queries fail with metastore exception after upgrade and compute stats

After running the COMPUTE STATS statement on an Impala table, subsequent queries on that table could fail with the exception message Failed to load metadata for table: default.stats_test.

Bug: https://issues.apache.org/jira/browse/IMPALA-1416 IMPALA-1416

Workaround: Upgrading to a level of that includes the fix for HIVE-8627, prevents the problem from affecting future COMPUTE STATS statements. On affected levels of , or for Impala tables that have become inaccessible, the workaround is to disable the hive.metastore.try.direct.sql setting in the Hive metastore hive-site.xml file and issue the INVALIDATE METADATA statement for the affected table. You do not need to rerun the COMPUTE STATS statement for the table.

Issues Fixed in Impala 2.0.0

This section lists the most significant issues fixed in Impala 2.0.0.

For the full list of fixed issues in Impala 2.0.0, see this report in the Impala JIRA tracker.

Join Hint is dropped when used inside a view

Hints specified within a view query did not take effect when the view was queried, leading to slow performance. As part of this fix, Impala now supports hints embedded within comments.

Bug: IMPALA-995"