Fixed Issues in Apache Impala
The following sections describe the major issues fixed in each Impala release.
For known issues that are currently unresolved, see Known Issues and Workarounds in Impala.
Issues Fixed in Impala 4.0
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 4.0.
Issues Fixed in Impala 3.4
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.4.
Issues Fixed in Impala 3.3
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.3.
Issues Fixed in Impala 3.2
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.2.
The following is a list of noteworthy issues fixed in Impala 3.2:
- IMPALA-341 - Remote profiles
are no longer ignored by the coordinator for the queries with the
LIMIT
clause. - IMPALA-941- Impala supports fully qualified table names that start with a number.
- IMPALA-1048 - The query execution summary now includes the total time taken and memory consumed by the data sink at the root of each query fragment.
- IMPALA-3323 - Fixed the issue
where valid impala-shell options, such as
--ldap_password_cmd
, were unrecognized when the--config_file
option was specified. - IMPALA-5397 - If a query has a dedicated coordinator, its end time is now set when the query releases its admission control resources. With no dedicated coordinator, the end time is set on un-registration.
- IMPALA-5474 - Fixed an issue where adding a trivial subquery to a query with an error turns the error into a warning.
- IMPALA-6521 - When set, experimental flags are now shown in /varz in web UI and log files.
- IMPALA-6900 -
INVALIDATE METADATA
operation is no longer ignored when HMS is empty. - IMPALA-7446 - Impala enables buffer pool garbage collection when near process memory limit to prevent queries from spilling to disk earlier than necessary.
- IMPALA-7659 - In
COMPUTE STATS
, Impala counts the number ofNULL
values in a table - IMPALA-7857 - Logs more information about StateStore failure detection.
- IMPALA-7928 - To increase the efficiency of the HDFS file handle cache, remote reads for a particular file are scheduled to a consistent set of executor nodes.
- IMPALA-7929 - Impala query on
tables created via Hive and mapped to HBase failed with an internal
exception because the qualifier of the HBase key column is null in the
mapped table. Impala relaxed the requirement and allows a
NULL
qualifier. - IMPALA-7960 - Impala now
returns a correct result when comparing
TIMESTAMP
to a string literal in a binary predicate where theTIMESTAMP
is casted toVARCHAR
of smaller length. - IMPALA-7961 - Fixed an issue
where queries running with the
SYNC_DDL
query option can fail when the Catalog Server is under a heavy load with concurrent catalog operations of long-running DDLs. - IMPALA-8026 - Impala query profile now reports correct row counts for all nested loop join modes.
- IMPALA-8061 - Impala correctly
initializes
S3_ACCESS_VALIDATED
variable to zero whenTARGET_FILESYSTEM=s3
. - IMPALA-8154 - Disabled the
Kerberos
auth_to_local
setting to prevent connection issues betweenimpalads
. - IMPALA-8188 - Impala now correctly detects an NVME device name and handles it.
- IMPALA-8245 - Added hostname to the timeout error message to enable the user to easily identify the host which has reached a bad connection state with the HDFS NameNode.
- IMPALA-8254 -
COMPUTE STATS
failed ifCOMPRESSION_CODEC
is set.
Issues Fixed in Impala 3.1
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.1.
Issues Fixed in Impala 3.0
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.0.
Issues Fixed in Impala 2.12
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.12.
Issues Fixed in Impala 2.11
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.11.
Issues Fixed in Impala 2.10
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.10.
Issues Fixed in Impala 2.9.0
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.9.
Issues Fixed in Impala 2.8.0
For the full list of Impala fixed issues in Impala 2.8, see this report in the Impala JIRA tracker.
Issues Fixed in Impala 2.7.0
For the full list of Impala fixed issues in Impala 2.7.0, see this report in the Impala JIRA tracker.
Issues Fixed in Impala 2.6.3
Issues Fixed in Impala 2.6.2
Issues Fixed in Impala 2.6.0
The following list contains the most critical fixed issues
(priority='Blocker'
) from the JIRA system.
For the full list of fixed issues in Impala 2.6.0, see
this report in the Impala JIRA tracker.
RuntimeState::error_log_ crashes
A crash could occur, with stack trace pointing to impala::RuntimeState::ErrorLog
.
Bug: IMPALA-3385
Severity: High
HiveUdfCall::Open() produces unsynchronized access to JniUtil::global_refs_ vector
A crash could occur because of contention between multiple calls to Java UDFs.
Bug: IMPALA-3378
Severity: High
HBaseTableWriter::CreatePutList() produces unsynchronized access to JniUtil::global_refs_ vector
A crash could occur because of contention between multiple concurrent statements writing to HBase.
Bug: IMPALA-3379
Severity: High
Stress test failure: sorter.cc:745] Check failed: i == 0 (1 vs. 0)
A crash or wrong results could occur if the spill-to-disk mechanism encountered a zero-length string at the very end of a data block.
Bug: IMPALA-3317
Severity: High
String data coming out of agg can be corrupted by blocking operators
If a query plan contains an aggregation node producing string values anywhere within a subplan (that is,if in the SQL statement, the aggregate function appears within an inline view over a collection column), the results of the aggregation may be incorrect.
Bug: IMPALA-3311
Severity: High
CTAS with subquery throws AuthzException
A CREATE TABLE AS SELECT
operation could fail with an authorization error,
due to a slight difference in the privilege checking for the CTAS operation.
Bug: IMPALA-3269
Severity: High
Crash on inserting into table with binary and parquet
Impala incorrectly allowed BINARY
to be specified as a column type,
resulting in a crash during a write to a Parquet table with a column of that type.
Bug: IMPALA-3237
Severity: High
RowBatch::MaxTupleBufferSize() calculation incorrect, may lead to memory corruption
A crash could occur while querying tables with very large rows, for example wide tables with many columns or very large string values. This problem was identified in Impala 2.3, but had low reproducibility in subsequent releases. The fix ensures the memory allocation size is correct.
Bug: IMPALA-3105
Severity: High
Thrift buffer overflows when serialize more than 3355443200 bytes in impala
A very large memory allocation within the catalogd daemon could exceed an internal Thrift limit, causing a crash.
Bug: IMPALA-3494
Severity: High
Altering table partition's storage format is not working and crashing the daemon
If a partitioned table used a file format other than Avro, and the file format of an individual partition was changed to Avro, subsequent queries could encounter a crash.
Bug: IMPALA-3314
Severity: High
Race condition may cause scanners to spin with runtime filters on Avro or Sequence files
A timing problem during runtime filter processing could cause queries against Avro or SequenceFile tables to hang.
Bug: IMPALA-3798
Severity: High
Issues Fixed in Impala 2.5.4
Issues Fixed in Impala 2.5.2
Issues Fixed in Impala 2.5.1
Issues Fixed in Impala 2.5.0
The following list contains the most critical issues (priority='Blocker'
) from the JIRA system.
For the full list of fixed issues in Impala 2.5, see
this report in the Impala JIRA tracker.
Stress test hit assert in LLVM: external function could not be resolved
Bug: IMPALA-2683
The stress test was running a build with the TPC-H, TPC-DS, and TPC-H nested queries with scale factor 3.
Impalad is crashing if udf jar is not available in hdfs location for first time
Bug: IMPALA-2365
If a UDF JAR was not available in the HDFS location specified in the CREATE FUNCTION
statement,
the impalad daemon could crash.
PAGG hits mem_limit when switching to I/O buffers
Bug: IMPALA-2535
A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
The cause was the internal ordering of operations that could cause a later phase of the query to
allocate memory required by an earlier phase of the query. The workaround was to either increase
or decrease the MEM_LIMIT
query option, because the issue would only occur for a specific
combination of memory limit and data volume.
Prevent migrating incorrectly inferred identity predicates into inline views
Bug: IMPALA-2643
Referring to the same column twice in a view definition could cause the view to omit
rows where that column contained a NULL
value. This could cause
incorrect results due to an inaccurate COUNT(*)
value or rows missing
from the result set.
Fix migration/assignment of On-clause predicates inside inline views
Bug: IMPALA-1459
Some combinations of ON
clauses in join queries could result in comparisons
being applied at the wrong stage of query processing, leading to incorrect results.
Wrong predicate assignment could happen under the following conditions:
- The query includes an inline view that contains an outer join.
- That inline view is joined with another table in the enclosing query block.
-
That join has an
ON
clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view.
Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate
Bug: IMPALA-2093
IN
subqueries might return wrong results if the left-hand side of the IN
is a constant.
For example:
select * from alltypestiny t1
where 10 not in (select sum(int_col) from alltypestiny);
Parquet DictDecoders accumulate throughout query
Bug: IMPALA-2940
Parquet dictionary decoders can accumulate throughout query execution, leading to excessive memory usage. One decoder is created per-column per-split.
Planner doesn't set the has_local_target field correctly
Bug: IMPALA-3056
MemPool allocation growth behavior
Bug: IMPALA-2742
Currently, the MemPool would always double the size of the last allocation. This can lead to bad behavior if the MemPool transferred the ownership of all its data except the last chunk. In the next allocation, the next allocated chunk would double the size of this large chunk, which can be undesirable.
Drop partition operations don't follow the catalog's locking protocol
Bug: IMPALA-3035
The CatalogOpExecutor.alterTableDropPartition()
function violates
the locking protocol used in the catalog that requires catalogLock_
to be acquired before any table-level lock. That may cause deadlocks when ALTER TABLE DROP PARTITION
is executed concurrently with other DDL operations.
HAVING clause without aggregation not applied properly
Bug: IMPALA-2215
A query with a HAVING
clause but no GROUP BY
clause was not being rejected,
despite being invalid syntax. For example:
select case when 1=1 then 'didit' end as c1 from (select 1 as one) a having 1!=1;
Hit DCHECK Check failed: HasDateOrTime()
Bug: IMPALA-2914
TimestampValue::ToTimestampVal()
requires a valid TimestampValue
as input.
This requirement was not enforced in some places, leading to serious errors.
Aggregation spill loop gives up too early leading to mem limit exceeded errors
Bug: IMPALA-2986
An aggregation query could fail with an out-of-memory error, despite sufficient memory being reported as available.
DataStreamSender::Channel::CloseInternal() does not close the channel on an error.
Bug: IMPALA-2592
Some queries do not close an internal communication channel on an error. This will cause the node on the other side of the channel to wait indefinitely, causing the query to hang. For example, this issue could happen on a Kerberos-enabled system if the credential cache was outdated. Although the affected query hangs, the impalad daemons continue processing other queries.
Codegen does not catch exceptions in FROM_UNIXTIME()
Bug: IMPALA-2184
Querying for the min or max value of a timestamp cast from a bigint via from_unixtime()
fails silently and crashes instances of impalad when the input includes a value outside of the valid range.
Workaround: Disable native code generation with:
SET disable_codegen=true;
Impala returns wrong result for function 'conv(bigint, from_base, to_base)'
Bug: IMPALA-2788
Impala returns wrong result for function conv()
.
Function conv(bigint, from_base, to_base)
returns an correct result,
while conv(string, from_base, to_base)
returns the correct value.
For example:
select 2061013007, conv(2061013007, 16, 10), conv('2061013007', 16, 10);
+------------+--------------------------+----------------------------+
| 2061013007 | conv(2061013007, 16, 10) | conv('2061013007', 16, 10) |
+------------+--------------------------+----------------------------+
| 2061013007 | 1627467783 | 139066421255 |
+------------+--------------------------+----------------------------+
Fetched 1 row(s) in 0.65s
select 2061013007, conv(cast(2061013007 as bigint), 16, 10), conv('2061013007', 16, 10);
+------------+------------------------------------------+----------------------------+
| 2061013007 | conv(cast(2061013007 as bigint), 16, 10) | conv('2061013007', 16, 10) |
+------------+------------------------------------------+----------------------------+
| 2061013007 | 1627467783 | 139066421255 |
+------------+------------------------------------------+----------------------------+
select 2061013007, conv(cast(2061013007 as string), 16, 10), conv('2061013007', 16, 10);
+------------+------------------------------------------+----------------------------+
| 2061013007 | conv(cast(2061013007 as string), 16, 10) | conv('2061013007', 16, 10) |
+------------+------------------------------------------+----------------------------+
| 2061013007 | 139066421255 | 139066421255 |
+------------+------------------------------------------+----------------------------+
select 2061013007, conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10), conv('2061013007', 16, 10);
+------------+-----------------------------------------------------------------+----------------------------+
| 2061013007 | conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10) | conv('2061013007', 16, 10) |
+------------+-----------------------------------------------------------------+----------------------------+
| 2061013007 | 1627467783 | 139066421255 |
+------------+-----------------------------------------------------------------+----------------------------+
Workaround:
Cast the value to string and use conv(string, from_base, to_base)
for conversion.
Issues Fixed in Impala 2.4.1
Issues Fixed in Impala 2.4.0
The set of fixes for Impala in Impala 2.4.0 is the same as in Impala 2.3.2.
Issues Fixed in Impala 2.3.4
Issues Fixed in Impala 2.3.2
This section lists the most serious or frequently encountered customer issues fixed in Impala 2.3.2.
SEGV in AnalyticEvalNode touching NULL input_stream_
A query involving an analytic function could encounter a serious error. This issue was encountered infrequently, depending upon specific combinations of queries and data.
Bug: IMPALA-2829
Free local allocations per row batch in non-partitioned AGG and HJ
An outer join query could fail unexpectedly with an out-of-memory error when the "spill to disk" mechanism was turned off.
Bug: IMPALA-2722
Free local allocations once for every row batch when building hash tables
A join query could encounter a serious error due to an internal failure to allocate memory, which
resulted in dereferencing a NULL
pointer.
Bug: IMPALA-2612
Prevent migrating incorrectly inferred identity predicates into inline views
Referring to the same column twice in a view definition could cause the view to omit
rows where that column contained a NULL
value. This could cause
incorrect results due to an inaccurate COUNT(*)
value or rows missing
from the result set.
Bug: IMPALA-2643
Fix GRANTs on URIs with uppercase letters
A GRANT
statement for a URI could be ineffective if the URI
contained uppercase letters, for example in an uppercase directory name.
Subsequent statements, such as CREATE EXTERNAL TABLE
with a LOCATION
clause, could fail with an authorization exception.
Bug: IMPALA-2695
Avoid sending large partition stats objects over thrift
The catalogd daemon could encounter a serious error
when loading the incremental statistics metadata for tables with large
numbers of partitions and columns. The problem occurred when the
internal representation of metadata for the table exceeded 2
GB, for example in a table with 20K partitions and 77 columns. The fix causes a
COMPUTE INCREMENTAL STATS
operation to fail if it
would produce metadata that exceeded the maximum size.
Bug: IMPALA-2664, IMPALA-2648
Throw AnalysisError if table properties are too large (for the Hive metastore)
CREATE TABLE
or ALTER TABLE
statements could fail with
metastore database errors due to length limits on the SERDEPROPERTIES
and TBLPROPERTIES
clauses.
(The limit on key size is 256, while the limit on value size is 4000.) The fix makes Impala handle these error conditions
more cleanly, by detecting too-long values rather than passing them to the metastore database.
Bug: IMPALA-2226
Make MAX_PAGE_HEADER_SIZE configurable
Impala could fail to access Parquet data files with page headers larger than 8 MB, which could
occur, for example, if the minimum or maximum values for a column were long strings. The
fix adds a configuration setting --max_page_header_size
, which you can use to
increase the Impala size limit to a value higher than 8 MB.
Bug: IMPALA-2273
reduce scanner memory usage
Queries on Parquet tables could consume excessive memory (potentially multiple gigabytes) due to producing
large intermediate data values while evaluating groups of rows. The workaround was to reduce the size of
the NUM_SCANNER_THREADS
query option, the BATCH_SIZE
query option,
or both.
Bug: IMPALA-2473
Handle error when distinct and aggregates are used with a having clause
A query that included a DISTINCT
operator and a HAVING
clause, but no
aggregate functions or GROUP BY
, would fail with an uninformative error message.
Bug: IMPALA-2113
Handle error when star based select item and aggregate are incorrectly used
A query that included *
in the SELECT
list, in addition to an
aggregate function call, would fail with an uninformative message if the query had no
GROUP BY
clause.
Bug: IMPALA-2225
Refactor MemPool usage in HBase scan node
Queries involving HBase tables used substantially more memory than in earlier Impala versions. The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284. The fix for this issue involves removing a separate memory work area for HBase queries and reusing other memory that was already allocated.
Bug: IMPALA-2731
Fix migration/assignment of On-clause predicates inside inline views
Some combinations of ON
clauses in join queries could result in comparisons
being applied at the wrong stage of query processing, leading to incorrect results.
Wrong predicate assignment could happen under the following conditions:
- The query includes an inline view that contains an outer join.
- That inline view is joined with another table in the enclosing query block.
-
That join has an
ON
clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view.
Bug: IMPALA-1459
DCHECK in parquet scanner after block read error
A debug build of Impala could encounter a serious error after encountering some kinds of I/O errors for Parquet files. This issue only occurred in debug builds, not release builds.
Bug: IMPALA-2558
PAGG hits mem_limit when switching to I/O buffers
A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
The cause was the internal ordering of operations that could cause a later phase of the query to
allocate memory required by an earlier phase of the query. The workaround was to either increase
or decrease the MEM_LIMIT
query option, because the issue would only occur for a specific
combination of memory limit and data volume.
Bug: IMPALA-2535
Fix check failed: sorter_runs_.back()->is_pinned_
A query could fail with an internal error while calculating the memory limit. This was an infrequent condition uncovered during stress testing.
Bug: IMPALA-2559
Don't ignore Status returned by DataStreamRecvr::CreateMerger()
A query could fail with an internal error while calculating the memory limit. This was an infrequent condition uncovered during stress testing.
Bug: IMPALA-2614, IMPALA-2559
DataStreamSender::Send() does not return an error status if SendBatch() failed
Bug: IMPALA-2591
Re-enable SSL and Kerberos on server-server
These fixes lift the restriction on using SSL encryption and Kerberos authentication together for internal communication between Impala components.
Bug: IMPALA-2598, IMPALA-2747
Issues Fixed in Impala 2.3.1
The version of Impala that is included with Impala 2.3.1 is identical to Impala 2.3.0. There are no new bug fixes, new features, or incompatible changes.
Issues Fixed in Impala 2.3.0
This section lists the most serious or frequently encountered customer issues fixed in Impala 2.3. Any issues already fixed in Impala 2.2 maintenance releases (up through Impala 2.2.8) are also included. Those issues are listed under the respective Impala 2.2 sections and are not repeated here.
Fixes for Serious Errors
A number of issues were resolved that could result in serious errors when encountered. The most critical or commonly encountered are listed here.
Bugs: IMPALA-2168, IMPALA-2378, IMPALA-2369, IMPALA-2357, IMPALA-2319, IMPALA-2314, IMPALA-2016
Fixes for Correctness Errors
A number of issues were resolved that could result in wrong results when encountered. The most critical or commonly encountered are listed here.
Bugs: IMPALA-2192, IMPALA-2440, IMPALA-2090, IMPALA-2086, IMPALA-1947, IMPALA-1917
Issues Fixed in Impala 2.2.10
Issues Fixed in Impala 2.2.9
This section lists the most frequently encountered customer issues fixed in Impala 2.2.9.
Query return empty result if it contains NullLiteral in inlineview
If an inline view in a FROM
clause contained a NULL
literal,
the result set was empty.
Bug: IMPALA-1917
HBase scan node uses 2-4x memory after upgrade to Impala 2.2.8
Queries involving HBase tables used substantially more memory than in earlier Impala versions. The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284. The fix for this issue involves removing a separate memory work area for HBase queries and reusing other memory that was already allocated.
Bug: IMPALA-2731
Fix migration/assignment of On-clause predicates inside inline views
Some combinations of ON
clauses in join queries could result in comparisons
being applied at the wrong stage of query processing, leading to incorrect results.
Wrong predicate assignment could happen under the following conditions:
- The query includes an inline view that contains an outer join.
- That inline view is joined with another table in the enclosing query block.
-
That join has an
ON
clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view.
Bug: IMPALA-1459
Fix wrong predicate assignment in outer joins
The join predicate for an OUTER JOIN
clause could be applied at the wrong stage
of query processing, leading to incorrect results.
Bug: IMPALA-2446
Avoid sending large partition stats objects over thrift
The catalogd daemon could encounter a serious error when loading the
incremental statistics metadata for tables with large numbers of partitions and columns.
The problem occurred when the internal representation of metadata for the table exceeded 2
GB, for example in a table with 20K partitions and 77 columns. The fix causes a
COMPUTE INCREMENTAL STATS
operation to fail if it would produce
metadata that exceeded the maximum size.
Bug: IMPALA-2648, IMPALA-2664
Avoid overflow when adding large intervals to TIMESTAMPs
Adding or subtracting a large INTERVAL
value to a
TIMESTAMP
value could produce an incorrect result, with the value
wrapping instead of returning an out-of-range error.
Bug: IMPALA-1675
Analysis exception when a binary operator contains an IN operator with values
An IN
operator with literal values could cause a statement to fail if used
as the argument to a binary operator, such as an equality test for a BOOLEAN
value.
Bug: IMPALA-1949
Make MAX_PAGE_HEADER_SIZE configurable
Impala could fail to access Parquet data files with page headers larger than 8 MB, which
could occur, for example, if the minimum or maximum values for a column were long strings.
The fix adds a configuration setting --max_page_header_size
, which you
can use to increase the Impala size limit to a value higher than 8 MB.
Bug: IMPALA-2273
Fix spilling sorts with var-len slots that are NULL or empty.
A query that activated the spill-to-disk mechanism could fail if it contained a sort expression involving certain combinations of fixed-length or variable-length types.
Bug: IMPALA-2357
Work-around IMPALA-2344: Fail query with OOM in case block->Pin() fails
Some queries that activated the spill-to-disk mechanism could produce a serious error if there was insufficient memory to set up internal work areas. Now those queries produce normal out-of-memory errors instead.
Bug: IMPALA-2344
Crash (likely race) tearing down BufferedBlockMgr on query failure
A serious error could occur under rare circumstances, due to a race condition while freeing memory during heavily concurrent workloads.
Bug: IMPALA-2252
QueryExecState doesn't check for query cancellation or errors
A call to SetError()
in a user-defined function (UDF) would not cause the query to fail as expected.
Bug: IMPALA-1746
Impala throws IllegalStateException when inserting data into a partition while select subquery group by partition columns
An INSERT ... SELECT
operation into a partitioned table could fail if the SELECT
query
included a GROUP BY
clause referring to the partition key columns.
Bug: IMPALA-2533
Issues Fixed in Impala 2.2.8
This section lists the most frequently encountered customer issues fixed in Impala 2.2.8.
Impala is unable to read hive tables created with the "STORED AS AVRO" clause
Impala could not read Avro tables created in Hive with the STORED AS AVRO
clause.
Bug: IMPALA-1136, IMPALA-2161
make Parquet scanner fail query if the file size metadata is stale
If a Parquet file in HDFS was overwritten by a smaller file, Impala could encounter a serious error.
Issuing a INVALIDATE METADATA
statement before a subsequent query would avoid the error.
The fix allows Impala to handle such inconsistencies in Parquet file length cleanly regardless of whether the
table metadata is up-to-date.
Bug: IMPALA-2213
Avoid allocating StringBuffer > 1GB in ScannerContext::Stream::GetBytesInternal()
Impala could encounter a serious error when reading compressed text files larger than 1 GB. The fix causes Impala to issue an error message instead in this case.
Bug: IMPALA-2249
Disallow long (1<<30) strings in group_concat()
A query using the group_concat()
function could encounter a serious error if the returned string value was larger than 1 GB.
Now the query fails with an error message in this case.
Bug: IMPALA-2284
avoid FnvHash64to32 with empty inputs
An edge case in the algorithm used to distribute data among nodes could result in uneven distribution of work for some queries, with all data sent to the same node.
Bug: IMPALA-2270
The catalog does not close the connection to HMS during table invalidation
A communication error could occur between Impala and the Hive metastore database, causing Impala operations that update table metadata to fail.
Bug: IMPALA-2348
Wrong DCHECK in PHJ::ProcessProbeBatch
Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.
Bug: IMPALA-2364
Avoid cardinality 0 in scan nodes of small tables and low selectivity
Impala could generate a suboptimal query plan for some queries involving small tables.
Bug: IMPALA-2165
Issues Fixed in Impala 2.2.7
This section lists the most frequently encountered customer issues fixed in Impala 2.2.7.
Warn if table stats are potentially corrupt.
Impala warns if it detects a discrepancy in table statistics: a table considered to have zero rows even though there are data files present. In this case, Impala also skips query optimizations that are normally applied to very small tables.
Bug: IMPALA-1983:
Pass correct child node in 2nd phase merge aggregation.
A query could encounter a serious error if it included a particular combination of aggregate functions and inline views.
Bug: IMPALA-2266
Set the output smap of an EmptySetNode produced from an empty inline view.
A query could encounter a serious error if it included an inline view whose subquery had no FROM
clause.
Bug: IMPALA-2216
Set an InsertStmt's result exprs from the source statement's result exprs.
A CREATE TABLE AS SELECT
or INSERT ... SELECT
statement could produce
different results than a SELECT
statement, for queries including a FULL JOIN
clause
and including literal values in the select list.
Bug: IMPALA-2203
Fix planning of empty union operands with analytics.
A query could return incorrect results if it contained a UNION
clause,
calls to analytic functions, and a constant expression that evaluated to FALSE
.
Bug: IMPALA-2088
Retain eq predicates bound by grouping slots with complex grouping exprs.
A query containing an INNER JOIN
clause could return undesired rows.
Some predicate specified in the ON
clause could be omitted from the filtering operation.
Bug: IMPALA-2089
Row count not set for empty partition when spec is used with compute incremental stats
A COMPUTE INCREMENTAL STATS
statement could leave the row count for an emptyp partition as -1,
rather than initializing the row count to 0. The missing statistic value could result in reduced query performance.
Bug: IMPALA-2199
Explicit aliases + ordinals analysis bug
A query could encounter a serious error if it included column aliases with the same names as table columns, and used
ordinal numbers in an ORDER BY
or GROUP BY
clause.
Bug: IMPALA-1898
Fix TupleIsNullPredicate to return false if no tuples are nullable.
A query could return incorrect results if it included an outer join clause, inline views, and calls to functions such as coalesce()
that can generate NULL
values.
Bug: IMPALA-1987
fix Expr::ComputeResultsLayout() logic
A query could return incorrect results if the table contained multiple CHAR
columns with length of 2 or less,
and the query included a GROUP BY
clause that referred to multiple such columns.
Bug: IMPALA-2178
Substitute an InsertStmt's partition key exprs with the root node's smap.
An INSERT
statement could encounter a serious error if the SELECT
portion called an analytic function.
Bug: IMPALA-1737
Issues Fixed in Impala Impala 2.2.5
This section lists the most frequently encountered customer issues fixed in Impala 2.2.5.
Impala DML/DDL operations corrupt table metadata leading to Hive query failures
When the Impala COMPUTE STATS
statement was run on a partitioned Parquet table that was created in Hive, the table subsequently became inaccessible in Hive.
The table was still accessible to Impala. Regaining access in Hive required a workaround of creating a new table. The error displayed in Hive was:
Error: Error while compiling statement: FAILED: SemanticException Class not found: org.apache.impala.hive.serde.ParquetInputFormat (state=42000,code=40000)
Bug: IMPALA-2048
Avoiding a DCHECK of NULL hash table in spilled right joins
A query could encounter a serious error if it contained a RIGHT OUTER
, RIGHT ANTI
, or FULL OUTER
join clause
and approached the memory limit on a host so that the "spill to disk" mechanism was activated.
Bug: IMPALA-1929
Bug in PrintTColumnValue caused wrong stats for TINYINT partition cols
Declaring a partition key column as a TINYINT
caused problems with the COMPUTE STATS
statement.
The associated partitions would always have zero estimated rows, leading to potential inefficient query plans.
Bug: IMPALA-2136
Where clause does not propagate to joins inside nested views
A query that referred to a view whose query referred to another view containing a join, could return incorrect results.
WHERE
clauses for the outermost query were not always applied, causing the result
set to include additional rows that should have been filtered out.
Bug: IMPALA-2018
Add effective_user() builtin
The user()
function returned the name of the logged-in user, which might not be the
same as the user name being checked for authorization if, for example, delegation was enabled.
Bug: IMPALA-2064
Resolution: Rather than change the behavior of the user()
function,
the fix introduces an additional function effective_user()
that returns the user name that is checked during authorization.
Make UTC to local TimestampValue conversion faster.
Query performance was improved substantially for Parquet files containing TIMESTAMP
data written by Hive, when the -convert_legacy_hive_parquet_utc_timestamps=true
setting
is in effect.
Bug: IMPALA-2125
Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory()
A join query could encounter a serious error if the query approached the memory limit on a host so that the "spill to disk" mechanism was activated, and data volume in the join was large enough that an internal memory buffer exceeded 1 GB in size on a particular host. (Exceeding this limit would only happen for huge join queries, because Impala could split this intermediate data into 16 parts during the join query, and the buffer only contains compact bookkeeping data rather than the actual join column data.)
Bug: IMPALA-2065
Issues Fixed in Impala 2.2.3
This section lists the most frequently encountered customer issues fixed in Impala 2.2.3.
Enable using Isilon as the underlying filesystem.
Enabling Impala to work with the Isilon filesystem involves a number of fixes to performance and flexibility for dealing with I/O using remote reads. See Using Impala with Isilon Storage for details on using Impala and Isilon together.
Bug: IMPALA-1968, IMPALA-1730
Expand set of supported timezones.
The set of timezones recognized by Impala was expanded. You can always find the latest list of supported timezones in the Impala source code, in the file timezone_db.cc.
Bug: IMPALA-1381
Impala Timestamp ISO-8601 Support.
Impala can now process TIMESTAMP
literals including a trailing z
,
signifying "Zulu" time, a synonym for UTC.
Bug: IMPALA-1963
Fix wrong warning when insert overwrite to empty table
An INSERT OVERWRITE
operation would encounter an error
if the SELECT
portion of the statement returned zero
rows, such as with a LIMIT 0
clause.
Bug: IMPALA-2008
Expand parsing of decimals to include scientific notation
DECIMAL
literals can now include e
scientific notation.
For example, now CAST(1e3 AS DECIMAL(5,3))
is a valid expression.
Formerly it returned NULL
.
Some scientific expressions might have worked before in DECIMAL
context, but only when the scale was 0.
Bug: IMPALA-1952
Issues Fixed in Impala 2.2.1
This section lists the most frequently encountered customer issues fixed in Impala 2.2.1.
Issues Fixed in Impala 2.2.0
This section lists the most frequently encountered customer issues fixed in Impala 2.2.0.
For the full list of fixed issues in Impala 2.2.0, including over 40 critical issues, see this report in the Impala JIRA tracker.
Altering a column's type causes column stats to stop sticking for that column
When the type of a column was changed in either Hive or Impala through ALTER TABLE CHANGE COLUMN
, the metastore database did not correctly propagate
that change to the table that contains the column statistics. The statistics (particularly the NDV
) for that column were permanently reset
and could not be changed by Impala's COMPUTE STATS
command. The underlying cause is a Hive bug (HIVE-9866).
Bug: IMPALA-1607
Resolution: Resolved by incorporating the fix for HIVE-9866.
Workaround: On systems without the corresponding Hive fix, change the column back to its original type. The stats reappear and you can recompute or drop them.
Impala may leak or use too many file descriptors
If a file was truncated in HDFS without a corresponding REFRESH
in Impala, Impala could allocate memory for file descriptors and not free that memory.
Bug: IMPALA-1854
Spurious stale block locality messages
Impala could issue messages stating the block locality metadata was stale,
when the metadata was actually fine.
The internal "remote bytes read" counter was not being reset properly.
This issue did not cause an actual slowdown in query execution,
but the spurious error could result in unnecessary debugging work
and unnecessary use of the INVALIDATE METADATA
statement.
Bug: IMPALA-1712
DROP TABLE fails after COMPUTE STATS and ALTER TABLE RENAME to a different database.
When a table was moved from one database to another, the column statistics were not pointed to the new database.i This could result in lower performance for queries due to unavailable statistics, and also an inability to drop the table.
Bug: IMPALA-1711
IMPALA-1556 causes memory leak with secure connections
impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.
Bug: IMPALA-1674
unix_timestamp() does not return correct time
The unix_timestamp()
function could return an incorrect value (a constant value of 1).
Bug: IMPALA-1623
Impala incorrectly handles text data missing a newline on the last line
Some queries did not recognize the final line of a text data file if the line did not end with a newline character.
This could lead to inconsistent results, such as a different number of rows for SELECT COUNT(*)
as opposed to SELECT *
.
Bug: IMPALA-1476
Impala's ACLs check do not consider all group ACLs, only checked first one.
If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.
Bug: IMPALA-1805
Fix infinite loop opening or closing file with invalid metadata
Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.
Bug: IMPALA-1794
Cannot write Parquet files when values are larger than 64KB
Impala could sometimes fail to INSERT
into a Parquet table if a column value such as a STRING
was larger than 64 KB.
Bug: IMPALA-1705
Impala Will Not Run on Certain Intel CPUs
This fix relaxes the CPU requirement for Impala. Now only the SSSE3 instruction set is required. Formerly, SSE4.1 instructions were generated, making Impala refuse to start on some older CPUs.
Bug: IMPALA-1646
Issues Fixed in Impala 2.1.10
Issues Fixed in Impala 2.1.7
This section lists the most significant Impala issues fixed in Impala 2.1.7.
Query return empty result if it contains NullLiteral in inlineview
If an inline view in a FROM
clause contained a NULL
literal,
the result set was empty.
Bug: IMPALA-1917
Fix edge cases for decimal/integer cast
A value of type DECIMAL(3,0)
could be incorrectly cast to TINYINT
.
The resulting out-of-range value could be incorrect. After the fix, the smallest type that is allowed
for this cast is INT
, and attempting to use DECIMAL(3,0)
in a
TINYINT
context produces a "loss of precision" error.
Bug: IMPALA-2264
Constant filter expressions are not checked for errors and state cleanup on exception / DCHECK on destroying an ExprContext
An invalid constant expression in a WHERE
clause (for example, an invalid
regular expression pattern) could fail to clean up internal state after raising a query error.
Therefore, certain combinations of invalid expressions in a query could cause a crash, or cause a query to continue
when it should halt with an error.
Bug: IMPALA-1756, IMPALA-2514
QueryExecState does not check for query cancellation or errors
A call to SetError()
in a user-defined function (UDF) would not cause the query to fail as expected.
Bug: IMPALA-1746, IMPALA-2141
Issues Fixed in Impala 2.1.6
This section lists the most significant Impala issues fixed in Impala 2.1.6.
Wrong DCHECK in PHJ::ProcessProbeBatch
Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.
Bug: IMPALA-2364
LargestSpilledPartition was not checking if partition is closed
Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.
Bug: IMPALA-2314
Avoid cardinality 0 in scan nodes of small tables and low selectivity
Impala could generate a suboptimal query plan for some queries involving small tables.
Bug: IMPALA-2165
fix Expr::ComputeResultsLayout() logic
Queries using the GROUP BY
operator on multiple CHAR
columns with length less than or equal to 2 characters
could return incorrect results for some columns.
Bug: IMPALA-2178
Properly unescape string value for HBase filters
Queries against HBase tables could return incomplete results if the WHERE
clause included string comparisons using literals
containing escaped quotation marks.
Bug: IMPALA-2133
Avoiding a DCHECK of NULL hash table in spilled right joins
A query could encounter a serious error if it contained a RIGHT OUTER
, RIGHT ANTI
, or FULL OUTER
join clause
and approached the memory limit on a host so that the "spill to disk" mechanism was activated.
Bug: IMPALA-1929
Issues Fixed in Impala 2.1.5
This section lists the most significant Impala issues fixed in Impala 2.1.5.
Avoid calling ProcessBatch with out_batch->AtCapacity in right joins
Queries including RIGHT OUTER JOIN
, RIGHT ANTI JOIN
, or FULL OUTER JOIN
clauses and involving a high volume of data could encounter a serious error.
Bug: IMPALA-1919
Issues Fixed in Impala 2.1.4
This section lists the most significant Impala issues fixed in Impala 2.1.4.
Crash: impala::TupleIsNullPredicate::Prepare
When expressions that tested for NULL
were used in combination with analytic functions, an error could occur.
(The original crash issue was fixed by an earlier patch.)
Bug: IMPALA-1519
Expand parsing of decimals to include scientific notation
DECIMAL
literals could include e
scientific notation.
For example, now CAST(1e3 AS DECIMAL(5,3))
is a valid expression.
Formerly it returned NULL
.
Some scientific expressions might have worked before in DECIMAL
context, but only when the scale was 0.
Bug: IMPALA-1952
INSERT/CTAS evaluates and applies constant predicates.
An INSERT OVERWRITE
statement would write new data, even if
a constant clause such as WHERE 1 = 0
should have
prevented it from writing any rows.
Bug: IMPALA-1860
Assign predicates below analytic functions with a compatible partition by clause
If the PARTITION BY
clause in an analytic function refers to partition key columns in a partitioned table,
now Impala can perform partition pruning during the analytic query.
Bug: IMPALA-1900
FIRST_VALUE may produce incorrect results with preceding windows
A query using the FIRST_VALUE
analytic function
and a window defined with the PRECEDING
keyword
could produce wrong results.
Bug: IMPALA-1888
FIRST_VALUE rewrite fn type might not match slot type
A query referencing a DECIMAL
column with the FIRST_VALUE
analytic function
could encounter an error.
Bug: IMPALA-1559
AnalyticEvalNode cannot handle partition/order by exprs with NaN
A query using an analytic function
could encounter an error if the
evaluation of an analytic ORDER BY
or PARTITION
expression
resulted in a NaN value, for example if the ORDER BY
or PARTITION
contained a division operation where both operands were zero.
Bug: IMPALA-1808
AnalyticEvalNode not properly handling nullable tuples
An analytic function containing only an OVER
clause could
encounter an error if another part of the query (specifically an outer join)
produced all-NULL
tuples.
Bug: IMPALA-1562
Issues Fixed in Impala 2.1.3
This section lists the most significant issues fixed in Impala 2.1.3.
Add compatibility flag for Hive-Parquet-Timestamps
When Hive writes TIMESTAMP
values, it represents them
in the local time zone of the server. Impala expects TIMESTAMP
values to always be in the UTC time zone, possibly leading to inconsistent
results depending on which component created the data files.
This patch introduces a new startup flag,
-convert_legacy_hive_parquet_utc_timestamps
for the impalad daemon.
Specify -convert_legacy_hive_parquet_utc_timestamps=true
to make Impala recognize Parquet data files written by Hive
and automatically adjust TIMESTAMP
values read from those files into the UTC time zone for
compatibility with other Impala TIMESTAMP
processing.
Although this setting is currently turned off by default,
consider enabling it if practical in your environment,
for maximum interoperability with Hive-created Parquet files.
Bug: IMPALA-1658
Use snprintf() instead of lexical_cast() in float-to-string casts
Converting a floating-point value to a STRING
could be slower than necessary.
Bug: IMPALA-1738
Fix partition spilling cleanup when new stream OOMs
Certain calls to aggregate functions with STRING
arguments could encounter a serious error
when the system ran low on memory and attempted to activate the spill-to-disk mechanism.
The error message referenced the function impala::AggregateFunctions::StringValGetValue
.
Bug: IMPALA-1865
Impala's ACLs check do not consider all group ACLs, only checked first one.
If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.
Bug: IMPALA-1805
Fix infinite loop opening or closing file with invalid metadata
Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.
Bug: IMPALA-1794
external-data-source-executor leaking global jni refs
Successive calls to the data source API could result in excessive memory consumption, with memory allocated but never freed.
Bug: IMPALA-1801
Spurious stale block locality messages
Impala could issue messages stating the block locality metadata was stale,
when the metadata was actually fine.
The internal "remote bytes read" counter was not being reset properly.
This issue did not cause an actual slowdown in query execution,
but the spurious error could result in unnecessary debugging work
and unnecessary use of the INVALIDATE METADATA
statement.
Bug: IMPALA-1712
Issues Fixed in Impala 2.1.2
This section lists the most significant issues fixed in Impala 2.1.2.
For the full list of fixed issues in Impala 2.1.2, see this report in the Impala JIRA tracker.
Impala incorrectly handles double numbers with more than 19 significant decimal digits
When a floating-point value was read from a text file and interpreted as a FLOAT
or DOUBLE
value, it could be incorrectly interpreted if it included more than
19 significant digits.
Bug: IMPALA-1622
unix_timestamp() does not return correct time
The unix_timestamp()
function could return an incorrect value (a constant value of 1).
Bug: IMPALA-1623
Row Count Mismatch: Partition pruning with NULL
A query against a partitioned table could return incorrect results if the WHERE
clause
compared the partition key to NULL
using operators such as =
or !=
.
Bug: IMPALA-1535
Fetch column stats in bulk using new (Hive .13) HMS APIs
The performance of the COMPUTE STATS
statement and queries was improved, particularly for wide tables.
Bug: IMPALA-1120
Issues Fixed in Impala 2.1.1
This section lists the most significant issues fixed in Impala 2.1.1.
For the full list of fixed issues in Impala 2.1.1, see this report in the Impala JIRA tracker.
IMPALA-1556 causes memory leak with secure connections
impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.
Bug: https://issues.apache.org/jira/browse/IMPALA-1674 IMPALA-1674
TSaslServerTransport::Factory::getTransport() leaks transport map entries
impalad daemons in clusters secured by Kerberos or LDAP could experience a slight memory leak on each connection. The accumulation of unreleased memory could cause problems on long-running clusters.
Bug: IMPALA-1668
Issues Fixed in Impala 2.1.0
This section lists the most significant issues fixed in Impala 2.1.0.
For the full list of fixed issues in Impala 2.1.0, see this report in the Impala JIRA tracker.
Kerberos fetches 3x slower
Transferring large result sets back to the client application on Kerberos
Bug: IMPALA-1455
Compressed file needs to be hold on entirely in Memory
Queries on gzipped text files required holding the entire data file and its uncompressed representation
in memory at the same time. SELECT
and COMPUTE STATS
statements could
fail or perform inefficiently as a result. The fix enables streaming reads for gzipped text, so that the
data is uncompressed as it is read.
Bug: IMPALA-1556
Cannot read hbase metadata with NullPointerException: null
Impala might not be able to access HBase tables, depending on the associated levels of Impala and HBase on the system.
Bug: IMPALA-1611
Serious errors / crashes
Improved code coverage in Impala testing uncovered a number of potentially serious errors that could occur with specific query syntax. These errors are resolved in Impala 2.1.
Bug: IMPALA-1553, IMPALA-1528, IMPALA-1526, IMPALA-1524, IMPALA-1508, IMPALA-1493, IMPALA-1501, IMPALA-1483
Issues Fixed in Impala 2.0.5
For the full list of fixed issues in Impala 2.0.5, see this report in the Impala JIRA tracker.
Issues Fixed in Impala 2.0.4
This section lists the most significant issues fixed in Impala 2.0.4.
For the full list of fixed issues in Impala 2.0.4, see this report in the Impala JIRA tracker.
Add compatibility flag for Hive-Parquet-Timestamps
When Hive writes TIMESTAMP
values, it represents them
in the local time zone of the server. Impala expects TIMESTAMP
values to always be in the UTC time zone, possibly leading to inconsistent
results depending on which component created the data files.
This patch introduces a new startup flag,
-convert_legacy_hive_parquet_utc_timestamps
for the impalad daemon.
Specify -convert_legacy_hive_parquet_utc_timestamps=true
to make Impala recognize Parquet data files written by Hive
and automatically adjust TIMESTAMP
values read from those files into the UTC time zone for
compatibility with other Impala TIMESTAMP
processing.
Although this setting is currently turned off by default,
consider enabling it if practical in your environment,
for maximum interoperability with Hive-created Parquet files.
Bug: IMPALA-1658
IoMgr infinite loop opening/closing file when shorter than cached metadata size
If a table data file was replaced by a shorter file outside of Impala,
such as with INSERT OVERWRITE
in Hive producing an empty
output file, subsequent Impala queries could hang.
Bug: IMPALA-1794
Issues Fixed in Impala 2.0.3
This section lists the most significant issues fixed in Impala 2.0.3.
For the full list of fixed issues in Impala 2.0.3, see this report in the Impala JIRA tracker.
Anti join could produce incorrect results when spilling
An anti-join query (or a NOT EXISTS
operation that was rewritten internally into an anti-join) could produce incorrect results
if Impala reached its memory limit, causing the query to write temporary results to disk.
Bug: IMPALA-1471
Row Count Mismatch: Partition pruning with NULL
A query against a partitioned table could return incorrect results if the WHERE
clause
compared the partition key to NULL
using operators such as =
or !=
.
Bug: IMPALA-1535
Fetch column stats in bulk using new (Hive .13) HMS APIs
The performance of the COMPUTE STATS
statement and queries was improved, particularly for wide tables.
Bug: IMPALA-1120
Issues Fixed in Impala 2.0.2
This section lists the most significant issues fixed in Impala 2.0.2.
For the full list of fixed issues in Impala 2.0.2, see this report in the Impala JIRA tracker.
GROUP BY on STRING column produces inconsistent results
Some operations in queries submitted through Hue or other HiveServer2 clients could produce inconsistent results.
Bug: IMPALA-1453
Fix leaked file descriptor and excessive file descriptor use
Impala could encounter an error from running out of file descriptors. The fix reduces the amount of time file descriptors are kept open, and avoids leaking file descriptors when read operations encounter errors.
unix_timestamp() does not return correct time
The unix_timestamp()
function could return a constant value 1
instead
of a representation of the time.
Bug: IMPALA-1623
Impala should randomly select cached replica
To avoid putting too heavy a load on any one node, Impala now randomizes which scan node processes each HDFS data block rather than choosing the first cached block replica.
Bug: IMPALA-1586
Impala does not always give short name to Llama.
In clusters secured by Kerberos or LDAP, a discrepancy in internal transmission of user names could cause a communication error with Llama.
Bug: IMPALA-1606
accept unmangled native UDF symbols
The CREATE FUNCTION
statement could report that it could not find a function entry point
within the .so
file for a UDF written in C++, even if the corresponding function was
present.
Bug: IMPALA-1475
Issues Fixed in Impala 2.0.1
This section lists the most significant issues fixed in Impala 2.0.1.
For the full list of fixed issues in Impala 2.0.1, see this report in the Impala JIRA tracker.
Queries fail with metastore exception after upgrade and compute stats
After running the COMPUTE STATS
statement on an Impala table, subsequent queries on that
table could fail with the exception message Failed to load metadata for table:
default.stats_test
.
Bug: https://issues.apache.org/jira/browse/IMPALA-1416 IMPALA-1416
Workaround: Upgrading to a level of that includes the fix for HIVE-8627,
prevents the problem from affecting future COMPUTE STATS
statements. On affected levels
of , or for Impala tables that have become inaccessible, the workaround is to disable the
hive.metastore.try.direct.sql
setting in the Hive metastore
hive-site.xml file and issue the INVALIDATE METADATA
statement for
the affected table. You do not need to rerun the COMPUTE STATS
statement for the table.
Issues Fixed in Impala 2.0.0
This section lists the most significant issues fixed in Impala 2.0.0.
For the full list of fixed issues in Impala 2.0.0, see this report in the Impala JIRA tracker.
Join Hint is dropped when used inside a view
Hints specified within a view query did not take effect when the view was queried, leading to slow performance. As part of this fix, Impala now supports hints embedded within comments.
Bug: IMPALA-995"
WHERE condition ignored in simple query with RIGHT JOIN
Potential wrong results for some types of queries.
Bug: IMPALA-1101"
Query with self joined table may produce incorrect results
Potential wrong results for some types of queries.
Bug: IMPALA-1102"
Incorrect plan after reordering predicates (inner join following outer join)
Potential wrong results for some types of queries.
Bug: IMPALA-1118"
Combining fragments with compatible data partitions can lead to incorrect results due to type incompatibilities (missing casts).
Potential wrong results for some types of queries.
Bug: IMPALA-1123"
Predicate dropped: Inline view + DISTINCT aggregate in outer query
Potential wrong results for some types of queries.
Bug: IMPALA-1165"
Reuse of a column in JOIN predicate may lead to incorrect results
Potential wrong results for some types of queries.
Bug: IMPALA-1353"
Usage of TRUNC with string timestamp reliably crashes node
Serious error for certain combinations of function calls and data types.
Bug: IMPALA-1105"
Timestamp Cast Returns invalid TIMESTAMP
Serious error for certain combinations of function calls and data types.
Bug: IMPALA-1109"
IllegalStateException upon JOIN of DECIMAL columns with different precision
DECIMAL
columns with different precision could not be compared in join predicates.
Bug: IMPALA-1121"
Allow creating Avro tables without column definitions. Allow COMPUTE STATS to always work on Impala-created Avro tables.
Hive-created Avro tables with columns specified by a JSON file or literal could produce errors when
queried in Impala, and could not be used with the COMPUTE STATS
statement. Now you can
create such tables in Impala to avoid such errors.
Bug: IMPALA-1104"
Ensure all webserver output is escaped
The Impala debug web UI did not properly encode all output.
Bug: IMPALA-1133"
Queries with union in inline view have empty resource requests
Certain queries could run without obeying the limits imposed by resource management.
Bug: IMPALA-1236"
Impala does not employ ACLs when checking path permissions for LOAD and INSERT
Certain INSERT
and LOAD DATA
statements could fail unnecessarily, if
the target directories in HDFS had restrictive HDFS permissions, but those permissions were overridden by
HDFS extended ACLs.
Bug: IMPALA-1279"
Impala does not map principals to lowercase, affecting Sentry authorisation
In a Kerberos environment, the principal name was not mapped to lowercase, causing issues when a user logged in with an uppercase principal name and Sentry authorization was enabled.
Bug: IMPALA-1334"
Issues Fixed in Impala 1.4.4
Issues Fixed in Impala 1.4.3
Impala 1.4.3 includes fixes to address what is known as the POODLE vulnerability in SSLv3. SSLv3 access is disabled in the Impala debug web UI.
Issues Fixed in Impala 1.4.2
This section lists the most significant issues fixed in Impala 1.4.2.
For the full list of fixed issues in Impala 1.4.2, see this report in the Impala JIRA tracker.
Issues Fixed in Impala 1.4.1
This section lists the most significant issues fixed in Impala 1.4.1.
For the full list of fixed issues in Impala 1.4.1, see this report in the Impala JIRA tracker.
impalad terminating with Boost exception
Occasionally, a non-trivial query run through Llama could encounter a serious error. The detailed error in the log was:
boost::exception_detail::clone_impl
<boost::exception_detail::error_info_injector<boost::lock_error> >
Severity: High
Impalad uses wrong string format when writing logs
Impala log files could contain internal error messages due to a problem formatting certain strings. The messages consisted of a Java call stack starting with:
jni-util.cc:177] java.util.MissingFormatArgumentException: Format specifier 's'
Update HS2 client API.
A downlevel version of the HiveServer2 API could cause difficulty retrieving the precision and scale of a
DECIMAL
value.
Bug: IMPALA-1107
Impalad catalog updates can fail with error: "IllegalArgumentException: fromKey out of range"
The error in the title could occur following a DDL statement. This issue was discovered during internal testing and has not been reported in customer environments.
Bug: IMPALA-1093
"Total" time counter does not capture all the network transmit time
The time for some network operations was not counted in the report of total time for a query, making it difficult to diagnose network-related performance issues.
Bug: IMPALA-1131
Impala will crash when reading certain Avro files containing bytes data
Certain Avro fields for byte data could cause Impala to be unable to read an Avro data file, even if the field was not part of the Impala table definition. With this fix, Impala can now read these Avro data files, although Impala queries cannot refer to the "bytes" fields.
Bug: IMPALA-1149
Support specifying a custom AuthorizationProvider in Impala
The --authorization_policy_provider_class
option for impalad was
added back. This option specifies a custom AuthorizationProvider
class rather than the
default HadoopGroupAuthorizationProvider
. It had been used for internal testing, then
removed in Impala 1.4.0, but it was considered useful by some customers.
Bug: IMPALA-1142
Issues Fixed in Impala 1.4.0
This section lists the most significant issues fixed in Impala 1.4.0.
For the full list of fixed issues in Impala 1.4.0, see this report in the Impala JIRA tracker.
Failed DCHECK in disk-io-mgr-reader-context.cc:174
The serious error in the title could occur, with the supplemental message:
num_used_buffers_ < 0: #used=-1 during cancellation HDFS cached data
The issue was due to the use of HDFS caching with data files accessed by Impala. Support for HDFS caching in Impala was introduced in Impala 1.4.0. The fix for this issue was backported to Impala 1.3.x, and is the only change in Impala 1.3.2.
Bug: IMPALA-1019
Resolution: This issue is fixed in Impala 1.3.2. The addition of HDFS caching support in Impala 1.4 means that this issue does not apply to any new level of Impala.
impala-shell only works with ASCII characters
The impala-shell interpreter could encounter errors processing SQL statements containing non-ASCII characters.
Bug: IMPALA-489
The extended view definition SQL text in Views created by Impala should always have fully-qualified table names
When a view was accessed while inside a different database, references to tables were not resolved unless the names were fully qualified when the view was created.
Bug: IMPALA-962
Impala forgets about partitions with non-existant locations
If an ALTER TABLE
specified a non-existent HDFS location for a partition, afterwards
Impala would not be able to access the partition at all.
Bug: IMPALA-741
CREATE TABLE LIKE fails if source is a view
The CREATE TABLE LIKE
clause was enhanced to be able to create a table with the same
column definitions as a view. The resulting table is a text table unless the STORED AS
clause is specified, because a view does not have an associated file format to inherit.
Bug: IMPALA-834
Improve partition pruning time
Operations on tables with many partitions could be slow due to the time to evaluate which partitions were affected. The partition pruning code was speeded up substantially.
Bug: IMPALA-887
Improve compute stats performance
The performance of the COMPUTE STATS
statement was improved substantially. The
efficiency of its internal operations was improved, and some statistics are no longer gathered because
they are not currently used for planning Impala queries.
Bug: IMPALA-1003
When I run CREATE TABLE new_table LIKE avro_table, the schema does not get mapped properly from an avro schema to a hive schema
After a CREATE TABLE LIKE
statement using an Avro table as the source, the new table
could have incorrect metadata and be inaccessible, depending on how the original Avro table was created.
Bug: IMPALA-185
Race condition in IoMgr. Blocked ranges enqueued after cancel.
Impala could encounter a serious error after a query was cancelled.
Bug: IMPALA-1046
Deadlock in scan node
A deadlock condition could make all impalad daemons hang, making the cluster unresponsive for Impala queries.
Bug: IMPALA-1083
Issues Fixed in Impala 1.3.3
Impala 1.3.3 includes fixes to address what is known as the POODLE vulnerability in SSLv3. SSLv3 access is disabled in the Impala debug web UI.
Issues Fixed in Impala 1.3.2
This backported bug fix is the only change between Impala 1.3.1 and Impala 1.3.2.
Failed DCHECK in disk-io-mgr-reader-context.cc:174
The serious error in the title could occur, with the supplemental message:
num_used_buffers_ < 0: #used=-1 during cancellation HDFS cached data
The issue was due to the use of HDFS caching with data files accessed by Impala. Support for HDFS caching in Impala was introduced in Impala 1.4.0. The fix for this issue was backported to Impala 1.3.x, and is the only change in Impala 1.3.2.
Bug: IMPALA-1019
Resolution: This issue is fixed in Impala 1.3.2. The addition of HDFS caching support in Impala 1.4 means that this issue does not apply to any new level of Impala.
Issues Fixed in Impala 1.3.1
This section lists the most significant issues fixed in Impala 1.3.1.
For the full list of fixed issues in Impala 1.3.1, see this report in the Impala JIRA tracker.
Impalad crashes when left joining inline view that has aggregate using distinct
Impala could encounter a severe error in a query combining a left outer join with an inline view
containing a COUNT(DISTINCT)
operation.
Bug: IMPALA-904
Incorrect result with group by query with null value in group by data
If the result of a GROUP BY
operation is NULL
, the resulting row might
be omitted from the result set. This issue depends on the data values and data types in the table.
Bug: IMPALA-901
Drop Function does not clear local library cache
When a UDF is dropped through the DROP FUNCTION
statement, and then the UDF is
re-created with a new .so
library or JAR file, the original version of the UDF is still
used when the UDF is called from queries.
Bug: IMPALA-786
Workaround: Restart the impalad daemon on all nodes.
Compute stats doesn't propagate underlying error correctly
If a COMPUTE STATS
statement encountered an error, the error message is "Query
aborted" with no further detail. Common reasons why a COMPUTE STATS
statement might
fail include network errors causing the coordinator node to lose contact with other
impalad instances, and column names that match Impala
reserved words. (Currently, if a column name
is an Impala reserved word, COMPUTE STATS
always returns an error.)
Bug: IMPALA-762
Inserts should respect changes in partition location
After an ALTER TABLE
statement that changes the LOCATION
property of a
partition, a subsequent INSERT
statement would always use a path derived from the base
data directory for the table.
Bug: IMPALA-624
Text data with carriage returns generates wrong results for count(*)
A COUNT(*)
operation could return the wrong result for text tables using nul characters
(ASCII value 0) as delimiters.
Bug: IMPALA-13
Workaround: Impala adds support for ASCII 0 characters as delimiters through the clause
FIELDS TERMINATED BY '\0'
.
IO Mgr should take instance memory limit into account when creating io buffers
Impala could allocate more memory than necessary during certain operations.
Bug: IMPALA-488
Workaround: Before issuing a COMPUTE STATS
statement for a Parquet table, reduce
the number of threads used in that operation by issuing SET NUM_SCANNER_THREADS=2
in
impala-shell. Then issue UNSET NUM_SCANNER_THREADS
before continuing
with queries.
Impala should provide an option for new sub directories to automatically inherit the permissions of the parent directory
When new subdirectories are created underneath a partitioned table by an INSERT
statement, previously the new subdirectories always used the default HDFS permissions for the
impala
user, which might not be suitable for directories intended to be read and written
by other components also.
Bug: IMPALA-827
Resolution: In Impala 1.3.1 and higher, you can specify the
--insert_inherit_permissions
configuration when starting the impalad
daemon.
Illegal state exception (or crash) in query with UNION in inline view
Impala could encounter a severe error in a query where the FROM
list contains an inline
view that includes a UNION
. The exact type of the error varies.
Bug: IMPALA-888
INSERT column reordering doesn't work with SELECT clause
The ability to specify a subset of columns in an INSERT
statement, with order different
than in the target table, was not working as intended.
Bug: IMPALA-945
Issues Fixed in Impala 1.3.0
This section lists the most significant issues fixed in Impala 1.3.0, primarily issues that could cause
wrong results, or cause problems running the COMPUTE STATS
statement, which is very
important for performance and scalability.
For the full list of fixed issues, see this report in the Impala JIRA tracker.
Inner join after right join may produce wrong results
The automatic join reordering optimization could incorrectly reorder queries with an outer join or semi join followed by an inner join, producing incorrect results.
Bug: IMPALA-860
Workaround: Including the STRAIGHT_JOIN
keyword in the query prevented the issue
from occurring.
Incorrect results with codegen on multi-column group by with NULLs.
A query with a GROUP BY
clause referencing multiple columns could introduce incorrect
NULL
values in some columns of the result set. The incorrect NULL
values could appear in rows where a different GROUP BY
column actually did return
NULL
.
Bug: IMPALA-850
Using distinct inside aggregate function may cause incorrect result when using having clause
A query could return incorrect results if it combined an aggregate function call, a
DISTINCT
operator, and a HAVING
clause, without a GROUP
BY
clause.
Bug: IMPALA-845
Aggregation on union inside (inline) view not distributed properly.
An aggregation query or a query with ORDER BY
and LIMIT
could be
executed on a single node in some cases, rather than distributed across the cluster. This issue affected
queries whose FROM
clause referenced an inline view containing a UNION
.
Bug: IMPALA-831
Wrong expression may be used in aggregate query if there are multiple similar expressions
If a GROUP BY
query referenced the same columns multiple times using different
operators, result rows could contain multiple copies of the same expression.
Bug: IMPALA-817
Incorrect results when changing the order of aggregates in the select list with codegen enabled
Referencing the same columns in both a COUNT()
and a SUM()
call in the
same query, or some other combinations of aggregate function calls, could incorrectly return a result of
0 from one of the aggregate functions. This issue affected references to TINYINT
and
SMALLINT
columns, but not INT
or BIGINT
columns.
Bug: IMPALA-765
Workaround: Setting the query option DISABLE_CODEGEN=TRUE
prevented the incorrect
results. Switching the order of the function calls could also prevent the issue from occurring.
Union queries give Wrong result in a UNION followed by SIGSEGV in another union
A UNION
query could produce a wrong result, followed by a serious error for a subsequent
UNION
query.
Bug: IMPALA-723
String data in MR-produced parquet files may be read incorrectly
Impala could return incorrect string results when reading uncompressed Parquet data files containing multiple row groups. This issue only affected Parquet data files produced by MapReduce jobs.
Bug: IMPALA-729
Compute stats need to use quotes with identifiers that are Impala keywords
Using a column or table name that conflicted with Impala keywords could prevent running the
COMPUTE STATS
statement for the table.
Bug: IMPALA-777
COMPUTE STATS child queries do not inherit parent query options.
The COMPUTE STATS
statement did not use the setting of the MEM_LIMIT
query option in impala-shell, potentially causing problems gathering statistics for
wide Parquet tables.
Bug: IMPALA-903
COMPUTE STATS should update partitions in batches
The COMPUTE STATS
statement could be slow or encounter a timeout while analyzing a table
with many partitions.
Bug: IMPALA-880
Fail early (in analysis) when COMPUTE STATS is run against Avro table with no columns
If the columns for an Avro table were all defined in the TBLPROPERTIES
or
SERDEPROPERTIES
clauses, the COMPUTE STATS
statement would fail after
completely analyzing the table, potentially causing a long delay. Although the COMPUTE
STATS
statement still does not work for such tables, now the problem is detected and reported
immediately.
Bug: IMPALA-867
Workaround: Re-create the Avro table with columns defined in SQL style, using the output of
SHOW CREATE TABLE
. (See the JIRA page for detailed steps.)
Issues Fixed in the 1.2.4 Release
This section lists the most significant issues fixed in Impala 1.2.4. For the full list of fixed issues, see this report in the Impala JIRA tracker.
The Catalog Server exits with an OOM error after a certain number of CREATE statements
A large number of concurrent CREATE TABLE
statements can cause the
catalogd process to consume excessive memory, and potentially be killed due to an
out-of-memory condition.
Bug: IMPALA-818
Workaround: Restart the catalogd service and re-try the DDL operations that failed.
Catalog Server consumes excessive cpu cycle
A large number of tables and partitions could result in unnecessary CPU overhead during Impala idle time and background operations.
Bug: IMPALA-821
Resolution: Catalog server processing was optimized in several ways.
Query against Avro table crashes Impala with codegen enabled
A query against a TIMESTAMP
column in an Avro table could encounter a serious issue.
Bug: IMPALA-828
Workaround: Set the query option DISABLE_CODEGEN=TRUE
Statestore seems to send concurrent heartbeats to the same subscriber leading to repeated "Subscriber 'hostname' is registering with statestore, ignoring update" messages
Impala nodes could produce repeated error messages after recovering from a communication error with the statestore service.
Bug: IMPALA-809
Join predicate incorrectly ignored
A join query could produce wrong results if multiple equality comparisons between the same tables referred to the same column.
Bug: IMPALA-805
Query result differing between Impala and Hive
Certain outer join queries could return wrong results. If one of the tables involved in the join was an
inline view, some tests from the WHERE
clauses could be applied to the wrong phase of
the query.
ArrayIndexOutOfBoundsException / Invalid query handle when reading large HBase cell
An HBase cell could contain a value larger than 32 KB, leading to a serious error when Impala queries that table. The error could occur even if the applicable row is not part of the result set.
Bug: IMPALA-715
Workaround: Use smaller values in the HBase table, or exclude the column containing the large value from the result set.
select with distinct and full outer join, impalad coredump
A query involving a DISTINCT
operator combined with a FULL OUTER JOIN
could encounter a serious error.
Bug: IMPALA-735
Workaround: Set the query option DISABLE_CODEGEN=TRUE
Impala cannot load tables with more than Short.MAX_VALUE number of partitions
If a table had more than 32,767 partitions, Impala would not recognize the partitions above the 32K limit and query results could be incomplete.
Bug: IMPALA-749
Various issues with HBase row key specification
Queries against HBase tables could fail with an error if the row key was compared to a function return
value rather than a string constant. Also, queries against HBase tables could fail if the
WHERE
clause contained combinations of comparisons that could not possibly match any row
key.
Resolution: Queries now return appropriate results when function calls are used in the row key
comparison. For queries involving non-existent row keys, such as WHERE row_key
IS NULL
or where the lower bound is greater than the upper bound, the query succeeds and returns
an empty result set.
Issues Fixed in the 1.2.3 Release
This release is a fix release that supercedes Impala 1.2.2, with the same features and fixes as 1.2.2 plus one additional fix for compatibility with Parquet files generated outside of Impala by components such as Hive, Pig, or MapReduce.
Impala cannot read Parquet files with multiple row groups
An early version of the parquet-mr
library writes files that are not readable by
Impala, due to the presence of multiple row groups. Queries involving these data files might result in a
crash or a failure with an error such as "Column chunk should not contain two dictionary pages".
This issue does not occur for Parquet files produced by Impala INSERT
statements,
because Impala only produces files with a single row group.
Bug: IMPALA-720
Issues Fixed in the 1.2.2 Release
This section lists the most significant issues fixed in Impala 1.2.2. For the full list of fixed issues, see this report in the Impala JIRA tracker.
Order of table references in FROM clause is critical for optimal performance
Impala does not currently optimize the join order of queries; instead, it joins tables in the order in which they are listed in the FROM clause. Queries that contain one or more large tables on the right hand side of joins (either an explicit join expressed as a JOIN statement or a join implicit in the list of table references in the FROM clause) may run slowly or crash Impala due to out-of-memory errors. For example:
SELECT ... FROM small_table JOIN large_table
Anticipated Resolution: Fixed in Impala 1.2.2.
Workaround: In Impala 1.2.2 and higher, use the COMPUTE STATS
statement to gather
statistics for each table involved in the join query, after data is loaded. Prior to Impala 1.2.2, modify
the query, if possible, to join the largest table first. For example:
SELECT ... FROM small_table JOIN large_table
should be modified to:
SELECT ... FROM large_table JOIN small_table
Parquet in CDH4.5 writes data files that are sometimes unreadable by Impala
Some Parquet files could be generated by other components that Impala could not read.
Bug: IMPALA-694
Resolution: The underlying issue is being addressed by a fix in the Parquet libraries. Impala 1.2.2 works around the problem and reads the existing data files.
Deadlock in statestore when unregistering a subscriber and building a topic update
The statestore service cound experience an internal error leading to a hang.
Bug: IMPALA-699
IllegalStateException when doing a union involving a group by
A UNION
query where one side involved a GROUP BY
operation could cause
a serious error.
Bug: IMPALA-687
Impala Parquet Writer hit DCHECK in RleEncoder
A serious error could occur when doing an INSERT
into a Parquet table.
Bug: IMPALA-689
Hive UDF jars cannot be loaded by the FE
If the JAR file for a Java-based Hive UDF was not in the CLASSPATH
, the UDF could not be
called during a query.
Bug: IMPALA-695
Issues Fixed in the 1.2.1 Release
This section lists the most significant issues fixed in Impala 1.2.1. For the full list of fixed issues, see this report in the Impala JIRA tracker.
Scanners use too much memory when reading past scan range
While querying a table with long column values, Impala could over-allocate memory leading to an out-of-memory error. This problem was observed most frequently with tables using uncompressed RCFile or text data files.
Bug: IMPALA-525
Resolution: Fixed in 1.2.1
Join node consumes memory way beyond mem-limit
A join query could allocate a temporary work area that was larger than needed, leading to an out-of-memory error. The fix makes Impala return unused memory to the system when the memory limit is reached, avoiding unnecessary memory errors.
Bug: IMPALA-657
Resolution: Fixed in 1.2.1
Excessive memory consumption when query tables with 1k columns (Parquet file)
Impala could encounter an out-of-memory condition setting up work areas for Parquet tables with many columns. The fix reduces the size of the allocated memory when not actually needed to hold table data.
Bug: IMPALA-652
Resolution: Fixed in 1.2.1
Issues Fixed in the 1.2.0 Beta Release
This section lists the most significant issues fixed in Impala 1.2 (beta). For the full list of fixed issues, see this report in the Impala JIRA tracker.
Issues Fixed in the 1.1.1 Release
This section lists the most significant issues fixed in Impala 1.1.1. For the full list of fixed issues, see this report in the Impala JIRA tracker.
Unexpected LLVM Crash When Querying Doubles on CentOS 5.x
Certain queries involving DOUBLE
columns could fail with a serious error. The fix
improves the generation of native machine instructions for certain chipsets.
Bug: IMPALA-477
"block size is too big" error with Snappy-compressed RCFile containing null
Queries could fail with a "block size is too big" error, due to NULL
values in
RCFile tables using Snappy compression.
Bug: IMPALA-482
Cannot query RC file for table that has more columns than the data file
Queries could fail if an Impala RCFile table was defined with more columns than in the corresponding RCFile data files.
Bug: IMPALA-510
Views Sometimes Not Utilizing Partition Pruning
Certain combinations of clauses in a view definition for a partitioned table could result in inefficient performance and incorrect results.
Bug: IMPALA-495
Update the serde name we write into the metastore for Parquet tables
The SerDes class string written into Parquet data files created by Impala was updated for compatibility with Parquet support in Hive. See Incompatible Changes Introduced in Impala 1.1.1 for the steps to update older Parquet data files for Hive compatibility.
Bug: IMPALA-485
Selective queries over large tables produce unnecessary memory consumption
A query returning a small result sets from a large table could tie up memory unnecessarily for the duration of the query.
Bug: IMPALA-534
Impala stopped to query AVRO tables
Queries against Avro tables could fail depending on whether the Avro schema URL was specified in the
TBLPROPERTIES
or SERDEPROPERTIES
field. The fix causes Impala to check
both fields for the schema URL.
Bug: IMPALA-538
Impala continues to allocate more memory even though it has exceed its mem-limit
Queries could allocate substantially more memory than specified in the impalad
-mem_limit
startup option. The fix causes more frequent checking of the limit during
query execution.
Bug: IMPALA-520
Issues Fixed in the 1.1.0 Release
This section lists the most significant issues fixed in Impala 1.1. For the full list of fixed issues, see this report in the Impala JIRA tracker.
10-20% perf regression for most queries across all table formats
This issue is due to a performance tradeoff between systems running many queries concurrently, and systems running a single query. Systems running only a single query could experience lower performance than in early beta releases. Systems running many queries simultaneously should experience higher performance than in the beta releases.
planner fails with "Join requires at least one equality predicate between the two tables" when "from" table order does not match "where" join order
A query could fail if it involved 3 or more tables and the last join table was specified as a subquery.
Bug: IMPALA-85
Parquet writer uses excessive memory with partitions
INSERT
statements against partitioned tables using the Parquet format could use
excessive amounts of memory as the number of partitions grew large.
Bug: IMPALA-257
Comments in impala-shell in interactive mode are not handled properly causing syntax errors or wrong results
The impala-shell interpreter did not accept comment entered at the command line, making it problematic to copy and paste from scripts or other code examples.
Bug: IMPALA-192
Cancelled queries sometimes aren't removed from the inflight query list
The Impala web UI would sometimes display a query as if it were still running, after the query was cancelled.
Bug: IMPALA-364
Impala's 1.0.1 Shell Broke Python 2.4 Compatibility (AttributeError: 'module' object has no attribute 'field_size_limit)
The impala-shell
command in Impala 1.0.1 does not work with Python 2.4, which is the
default on Red Hat 5.
For the impala-shell
command in Impala 1.0, the -o
option (pipe output
to a file) does not work with Python 2.4.
Bug: IMPALA-396
Issues Fixed in the 1.0.1 Release
This section lists the most significant issues fixed in Impala 1.0.1. For the full list of fixed issues, see this report in the Impala JIRA tracker.
Impala parquet scanner cannot read all data files generated by other frameworks
Impala might issue an erroneous error message when processing a Parquet data file produced by a non-Impala Hadoop component.
Bug: IMPALA-333
Resolution: Fixed
Impala is unable to query RCFile tables which describe fewer columns than the file's header.
If an RCFile table definition had fewer columns than the fields actually in the data files, queries would fail.
Bug: IMPALA-293
Resolution: Fixed
Impala does not correctly substitute _HOST with hostname in --principal
The _HOST
placeholder in the --principal
startup option was not
substituted with the correct hostname, potentially leading to a startup error in setups using Kerberos
authentication.
Bug: IMPALA-351
Resolution: Fixed
HBase query missed the last region
Hbase region changes are not handled correctly
After a region in an HBase table was split or moved, an Impala query might return incomplete or out-of-date results.
Bug: IMPALA-300
Resolution: Fixed
Query state for successful create table is EXCEPTION
After a successful CREATE TABLE
statement, the corresponding query state would be
incorrectly reported as EXCEPTION
.
Bug: IMPALA-349
Resolution: Fixed
Double check release of JNI-allocated byte-strings
Operations involving calls to the Java JNI subsystem (for example, queries on HBase tables) could allocate memory but not release it.
Bug: IMPALA-358
Resolution: Fixed
Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL
Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL.
Impala:
impala> select UNIX_TIMESTAMP('10:02:01') ;
impala> 0
Hive:
hive> select UNIX_TIMESTAMP('10:02:01') FROM tmp;
hive> NULL
Bug: IMPALA-16
Anticipated Resolution: Fixed
INSERT INTO TABLE SELECT <constant> does not work.
Insert INTO TABLE SELECT <constant> will not insert any data and may return an error.
Anticipated Resolution: Fixed
Issues Fixed in the 1.0 GA Release
Here are the major user-visible issues fixed in Impala 1.0. For a full list of fixed issues, see this report in the Impala JIRA tracker.
Undeterministically receive "ERROR: unknown row bach destination..." and "ERROR: Invalid query handle" from impala shell when running union query
A query containing both UNION
and LIMIT
clauses could intermittently
cause the impalad
process to halt with a segmentation fault.
Bug: IMPALA-183
Resolution: Fixed
Insert with NULL partition keys results in SIGSEGV.
An INSERT
statement specifying a NULL
value for one of the partitioning
columns could cause the impalad
process to halt with a segmentation fault.
Bug: IMPALA-190
Resolution: Fixed
INSERT queries don't show completed profiles on the debug webpage
In the Impala web user interface, the profile page for an INSERT
statement showed
obsolete information for the statement once it was complete.
Bug: IMPALA-217
Resolution: Fixed
Impala HBase scan is very slow
Queries involving an HBase table could be slower than expected, due to excessive memory usage on the Impala nodes.
Bug: IMPALA-231
Resolution: Fixed
Add some library version validation logic to impalad when loading impala-lzo shared library
No validation was done to check that the impala-lzo
shared library was compatible with
the version of Impala, possibly leading to a crash when using LZO-compressed text files.
Bug: IMPALA-234
Resolution: Fixed
Workaround: Always upgrade the impala-lzo
library at the same time as you upgrade
Impala itself.
Problems inserting into tables with TIMESTAMP partition columns leading table metadata loading failures and failed dchecks
INSERT
statements for tables partitioned on columns involving datetime types could
appear to succeed, but cause errors for subsequent queries on those tables. The problem was especially
serious if an improperly formatted timestamp value was specified for the partition key.
Bug: IMPALA-238
Resolution: Fixed
Ctrl-C sometimes interrupts shell in system call, rather than cancelling query
Pressing Ctrl-C in the impala-shell
interpreter could sometimes display an error and
return control to the shell, making it impossible to cancel the query.
Bug: IMPALA-243
Resolution: Fixed
Empty string partition value causes metastore update failure
Specifying an empty string or NULL
for a partition key in an INSERT
statement would fail.
Bug: IMPALA-252
Resolution: Fixed. The behavior for empty partition keys was made more compatible with the corresponding Hive behavior.
Round() does not output the right precision
The round()
function did not always return the correct number of significant digits.
Bug: IMPALA-266
Resolution: Fixed
Cannot cast string literal to string
Casting from a string literal back to the same type would cause an "invalid type cast" error rather than leaving the original value unchanged.
Bug: IMPALA-267
Resolution: Fixed
Excessive mem usage for certain queries which are very selective
Some queries that returned very few rows experienced unnecessary memory usage.
Bug: IMPALA-288
Resolution: Fixed
HdfsScanNode crashes in UpdateCounters
A serious error could occur for relatively small and inexpensive queries.
Bug: IMPALA-289
Resolution: Fixed
Parquet performance issues on large dataset
Certain aggregation queries against Parquet tables were inefficient due to lower than required thread utilization.
Bug: IMPALA-292
Resolution: Fixed
impala not populating hive metadata correctly for create table
The Impala CREATE TABLE
command did not fill in the owner
and
tbl_type
columns in the Hive metastore database.
Bug: IMPALA-295
Resolution: Fixed. The metadata was made more Hive-compatible.
impala daemons die if statestore goes down
The impalad
instances in a cluster could halt when the statestored
process became unavailable.
Bug: IMPALA-312
Resolution: Fixed
Constant SELECT clauses do not work in subqueries
A subquery would fail if the SELECT
statement inside it returned a constant value rather
than querying a table.
Bug: IMPALA-67
Resolution: Fixed
Right outer Join includes NULLs as well and hence wrong result count
The result set from a right outer join query could include erroneous rows containing
NULL
values.
Bug: IMPALA-90
Resolution: Fixed
Parquet scanner hangs for some queries
The Parquet scanner non-deterministically hangs when executing some queries.
Bug: IMPALA-204
Resolution: Fixed
Issues Fixed in Version 0.7 of the Beta Release
Impala does not gracefully handle unsupported Hive table types (INDEX and VIEW tables)
When attempting to load metadata from an unsupported Hive table type (INDEX and VIEW tables), Impala fails with an unclear error message.
Bug: IMPALA-167
Resolution: Fixed in 0.7
DDL statements (CREATE/ALTER/DROP TABLE) are not supported in the Impala Beta Release
Resolution: Fixed in 0.7
Avro is not supported in the Impala Beta Release
Resolution: Fixed in 0.7
Workaround: None
Impala does not currently allow limiting the memory consumption of a single query
It is currently not possible to limit the memory consumption of a single query. All tables on the right hand side of JOIN statements need to be able to fit in memory. If they do not, Impala may crash due to out of memory errors.
Resolution: Fixed in 0.7
Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' and data is distributed across multiple nodes
Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' clause and data is distributed across multiple nodes. From the query plan, it looks like we are just summing the results from each worker node.
Bug: IMPALA-20
Resolution: Fixed in 0.7
Partition pruning for arbitrary predicates that are fully bound by a particular partition column
We currently cannot utilize a predicate like "country_code in ('DE', 'FR', 'US')" to do partitioning pruning, because that requires an equality predicate or a binary comparison.
We should create a superclass of planner.ValueRange, ValueSet, that can be constructed with an arbitrary predicate, and whose isInRange(analyzer, valueExpr) constructs a literal predicate by substitution of the valueExpr into the predicate.
Bug: IMPALA-144
Resolution: Fixed in 0.7
Issues Fixed in Version 0.6 of the Beta Release
Impala reads the NameNode address and port as command line parameters
Impala reads the NameNode address and port as command line parameters rather than reading them from
core-site.xml
. Updating the NameNode address in the core-site.xml
file
does not propagate to Impala.
Severity: Low
Resolution: Fixed in 0.6 - Impala reads the namenode location and port from the Hadoop
configuration files, though setting -nn
and -nn_port
overrides this.
Users are advised not to set -nn
or -nn_port
.
Queries may fail on secure environment due to impalad Kerberos ticket expiration
Queries may fail on secure environment due to impalad
Kerberos tickets expiring. This
can happen if the Impala -kerberos_reinit_interval
flag is set to a value ten minutes or
less. This may lead to an impalad
requesting a ticket with a lifetime that is less than
the time to the next ticket renewal.
Bug: IMPALA-64
Resolution: Fixed in 0.6
Concurrent queries may fail when Impala uses Thrift to communicate with the Hive Metastore
Concurrent queries may fail when Impala is using Thrift to communicate with part of the Hive Metastore
such as the Hive Metastore Service. In such a case, the error get_fields failed: out of sequence
response"
may occur because Impala shared a single Hive Metastore Client connection across
threads. With Impala 0.6, a separate connection is used for each metadata request.
Bug: IMPALA-48
Resolution: Fixed in 0.6
impalad fails to start if unable to connect to the Hive Metastore
Impala fails to start if it is unable to establish a connection with the Hive Metastore. This behavior was fixed, allowing Impala to start, even when no Metastore is available.
Bug: IMPALA-58
Resolution: Fixed in 0.6
Impala treats database names as case-sensitive in some contexts
In some queries (including "USE database" statements), database names are treated as case-sensitive. This may lead queries to fail with an IllegalStateException.
Bug: IMPALA-44
Resolution: Fixed in 0.6
Impala does not ignore hidden HDFS files
Impala does not ignore hidden HDFS files, meaning those files prefixed with a period '.' or underscore '_'. This diverges from Hive/MapReduce, which skips these files.
Bug: IMPALA-18
Resolution: Fixed in 0.6
Issues Fixed in Version 0.5 of the Beta Release
Impala may have reduced performance on tables that contain a large number of partitions
Impala may have reduced performance on tables that contain a large number of partitions. This is due to extra overhead reading/parsing the partition metadata.
Resolution: Fixed in 0.5
Backend client connections not getting cached causes an observable latency in secure clusters
Backend impalads do not cache connections to the coordinator. On a secure cluster, this introduces a latency proportional to the number of backend clients involved in query execution, as the cost of establishing a secure connection is much higher than in the non-secure case.
Bug: IMPALA-38
Resolution: Fixed in 0.5
Concurrent queries may fail with error: "Table object has not been been initialised : `PARTITIONS`"
Concurrent queries may fail with error: "Table object has not been been initialised :
`PARTITIONS`"
. This was due to a lack of locking in the Impala table/database metadata cache.
Bug: IMPALA-30
Resolution: Fixed in 0.5
UNIX_TIMESTAMP format behaviour deviates from Hive when format matches a prefix of the time value
The Impala UNIX_TIMESTAMP(val, format) operation compares the length of format and val and returns NULL if they do not match. Hive instead effectively truncates val to the length of the format parameter.
Bug: IMPALA-15
Resolution: Fixed in 0.5
Issues Fixed in Version 0.4 of the Beta Release
Impala fails to refresh the Hive metastore if a Hive temporary configuration file is removed
Impala is impacted by Hive bug
HIVE-3596
which may cause metastore refreshes to fail if a Hive temporary configuration file is deleted (normally
located at /tmp/hive-<user>-<tmp_number>.xml
). Additionally, the
impala-shell will incorrectly report that the failed metadata refresh completed successfully.
Anticipated Resolution: To be fixed in a future release
Workaround: Restart the impalad
service. Use the impalad
log to
check for metadata refresh errors.
lpad/rpad builtin functions is not correct.
The lpad/rpad builtin functions generate the wrong results.
Resolution: Fixed in 0.4
Files with .gz extension reported as 'not supported'
Compressed files with extensions incorrectly generate an exception.
Bug: IMPALA-14
Resolution: Fixed in 0.4
Queries with large limits would hang.
Some queries with large limits were hanging.
Resolution: Fixed in 0.4
Order by on a string column produces incorrect results if there are empty strings
Resolution: Fixed in 0.4
Issues Fixed in Version 0.3 of the Beta Release
All table loading errors show as unknown table
If Impala is unable to load the metadata for a table for any reason, a subsequent query referring to that
table will return an unknown table
error message, even if the table is known.
Resolution: Fixed in 0.3
A table that cannot be loaded will disappear from SHOW TABLES
After failing to load metadata for a table, Impala removes that table from the list of known tables
returned in SHOW TABLES
. Subsequent attempts to query the table returns 'unknown table',
even if the metadata for that table is fixed.
Resolution: Fixed in 0.3
Impala cannot read from HBase tables that are not created as external tables in the hive metastore.
Attempting to select from these tables fails.
Resolution: Fixed in 0.3
Certain queries that contain OUTER JOINs may return incorrect results
Queries that contain OUTER JOINs may not return the correct results if there are predicates referencing any of the joined tables in the WHERE clause.
Resolution: Fixed in 0.3.
Issues Fixed in Version 0.2 of the Beta Release
Subqueries which contain aggregates cannot be joined with other tables or Impala may crash
Subqueries that contain an aggregate cannot be joined with another table or Impala may crash. For example:
SELECT * FROM (SELECT sum(col1) FROM some_table GROUP BY col1) t1 JOIN other_table ON (...);
Resolution: Fixed in 0.2
An insert with a limit that runs as more than one query fragment inserts more rows than the limit.
For example:
INSERT OVERWRITE TABLE test SELECT * FROM test2 LIMIT 1;
Resolution: Fixed in 0.2
Query with limit clause might fail.
For example:
SELECT * FROM test2 LIMIT 1;
Resolution: Fixed in 0.2
Files in unsupported compression formats are read as plain text.
Attempting to read such files does not generate a diagnostic.
Resolution: Fixed in 0.2
Impala server raises a null pointer exception when running an HBase query.
When querying an HBase table whose row-key is string type, the Impala server may raise a null pointer exception.
Resolution: Fixed in 0.2