The following sections describe the major issues fixed in each Impala release.
For known issues that are currently unresolved, see Known Issues and Workarounds in Impala.
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 4.0.
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.4.
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.3.
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.2.
The following is a list of noteworthy issues fixed in Impala 3.2:
LIMIT
clause.--ldap_password_cmd
, were unrecognized when the
--config_file
option was specified.INVALIDATE METADATA
operation is no longer ignored
when HMS is empty.COMPUTE STATS
, Impala counts the number of
NULL
values in a tableNULL
qualifier.TIMESTAMP
to
a string literal in a binary predicate where the
TIMESTAMP
is casted to VARCHAR
of
smaller length.SYNC_DDL
query option
can fail when the Catalog Server is under a heavy load with concurrent
catalog operations of long-running DDLs. S3_ACCESS_VALIDATED
variable to zero
when TARGET_FILESYSTEM=s3
. auth_to_local
setting to prevent
connection issues between impalads
.COMPUTE
STATS
failed if COMPRESSION_CODEC
is
set.For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.1.
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 3.0.
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.12.
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.11.
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.10.
For the full list of issues closed in this release, including bug fixes, see the changelog for Impala 2.9.
For the full list of Impala fixed issues in Impala 2.8, see this report in the Impala JIRA tracker.
For the full list of Impala fixed issues in Impala 2.7.0, see this report in the Impala JIRA tracker.
The following list contains the most critical fixed issues
(priority='Blocker'
) from the JIRA system.
For the full list of fixed issues in Impala 2.6.0, see
this report in the Impala JIRA tracker.
A crash could occur, with stack trace pointing to impala::RuntimeState::ErrorLog
.
Bug: IMPALA-3385
Severity: High
A crash could occur because of contention between multiple calls to Java UDFs.
Bug: IMPALA-3378
Severity: High
A crash could occur because of contention between multiple concurrent statements writing to HBase.
Bug: IMPALA-3379
Severity: High
A crash or wrong results could occur if the spill-to-disk mechanism encountered a zero-length string at the very end of a data block.
Bug: IMPALA-3317
Severity: High
If a query plan contains an aggregation node producing string values anywhere within a subplan (that is,if in the SQL statement, the aggregate function appears within an inline view over a collection column), the results of the aggregation may be incorrect.
Bug: IMPALA-3311
Severity: High
A CREATE TABLE AS SELECT
operation could fail with an authorization error,
due to a slight difference in the privilege checking for the CTAS operation.
Bug: IMPALA-3269
Severity: High
Impala incorrectly allowed BINARY
to be specified as a column type,
resulting in a crash during a write to a Parquet table with a column of that type.
Bug: IMPALA-3237
Severity: High
A crash could occur while querying tables with very large rows, for example wide tables with many columns or very large string values. This problem was identified in Impala 2.3, but had low reproducibility in subsequent releases. The fix ensures the memory allocation size is correct.
Bug: IMPALA-3105
Severity: High
A very large memory allocation within the catalogd daemon could exceed an internal Thrift limit, causing a crash.
Bug: IMPALA-3494
Severity: High
If a partitioned table used a file format other than Avro, and the file format of an individual partition was changed to Avro, subsequent queries could encounter a crash.
Bug: IMPALA-3314
Severity: High
A timing problem during runtime filter processing could cause queries against Avro or SequenceFile tables to hang.
Bug: IMPALA-3798
Severity: High
The following list contains the most critical issues (priority='Blocker'
) from the JIRA system.
For the full list of fixed issues in Impala 2.5, see
this report in the Impala JIRA tracker.
Bug: IMPALA-2683
The stress test was running a build with the TPC-H, TPC-DS, and TPC-H nested queries with scale factor 3.
Bug: IMPALA-2365
If a UDF JAR was not available in the HDFS location specified in the CREATE FUNCTION
statement,
the impalad daemon could crash.
Bug: IMPALA-2535
A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
The cause was the internal ordering of operations that could cause a later phase of the query to
allocate memory required by an earlier phase of the query. The workaround was to either increase
or decrease the MEM_LIMIT
query option, because the issue would only occur for a specific
combination of memory limit and data volume.
Bug: IMPALA-2643
Referring to the same column twice in a view definition could cause the view to omit
rows where that column contained a NULL
value. This could cause
incorrect results due to an inaccurate COUNT(*)
value or rows missing
from the result set.
Bug: IMPALA-1459
Some combinations of ON
clauses in join queries could result in comparisons
being applied at the wrong stage of query processing, leading to incorrect results.
Wrong predicate assignment could happen under the following conditions:
ON
clause containing a predicate that
only references columns originating from the outer-joined tables inside the inline view.
Bug: IMPALA-2093
IN
subqueries might return wrong results if the left-hand side of the IN
is a constant.
For example:
select * from alltypestiny t1
where 10 not in (select sum(int_col) from alltypestiny);
Bug: IMPALA-2940
Parquet dictionary decoders can accumulate throughout query execution, leading to excessive memory usage. One decoder is created per-column per-split.
Bug: IMPALA-3056
Bug: IMPALA-2742
Currently, the MemPool would always double the size of the last allocation. This can lead to bad behavior if the MemPool transferred the ownership of all its data except the last chunk. In the next allocation, the next allocated chunk would double the size of this large chunk, which can be undesirable.
Bug: IMPALA-3035
The CatalogOpExecutor.alterTableDropPartition()
function violates
the locking protocol used in the catalog that requires catalogLock_
to be acquired before any table-level lock. That may cause deadlocks when ALTER TABLE DROP PARTITION
is executed concurrently with other DDL operations.
Bug: IMPALA-2215
A query with a HAVING
clause but no GROUP BY
clause was not being rejected,
despite being invalid syntax. For example:
select case when 1=1 then 'didit' end as c1 from (select 1 as one) a having 1!=1;
Bug: IMPALA-2914
TimestampValue::ToTimestampVal()
requires a valid TimestampValue
as input.
This requirement was not enforced in some places, leading to serious errors.
Bug: IMPALA-2986
An aggregation query could fail with an out-of-memory error, despite sufficient memory being reported as available.
Bug: IMPALA-2592
Some queries do not close an internal communication channel on an error. This will cause the node on the other side of the channel to wait indefinitely, causing the query to hang. For example, this issue could happen on a Kerberos-enabled system if the credential cache was outdated. Although the affected query hangs, the impalad daemons continue processing other queries.
Bug: IMPALA-2184
Querying for the min or max value of a timestamp cast from a bigint via from_unixtime()
fails silently and crashes instances of impalad when the input includes a value outside of the valid range.
Workaround: Disable native code generation with:
SET disable_codegen=true;
Bug: IMPALA-2788
Impala returns wrong result for function conv()
.
Function conv(bigint, from_base, to_base)
returns an correct result,
while conv(string, from_base, to_base)
returns the correct value.
For example:
select 2061013007, conv(2061013007, 16, 10), conv('2061013007', 16, 10);
+------------+--------------------------+----------------------------+
| 2061013007 | conv(2061013007, 16, 10) | conv('2061013007', 16, 10) |
+------------+--------------------------+----------------------------+
| 2061013007 | 1627467783 | 139066421255 |
+------------+--------------------------+----------------------------+
Fetched 1 row(s) in 0.65s
select 2061013007, conv(cast(2061013007 as bigint), 16, 10), conv('2061013007', 16, 10);
+------------+------------------------------------------+----------------------------+
| 2061013007 | conv(cast(2061013007 as bigint), 16, 10) | conv('2061013007', 16, 10) |
+------------+------------------------------------------+----------------------------+
| 2061013007 | 1627467783 | 139066421255 |
+------------+------------------------------------------+----------------------------+
select 2061013007, conv(cast(2061013007 as string), 16, 10), conv('2061013007', 16, 10);
+------------+------------------------------------------+----------------------------+
| 2061013007 | conv(cast(2061013007 as string), 16, 10) | conv('2061013007', 16, 10) |
+------------+------------------------------------------+----------------------------+
| 2061013007 | 139066421255 | 139066421255 |
+------------+------------------------------------------+----------------------------+
select 2061013007, conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10), conv('2061013007', 16, 10);
+------------+-----------------------------------------------------------------+----------------------------+
| 2061013007 | conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10) | conv('2061013007', 16, 10) |
+------------+-----------------------------------------------------------------+----------------------------+
| 2061013007 | 1627467783 | 139066421255 |
+------------+-----------------------------------------------------------------+----------------------------+
Workaround:
Cast the value to string and use conv(string, from_base, to_base)
for conversion.
The set of fixes for Impala in Impala 2.4.0 is the same as in Impala 2.3.2.
This section lists the most serious or frequently encountered customer issues fixed in Impala 2.3.2.
A query involving an analytic function could encounter a serious error. This issue was encountered infrequently, depending upon specific combinations of queries and data.
Bug: IMPALA-2829
An outer join query could fail unexpectedly with an out-of-memory error when the "spill to disk" mechanism was turned off.
Bug: IMPALA-2722
A join query could encounter a serious error due to an internal failure to allocate memory, which
resulted in dereferencing a NULL
pointer.
Bug: IMPALA-2612
Referring to the same column twice in a view definition could cause the view to omit
rows where that column contained a NULL
value. This could cause
incorrect results due to an inaccurate COUNT(*)
value or rows missing
from the result set.
Bug: IMPALA-2643
A GRANT
statement for a URI could be ineffective if the URI
contained uppercase letters, for example in an uppercase directory name.
Subsequent statements, such as CREATE EXTERNAL TABLE
with a LOCATION
clause, could fail with an authorization exception.
Bug: IMPALA-2695
The catalogd daemon could encounter a serious error
when loading the incremental statistics metadata for tables with large
numbers of partitions and columns. The problem occurred when the
internal representation of metadata for the table exceeded 2
GB, for example in a table with 20K partitions and 77 columns. The fix causes a
COMPUTE INCREMENTAL STATS
operation to fail if it
would produce metadata that exceeded the maximum size.
Bug: IMPALA-2664, IMPALA-2648
CREATE TABLE
or ALTER TABLE
statements could fail with
metastore database errors due to length limits on the SERDEPROPERTIES
and TBLPROPERTIES
clauses.
(The limit on key size is 256, while the limit on value size is 4000.) The fix makes Impala handle these error conditions
more cleanly, by detecting too-long values rather than passing them to the metastore database.
Bug: IMPALA-2226
Impala could fail to access Parquet data files with page headers larger than 8 MB, which could
occur, for example, if the minimum or maximum values for a column were long strings. The
fix adds a configuration setting --max_page_header_size
, which you can use to
increase the Impala size limit to a value higher than 8 MB.
Bug: IMPALA-2273
Queries on Parquet tables could consume excessive memory (potentially multiple gigabytes) due to producing
large intermediate data values while evaluating groups of rows. The workaround was to reduce the size of
the NUM_SCANNER_THREADS
query option, the BATCH_SIZE
query option,
or both.
Bug: IMPALA-2473
A query that included a DISTINCT
operator and a HAVING
clause, but no
aggregate functions or GROUP BY
, would fail with an uninformative error message.
Bug: IMPALA-2113
A query that included *
in the SELECT
list, in addition to an
aggregate function call, would fail with an uninformative message if the query had no
GROUP BY
clause.
Bug: IMPALA-2225
Queries involving HBase tables used substantially more memory than in earlier Impala versions. The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284. The fix for this issue involves removing a separate memory work area for HBase queries and reusing other memory that was already allocated.
Bug: IMPALA-2731
Some combinations of ON
clauses in join queries could result in comparisons
being applied at the wrong stage of query processing, leading to incorrect results.
Wrong predicate assignment could happen under the following conditions:
ON
clause containing a predicate that
only references columns originating from the outer-joined tables inside the inline view.
Bug: IMPALA-1459
A debug build of Impala could encounter a serious error after encountering some kinds of I/O errors for Parquet files. This issue only occurred in debug builds, not release builds.
Bug: IMPALA-2558
A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
The cause was the internal ordering of operations that could cause a later phase of the query to
allocate memory required by an earlier phase of the query. The workaround was to either increase
or decrease the MEM_LIMIT
query option, because the issue would only occur for a specific
combination of memory limit and data volume.
Bug: IMPALA-2535
A query could fail with an internal error while calculating the memory limit. This was an infrequent condition uncovered during stress testing.
Bug: IMPALA-2559
A query could fail with an internal error while calculating the memory limit. This was an infrequent condition uncovered during stress testing.
Bug: IMPALA-2614, IMPALA-2559
Bug: IMPALA-2591
These fixes lift the restriction on using SSL encryption and Kerberos authentication together for internal communication between Impala components.
Bug: IMPALA-2598, IMPALA-2747
The version of Impala that is included with Impala 2.3.1 is identical to Impala 2.3.0. There are no new bug fixes, new features, or incompatible changes.
This section lists the most serious or frequently encountered customer issues fixed in Impala 2.3. Any issues already fixed in Impala 2.2 maintenance releases (up through Impala 2.2.8) are also included. Those issues are listed under the respective Impala 2.2 sections and are not repeated here.
A number of issues were resolved that could result in serious errors when encountered. The most critical or commonly encountered are listed here.
Bugs: IMPALA-2168, IMPALA-2378, IMPALA-2369, IMPALA-2357, IMPALA-2319, IMPALA-2314, IMPALA-2016
A number of issues were resolved that could result in wrong results when encountered. The most critical or commonly encountered are listed here.
Bugs: IMPALA-2192, IMPALA-2440, IMPALA-2090, IMPALA-2086, IMPALA-1947, IMPALA-1917
This section lists the most frequently encountered customer issues fixed in Impala 2.2.9.
If an inline view in a FROM
clause contained a NULL
literal,
the result set was empty.
Bug: IMPALA-1917
Queries involving HBase tables used substantially more memory than in earlier Impala versions. The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284. The fix for this issue involves removing a separate memory work area for HBase queries and reusing other memory that was already allocated.
Bug: IMPALA-2731
Some combinations of ON
clauses in join queries could result in comparisons
being applied at the wrong stage of query processing, leading to incorrect results.
Wrong predicate assignment could happen under the following conditions:
ON
clause containing a predicate that
only references columns originating from the outer-joined tables inside the inline view.
Bug: IMPALA-1459
The join predicate for an OUTER JOIN
clause could be applied at the wrong stage
of query processing, leading to incorrect results.
Bug: IMPALA-2446
The catalogd daemon could encounter a serious error when loading the
incremental statistics metadata for tables with large numbers of partitions and columns.
The problem occurred when the internal representation of metadata for the table exceeded 2
GB, for example in a table with 20K partitions and 77 columns. The fix causes a
COMPUTE INCREMENTAL STATS
operation to fail if it would produce
metadata that exceeded the maximum size.
Bug: IMPALA-2648, IMPALA-2664
Adding or subtracting a large INTERVAL
value to a
TIMESTAMP
value could produce an incorrect result, with the value
wrapping instead of returning an out-of-range error.
Bug: IMPALA-1675
An IN
operator with literal values could cause a statement to fail if used
as the argument to a binary operator, such as an equality test for a BOOLEAN
value.
Bug: IMPALA-1949
Impala could fail to access Parquet data files with page headers larger than 8 MB, which
could occur, for example, if the minimum or maximum values for a column were long strings.
The fix adds a configuration setting --max_page_header_size
, which you
can use to increase the Impala size limit to a value higher than 8 MB.
Bug: IMPALA-2273
A query that activated the spill-to-disk mechanism could fail if it contained a sort expression involving certain combinations of fixed-length or variable-length types.
Bug: IMPALA-2357
Some queries that activated the spill-to-disk mechanism could produce a serious error if there was insufficient memory to set up internal work areas. Now those queries produce normal out-of-memory errors instead.
Bug: IMPALA-2344
A serious error could occur under rare circumstances, due to a race condition while freeing memory during heavily concurrent workloads.
Bug: IMPALA-2252
A call to SetError()
in a user-defined function (UDF) would not cause the query to fail as expected.
Bug: IMPALA-1746
An INSERT ... SELECT
operation into a partitioned table could fail if the SELECT
query
included a GROUP BY
clause referring to the partition key columns.
Bug: IMPALA-2533
This section lists the most frequently encountered customer issues fixed in Impala 2.2.8.
Impala could not read Avro tables created in Hive with the STORED AS AVRO
clause.
Bug: IMPALA-1136, IMPALA-2161
If a Parquet file in HDFS was overwritten by a smaller file, Impala could encounter a serious error.
Issuing a INVALIDATE METADATA
statement before a subsequent query would avoid the error.
The fix allows Impala to handle such inconsistencies in Parquet file length cleanly regardless of whether the
table metadata is up-to-date.
Bug: IMPALA-2213
Impala could encounter a serious error when reading compressed text files larger than 1 GB. The fix causes Impala to issue an error message instead in this case.
Bug: IMPALA-2249
A query using the group_concat()
function could encounter a serious error if the returned string value was larger than 1 GB.
Now the query fails with an error message in this case.
Bug: IMPALA-2284
An edge case in the algorithm used to distribute data among nodes could result in uneven distribution of work for some queries, with all data sent to the same node.
Bug: IMPALA-2270
A communication error could occur between Impala and the Hive metastore database, causing Impala operations that update table metadata to fail.
Bug: IMPALA-2348
Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.
Bug: IMPALA-2364
Impala could generate a suboptimal query plan for some queries involving small tables.
Bug: IMPALA-2165
This section lists the most frequently encountered customer issues fixed in Impala 2.2.7.
Impala warns if it detects a discrepancy in table statistics: a table considered to have zero rows even though there are data files present. In this case, Impala also skips query optimizations that are normally applied to very small tables.
Bug: IMPALA-1983:
A query could encounter a serious error if it included a particular combination of aggregate functions and inline views.
Bug: IMPALA-2266
A query could encounter a serious error if it included an inline view whose subquery had no FROM
clause.
Bug: IMPALA-2216
A CREATE TABLE AS SELECT
or INSERT ... SELECT
statement could produce
different results than a SELECT
statement, for queries including a FULL JOIN
clause
and including literal values in the select list.
Bug: IMPALA-2203
A query could return incorrect results if it contained a UNION
clause,
calls to analytic functions, and a constant expression that evaluated to FALSE
.
Bug: IMPALA-2088
A query containing an INNER JOIN
clause could return undesired rows.
Some predicate specified in the ON
clause could be omitted from the filtering operation.
Bug: IMPALA-2089
A COMPUTE INCREMENTAL STATS
statement could leave the row count for an emptyp partition as -1,
rather than initializing the row count to 0. The missing statistic value could result in reduced query performance.
Bug: IMPALA-2199
A query could encounter a serious error if it included column aliases with the same names as table columns, and used
ordinal numbers in an ORDER BY
or GROUP BY
clause.
Bug: IMPALA-1898
A query could return incorrect results if it included an outer join clause, inline views, and calls to functions such as coalesce()
that can generate NULL
values.
Bug: IMPALA-1987
A query could return incorrect results if the table contained multiple CHAR
columns with length of 2 or less,
and the query included a GROUP BY
clause that referred to multiple such columns.
Bug: IMPALA-2178
An INSERT
statement could encounter a serious error if the SELECT
portion called an analytic function.
Bug: IMPALA-1737
This section lists the most frequently encountered customer issues fixed in Impala 2.2.5.
When the Impala COMPUTE STATS
statement was run on a partitioned Parquet table that was created in Hive, the table subsequently became inaccessible in Hive.
The table was still accessible to Impala. Regaining access in Hive required a workaround of creating a new table. The error displayed in Hive was:
Error: Error while compiling statement: FAILED: SemanticException Class not found: org.apache.impala.hive.serde.ParquetInputFormat (state=42000,code=40000)
Bug: IMPALA-2048
A query could encounter a serious error if it contained a RIGHT OUTER
, RIGHT ANTI
, or FULL OUTER
join clause
and approached the memory limit on a host so that the "spill to disk" mechanism was activated.
Bug: IMPALA-1929
Declaring a partition key column as a TINYINT
caused problems with the COMPUTE STATS
statement.
The associated partitions would always have zero estimated rows, leading to potential inefficient query plans.
Bug: IMPALA-2136
A query that referred to a view whose query referred to another view containing a join, could return incorrect results.
WHERE
clauses for the outermost query were not always applied, causing the result
set to include additional rows that should have been filtered out.
Bug: IMPALA-2018
The user()
function returned the name of the logged-in user, which might not be the
same as the user name being checked for authorization if, for example, delegation was enabled.
Bug: IMPALA-2064
Resolution: Rather than change the behavior of the user()
function,
the fix introduces an additional function effective_user()
that returns the user name that is checked during authorization.
Query performance was improved substantially for Parquet files containing TIMESTAMP
data written by Hive, when the -convert_legacy_hive_parquet_utc_timestamps=true
setting
is in effect.
Bug: IMPALA-2125
A join query could encounter a serious error if the query approached the memory limit on a host so that the "spill to disk" mechanism was activated, and data volume in the join was large enough that an internal memory buffer exceeded 1 GB in size on a particular host. (Exceeding this limit would only happen for huge join queries, because Impala could split this intermediate data into 16 parts during the join query, and the buffer only contains compact bookkeeping data rather than the actual join column data.)
Bug: IMPALA-2065
This section lists the most frequently encountered customer issues fixed in Impala 2.2.3.
Enabling Impala to work with the Isilon filesystem involves a number of fixes to performance and flexibility for dealing with I/O using remote reads. See Using Impala with Isilon Storage for details on using Impala and Isilon together.
Bug: IMPALA-1968, IMPALA-1730
The set of timezones recognized by Impala was expanded. You can always find the latest list of supported timezones in the Impala source code, in the file timezone_db.cc.
Bug: IMPALA-1381
Impala can now process TIMESTAMP
literals including a trailing z
,
signifying "Zulu" time, a synonym for UTC.
Bug: IMPALA-1963
An INSERT OVERWRITE
operation would encounter an error
if the SELECT
portion of the statement returned zero
rows, such as with a LIMIT 0
clause.
Bug: IMPALA-2008
DECIMAL
literals can now include e
scientific notation.
For example, now CAST(1e3 AS DECIMAL(5,3))
is a valid expression.
Formerly it returned NULL
.
Some scientific expressions might have worked before in DECIMAL
context, but only when the scale was 0.
Bug: IMPALA-1952
This section lists the most frequently encountered customer issues fixed in Impala 2.2.1.
This section lists the most frequently encountered customer issues fixed in Impala 2.2.0.
For the full list of fixed issues in Impala 2.2.0, including over 40 critical issues, see this report in the Impala JIRA tracker.
When the type of a column was changed in either Hive or Impala through ALTER TABLE CHANGE COLUMN
, the metastore database did not correctly propagate
that change to the table that contains the column statistics. The statistics (particularly the NDV
) for that column were permanently reset
and could not be changed by Impala's COMPUTE STATS
command. The underlying cause is a Hive bug (HIVE-9866).
Bug: IMPALA-1607
Resolution: Resolved by incorporating the fix for HIVE-9866.
Workaround: On systems without the corresponding Hive fix, change the column back to its original type. The stats reappear and you can recompute or drop them.
If a file was truncated in HDFS without a corresponding REFRESH
in Impala, Impala could allocate memory for file descriptors and not free that memory.
Bug: IMPALA-1854
Impala could issue messages stating the block locality metadata was stale,
when the metadata was actually fine.
The internal "remote bytes read" counter was not being reset properly.
This issue did not cause an actual slowdown in query execution,
but the spurious error could result in unnecessary debugging work
and unnecessary use of the INVALIDATE METADATA
statement.
Bug: IMPALA-1712
When a table was moved from one database to another, the column statistics were not pointed to the new database.i This could result in lower performance for queries due to unavailable statistics, and also an inability to drop the table.
Bug: IMPALA-1711
impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.
Bug: IMPALA-1674
The unix_timestamp()
function could return an incorrect value (a constant value of 1).
Bug: IMPALA-1623
Some queries did not recognize the final line of a text data file if the line did not end with a newline character.
This could lead to inconsistent results, such as a different number of rows for SELECT COUNT(*)
as opposed to SELECT *
.
Bug: IMPALA-1476
If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.
Bug: IMPALA-1805
Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.
Bug: IMPALA-1794
Impala could sometimes fail to INSERT
into a Parquet table if a column value such as a STRING
was larger than 64 KB.
Bug: IMPALA-1705
This fix relaxes the CPU requirement for Impala. Now only the SSSE3 instruction set is required. Formerly, SSE4.1 instructions were generated, making Impala refuse to start on some older CPUs.
Bug: IMPALA-1646
This section lists the most significant Impala issues fixed in Impala 2.1.7.
If an inline view in a FROM
clause contained a NULL
literal,
the result set was empty.
Bug: IMPALA-1917
A value of type DECIMAL(3,0)
could be incorrectly cast to TINYINT
.
The resulting out-of-range value could be incorrect. After the fix, the smallest type that is allowed
for this cast is INT
, and attempting to use DECIMAL(3,0)
in a
TINYINT
context produces a "loss of precision" error.
Bug: IMPALA-2264
An invalid constant expression in a WHERE
clause (for example, an invalid
regular expression pattern) could fail to clean up internal state after raising a query error.
Therefore, certain combinations of invalid expressions in a query could cause a crash, or cause a query to continue
when it should halt with an error.
Bug: IMPALA-1756, IMPALA-2514
A call to SetError()
in a user-defined function (UDF) would not cause the query to fail as expected.
Bug: IMPALA-1746, IMPALA-2141
This section lists the most significant Impala issues fixed in Impala 2.1.6.
Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.
Bug: IMPALA-2364
Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.
Bug: IMPALA-2314
Impala could generate a suboptimal query plan for some queries involving small tables.
Bug: IMPALA-2165
Queries using the GROUP BY
operator on multiple CHAR
columns with length less than or equal to 2 characters
could return incorrect results for some columns.
Bug: IMPALA-2178
Queries against HBase tables could return incomplete results if the WHERE
clause included string comparisons using literals
containing escaped quotation marks.
Bug: IMPALA-2133
A query could encounter a serious error if it contained a RIGHT OUTER
, RIGHT ANTI
, or FULL OUTER
join clause
and approached the memory limit on a host so that the "spill to disk" mechanism was activated.
Bug: IMPALA-1929
This section lists the most significant Impala issues fixed in Impala 2.1.5.
Queries including RIGHT OUTER JOIN
, RIGHT ANTI JOIN
, or FULL OUTER JOIN
clauses and involving a high volume of data could encounter a serious error.
Bug: IMPALA-1919
This section lists the most significant Impala issues fixed in Impala 2.1.4.
When expressions that tested for NULL
were used in combination with analytic functions, an error could occur.
(The original crash issue was fixed by an earlier patch.)
Bug: IMPALA-1519
DECIMAL
literals could include e
scientific notation.
For example, now CAST(1e3 AS DECIMAL(5,3))
is a valid expression.
Formerly it returned NULL
.
Some scientific expressions might have worked before in DECIMAL
context, but only when the scale was 0.
Bug: IMPALA-1952
An INSERT OVERWRITE
statement would write new data, even if
a constant clause such as WHERE 1 = 0
should have
prevented it from writing any rows.
Bug: IMPALA-1860
If the PARTITION BY
clause in an analytic function refers to partition key columns in a partitioned table,
now Impala can perform partition pruning during the analytic query.
Bug: IMPALA-1900
A query using the FIRST_VALUE
analytic function
and a window defined with the PRECEDING
keyword
could produce wrong results.
Bug: IMPALA-1888
A query referencing a DECIMAL
column with the FIRST_VALUE
analytic function
could encounter an error.
Bug: IMPALA-1559
A query using an analytic function
could encounter an error if the
evaluation of an analytic ORDER BY
or PARTITION
expression
resulted in a NaN value, for example if the ORDER BY
or PARTITION
contained a division operation where both operands were zero.
Bug: IMPALA-1808
An analytic function containing only an OVER
clause could
encounter an error if another part of the query (specifically an outer join)
produced all-NULL
tuples.
Bug: IMPALA-1562
This section lists the most significant issues fixed in Impala 2.1.3.
When Hive writes TIMESTAMP
values, it represents them
in the local time zone of the server. Impala expects TIMESTAMP
values to always be in the UTC time zone, possibly leading to inconsistent
results depending on which component created the data files.
This patch introduces a new startup flag,
-convert_legacy_hive_parquet_utc_timestamps
for the impalad daemon.
Specify -convert_legacy_hive_parquet_utc_timestamps=true
to make Impala recognize Parquet data files written by Hive
and automatically adjust TIMESTAMP
values read from those files into the UTC time zone for
compatibility with other Impala TIMESTAMP
processing.
Although this setting is currently turned off by default,
consider enabling it if practical in your environment,
for maximum interoperability with Hive-created Parquet files.
Bug: IMPALA-1658
Converting a floating-point value to a STRING
could be slower than necessary.
Bug: IMPALA-1738
Certain calls to aggregate functions with STRING
arguments could encounter a serious error
when the system ran low on memory and attempted to activate the spill-to-disk mechanism.
The error message referenced the function impala::AggregateFunctions::StringValGetValue
.
Bug: IMPALA-1865
If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.
Bug: IMPALA-1805
Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.
Bug: IMPALA-1794
Successive calls to the data source API could result in excessive memory consumption, with memory allocated but never freed.
Bug: IMPALA-1801
Impala could issue messages stating the block locality metadata was stale,
when the metadata was actually fine.
The internal "remote bytes read" counter was not being reset properly.
This issue did not cause an actual slowdown in query execution,
but the spurious error could result in unnecessary debugging work
and unnecessary use of the INVALIDATE METADATA
statement.
Bug: IMPALA-1712
This section lists the most significant issues fixed in Impala 2.1.2.
For the full list of fixed issues in Impala 2.1.2, see this report in the Impala JIRA tracker.
When a floating-point value was read from a text file and interpreted as a FLOAT
or DOUBLE
value, it could be incorrectly interpreted if it included more than
19 significant digits.
Bug: IMPALA-1622
The unix_timestamp()
function could return an incorrect value (a constant value of 1).
Bug: IMPALA-1623
A query against a partitioned table could return incorrect results if the WHERE
clause
compared the partition key to NULL
using operators such as =
or !=
.
Bug: IMPALA-1535
The performance of the COMPUTE STATS
statement and queries was improved, particularly for wide tables.
Bug: IMPALA-1120
This section lists the most significant issues fixed in Impala 2.1.1.
For the full list of fixed issues in Impala 2.1.1, see this report in the Impala JIRA tracker.
impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.
Bug: https://issues.apache.org/jira/browse/IMPALA-1674 IMPALA-1674
impalad daemons in clusters secured by Kerberos or LDAP could experience a slight memory leak on each connection. The accumulation of unreleased memory could cause problems on long-running clusters.
Bug: IMPALA-1668
This section lists the most significant issues fixed in Impala 2.1.0.
For the full list of fixed issues in Impala 2.1.0, see this report in the Impala JIRA tracker.
Transferring large result sets back to the client application on Kerberos
Bug: IMPALA-1455
Queries on gzipped text files required holding the entire data file and its uncompressed representation
in memory at the same time. SELECT
and COMPUTE STATS
statements could
fail or perform inefficiently as a result. The fix enables streaming reads for gzipped text, so that the
data is uncompressed as it is read.
Bug: IMPALA-1556
Impala might not be able to access HBase tables, depending on the associated levels of Impala and HBase on the system.
Bug: IMPALA-1611
Improved code coverage in Impala testing uncovered a number of potentially serious errors that could occur with specific query syntax. These errors are resolved in Impala 2.1.
Bug: IMPALA-1553, IMPALA-1528, IMPALA-1526, IMPALA-1524, IMPALA-1508, IMPALA-1493, IMPALA-1501, IMPALA-1483
For the full list of fixed issues in Impala 2.0.5, see this report in the Impala JIRA tracker.
This section lists the most significant issues fixed in Impala 2.0.4.
For the full list of fixed issues in Impala 2.0.4, see this report in the Impala JIRA tracker.
When Hive writes TIMESTAMP
values, it represents them
in the local time zone of the server. Impala expects TIMESTAMP
values to always be in the UTC time zone, possibly leading to inconsistent
results depending on which component created the data files.
This patch introduces a new startup flag,
-convert_legacy_hive_parquet_utc_timestamps
for the impalad daemon.
Specify -convert_legacy_hive_parquet_utc_timestamps=true
to make Impala recognize Parquet data files written by Hive
and automatically adjust TIMESTAMP
values read from those files into the UTC time zone for
compatibility with other Impala TIMESTAMP
processing.
Although this setting is currently turned off by default,
consider enabling it if practical in your environment,
for maximum interoperability with Hive-created Parquet files.
Bug: IMPALA-1658
If a table data file was replaced by a shorter file outside of Impala,
such as with INSERT OVERWRITE
in Hive producing an empty
output file, subsequent Impala queries could hang.
Bug: IMPALA-1794
This section lists the most significant issues fixed in Impala 2.0.3.
For the full list of fixed issues in Impala 2.0.3, see this report in the Impala JIRA tracker.
An anti-join query (or a NOT EXISTS
operation that was rewritten internally into an anti-join) could produce incorrect results
if Impala reached its memory limit, causing the query to write temporary results to disk.
Bug: IMPALA-1471
A query against a partitioned table could return incorrect results if the WHERE
clause
compared the partition key to NULL
using operators such as =
or !=
.
Bug: IMPALA-1535
The performance of the COMPUTE STATS
statement and queries was improved, particularly for wide tables.
Bug: IMPALA-1120
This section lists the most significant issues fixed in Impala 2.0.2.
For the full list of fixed issues in Impala 2.0.2, see this report in the Impala JIRA tracker.
Some operations in queries submitted through Hue or other HiveServer2 clients could produce inconsistent results.
Bug: IMPALA-1453
Impala could encounter an error from running out of file descriptors. The fix reduces the amount of time file descriptors are kept open, and avoids leaking file descriptors when read operations encounter errors.
The unix_timestamp()
function could return a constant value 1
instead
of a representation of the time.
Bug: IMPALA-1623
To avoid putting too heavy a load on any one node, Impala now randomizes which scan node processes each HDFS data block rather than choosing the first cached block replica.
Bug: IMPALA-1586
In clusters secured by Kerberos or LDAP, a discrepancy in internal transmission of user names could cause a communication error with Llama.
Bug: IMPALA-1606
The CREATE FUNCTION
statement could report that it could not find a function entry point
within the .so
file for a UDF written in C++, even if the corresponding function was
present.
Bug: IMPALA-1475
This section lists the most significant issues fixed in Impala 2.0.1.
For the full list of fixed issues in Impala 2.0.1, see this report in the Impala JIRA tracker.
After running the COMPUTE STATS
statement on an Impala table, subsequent queries on that
table could fail with the exception message Failed to load metadata for table:
default.stats_test
.
Bug: https://issues.apache.org/jira/browse/IMPALA-1416 IMPALA-1416
Workaround: Upgrading to a level of that includes the fix for HIVE-8627,
prevents the problem from affecting future COMPUTE STATS
statements. On affected levels
of , or for Impala tables that have become inaccessible, the workaround is to disable the
hive.metastore.try.direct.sql
setting in the Hive metastore
hive-site.xml file and issue the INVALIDATE METADATA
statement for
the affected table. You do not need to rerun the COMPUTE STATS
statement for the table.
This section lists the most significant issues fixed in Impala 2.0.0.
For the full list of fixed issues in Impala 2.0.0, see this report in the Impala JIRA tracker.
Hints specified within a view query did not take effect when the view was queried, leading to slow performance. As part of this fix, Impala now supports hints embedded within comments.
Bug: IMPALA-995"
Potential wrong results for some types of queries.
Bug: IMPALA-1101"
Potential wrong results for some types of queries.
Bug: IMPALA-1102"
Potential wrong results for some types of queries.
Bug: IMPALA-1118"
Potential wrong results for some types of queries.
Bug: IMPALA-1123"
Potential wrong results for some types of queries.
Bug: IMPALA-1165"
Potential wrong results for some types of queries.
Bug: IMPALA-1353"
Serious error for certain combinations of function calls and data types.
Bug: IMPALA-1105"
Serious error for certain combinations of function calls and data types.
Bug: IMPALA-1109"
DECIMAL
columns with different precision could not be compared in join predicates.
Bug: IMPALA-1121"
Hive-created Avro tables with columns specified by a JSON file or literal could produce errors when
queried in Impala, and could not be used with the COMPUTE STATS
statement. Now you can
create such tables in Impala to avoid such errors.
Bug: IMPALA-1104"
The Impala debug web UI did not properly encode all output.
Bug: IMPALA-1133"
Certain queries could run without obeying the limits imposed by resource management.
Bug: IMPALA-1236"
Certain INSERT
and LOAD DATA
statements could fail unnecessarily, if
the target directories in HDFS had restrictive HDFS permissions, but those permissions were overridden by
HDFS extended ACLs.
Bug: IMPALA-1279"
In a Kerberos environment, the principal name was not mapped to lowercase, causing issues when a user logged in with an uppercase principal name and Sentry authorization was enabled.
Bug: IMPALA-1334"
Impala 1.4.3 includes fixes to address what is known as the POODLE vulnerability in SSLv3. SSLv3 access is disabled in the Impala debug web UI.
This section lists the most significant issues fixed in Impala 1.4.2.
For the full list of fixed issues in Impala 1.4.2, see this report in the Impala JIRA tracker.
This section lists the most significant issues fixed in Impala 1.4.1.
For the full list of fixed issues in Impala 1.4.1, see this report in the Impala JIRA tracker.
Occasionally, a non-trivial query run through Llama could encounter a serious error. The detailed error in the log was:
boost::exception_detail::clone_impl
<boost::exception_detail::error_info_injector<boost::lock_error> >
Severity: High
Impala log files could contain internal error messages due to a problem formatting certain strings. The messages consisted of a Java call stack starting with:
jni-util.cc:177] java.util.MissingFormatArgumentException: Format specifier 's'
A downlevel version of the HiveServer2 API could cause difficulty retrieving the precision and scale of a
DECIMAL
value.
Bug: IMPALA-1107
The error in the title could occur following a DDL statement. This issue was discovered during internal testing and has not been reported in customer environments.
Bug: IMPALA-1093
The time for some network operations was not counted in the report of total time for a query, making it difficult to diagnose network-related performance issues.
Bug: IMPALA-1131
Certain Avro fields for byte data could cause Impala to be unable to read an Avro data file, even if the field was not part of the Impala table definition. With this fix, Impala can now read these Avro data files, although Impala queries cannot refer to the "bytes" fields.
Bug: IMPALA-1149
The --authorization_policy_provider_class
option for impalad was
added back. This option specifies a custom AuthorizationProvider
class rather than the
default HadoopGroupAuthorizationProvider
. It had been used for internal testing, then
removed in Impala 1.4.0, but it was considered useful by some customers.
Bug: IMPALA-1142
This section lists the most significant issues fixed in Impala 1.4.0.
For the full list of fixed issues in Impala 1.4.0, see this report in the Impala JIRA tracker.
The serious error in the title could occur, with the supplemental message:
num_used_buffers_ < 0: #used=-1 during cancellation HDFS cached data
The issue was due to the use of HDFS caching with data files accessed by Impala. Support for HDFS caching in Impala was introduced in Impala 1.4.0. The fix for this issue was backported to Impala 1.3.x, and is the only change in Impala 1.3.2.
Bug: IMPALA-1019
Resolution: This issue is fixed in Impala 1.3.2. The addition of HDFS caching support in Impala 1.4 means that this issue does not apply to any new level of Impala.
The impala-shell interpreter could encounter errors processing SQL statements containing non-ASCII characters.
Bug: IMPALA-489
When a view was accessed while inside a different database, references to tables were not resolved unless the names were fully qualified when the view was created.
Bug: IMPALA-962
If an ALTER TABLE
specified a non-existent HDFS location for a partition, afterwards
Impala would not be able to access the partition at all.
Bug: IMPALA-741
The CREATE TABLE LIKE
clause was enhanced to be able to create a table with the same
column definitions as a view. The resulting table is a text table unless the STORED AS
clause is specified, because a view does not have an associated file format to inherit.
Bug: IMPALA-834
Operations on tables with many partitions could be slow due to the time to evaluate which partitions were affected. The partition pruning code was speeded up substantially.
Bug: IMPALA-887
The performance of the COMPUTE STATS
statement was improved substantially. The
efficiency of its internal operations was improved, and some statistics are no longer gathered because
they are not currently used for planning Impala queries.
Bug: IMPALA-1003
After a CREATE TABLE LIKE
statement using an Avro table as the source, the new table
could have incorrect metadata and be inaccessible, depending on how the original Avro table was created.
Bug: IMPALA-185
Impala could encounter a serious error after a query was cancelled.
Bug: IMPALA-1046
A deadlock condition could make all impalad daemons hang, making the cluster unresponsive for Impala queries.
Bug: IMPALA-1083
Impala 1.3.3 includes fixes to address what is known as the POODLE vulnerability in SSLv3. SSLv3 access is disabled in the Impala debug web UI.
This backported bug fix is the only change between Impala 1.3.1 and Impala 1.3.2.
The serious error in the title could occur, with the supplemental message:
num_used_buffers_ < 0: #used=-1 during cancellation HDFS cached data
The issue was due to the use of HDFS caching with data files accessed by Impala. Support for HDFS caching in Impala was introduced in Impala 1.4.0. The fix for this issue was backported to Impala 1.3.x, and is the only change in Impala 1.3.2.
Bug: IMPALA-1019
Resolution: This issue is fixed in Impala 1.3.2. The addition of HDFS caching support in Impala 1.4 means that this issue does not apply to any new level of Impala.
This section lists the most significant issues fixed in Impala 1.3.1.
For the full list of fixed issues in Impala 1.3.1, see this report in the Impala JIRA tracker.
Impala could encounter a severe error in a query combining a left outer join with an inline view
containing a COUNT(DISTINCT)
operation.
Bug: IMPALA-904
If the result of a GROUP BY
operation is NULL
, the resulting row might
be omitted from the result set. This issue depends on the data values and data types in the table.
Bug: IMPALA-901
When a UDF is dropped through the DROP FUNCTION
statement, and then the UDF is
re-created with a new .so
library or JAR file, the original version of the UDF is still
used when the UDF is called from queries.
Bug: IMPALA-786
Workaround: Restart the impalad daemon on all nodes.
If a COMPUTE STATS
statement encountered an error, the error message is "Query
aborted" with no further detail. Common reasons why a COMPUTE STATS
statement might
fail include network errors causing the coordinator node to lose contact with other
impalad instances, and column names that match Impala
reserved words. (Currently, if a column name
is an Impala reserved word, COMPUTE STATS
always returns an error.)
Bug: IMPALA-762
After an ALTER TABLE
statement that changes the LOCATION
property of a
partition, a subsequent INSERT
statement would always use a path derived from the base
data directory for the table.
Bug: IMPALA-624
A COUNT(*)
operation could return the wrong result for text tables using nul characters
(ASCII value 0) as delimiters.
Bug: IMPALA-13
Workaround: Impala adds support for ASCII 0 characters as delimiters through the clause
FIELDS TERMINATED BY '\0'
.
Impala could allocate more memory than necessary during certain operations.
Bug: IMPALA-488
Workaround: Before issuing a COMPUTE STATS
statement for a Parquet table, reduce
the number of threads used in that operation by issuing SET NUM_SCANNER_THREADS=2
in
impala-shell. Then issue UNSET NUM_SCANNER_THREADS
before continuing
with queries.
When new subdirectories are created underneath a partitioned table by an INSERT
statement, previously the new subdirectories always used the default HDFS permissions for the
impala
user, which might not be suitable for directories intended to be read and written
by other components also.
Bug: IMPALA-827
Resolution: In Impala 1.3.1 and higher, you can specify the
--insert_inherit_permissions
configuration when starting the impalad
daemon.
Impala could encounter a severe error in a query where the FROM
list contains an inline
view that includes a UNION
. The exact type of the error varies.
Bug: IMPALA-888
The ability to specify a subset of columns in an INSERT
statement, with order different
than in the target table, was not working as intended.
Bug: IMPALA-945
This section lists the most significant issues fixed in Impala 1.3.0, primarily issues that could cause
wrong results, or cause problems running the COMPUTE STATS
statement, which is very
important for performance and scalability.
For the full list of fixed issues, see this report in the Impala JIRA tracker.
The automatic join reordering optimization could incorrectly reorder queries with an outer join or semi join followed by an inner join, producing incorrect results.
Bug: IMPALA-860
Workaround: Including the STRAIGHT_JOIN
keyword in the query prevented the issue
from occurring.
A query with a GROUP BY
clause referencing multiple columns could introduce incorrect
NULL
values in some columns of the result set. The incorrect NULL
values could appear in rows where a different GROUP BY
column actually did return
NULL
.
Bug: IMPALA-850
A query could return incorrect results if it combined an aggregate function call, a
DISTINCT
operator, and a HAVING
clause, without a GROUP
BY
clause.
Bug: IMPALA-845
An aggregation query or a query with ORDER BY
and LIMIT
could be
executed on a single node in some cases, rather than distributed across the cluster. This issue affected
queries whose FROM
clause referenced an inline view containing a UNION
.
Bug: IMPALA-831
If a GROUP BY
query referenced the same columns multiple times using different
operators, result rows could contain multiple copies of the same expression.
Bug: IMPALA-817
Referencing the same columns in both a COUNT()
and a SUM()
call in the
same query, or some other combinations of aggregate function calls, could incorrectly return a result of
0 from one of the aggregate functions. This issue affected references to TINYINT
and
SMALLINT
columns, but not INT
or BIGINT
columns.
Bug: IMPALA-765
Workaround: Setting the query option DISABLE_CODEGEN=TRUE
prevented the incorrect
results. Switching the order of the function calls could also prevent the issue from occurring.
A UNION
query could produce a wrong result, followed by a serious error for a subsequent
UNION
query.
Bug: IMPALA-723
Impala could return incorrect string results when reading uncompressed Parquet data files containing multiple row groups. This issue only affected Parquet data files produced by MapReduce jobs.
Bug: IMPALA-729
Using a column or table name that conflicted with Impala keywords could prevent running the
COMPUTE STATS
statement for the table.
Bug: IMPALA-777
The COMPUTE STATS
statement did not use the setting of the MEM_LIMIT
query option in impala-shell, potentially causing problems gathering statistics for
wide Parquet tables.
Bug: IMPALA-903
The COMPUTE STATS
statement could be slow or encounter a timeout while analyzing a table
with many partitions.
Bug: IMPALA-880
If the columns for an Avro table were all defined in the TBLPROPERTIES
or
SERDEPROPERTIES
clauses, the COMPUTE STATS
statement would fail after
completely analyzing the table, potentially causing a long delay. Although the COMPUTE
STATS
statement still does not work for such tables, now the problem is detected and reported
immediately.
Bug: IMPALA-867
Workaround: Re-create the Avro table with columns defined in SQL style, using the output of
SHOW CREATE TABLE
. (See the JIRA page for detailed steps.)
This section lists the most significant issues fixed in Impala 1.2.4. For the full list of fixed issues, see this report in the Impala JIRA tracker.
A large number of concurrent CREATE TABLE
statements can cause the
catalogd process to consume excessive memory, and potentially be killed due to an
out-of-memory condition.
Bug: IMPALA-818
Workaround: Restart the catalogd service and re-try the DDL operations that failed.
A large number of tables and partitions could result in unnecessary CPU overhead during Impala idle time and background operations.
Bug: IMPALA-821
Resolution: Catalog server processing was optimized in several ways.
A query against a TIMESTAMP
column in an Avro table could encounter a serious issue.
Bug: IMPALA-828
Workaround: Set the query option DISABLE_CODEGEN=TRUE
Impala nodes could produce repeated error messages after recovering from a communication error with the statestore service.
Bug: IMPALA-809
A join query could produce wrong results if multiple equality comparisons between the same tables referred to the same column.
Bug: IMPALA-805
Certain outer join queries could return wrong results. If one of the tables involved in the join was an
inline view, some tests from the WHERE
clauses could be applied to the wrong phase of
the query.
An HBase cell could contain a value larger than 32 KB, leading to a serious error when Impala queries that table. The error could occur even if the applicable row is not part of the result set.
Bug: IMPALA-715
Workaround: Use smaller values in the HBase table, or exclude the column containing the large value from the result set.
A query involving a DISTINCT
operator combined with a FULL OUTER JOIN
could encounter a serious error.
Bug: IMPALA-735
Workaround: Set the query option DISABLE_CODEGEN=TRUE
If a table had more than 32,767 partitions, Impala would not recognize the partitions above the 32K limit and query results could be incomplete.
Bug: IMPALA-749
Queries against HBase tables could fail with an error if the row key was compared to a function return
value rather than a string constant. Also, queries against HBase tables could fail if the
WHERE
clause contained combinations of comparisons that could not possibly match any row
key.
Resolution: Queries now return appropriate results when function calls are used in the row key
comparison. For queries involving non-existent row keys, such as WHERE row_key
IS NULL
or where the lower bound is greater than the upper bound, the query succeeds and returns
an empty result set.
This release is a fix release that supercedes Impala 1.2.2, with the same features and fixes as 1.2.2 plus one additional fix for compatibility with Parquet files generated outside of Impala by components such as Hive, Pig, or MapReduce.
An early version of the parquet-mr
library writes files that are not readable by
Impala, due to the presence of multiple row groups. Queries involving these data files might result in a
crash or a failure with an error such as "Column chunk should not contain two dictionary pages".
This issue does not occur for Parquet files produced by Impala INSERT
statements,
because Impala only produces files with a single row group.
Bug: IMPALA-720
This section lists the most significant issues fixed in Impala 1.2.2. For the full list of fixed issues, see this report in the Impala JIRA tracker.
Impala does not currently optimize the join order of queries; instead, it joins tables in the order in which they are listed in the FROM clause. Queries that contain one or more large tables on the right hand side of joins (either an explicit join expressed as a JOIN statement or a join implicit in the list of table references in the FROM clause) may run slowly or crash Impala due to out-of-memory errors. For example:
SELECT ... FROM small_table JOIN large_table
Anticipated Resolution: Fixed in Impala 1.2.2.
Workaround: In Impala 1.2.2 and higher, use the COMPUTE STATS
statement to gather
statistics for each table involved in the join query, after data is loaded. Prior to Impala 1.2.2, modify
the query, if possible, to join the largest table first. For example:
SELECT ... FROM small_table JOIN large_table
should be modified to:
SELECT ... FROM large_table JOIN small_table
Some Parquet files could be generated by other components that Impala could not read.
Bug: IMPALA-694
Resolution: The underlying issue is being addressed by a fix in the Parquet libraries. Impala 1.2.2 works around the problem and reads the existing data files.
The statestore service cound experience an internal error leading to a hang.
Bug: IMPALA-699
A UNION
query where one side involved a GROUP BY
operation could cause
a serious error.
Bug: IMPALA-687
A serious error could occur when doing an INSERT
into a Parquet table.
Bug: IMPALA-689
If the JAR file for a Java-based Hive UDF was not in the CLASSPATH
, the UDF could not be
called during a query.
Bug: IMPALA-695
This section lists the most significant issues fixed in Impala 1.2.1. For the full list of fixed issues, see this report in the Impala JIRA tracker.
While querying a table with long column values, Impala could over-allocate memory leading to an out-of-memory error. This problem was observed most frequently with tables using uncompressed RCFile or text data files.
Bug: IMPALA-525
Resolution: Fixed in 1.2.1
A join query could allocate a temporary work area that was larger than needed, leading to an out-of-memory error. The fix makes Impala return unused memory to the system when the memory limit is reached, avoiding unnecessary memory errors.
Bug: IMPALA-657
Resolution: Fixed in 1.2.1
Impala could encounter an out-of-memory condition setting up work areas for Parquet tables with many columns. The fix reduces the size of the allocated memory when not actually needed to hold table data.
Bug: IMPALA-652
Resolution: Fixed in 1.2.1
This section lists the most significant issues fixed in Impala 1.2 (beta). For the full list of fixed issues, see this report in the Impala JIRA tracker.
This section lists the most significant issues fixed in Impala 1.1.1. For the full list of fixed issues, see this report in the Impala JIRA tracker.
Certain queries involving DOUBLE
columns could fail with a serious error. The fix
improves the generation of native machine instructions for certain chipsets.
Bug: IMPALA-477
Queries could fail with a "block size is too big" error, due to NULL
values in
RCFile tables using Snappy compression.
Bug: IMPALA-482
Queries could fail if an Impala RCFile table was defined with more columns than in the corresponding RCFile data files.
Bug: IMPALA-510
Certain combinations of clauses in a view definition for a partitioned table could result in inefficient performance and incorrect results.
Bug: IMPALA-495
The SerDes class string written into Parquet data files created by Impala was updated for compatibility with Parquet support in Hive. See Incompatible Changes Introduced in Impala 1.1.1 for the steps to update older Parquet data files for Hive compatibility.
Bug: IMPALA-485
A query returning a small result sets from a large table could tie up memory unnecessarily for the duration of the query.
Bug: IMPALA-534
Queries against Avro tables could fail depending on whether the Avro schema URL was specified in the
TBLPROPERTIES
or SERDEPROPERTIES
field. The fix causes Impala to check
both fields for the schema URL.
Bug: IMPALA-538
Queries could allocate substantially more memory than specified in the impalad
-mem_limit
startup option. The fix causes more frequent checking of the limit during
query execution.
Bug: IMPALA-520
This section lists the most significant issues fixed in Impala 1.1. For the full list of fixed issues, see this report in the Impala JIRA tracker.
This issue is due to a performance tradeoff between systems running many queries concurrently, and systems running a single query. Systems running only a single query could experience lower performance than in early beta releases. Systems running many queries simultaneously should experience higher performance than in the beta releases.
A query could fail if it involved 3 or more tables and the last join table was specified as a subquery.
Bug: IMPALA-85
INSERT
statements against partitioned tables using the Parquet format could use
excessive amounts of memory as the number of partitions grew large.
Bug: IMPALA-257
The impala-shell interpreter did not accept comment entered at the command line, making it problematic to copy and paste from scripts or other code examples.
Bug: IMPALA-192
The Impala web UI would sometimes display a query as if it were still running, after the query was cancelled.
Bug: IMPALA-364
The impala-shell
command in Impala 1.0.1 does not work with Python 2.4, which is the
default on Red Hat 5.
For the impala-shell
command in Impala 1.0, the -o
option (pipe output
to a file) does not work with Python 2.4.
Bug: IMPALA-396
This section lists the most significant issues fixed in Impala 1.0.1. For the full list of fixed issues, see this report in the Impala JIRA tracker.
Impala might issue an erroneous error message when processing a Parquet data file produced by a non-Impala Hadoop component.
Bug: IMPALA-333
Resolution: Fixed
If an RCFile table definition had fewer columns than the fields actually in the data files, queries would fail.
Bug: IMPALA-293
Resolution: Fixed
The _HOST
placeholder in the --principal
startup option was not
substituted with the correct hostname, potentially leading to a startup error in setups using Kerberos
authentication.
Bug: IMPALA-351
Resolution: Fixed
After a region in an HBase table was split or moved, an Impala query might return incomplete or out-of-date results.
Bug: IMPALA-300
Resolution: Fixed
After a successful CREATE TABLE
statement, the corresponding query state would be
incorrectly reported as EXCEPTION
.
Bug: IMPALA-349
Resolution: Fixed
Operations involving calls to the Java JNI subsystem (for example, queries on HBase tables) could allocate memory but not release it.
Bug: IMPALA-358
Resolution: Fixed
Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL.
Impala:
impala> select UNIX_TIMESTAMP('10:02:01') ;
impala> 0
Hive:
hive> select UNIX_TIMESTAMP('10:02:01') FROM tmp;
hive> NULL
Bug: IMPALA-16
Anticipated Resolution: Fixed
Insert INTO TABLE SELECT <constant> will not insert any data and may return an error.
Anticipated Resolution: Fixed
Here are the major user-visible issues fixed in Impala 1.0. For a full list of fixed issues, see this report in the Impala JIRA tracker.
A query containing both UNION
and LIMIT
clauses could intermittently
cause the impalad
process to halt with a segmentation fault.
Bug: IMPALA-183
Resolution: Fixed
An INSERT
statement specifying a NULL
value for one of the partitioning
columns could cause the impalad
process to halt with a segmentation fault.
Bug: IMPALA-190
Resolution: Fixed
In the Impala web user interface, the profile page for an INSERT
statement showed
obsolete information for the statement once it was complete.
Bug: IMPALA-217
Resolution: Fixed
Queries involving an HBase table could be slower than expected, due to excessive memory usage on the Impala nodes.
Bug: IMPALA-231
Resolution: Fixed
No validation was done to check that the impala-lzo
shared library was compatible with
the version of Impala, possibly leading to a crash when using LZO-compressed text files.
Bug: IMPALA-234
Resolution: Fixed
Workaround: Always upgrade the impala-lzo
library at the same time as you upgrade
Impala itself.
INSERT
statements for tables partitioned on columns involving datetime types could
appear to succeed, but cause errors for subsequent queries on those tables. The problem was especially
serious if an improperly formatted timestamp value was specified for the partition key.
Bug: IMPALA-238
Resolution: Fixed
Pressing Ctrl-C in the impala-shell
interpreter could sometimes display an error and
return control to the shell, making it impossible to cancel the query.
Bug: IMPALA-243
Resolution: Fixed
Specifying an empty string or NULL
for a partition key in an INSERT
statement would fail.
Bug: IMPALA-252
Resolution: Fixed. The behavior for empty partition keys was made more compatible with the corresponding Hive behavior.
The round()
function did not always return the correct number of significant digits.
Bug: IMPALA-266
Resolution: Fixed
Casting from a string literal back to the same type would cause an "invalid type cast" error rather than leaving the original value unchanged.
Bug: IMPALA-267
Resolution: Fixed
Some queries that returned very few rows experienced unnecessary memory usage.
Bug: IMPALA-288
Resolution: Fixed
A serious error could occur for relatively small and inexpensive queries.
Bug: IMPALA-289
Resolution: Fixed
Certain aggregation queries against Parquet tables were inefficient due to lower than required thread utilization.
Bug: IMPALA-292
Resolution: Fixed
The Impala CREATE TABLE
command did not fill in the owner
and
tbl_type
columns in the Hive metastore database.
Bug: IMPALA-295
Resolution: Fixed. The metadata was made more Hive-compatible.
The impalad
instances in a cluster could halt when the statestored
process became unavailable.
Bug: IMPALA-312
Resolution: Fixed
A subquery would fail if the SELECT
statement inside it returned a constant value rather
than querying a table.
Bug: IMPALA-67
Resolution: Fixed
The result set from a right outer join query could include erroneous rows containing
NULL
values.
Bug: IMPALA-90
Resolution: Fixed
The Parquet scanner non-deterministically hangs when executing some queries.
Bug: IMPALA-204
Resolution: Fixed
When attempting to load metadata from an unsupported Hive table type (INDEX and VIEW tables), Impala fails with an unclear error message.
Bug: IMPALA-167
Resolution: Fixed in 0.7
Resolution: Fixed in 0.7
Resolution: Fixed in 0.7
Workaround: None
It is currently not possible to limit the memory consumption of a single query. All tables on the right hand side of JOIN statements need to be able to fit in memory. If they do not, Impala may crash due to out of memory errors.
Resolution: Fixed in 0.7
Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' clause and data is distributed across multiple nodes. From the query plan, it looks like we are just summing the results from each worker node.
Bug: IMPALA-20
Resolution: Fixed in 0.7
We currently cannot utilize a predicate like "country_code in ('DE', 'FR', 'US')" to do partitioning pruning, because that requires an equality predicate or a binary comparison.
We should create a superclass of planner.ValueRange, ValueSet, that can be constructed with an arbitrary predicate, and whose isInRange(analyzer, valueExpr) constructs a literal predicate by substitution of the valueExpr into the predicate.
Bug: IMPALA-144
Resolution: Fixed in 0.7
Impala reads the NameNode address and port as command line parameters rather than reading them from
core-site.xml
. Updating the NameNode address in the core-site.xml
file
does not propagate to Impala.
Severity: Low
Resolution: Fixed in 0.6 - Impala reads the namenode location and port from the Hadoop
configuration files, though setting -nn
and -nn_port
overrides this.
Users are advised not to set -nn
or -nn_port
.
Queries may fail on secure environment due to impalad
Kerberos tickets expiring. This
can happen if the Impala -kerberos_reinit_interval
flag is set to a value ten minutes or
less. This may lead to an impalad
requesting a ticket with a lifetime that is less than
the time to the next ticket renewal.
Bug: IMPALA-64
Resolution: Fixed in 0.6
Concurrent queries may fail when Impala is using Thrift to communicate with part of the Hive Metastore
such as the Hive Metastore Service. In such a case, the error get_fields failed: out of sequence
response"
may occur because Impala shared a single Hive Metastore Client connection across
threads. With Impala 0.6, a separate connection is used for each metadata request.
Bug: IMPALA-48
Resolution: Fixed in 0.6
Impala fails to start if it is unable to establish a connection with the Hive Metastore. This behavior was fixed, allowing Impala to start, even when no Metastore is available.
Bug: IMPALA-58
Resolution: Fixed in 0.6
In some queries (including "USE database" statements), database names are treated as case-sensitive. This may lead queries to fail with an IllegalStateException.
Bug: IMPALA-44
Resolution: Fixed in 0.6
Impala does not ignore hidden HDFS files, meaning those files prefixed with a period '.' or underscore '_'. This diverges from Hive/MapReduce, which skips these files.
Bug: IMPALA-18
Resolution: Fixed in 0.6
Impala may have reduced performance on tables that contain a large number of partitions. This is due to extra overhead reading/parsing the partition metadata.
Resolution: Fixed in 0.5
Backend impalads do not cache connections to the coordinator. On a secure cluster, this introduces a latency proportional to the number of backend clients involved in query execution, as the cost of establishing a secure connection is much higher than in the non-secure case.
Bug: IMPALA-38
Resolution: Fixed in 0.5
Concurrent queries may fail with error: "Table object has not been been initialised :
`PARTITIONS`"
. This was due to a lack of locking in the Impala table/database metadata cache.
Bug: IMPALA-30
Resolution: Fixed in 0.5
The Impala UNIX_TIMESTAMP(val, format) operation compares the length of format and val and returns NULL if they do not match. Hive instead effectively truncates val to the length of the format parameter.
Bug: IMPALA-15
Resolution: Fixed in 0.5
Impala is impacted by Hive bug
HIVE-3596
which may cause metastore refreshes to fail if a Hive temporary configuration file is deleted (normally
located at /tmp/hive-<user>-<tmp_number>.xml
). Additionally, the
impala-shell will incorrectly report that the failed metadata refresh completed successfully.
Anticipated Resolution: To be fixed in a future release
Workaround: Restart the impalad
service. Use the impalad
log to
check for metadata refresh errors.
The lpad/rpad builtin functions generate the wrong results.
Resolution: Fixed in 0.4
Compressed files with extensions incorrectly generate an exception.
Bug: IMPALA-14
Resolution: Fixed in 0.4
Some queries with large limits were hanging.
Resolution: Fixed in 0.4
Resolution: Fixed in 0.4
If Impala is unable to load the metadata for a table for any reason, a subsequent query referring to that
table will return an unknown table
error message, even if the table is known.
Resolution: Fixed in 0.3
After failing to load metadata for a table, Impala removes that table from the list of known tables
returned in SHOW TABLES
. Subsequent attempts to query the table returns 'unknown table',
even if the metadata for that table is fixed.
Resolution: Fixed in 0.3
Attempting to select from these tables fails.
Resolution: Fixed in 0.3
Queries that contain OUTER JOINs may not return the correct results if there are predicates referencing any of the joined tables in the WHERE clause.
Resolution: Fixed in 0.3.
Subqueries that contain an aggregate cannot be joined with another table or Impala may crash. For example:
SELECT * FROM (SELECT sum(col1) FROM some_table GROUP BY col1) t1 JOIN other_table ON (...);
Resolution: Fixed in 0.2
For example:
INSERT OVERWRITE TABLE test SELECT * FROM test2 LIMIT 1;
Resolution: Fixed in 0.2
For example:
SELECT * FROM test2 LIMIT 1;
Resolution: Fixed in 0.2
Attempting to read such files does not generate a diagnostic.
Resolution: Fixed in 0.2
When querying an HBase table whose row-key is string type, the Impala server may raise a null pointer exception.
Resolution: Fixed in 0.2