Impala 4.2.0 Change Log
Release Notes - IMPALA - Version Impala 4.2.0
New Feature
Improvement
- [IMPALA-886] - Always display HBase cols in same order as CREATE TABLE statement
- [IMPALA-8011] - Allow filtering on virtual column for file name
- [IMPALA-8373] - impala-shell should be tested with Python 3
- [IMPALA-8592] - Add support for insert events for 'LOAD DATA..' statements from Impala.
- [IMPALA-9615] - Make re2's max_mem option configurable via an Impala startup flag.
- [IMPALA-9670] - Fix unloaded views are shown as tables for GET_TABLES requests
- [IMPALA-9718] - Remove pkg_resources.py from Impala/shell
- [IMPALA-10199] - Need to update bootstrap_toolchain.py with the 20.04 version of toolchain file when available
- [IMPALA-10436] - Investigate the need for granting ALL privilege on server when creating an external Kudu table
- [IMPALA-10453] - Support file/partition pruning via runtime filters on Iceberg
- [IMPALA-10465] - Improve Kudu DML error logging in Impala
- [IMPALA-10545] - Tune data_cache_write_concurrency based on the type of IO device
- [IMPALA-10610] - Support multiple file formats in a single Iceberg Table
- [IMPALA-10791] - Add Support of Batch Reading for Spilling to Remote FS
- [IMPALA-10794] - Improve observability of execDdl code paths
- [IMPALA-10927] - TestFetchAndSpooling.test_rows_sent_counters is flaky in core-s3 based test
- [IMPALA-11207] - use hadoop-cloud-storage as the cloud store connector dependency
- [IMPALA-11240] - Revisit the default value for ssl_cipher_list to eliminate insecure ciphers
- [IMPALA-11243] - Improve predicate pushdown to Iceberg
- [IMPALA-11275] - Dump thread debug information when crashing/generating a minidump
- [IMPALA-11279] - Optimize count(*) queries for Iceberg tables
- [IMPALA-11286] - Implement impala write 'values_counts' to Iceberg meta
- [IMPALA-11294] - Remove unnecessary workarounds from dictionary runtime filter tests
- [IMPALA-11314] - Add automated tests for PyPi impala-shell to precommit
- [IMPALA-11369] - Have separate thrift compiler for different component
- [IMPALA-11370] - Run make with --load-average for large toolchain components
- [IMPALA-11382] - Produce audit log entries corresponding to unauthorized SELECT operation on non-existing tables
- [IMPALA-11384] - Upgrade CPP thrift components to thrift-0.16.0
- [IMPALA-11385] - Upgrade JAVA thrift components to thrift-0.16.0
- [IMPALA-11389] - Produce Python 3-compatible eggs in shell tarball
- [IMPALA-11398] - Upgrade flake8 for indent-size support
- [IMPALA-11401] - Catalogd should log the table names causing OOM on array limit
- [IMPALA-11426] - quotes not paired in doc 'impala_alter_table'
- [IMPALA-11430] - Support kudu custom hash partitions at the range level
- [IMPALA-11439] - Add an environment variable to control bootstrap_system.sh whether to prepopulate the m2 directory or not
- [IMPALA-11440] - Remove the unnecessary installation of Apache Ant in bootstrap_system.sh
- [IMPALA-11450] - Support building on Centos 8 alternatives
- [IMPALA-11454] - Reduce the kudu package's size in toolchain
- [IMPALA-11458] - Update to newer zlib/zstd
- [IMPALA-11469] - Ignore _spark_metadata folder in table location
- [IMPALA-11474] - Codegen Tuple size in Sorter
- [IMPALA-11490] - More metrics to debug event processing lagging behind
- [IMPALA-11504] - Speed up Decimal16Value division by specializing DecimalUtil::GetScaleMultiplier<int256_t>()
- [IMPALA-11511] - Provide an option to build with compressed debug info
- [IMPALA-11540] - Add warning logs for slow ALTER_TABLE event processing
- [IMPALA-11543] - JniUtil should print exception message even if throwable_to_stack_trace_id fails
- [IMPALA-11569] - Run finalize.sh in bin/jenkins/all-tests.sh even if dataload fails
- [IMPALA-11570] - bin/jenkins/finalize.sh should tolerate errors in subcommands like dmesg
- [IMPALA-11591] - Avoid calling planFiles() on Iceberg tables when there are no predicates
- [IMPALA-11621] - testdata/bin/kill-hive-server.sh should remove the HiveServer2 pid file
- [IMPALA-11628] - Investigate replacing log4j with reload4j
- [IMPALA-11634] - Add ability to produce Docker images with Java 11
- [IMPALA-11695] - Exclude some useless warnings from the Clang Tidy build
- [IMPALA-11702] - Expose more Kudu Scanner metrics for KuduScanNode
Bug
- [IMPALA-5845] - Impala should de-duplicate row parsing error
- [IMPALA-7864] - TestLocalCatalogRetries::test_replan_limit is flaky
- [IMPALA-9726] - Update boilerplate in the PyPI sidebar for impala-shell supported versions
- [IMPALA-9823] - use_local_catalog and related flags shouldn't be hidden
- [IMPALA-10057] - TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST
- [IMPALA-10069] - Cipher-specific BE tests fail on Ubuntu 18.04
- [IMPALA-10148] - test_query_event_hooks.py's TestHooksStartupFail generates a core dump
- [IMPALA-10267] - Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
- [IMPALA-10356] - Analyzed query in explain plan is not quite right for insert with values clause
- [IMPALA-10375] - Lock down which filesystem types use the file handle cache
- [IMPALA-10660] - Impala shell prints DOUBLEs with less precision in HS2 than beeswax
- [IMPALA-10699] - Apply patch to libev to support compiling with C++17
- [IMPALA-10715] - test_decimal_min_max_filters failed in exhaustive run
- [IMPALA-10865] - Query with select alias the same as column name rewrite error for grouping exprs
- [IMPALA-10895] - TestQueryRetries.test_retrying_query_cancel is flaky
- [IMPALA-10962] - Toolchain Python loses readline functionality on Ubuntu 20.04
- [IMPALA-11034] - Resolve schema of old data files in migrated Iceberg tables
- [IMPALA-11160] - TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests
- [IMPALA-11183] - run-all-tests.sh fails to run tests in multiple iterations
- [IMPALA-11208] - CollectionItemsRead profile counter might be wrong in ORC scanner
- [IMPALA-11234] - impalad keeps reporting ShortCircuitCache slot release failures in heavy workload
- [IMPALA-11244] - Docker-based test failure for DiskIoMgrTest BE test
- [IMPALA-11249] - Precommits are not running strict_hs2_protocol=True configurations for shell tests
- [IMPALA-11250] - Webserver.TestWithSpnego fails on Ubuntu 20
- [IMPALA-11251] - TestImpalaShellInteractive.test_unicode_input fails on Ubuntu 20
- [IMPALA-11268] - Allow STORED BY and STORED AS as well
- [IMPALA-11274] - CNF Rewrite causes a regress in join node performance
- [IMPALA-11280] - Zipping unnest hits DCHECK when querying from a view that has an IN operator
- [IMPALA-11281] - Consider loading the table metadata for a ResetMetadataStmt
- [IMPALA-11287] - Implement CREATE TABLE LIKE for Iceberg tables
- [IMPALA-11291] - minidump-test is flaky
- [IMPALA-11295] - TestParquet.test_multiple_blocks_mt_dop failed by unexpected ranges_complete_list
- [IMPALA-11301] - Extreme cardinality estimations if NDV=1
- [IMPALA-11302] - Improve error message for CREATE EXTERNAL TABLE iceberg command
- [IMPALA-11303] - Exception is not raised for Iceberg DDL that misses LOCATION clause
- [IMPALA-11305] - TypeError in impala-shell summary progress
- [IMPALA-11306] - single_node_perf_run.py fail to load dataset if scale factor is 1
- [IMPALA-11310] - [Doc] the code's tag didn't closed in impala_admission_config.xml
- [IMPALA-11311] - debug_noopt build option uses be/build/release
- [IMPALA-11313] - impala-shell's PyPi form factor still suffers from IMPALA-10299
- [IMPALA-11315] - TestImpalaShellInteractive.test_multiline_queries_in_history fails with python3
- [IMPALA-11316] - TestImpalaShell.test_http_socket_timeout fails on Python3 with different message
- [IMPALA-11317] - TestImpalaShellInteractive.test_http_interactions_extra fails on Python 3
- [IMPALA-11320] - SHOW PARTITIONS on Iceberg table doesn't list the partitions
- [IMPALA-11323] - Invalid inferred predicates based on casted null values being equivalent
- [IMPALA-11324] - TestRPCTimeout::test_reportexecstatus_retries broken
- [IMPALA-11325] - Impala-shell hits UnicodeDecodeError when outputting Unicode via --output_file
- [IMPALA-11332] - impala-shell strips trailing whitespace from csv output
- [IMPALA-11334] - Include partition transform information in DESCRIBE FORMATTED for Iceberg tables
- [IMPALA-11335] - WriteId must be requested before taking locks during inserts
- [IMPALA-11337] - impala-shell can output the "Fetched X row(s)" line before the actual results
- [IMPALA-11342] - Hive UDFs are unable to use classes from the same Jar in catalogd
- [IMPALA-11343] - impala-shell --ssl fails in PyPI install
- [IMPALA-11344] - Selecting only the missing fields of ORC files should return NULLs
- [IMPALA-11345] - Query failed when creating equal conjunction map for Parquet bloom filter
- [IMPALA-11346] - Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
- [IMPALA-11354] - load_data.py can't force reload views
- [IMPALA-11358] - Kudu tables missing comment field if HMS-Kudu integration is enabled
- [IMPALA-11365] - Dereferencing null pointer in TopNNode
- [IMPALA-11366] - Impala-shell failed to build if the build date contains Non-ASCII characters
- [IMPALA-11367] - There should be LINE_DELIM above '# Partition Transform Information'
- [IMPALA-11368] - Iceberg time-travel error message shows timestamp in UTC
- [IMPALA-11379] - perf-AB-test Jenkins job fails with "Working copy is dirty"
- [IMPALA-11380] - impala-shell's VerticalOutputFormatter may incorrectly strip trailing whitespace
- [IMPALA-11391] - TestKuduHMSIntegration.test_drop_managed_kudu_table fails sometimes due to race condition
- [IMPALA-11403] - CMake warning triggered by a sed regexp in common/thrift
- [IMPALA-11406] - Incorrect duration logged in "Authorization check took n ms"
- [IMPALA-11407] - Update google-oauth-client to address CVE-2021-22573
- [IMPALA-11408] - ERROR IllegalStateException when INSERT INTO partitioned Iceberg table
- [IMPALA-11412] - CodegenFnPtr<FuncType>::store() has a compile time error when instantiated
- [IMPALA-11414] - Off-by-one error in Parquet late materialization
- [IMPALA-11415] - Add run-step-wait-all after loading Kudu data
- [IMPALA-11416] - SlotRef::tuple_is_nullable_ uninitialised for struct children
- [IMPALA-11419] - Incremental build is broken
- [IMPALA-11429] - Newly created Iceberg tables are owned by the user of the CatalogD process
- [IMPALA-11433] - Remove misleading bucketing info from DESCRIBE FORMATTED output for Iceberg tables
- [IMPALA-11434] - More than 1 2d arrays in select list causes analysis error
- [IMPALA-11441] - test_kudu_upsert failed with Kudu Illegal State error
- [IMPALA-11443] - Possible overflow in SortNode.java
- [IMPALA-11445] - Fix the issue for partitions in different file systems
- [IMPALA-11447] - Fetching arrays/structs crashes Impala if result caching is enabled
- [IMPALA-11451] - TestKrpcSocket fails if netstat is not installed
- [IMPALA-11457] - Ozone parallelism reduced when backends are co-located
- [IMPALA-11462] - shiftleft problem
- [IMPALA-11464] - hasNext() throws FileNotFoundException on staging files and breaks file metadata loading
- [IMPALA-11489] - Async IO cannot handle >2GB ORC files
- [IMPALA-11492] - ExprTest.Utf8MaskTest fails when en_US.UTF-8 is not present
- [IMPALA-11493] - run-all-tests.sh needs to start Impala for first run of custom cluster tests
- [IMPALA-11494] - Ranger audit log entries generated for authorized query against non-existing tables
- [IMPALA-11500] - Impalad crashed in the ParquetBoolDecoder::SkipValues when num_values is 0
- [IMPALA-11505] - TestKuduTransaction.test_kudu_txn_abort_partition_lock fails as exception message is unexpected
- [IMPALA-11507] - Impala cannot read Iceberg tables where DataFile is not under 'table location'
- [IMPALA-11508] - Iceberg test test_expire_snapshots is flaky
- [IMPALA-11514] - Workaround s3 timeout waiting for connection from pool (HADOOP-18410)
- [IMPALA-11515] - Adding --load-average to native-toolchain's Kudu compilation broke ARM jobs
- [IMPALA-11523] - TestImpalaShell.test_http_socket_timeout fails for docker-based test runs
- [IMPALA-11526] - Install en_US.UTF-8 locale in docker builds
- [IMPALA-11528] - hive-exec.pom doesn't include UDAF class
- [IMPALA-11539] - Mitigate intra-node skew of HDFS scans with MT_DOP
- [IMPALA-11548] - TTransportException can be thrown for RPC connection that is open for long time
- [IMPALA-11551] - Bump CDP_BUILD_NUMBER to use Iceberg 0.14
- [IMPALA-11557] - Memory leak in BlockingRowBatchQueue
- [IMPALA-11558] - Looks like memory leak when select from kudu table concurrently
- [IMPALA-11567] - Error in left outer join if the right side is a subquery with complex types
- [IMPALA-11576] - query_test.test_iceberg.test_multiple_storage_locations fails on S3
- [IMPALA-11578] - test_scheduler_locality.test_local_assignment times out on S3 and EC builds
- [IMPALA-11580] - Memory leak in legacy catalog mode when applying incremental partition updates
- [IMPALA-11582] - Implement table sampling for Iceberg tables
- [IMPALA-11585] - Docker quickstart client fails to build on Ubuntu 20.04
- [IMPALA-11594] - TestIcebergTable.test_create_table_like_parquet fails in non-HDFS build
- [IMPALA-11599] - GCC 10 toolchain's gdb won't run on older distributions
- [IMPALA-11605] - IMPALA-9999's upgrade of flatbuffers to 1.12 causes a conflict with Hive
- [IMPALA-11608] - Impala SHOW TABLE STATS shows wrong number of files for Iceberg tables
- [IMPALA-11610] - Dockerized tests don't respect Jenkins parameters / environment variables
- [IMPALA-11611] - fe/pom.xml still has reference to THRIFT_HOME
- [IMPALA-11614] - TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone
- [IMPALA-11630] - ERROR: NoSuchMethodError when create table on Tencent COS FileSystem
- [IMPALA-11631] - Impala crashes in impala::TopNNode::Heap::Close()
- [IMPALA-11640] - Build fails on Ubuntu 18/20 when using shared libraries
- [IMPALA-11644] - updateLatestEventId should handle cases of empty events
- [IMPALA-11646] - IMPALA-11562 seems to break test_unsupported_text_compression in s3 builds
- [IMPALA-11648] - validate-java-pom-versions.sh should skip pom.xml in toolchain
- [IMPALA-11657] - build-all-flag-combinations.sh should tolerate git-reset failures
- [IMPALA-11671] - run-all-tests.sh with Ozone fails listing DFS files
- [IMPALA-11674] - Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0
- [IMPALA-11699] - Some FE tests failed by NullPointerException in FileSystemUtil
- [IMPALA-11703] - Make sure the /var/tmp directory has the sticky bit set for Docker builds
- [IMPALA-11704] - Remote Ozone scans are slow even after data cache warmup
- [IMPALA-11706] - Precommit jobs stop after 10 pytest failures
- [IMPALA-11707] - Wrong results when global runtime IN-list filters are applied
- [IMPALA-11711] - Wrong results, when query the iceberg v2 table (the first column is the partition column of bool type)
- [IMPALA-11714] - resolve_minidumps.py doesn't work on Ubuntu 16
- [IMPALA-11716] - bin/coverage_helper.sh doesn't work
- [IMPALA-11720] - FileMetadataLoaderTest is flaky due to IOException: Filesystem closed
- [IMPALA-11721] - Impala query keep being retried over frequently updated iceberg table
- [IMPALA-11722] - Wrong error message when unsupported complex type comes from * expression
- [IMPALA-11724] - Ozone tests fail after CDP build number update
- [IMPALA-11737] - impala-shell does not work with Python 3.10
- [IMPALA-11740] - Incorrect results for partitioned Iceberg V2 tables when runtime filters are applied
- [IMPALA-11741] - Impala docker builds should verify that 'hostname' is installed
Task
- [IMPALA-11257] - Fix CMake warnings for module names and cmake_minimum_required
- [IMPALA-11269] - Consolidate Ranger audit logs for the same table
- [IMPALA-11338] - Update Impala version to 4.2.0-SNAPSHOT
- [IMPALA-11341] - Print error log files when data-loading fails
- [IMPALA-11394] - Upgrade jackson-databind to version 2.12.6.1 or above
- [IMPALA-11456] - Refactor SkipIf logic for filesystems
- [IMPALA-11465] - Bump CDP_BUILD_NUMBER to 30010248
- [IMPALA-11468] - Port "Block Bloom filter false positive rate correction" test fix from Kudu
- [IMPALA-11471] - Track and limit disk space used by build-all-flag-combinations.sh
- [IMPALA-11472] - custom_cluster/test_client_ssl.py takes over an hour due to many combinations
- [IMPALA-11480] - Retain all the YARN container log files in a Jenkins run
- [IMPALA-11513] - Upgrade postgresql package to 42.4.1 or higher
- [IMPALA-11524] - Remove workaround when HADOOP-18410 is fixed
- [IMPALA-11554] - Bump IMPALA_ORC_JAVA_VERSION to 1.7.6
- [IMPALA-11639] - Upgrade Spring Framework to 5.3.20 due to multiple CVEs
- [IMPALA-11667] - Clean up POMs using dependencyManagement
- [IMPALA-11669] - Make Thrift max message size configuration
- [IMPALA-11670] - Upgrade components for CVEs, make it easier to override versions
- [IMPALA-11673] - Exclude spring-jcl from the classpath
Sub-task
- [IMPALA-3119] - DDL support for bucketed tables
- [IMPALA-6684] - Large payload going through exchange results in lots of untracked memory in the form of a string
- [IMPALA-7092] - Re-enable EC tests broken by HDFS-13539
- [IMPALA-7098] - Re-enable blocksize-related tests under EC
- [IMPALA-9442] - Add Ozone to minicluster
- [IMPALA-9448] - Test coverage for Ozone Transparent Data Encryption
- [IMPALA-9488] - Add EC metrics to impala-server
- [IMPALA-10213] - Handle block location for Ozone
- [IMPALA-10214] - Ozone support for file handle cache
- [IMPALA-11283] - push down IS_NULL predicate to iceberg
- [IMPALA-11289] - push down compound predicates to iceberg
- [IMPALA-11377] - Handle concurrent Iceberg INSERT OVERWRITEs
- [IMPALA-11378] - Allow INSERT OVERWRITE into bucket partition transform in some cases
- [IMPALA-11446] - push down NOT_IN predicate to iceberg
- [IMPALA-11562] - Remove o3fs default filesystem support
- [IMPALA-11697] - Enable SkipIf tests specific to HDFS for Ozone
- [IMPALA-11709] - Bump Arrow version to 9.0.0-p2 for decimal support
Test
- [IMPALA-11371] - Increase password complexity of users to be added in Ranger
- [IMPALA-11438] - Add tests for CREATE TABLE LIKE PARQUET STORED AS ICEBERG
- [IMPALA-11680] - test_krpc_datastream_sender_shuffle failed in erasure-coding environment
Documentation