Impala 4.1 Release Notes
New Features
- Documentation on Iceberg support has been created, Apache Impala has been added to the list of query engines that support Apache Iceberg.
Apache Iceberg support includes:
- Reading/writing Iceberg V1 tables:
- (write support is only available for Iceberg tables with Parquet data files)
- (V2 tables are also readable if they don’t contain delete delta files)
- (AVRO and mixed file format tables are not yet supported)
- Support for all partition transforms with unified syntax with Hive
- Partition evolution
- Schema evolution
- Time travel function is available with FOR SYSTEM_TIME AS OF and FOR SYSTEM_VERSION AS OF clauses for Iceberg tables. FOR SYSTEM_TIME AS OF conforms to the SQL2011 standard (IMPALA-10840).
- Hive compatible UTF-8 support in string functions. Turned on by setting the query option UTF8_MODE=true (IMPALA-2019).
- Complex types enhancements:
- Support ALTER TABLE UNSET TBLPROPERTIES/SERDEPROPERTIES (IMPALA-5569).
- Support reading decimals from Parquet files with different precision/scale (IMPALA-7087, IMPALA-8131)
- Support table definition over a single file (IMPALA-10934, HIVE-25569)
- impala-shell can connect directly to HiveServer2 (IMPALA-10778)
- Support spilling to HDFS (IMPALA-10429)
Improvements
- Several improvements in Parquet support:
- Several improvements in ORC support:
See more in the Epic IMPALA-9040 (login to see all jiras linked to it).
- Catalog improvements:
- Batching of consecutive partition events (IMPALA-9857)
- Fine grained table refreshing in catalogd at partition level for transactional tables (IMPALA-10923).
- Improve metadata consistency and self events detection in catalogd (IMPALA-10926).
- Skip file metadata reloading in processing AlterPartition events in EventProcessor in catalogd (IMPALA-11050).
- Planner improvements:
- Improve inner join cardinality estimates (IMPALA-10681)
- Set selectivity of Not-equal (IMPALA-7560)
- Better selectivity for =,not distinct (IMPALA-10766)
- Improve analysis with inline views and thousands of columns (IMPALA-10799)
- Improve SingleNodePlan creation when hundreds of inline views are joined (IMPALA-10806)
- Reducing HashTable size by packing its buckets efficiently (IMPALA-7635).
- Improve TimestampValue to String casting (IMPALA-10984).
- ACID lock timeouts are now configurable (IMPALA-11153).
- Implementing adaptive 3-way quicksort in sorter. Improves quicksort performance when there is a large number of duplicates (IMPALA-10961).
Fixed Issues
See change log