Impala 4.0 Release Notes
Breaking Changes
- Remove support for Hive 2.x
- Remove support for Impala-lzo
- Impala-lzo provides code to allow Impala to read the LZO compressed tables. LZO is GPL licensed, which is why this support is not included directly. The Impala-lzo code interacts with internal Impala code at a level that is error prone and intricate. Given the low adoption of LZO and the other compression options available, Impala removes Impala-lzo support along with the low level interface it used.
- Remove support for Sentry
- Starting from 4.0, Impala will only support Ranger in authorization.
- Set minimum CPU requirement to AVX for x86_64 (IMPALA-9690)
- Before 4.0, the minimum CPU requirement is SSSE3. Now we bump it to AVX. For machines that only support AVX but not AVX2, please use --enable_legacy_avx_support flag to launch Impala. We may bump the minimum CPU requirement to AVX2 in near future.
- Dropped support for dateless timestamps (IMPALA-9531)
- Add support for string concatenation operator using ||
- (IMPALA-452) Previously, "||" means "OR" for logical OR expressions. Now if the type of the left operand is STRING, "||" means "concat" for string concatenations.
- Default to not allow ordinals in HAVING clause (IMPALA-7844)
New Features
- Multi-threading (MT_DOP) support in all operators (IMPALA-3902)
- The MT_DOP query option sets the degree of parallelism used for certain operations that can benefit from multithreaded execution.
- Previously Impala only supported setting MT_DOP in queries that have only scans and aggregates. Now we can set MT_DOP for all kinds of queries.
- Denser (aggregated) runtime profile, i.e. profile-v2 (experimental, IMPALA-9378)
- Impala can produce a denser runtime profile which reduces a lot of cpu and mem consumption, especially for large clusters or queries with MT_DOP > 1.
- Enabled by setting --gen_experimental_profile=true.
- Support all TPC-DS 99 queries without manual rewrites
- Transparent Query Retry (IMPALA-9124)
- Queries that fail due to cluster membership changes can be transparently retried by the coordinator. Enabled by setting retry_failed_queries query option to true.
- Support sorting by Z-Order (IMPALA-8755)
- Support Async Codegen (IMPALA-5444)
- Read support on Hive full-ACID ORC table (IMPALA-9042)
- Builtin functions with Apache DataSketches (IMPALA-9593, IMPALA-10281, IMPALA-10439)
- Iceberg support (experimental, some syntax might change, IMPALA-10149)
- Support spilling to S3 (IMPALA-9867)
- Impala quickstart cluster with docker-compose (IMPALA-9793)
- aarch64 (ARM) support (IMPALA-9376)
- Support more storage engines: Ozone, GCS (Google Cloud Storage).
- Authentication & Authorization
- Support integration with Apache Knox
- Support SAML authentication
- FIPS Compliance
- More LDAP features
- Support Ranger row-filtering policies (IMPALA-9234)
- Support basic role-related statements with Ranger (IMPALA-10211)
- Support Kudu table ownership (IMPALA-9990)
- Extend 'compute incremental stats' to support a list of columns (IMPALA-10435)
- More ndv() features
- Extend the NDV function to accept a precision (IMPALA-2658)
- Adjust NDV's scale with query option DEFAULT_NDV_SCALE (IMPALA-10445)
- … (See more in change log)
Improvements
- Planner & Optimizer & Performance improvements
- Many improvements in metadata management
- Skip locked tables from topic updates (IMPALA-6671)
- Previously, long running operations on a locked table (refresh, recover partitions, compute stats) may block the topic update thread, which causes unrelated queries that are waiting on metadata updates to unnecessarily block. We now add a mechanism for topic-update thread to skip a table which is locked for more than a configurable interval.
- Partition-level metadata propagation (IMPALA-3127)
- In the legacy catalog mode, previously table updates are propagated in a new snapshot of the table metadata, now they are populated in partition level deltas. This helps to avoid OOM errors of hitting JVM array limit (2GB) in sending catalog updates. Note that DDL/DML responses still contain the whole snapshot of table metadata. This will be tracked in IMPALA-9937.
- In the LocalCatalog mode, previously when a table is updated, all the partitions in the coordinator’s local cache will be invalidated. Now only updated partitions will be invalidated.
- Slim down metastore Partition objects in LocalCatalog cache (IMPALA-7501)
- This helps to avoid OOM errors of hitting JVM array limit (2GB) in LocalCatalog mode. Also improve cache efficiency.
- Kudu integration
- Push bloom filters to Kudu scanners (IMPALA-3741)
- Support Kudu Timestamp and Date Bloom Filter (IMPALA-9691)
- Support DATE for min-max runtime filters (IMPALA-9294)
- Query with analytic functions doesn't need to materialize the predicates bounded to kudu (IMPALA-10406)
- Spilling improvements
- Codegen improvements
- ORC scanner improvements
- Data cache improvements
- Node Blacklisting - Coordinator will blacklist unhealthy nodes
- Observibility
- Expose current DDL metrics (grouped by type) in the Catalog web UI (IMPALA-6663)
- Expose JSON catalog objects in catalogd's debug page (IMPALA-10168)
- … (See more in change log)
Fixed Issues
See change log