RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)

Size (in bytes) of Bloom filter data structure used by the runtime filtering feature.

Important:

In Impala 2.6 and higher, this query option only applies as a fallback, when statistics are not available. By default, Impala estimates the optimal size of the Bloom filter structure regardless of the setting for this option. (This is a change from the original behavior in Impala 2.5.)

In Impala 2.6 and higher, when the value of this query option is used for query planning, it is constrained by the minimum and maximum sizes specified by the RUNTIME_FILTER_MIN_SIZE and RUNTIME_FILTER_MAX_SIZE query options. The filter size is adjusted upward or downward if necessary to fit within the minimum/maximum range.

Type: integer

Default: 1048576 (1 MB)

Maximum: 16 MB

Added in: Impala 2.5.0

Usage notes:

This setting affects optimizations for large and complex queries, such as dynamic partition pruning for partitioned tables, and join optimization for queries that join large tables. Larger filters are more effective at handling higher cardinality input sets, but consume more memory per filter.

If your query filters on high-cardinality columns (for example, millions of different values) and you do not get the expected speedup from the runtime filtering mechanism, consider doing some benchmarks with a higher value for RUNTIME_BLOOM_FILTER_SIZE. The extra memory devoted to the Bloom filter data structures can help make the filtering more accurate.

Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.

Because the effectiveness of this setting depends so much on query characteristics and data distribution, you typically only use it for specific queries that need some extra tuning, and the ideal value depends on the query. Consider setting this query option immediately before the expensive query and unsetting it immediately afterward.

Kudu considerations:

This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.

Related information:

Runtime Filtering for Impala Queries (Impala 2.5 or higher only), RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only), RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only), RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)