RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)
Size (in bytes) of Bloom filter data structure used by the runtime filtering feature.
In Impala 2.6 and higher, this query option only applies as a fallback, when statistics are not available. By default, Impala estimates the optimal size of the Bloom filter structure regardless of the setting for this option. (This is a change from the original behavior in Impala 2.5.)
In Impala 2.6 and higher, when the value of this query option is used for query planning,
it is constrained by the minimum and maximum sizes specified by the
RUNTIME_FILTER_MAX_SIZE query options.
The filter size is adjusted upward or downward if necessary to fit within the minimum/maximum range.
Default: 1048576 (1 MB)
Maximum: 16 MB
Added in: Impala 2.5.0
This setting affects optimizations for large and complex queries, such as dynamic partition pruning for partitioned tables, and join optimization for queries that join large tables. Larger filters are more effective at handling higher cardinality input sets, but consume more memory per filter.
If your query filters on high-cardinality columns (for example, millions of different values)
and you do not get the expected speedup from the runtime filtering mechanism, consider
doing some benchmarks with a higher value for
The extra memory devoted to the Bloom filter data structures can help make the filtering
Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.
Because the effectiveness of this setting depends so much on query characteristics and data distribution, you typically only use it for specific queries that need some extra tuning, and the ideal value depends on the query. Consider setting this query option immediately before the expensive query and unsetting it immediately afterward.
This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.
Runtime Filtering for Impala Queries (Impala 2.5 or higher only), RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only), RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only), RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)