DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)

Turns off the "streaming preaggregation" optimization that is available in Impala 2.5 and higher. This optimization reduces unnecessary work performed by queries that perform aggregation operations on columns with few or no duplicate values, for example DISTINCT id_column or GROUP BY unique_column. If the optimization causes regressions in existing queries that use aggregation functions, you can turn it off as needed by setting this query option.

Type: Boolean; recognized values are 1 and 0, or true and false; any other value interpreted as false

Default: false (shown as 0 in output of SET statement)

Note: In Impala 2.5.0, only the value 1 enables the option, and the value true is not recognized. This limitation is tracked by the issue IMPALA-3334, which shows the releases where the problem is fixed.

Usage notes:

Typically, queries that would require enabling this option involve very large numbers of aggregated values, such as a billion or more distinct keys being processed on each worker node.

Added in: Impala 2.5.0