AGG_MEM_CORRELATION_FACTOR Query Option (Impala 4.4 or higher only)

Default correlation factor between two or more grouping columns in aggregation node. When grouping over multiple columns, the query planner will assume this value to reason about how correlated the columns are. A value close to 1.0 means columns are highly correlated, while 0.0 means no correlation. In popular RDBMS, this column correlation can usually be measured by using CORR function.

If both AGG_MEM_CORRELATION_FACTOR and LARGE_AGG_MEM_THRESHOLD are set larger than 0, the planner will switch memory estimation calculation for aggregation node from using NDV multiplication-based algorithm to correlation-based memory estimation that should yield lower estimate. Setting a high AGG_MEM_CORRELATION_FACTOR will result in lower memory estimation, but no less than LARGE_AGG_MEM_THRESHOLD. Setting a low value will result in higher memory estimation, but will not exceed the default NDV multiplication-based estimation.

Users can set this option value to 0.0 so the planner stays using the default NDV multiplication based-estimation.

Type: double

Default:

0.5

Added in: Impala 4.4

Related information:

LARGE_AGG_MEM_THRESHOLD Query Option (Impala 4.4 or higher only)