AGG_MEM_CORRELATION_FACTOR Query Option (Impala 4.4 or higher only)
Default correlation factor between two or more grouping columns in aggregation node.
When grouping over multiple columns, the query planner will assume this value to reason
about how correlated the columns are. A value close to 1.0 means columns are highly
correlated, while 0.0 means no correlation. In popular RDBMS, this column correlation
can usually be measured by using CORR
function.
If both AGG_MEM_CORRELATION_FACTOR
and
LARGE_AGG_MEM_THRESHOLD
are set larger than 0, the planner will
switch memory estimation calculation for aggregation node from using NDV multiplication-based
algorithm to correlation-based memory estimation that should yield lower
estimate. Setting a high AGG_MEM_CORRELATION_FACTOR
will result in
lower memory estimation, but no less than
LARGE_AGG_MEM_THRESHOLD
. Setting a low value will result in higher
memory estimation, but will not exceed the default NDV multiplication-based
estimation.
Users can set this option value to 0.0 so the planner stays using the default NDV multiplication based-estimation.
Type: double
Default:
0.5
Added in: Impala 4.4
Related information:
LARGE_AGG_MEM_THRESHOLD Query Option (Impala 4.4 or higher only)