This setting controls the cutoff point (in terms of number of rows processed per Impala daemon) below which Impala disables native code generation for the whole query. Native code generation is very beneficial for queries that process many rows because it reduces the time taken to process of each row. However, generating the native code adds latency to query startup. Therefore, automatically disabling codegen for queries that process relatively small amounts of data can improve query response time.
Syntax:
SET DISABLE_CODEGEN_ROWS_THRESHOLD=number_of_rows
Type: numeric
Default: 50000
Usage notes: Typically, you increase the default value to make this optimization apply to more queries.
If incorrect or corrupted table and column statistics cause Impala to apply this optimization incorrectly to
queries that actually involve substantial work, you might see the queries being slower as a result of codegen
being disabled. In that case, recompute statistics with the COMPUTE STATS
or
COMPUTE INCREMENTAL STATS
statement. If there is a problem collecting accurate statistics,
you can turn this feature off by setting the value to 0.
Internal details:
This setting applies to queries where the number of rows processed can be accurately
determined, either through table and column statistics, or by the presence of a
LIMIT
clause. If Impala cannot accurately estimate the number of rows,
then this setting does not apply.
If a query uses the complex data types STRUCT
, ARRAY
,
or MAP
, then codegen is never automatically disabled regardless of the
DISABLE_CODEGEN_ROWS_THRESHOLD
setting.
Added in: Impala 2.10.0