Limit the number of nodes that process a query, typically during debugging.
Type: numeric
Allowed values: Only accepts the values 0 (meaning all nodes) or 1 (meaning all work is done on the coordinator node).
Default: 0
Usage notes:
Setting NUM_NODES
to 1 disables multithreading, i.e. if
MT_DOP
is greater than 1, it is effectively reduced to 1.
If you are diagnosing a problem that you suspect is due to a timing issue due to
distributed query processing, you can set NUM_NODES=1
to verify
if the problem still occurs when all the work is done on a single node.
You might set the NUM_NODES
option to 1 briefly, during
INSERT
or CREATE TABLE AS SELECT
statements. Normally,
those statements produce one or more data files per data node. If the write operation
involves small amounts of data, a Parquet table, and/or a partitioned table, the default
behavior could produce many small files when intuitively you might expect only a single
output file. SET NUM_NODES=1
turns off the "distributed" aspect of
the write operation, making it more likely to produce only one or a few data files.
Because this option results in increased resource utilization on a single host, it could cause problems due to contention with other Impala statements or high resource usage. Symptoms could include queries running slowly, exceeding the memory limit, or appearing to hang. Use it only in a single-user development/test environment; do not use it in a production environment or in a cluster with a high-concurrency or high-volume or performance-critical workload.