NUM_NODES Query Option
Limit the number of nodes that process a query, typically during debugging.
Allowed values: Only accepts the values 0 (meaning all nodes) or 1 (meaning all work is done on the coordinator node).
If you are diagnosing a problem that you suspect is due to a timing issue due to
distributed query processing, you can set
NUM_NODES=1 to verify
if the problem still occurs when all the work is done on a single node.
You might set the
NUM_NODES option to 1 briefly, during
CREATE TABLE AS SELECT statements. Normally, those statements produce one or more data
files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a
partitioned table, the default behavior could produce many small files when intuitively you might expect
only a single output file.
SET NUM_NODES=1 turns off the "distributed" aspect of the
write operation, making it more likely to produce only one or a few data files.
Because this option results in increased resource utilization on a single host, it could cause problems due to contention with other Impala statements or high resource usage. Symptoms could include queries running slowly, exceeding the memory limit, or appearing to hang. Use it only in a single-user development/test environment; do not use it in a production environment or in a cluster with a high-concurrency or high-volume or performance-critical workload.