The MEM_LIMIT query option defines the maximum amount of memory a query can allocate on
each node. The total memory that can be used by a query is the MEM_LIMIT
times the number of nodes.
There are two levels of memory limit for Impala. The
‑‑mem_limit
startup option sets an overall limit for the
impalad process (which handles multiple queries concurrently). That
process memory limit can be expressed either as a percentage of RAM available to the
process such as ‑‑mem_limit=70%
or as a fixed amount of
memory, such as 100gb
. The memory available to the process is based on
the host's physical memory and, since Impala 3.2, memory limits from Linux Control Groups.
E.g. if an impalad process is running in a Docker container on a host
with 100GB of memory, the memory available is 100GB or the Docker container's memory
limit, whichever is less.
The MEM_LIMIT
query option, which you set through
impala-shell or the SET
statement in a JDBC or ODBC
application, applies to each individual query. The MEM_LIMIT
query option
is usually expressed as a fixed size such as 10gb
, and must always be
less than the impalad memory limit.
If query processing approaches the specified memory limit on any node, either the per-query limit or the impalad limit, then the SQL operations will start to reduce their memory consumption, for example by writing the temporary data to disk (known as spilling to disk). The result is a query that completes successfully, rather than failing with an out-of-memory error. The tradeoff is decreased performance due to the extra disk I/O to write the temporary data and read it back in. The slowdown could potentially be significant. Thus, while this feature improves reliability, you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence.
Type: numeric
Units: A numeric argument represents memory size in bytes; you can also use a
suffix of m
or mb
for megabytes, or more commonly
g
or gb
for gigabytes. If you specify a value with
unrecognized formats, subsequent queries fail with an error.
Default: 0 (unlimited)
Usage notes:
The MEM_LIMIT
setting is primarily useful for production workloads.
Impala's Admission Controller can be configured to automatically assign memory limits to
queries and limit memory consumption of resource pools. See Concurrent Queries and Admission Control
and Memory Limits and Admission Control for more information on configuring
the resource usage through admission control.
Use the output of the SUMMARY
command in impala-shell
to get a report of memory used for each phase of your most heavyweight queries on each
node, and then set a MEM_LIMIT
somewhat higher than that. See
Using the SUMMARY Report for Performance Tuning for usage information about the
SUMMARY
command.
Examples:
The following examples show how to set the MEM_LIMIT
query option using a
fixed number of bytes, or suffixes representing gigabytes or megabytes.
[localhost:21000] > set mem_limit=3000000000;
MEM_LIMIT set to 3000000000
[localhost:21000] > select 5;
Query: select 5
+---+
| 5 |
+---+
| 5 |
+---+
[localhost:21000] > set mem_limit=3g;
MEM_LIMIT set to 3g
[localhost:21000] > select 5;
Query: select 5
+---+
| 5 |
+---+
| 5 |
+---+
[localhost:21000] > set mem_limit=3gb;
MEM_LIMIT set to 3gb
[localhost:21000] > select 5;
+---+
| 5 |
+---+
| 5 |
+---+
[localhost:21000] > set mem_limit=3m;
MEM_LIMIT set to 3m
[localhost:21000] > select 5;
+---+
| 5 |
+---+
| 5 |
+---+
[localhost:21000] > set mem_limit=3mb;
MEM_LIMIT set to 3mb
[localhost:21000] > select 5;
+---+
| 5 |
+---+
The following examples show how unrecognized MEM_LIMIT
values lead to
errors for subsequent queries.
[localhost:21000] > set mem_limit=3pb;
MEM_LIMIT set to 3pb
[localhost:21000] > select 5;
ERROR: Failed to parse query memory limit from '3pb'.
[localhost:21000] > set mem_limit=xyz;
MEM_LIMIT set to xyz
[localhost:21000] > select 5;
Query: select 5
ERROR: Failed to parse query memory limit from 'xyz'.
The following examples shows the automatic query cancellation when the
MEM_LIMIT
value is exceeded on any host involved in the Impala query.
First it runs a successful query and checks the largest amount of memory used on any node
for any stage of the query. Then it sets an artificially low MEM_LIMIT
setting so that the same query cannot run.
[localhost:21000] > select count(*) from customer;
Query: select count(*) from customer
+----------+
| count(*) |
+----------+
| 150000 |
+----------+
[localhost:21000] > select count(distinct c_name) from customer;
Query: select count(distinct c_name) from customer
+------------------------+
| count(distinct c_name) |
+------------------------+
| 150000 |
+------------------------+
[localhost:21000] > summary;
+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
| 06:AGGREGATE | 1 | 1 | 230.00ms | 230.00ms | 1 | 1 | 16.00 KB | -1 B | FINALIZE |
| 05:EXCHANGE | 1 | 1 | 43.44us | 43.44us | 1 | 1 | 0 B | -1 B | UNPARTITIONED |
| 02:AGGREGATE | 1 | 1 | 227.14ms | 227.14ms | 1 | 1 | 12.00 KB | 10.00 MB | |
| 04:AGGREGATE | 1 | 1 | 126.27ms | 126.27ms | 150.00K | 150.00K | 15.17 MB | 10.00 MB | |
| 03:EXCHANGE | 1 | 1 | 44.07ms | 44.07ms | 150.00K | 150.00K | 0 B | 0 B | HASH(c_name) |
| 01:AGGREGATE | 1 | 1 | 361.94ms | 361.94ms | 150.00K | 150.00K | 23.04 MB | 10.00 MB | |
| 00:SCAN HDFS | 1 | 1 | 43.64ms | 43.64ms | 150.00K | 150.00K | 24.19 MB | 64.00 MB | tpch.customer |
+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
[localhost:21000] > set mem_limit=15mb;
MEM_LIMIT set to 15mb
[localhost:21000] > select count(distinct c_name) from customer;
Query: select count(distinct c_name) from customer
ERROR:
Rejected query from pool default-pool: minimum memory reservation is greater than memory available to the query
for buffer reservations. Memory reservation needed given the current plan: 38.00 MB. Adjust either the mem_limit
or the pool config (max-query-mem-limit, min-query-mem-limit) for the query to allow the query memory limit to be
at least 70.00 MB. Note that changing the mem_limit may also change the plan. See the query profile for more
information about the per-node memory requirements.
The existing MEM_LIMIT
query option applies to all impala coordinators and
executors. This means that the same amount of memory gets reserved but coordinators
typically just do the job of coordinating the query and thus do not necessarily need all the
estimated memory. Blocking the estimated memory on coordinators blocks the memory to be used
for other queries.
The new MEM_LIMIT_EXECUTORS
query option functions similarly to the
MEM_LIMIT
option but sets the query memory limit only on executors. This
new option addresses the issue related to MEM_LIMIT
and is recommended in
scenarios where the query needs much higher memory on executors compared with
coordinators.
Note that the MEM_LIMIT_EXECUTORS
option does not work with
MEM_LIMIT
. If you set both, only MEM_LIMIT
applies.
The existing MEM_LIMIT
query option applies to all impala coordinators and
executors. This means that the same amount of memory gets reserved but coordinators
typically just do the job of coordinating the query and thus do not necessarily need all the
estimated memory. Blocking the estimated memory on coordinators blocks the memory to be used
for other queries.
The new MEM_LIMIT_COORDINATORS
query option functions similarly to the
MEM_LIMIT
option but sets the query memory limit only on coordinators. This
new option addresses the issue related to MEM_LIMIT
and is recommended in
scenarios where the query needs higher or lower memory on coordinators compared to the planner
estimates.
Note that the MEM_LIMIT_COORDINATORS
option does not work with
MEM_LIMIT
. If you set both, only MEM_LIMIT
applies.