Modifying Impala Startup Options
The configuration options for the Impala-related daemons let you choose which hosts and ports to use for the services that run on a single host, specify directories for logging, control resource usage and security, and specify other aspects of the Impala software.
Configuring Impala Startup Options through the Command Line
The Impala server, statestore, and catalog services start up using values provided in a defaults file, /etc/default/impala.
This file includes information about many resources used by Impala. Most of the defaults
included in this file should be effective in most cases. For example, typically you
would not change the definition of the CLASSPATH
variable, but you
would always set the address used by the statestore server. Some of the content you
might modify includes:
IMPALA_STATE_STORE_HOST=127.0.0.1
IMPALA_STATE_STORE_PORT=24000
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=/var/log/impala
IMPALA_CATALOG_SERVICE_HOST=...
IMPALA_STATE_STORE_HOST=...
export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}}
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}"
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}
To use alternate values, edit the defaults file, then restart all the Impala-related services so that the changes take effect. Restart the Impala server using the following commands:
$ sudo service impala-server restart
Stopping Impala Server: [ OK ]
Starting Impala Server: [ OK ]
Restart the Impala statestore using the following commands:
$ sudo service impala-state-store restart
Stopping Impala State Store Server: [ OK ]
Starting Impala State Store Server: [ OK ]
Restart the Impala catalog service using the following commands:
$ sudo service impala-catalog restart
Stopping Impala Catalog Server: [ OK ]
Starting Impala Catalog Server: [ OK ]
Some common settings to change include:
-
Statestore address. Where practical, put the statestore on a separate host not running the impalad daemon. In that recommended configuration, the impalad daemon cannot refer to the statestore server using the loopback address. If the statestore is hosted on a machine with an IP address of 192.168.0.27, change:
IMPALA_STATE_STORE_HOST=127.0.0.1
to:
IMPALA_STATE_STORE_HOST=192.168.0.27
-
Catalog server address (including both the hostname and the port number). Update the value of the
IMPALA_CATALOG_SERVICE_HOST
variable. Where practical, run the catalog server on the same host as the statestore. In that recommended configuration, the impalad daemon cannot refer to the catalog server using the loopback address. If the catalog service is hosted on a machine with an IP address of 192.168.0.27, add the following line:IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000
The /etc/default/impala defaults file currently does not define an
IMPALA_CATALOG_ARGS
environment variable, but if you add one it will be recognized by the service startup/shutdown script. Add a definition for this variable to /etc/default/impala and add the option-catalog_service_host=hostname
. If the port is different than the default 26000, also add the option-catalog_service_port=port
. -
Memory limits. You can limit the amount of memory available to Impala. For example, to allow Impala to use no more than 70% of system memory, change:
export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \ -log_dir=${IMPALA_LOG_DIR} \ -state_store_port=${IMPALA_STATE_STORE_PORT} \ -state_store_host=${IMPALA_STATE_STORE_HOST} \ -be_port=${IMPALA_BACKEND_PORT}}
to:
export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \ -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \ -state_store_host=${IMPALA_STATE_STORE_HOST} \ -be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}
You can specify the memory limit using absolute notation such as
500m
or2G
, or as a percentage of physical memory such as60%
.Note: Queries that exceed the specified memory limit are aborted. Percentage limits are based on the physical memory of the machine and do not consider cgroups. -
Core dump enablement. To enable core dumps, change:
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}
to:
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}
Note:-
The location of core dump files may vary according to your operating system configuration.
-
Other security settings may prevent Impala from writing core dumps even when this option is enabled.
-
-
Authorization using the open source Sentry plugin. Specify the
-server_name
and-authorization_policy_file
options as part of theIMPALA_SERVER_ARGS
andIMPALA_STATE_STORE_ARGS
settings to enable the core Impala support for authentication. See Starting the impalad Daemon with Sentry Authorization Enabled for details. -
Auditing for successful or blocked Impala queries, another aspect of security. Specify the
-audit_event_log_dir=directory_path
option and optionally the-max_audit_event_log_file_size=number_of_queries
and-abort_on_failed_audit_event
options as part of theIMPALA_SERVER_ARGS
settings, for each Impala node, to enable and customize auditing. See Auditing Impala Operations for details. -
Password protection for the Impala web UI, which listens on port 25000 by default. This feature involves adding some or all of the
--webserver_password_file
,--webserver_authentication_domain
, and--webserver_certificate_file
options to theIMPALA_SERVER_ARGS
andIMPALA_STATE_STORE_ARGS
settings. See Security Guidelines for Impala for details. -
Another setting you might add to
IMPALA_SERVER_ARGS
is a comma-separated list of query options and values:
These options control the behavior of queries performed by this impalad instance. The option values you specify here override the default values for Impala query options, as shown by the-default_query_options='option=value,option=value,...'
SET
statement in impala-shell. -
During troubleshooting, the appropriate support channel might direct you to change other values, particularly for
IMPALA_SERVER_ARGS
, to work around issues or gather debugging information.
These startup options for the impalad daemon are different from the command-line options for the impala-shell command. For the impala-shell options, see impala-shell Configuration Options.
Checking the Values of Impala Configuration Options
You can check the current runtime value of all these settings through the Impala web
interface, available by default at
http://impala_hostname:25000/varz
for the
impalad daemon,
http://impala_hostname:25010/varz
for the
statestored daemon, or
http://impala_hostname:25020/varz
for the
catalogd daemon.
Startup Options for impalad Daemon
The impalad
daemon implements the main Impala service, which performs
query processing and reads and writes the data files.
Startup Options for statestored Daemon
The statestored daemon implements the Impala statestore service, which monitors the availability of Impala services across the cluster, and handles situations such as nodes becoming unavailable or becoming available again.
Startup Options for catalogd Daemon
The catalogd daemon implements the Impala catalog service, which broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts data, or performs other kinds of DDL and DML operations.
--load_catalog_in_background
option to control when
the metadata of a table is loaded.
-
If set to
false
, the metadata of a table is loaded when it is referenced for the first time. This means that the first run of a particular query can be slower than subsequent runs. Starting in Impala 2.2, the default forload_catalog_in_background
isfalse
. -
If set to
true
, the catalog service attempts to load metadata for a table even if no query needed that metadata. So metadata will possibly be already loaded when the first query that would need it is run. However, for the following reasons, we recommend not to set the option totrue
.- Background load can interfere with query-specific metadata loading. This can happen on startup or after invalidating metadata, with a duration depending on the amount of metadata, and can lead to a seemingly random long running queries that are difficult to diagnose.
- Impala may load metadata for tables that are possibly never used, potentially increasing catalog size and consequently memory usage for both catalog service and Impala Daemon.