The configuration options for the Impala daemons let you choose which hosts and ports to use for the services that run on a single host, specify directories for logging, control resource usage and security, and specify other aspects of the Impala software.
The Impala server, statestore
, and catalog services start up using
values provided in a defaults file, /etc/default/impala.
This file includes information about many resources used by Impala. Most of the defaults
included in this file should be effective in most cases. For example, typically you
would not change the definition of the CLASSPATH
variable, but you
would always set the address used by the statestore
server. Some of the
content you might modify includes:
IMPALA_STATE_STORE_HOST=127.0.0.1
IMPALA_STATE_STORE_PORT=24000
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=/var/log/impala
IMPALA_CATALOG_SERVICE_HOST=...
IMPALA_STATE_STORE_HOST=...
export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}}
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}"
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}
To use alternate values, edit the defaults file, then restart all the Impala-related services so that the changes take effect. Restart the Impala server using the following commands:
$ sudo service impala-server restart
Stopping Impala Server: [ OK ]
Starting Impala Server: [ OK ]
Restart the Impala StateStore using the following commands:
$ sudo service impala-state-store restart
Stopping Impala State Store Server: [ OK ]
Starting Impala State Store Server: [ OK ]
Restart the Impala Catalog Service using the following commands:
$ sudo service impala-catalog restart
Stopping Impala Catalog Server: [ OK ]
Starting Impala Catalog Server: [ OK ]
Some common settings to change include:
StateStore address. Where practical, put the statestored
on a
separate host not running the impalad daemon. In that recommended
configuration, the impalad daemon cannot refer to the
statestored
server using the loopback address. If the
statestored
is hosted on a machine with an IP address of
192.168.0.27, change:
IMPALA_STATE_STORE_HOST=127.0.0.1
to:
IMPALA_STATE_STORE_HOST=192.168.0.27
Catalog server address (including both the hostname and the port number). Update the
value of the IMPALA_CATALOG_SERVICE_HOST
variable. Where practical,
run the catalog server on the same host as the statestore
. In that
recommended configuration, the impalad daemon cannot refer to the
catalog server using the loopback address. If the catalog service is hosted on a
machine with an IP address of 192.168.0.27, add the following line:
IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000
The /etc/default/impala defaults file currently does not define
an IMPALA_CATALOG_ARGS
environment variable, but if you add one it
will be recognized by the service startup/shutdown script. Add a definition for this
variable to /etc/default/impala and add the option
‑‑catalog_service_host=hostname
. If
the port is different than the default 26000, also add the option
‑‑catalog_service_port=port
.
Memory limits. You can limit the amount of memory available to Impala. For example, to allow Impala to use no more than 70% of system memory, change:
export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}}
to:
export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}
You can specify the memory limit using absolute notation such as
500m
or 2G
, or as a percentage of physical memory
such as 60%
.
Core dump enablement. To enable core dumps, change:
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}
to:
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}
The location of core dump files may vary according to your operating system configuration.
Other security settings may prevent Impala from writing core dumps even when this option is enabled.
Authorization. Specify the
‑‑server_name
option as part of the
IMPALA_SERVER_ARGS
and
IMPALA_CATALOG_ARGS
settings to enable the core
Impala support for authorization. See impala_authorization.html#secure_startup for details.
Auditing for successful or blocked Impala queries, another aspect of security.
Specify the
‑‑audit_event_log_dir=directory_path
option and optionally the
‑‑max_audit_event_log_file_size=number_of_queries
and ‑‑abort_on_failed_audit_event
options as part of
the IMPALA_SERVER_ARGS
settings, for each Impala node, to enable
and customize auditing. See
Auditing Impala Operations for details.
Password protection for the Impala web UI, which listens on port 25000 by default.
This feature involves adding some or all of the
‑‑webserver_password_file
,
‑‑webserver_authentication_domain
, and
‑‑webserver_certificate_file
options to the
IMPALA_SERVER_ARGS
and IMPALA_STATE_STORE_ARGS
settings. See
Security Guidelines for Impala for
details.
IMPALA_SERVER_ARGS
is a
comma-separated list of query options and values:
‑‑default_query_options='option=value,option=value,...'
These options control the behavior of queries performed by this
impalad instance. The option values you specify here override the
default values for
Impala query
options, as shown by the SET
statement in
impala-shell.
During troubleshooting, the appropriate support channel might direct you to change
other values, particularly for IMPALA_SERVER_ARGS
, to work around
issues or gather debugging information.
These startup options for the impalad daemon are different from the command-line options for the impala-shell command. For the impala-shell options, see impala-shell Configuration Options.
You can check the current runtime value of all these settings through the Impala web
interface, available by default at
http://impala_hostname:25000/varz
for the
impalad daemon,
http://impala_hostname:25010/varz
for the
statestored daemon, or
http://impala_hostname:25020/varz
for the
catalogd daemon.
The catalogd daemon implements the Impala Catalog service, which broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts data, or performs other kinds of DDL and DML operations.
‑‑load_catalog_in_background
option to control when the
metadata of a table is loaded.
false
, the metadata of a table is loaded when it is
referenced for the first time. This means that the first run of a particular query
can be slower than subsequent runs. Starting in Impala 2.2, the default for
‑‑load_catalog_in_background
is false
.
true
, the catalog service attempts to load metadata for a
table even if no query needed that metadata. So metadata will possibly be already
loaded when the first query that would need it is run. However, for the following
reasons, we recommend not to set the option to true
.