Test to ensure that Impala is configured for optimal performance. If you have installed Impala with cluster management software, complete the processes described in this topic to help ensure a proper configuration. These procedures can be used to verify that Impala is set up correctly.
You can inspect Impala configuration values by connecting to your Impala server using a browser.
To check Impala configuration values:
impalad
in your environment.
Connect using an address of the form
http://hostname:port/varz
.
hostname
and port
with the name and
port of your Impala server. The default port is 25000.
For example, to check that your system is configured to use block locality tracking information, you
would check that the value for dfs.datanode.hdfs-blocks-metadata.enabled
is
true
.
To check data locality:
MyTable
that has a reasonable chance of being spread across multiple DataNodes:
[impalad-host:21000] > SELECT COUNT (*) FROM MyTable
Total remote scan volume = 0
The presence of remote scans may indicate impalad
is not running on the correct nodes.
This can be because some DataNodes do not have impalad
running or it can be because the
impalad
instance that is starting the query is unable to contact one or more of the
impalad
instances.
To understand the causes of this issue:
impalad
instances running in your cluster. If there are fewer instances than you expect,
this often indicates some DataNodes are not running impalad
. Ensure
impalad
is started on all DataNodes.
impalad
is running. The hostname Impala is using is displayed when
impalad
starts. To explicitly set the hostname, use the --hostname
flag.
statestored
is running as expected. Review the contents of the state store
log to ensure all instances of impalad
are listed as having connected to the state
store.
You can review the contents of the Impala logs for signs that short-circuit reads or block location tracking are not functioning. Before checking logs, execute a simple query against a small HDFS dataset. Completing a query task generates log messages using current settings. Information on starting Impala and executing queries can be found in Starting Impala and Using the Impala Shell (impala-shell Command). Information on logging can be found in Using Impala Logging. Log messages and their interpretations are as follows:
Log Message | Interpretation |
---|---|
Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata |
Tracking block locality is not enabled. |
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
Native checksumming is not enabled. |