This section describes the mandatory and recommended configuration settings for Impala. If Impala is installed using cluster management software, some of these configurations might be completed automatically; you must still configure short-circuit reads manually. If you want to customize your environment, consider making the changes described in this topic.
Enabling short-circuit reads allows Impala to read local data directly
from the file system. This removes the need to communicate through the
DataNodes, improving performance. This setting also minimizes the number
of additional copies of data. Short-circuit reads requires
libhadoop.so
(the Hadoop Native Library) to be accessible to both the server and the
client. libhadoop.so
is not available if you have
installed from a tarball. You must install from an
.rpm
, .deb
, or parcel to use
short-circuit local reads.
To configure DataNodes for short-circuit reads:
core-site.xml
and hdfs-site.xml
configuration files from the Hadoop configuration directory to the
Impala configuration directory. The default Impala configuration
location is /etc/impala/conf
. hdfs-site.xml
as shown: <property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hdfs-sockets/dn</value>
</property>
<property>
<name>dfs.client.file-block-storage-locations.timeout.millis</name>
<value>10000</value>
</property>
If /var/run/hadoop-hdfs/
is group-writable, make
sure its group is root
.
Enabling block location metadata allows Impala to know which disk data blocks are located on, allowing better utilization of the underlying disks. Impala will not start unless this setting is enabled.
To enable block location tracking:
hdfs-site.xml
file:
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
core-site.xml
and hdfs-site.xml
configuration files from the Hadoop configuration directory to the
Impala configuration directory. The default Impala configuration
location is /etc/impala/conf
. Enabling native checksumming causes Impala to use an optimized native library for computing checksums, if that library is available.
To enable native checksumming:
If you installed from packages, the native checksumming library is installed and setup correctly. In
such a case, no additional steps are required. Conversely, if you installed by other means, such as with
tarballs, native checksumming may not be available due to missing shared objects. Finding the message
"Unable to load native-hadoop library for your platform... using builtin-java classes where
applicable
" in the Impala logs indicates native checksumming may be unavailable. To enable native
checksumming, you must build and install libhadoop.so
(the
Hadoop Native Library).