Troubleshooting for Impala requires being able to diagnose and debug problems with performance, network connectivity, out-of-memory conditions, disk space usage, and crash or hang conditions in any of the Impala-related daemons.
The following sections describe the general troubleshooting procedures to diagnose different kinds of problems:
In general, if queries issued against Impala fail, you can try running these same queries against Hive.
``
) if so.
Under very high concurrency, Impala could encounter a serious error due to usage of various operating system resources. Errors similar to the following may be caused by operating system resource exhaustion:
F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
The KRPC implementation in Impala 2.12 / 3.0 greatly reduces thread counts and the chances of hitting a resource limit.
If you still get an error similar to the above in Impala 3.0 and
higher, try increasing the max_map_count
OS virtual
memory parameter. max_map_count
defines the maximum
number of memory map areas that a process can use. Configure each host
running an impalad
daemon with the command to increase
max_map_count
to 8 GB.
echo 8000000 > /proc/sys/vm/max_map_count
/etc/sysctl.conf
:vm.max_map_count=8000000
sysctl -p
Impala queries are typically I/O-intensive. If there is an I/O problem
with storage devices, or with HDFS itself, Impala queries could show
slow response times with no obvious cause on the Impala side. Slow I/O
on even a single Impala daemon could result in an overall slowdown,
because queries involving clauses such as ORDER BY
,
GROUP BY
, or JOIN
do not start
returning results until all executor Impala daemons have finished their
work.
To test whether the Linux I/O system itself is performing as expected, run Linux commands like the following on each host Impala daemon is running:
$ sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
vm.drop_caches = 3
vm.drop_caches = 0
$ sudo dd if=/dev/sda bs=1M of=/dev/null count=1k
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 5.60373 s, 192 MB/s
$ sudo dd if=/dev/sdb bs=1M of=/dev/null count=1k
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 5.51145 s, 195 MB/s
$ sudo dd if=/dev/sdc bs=1M of=/dev/null count=1k
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 5.58096 s, 192 MB/s
$ sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 5.43924 s, 197 MB/s
On modern hardware, a throughput rate of less than 100 MB/s typically indicates a performance issue with the storage device. Correct the hardware problem before continuing with Impala tuning or benchmarking.
The following table lists common problems and potential solutions.
Symptom | Explanation | Recommendation |
---|---|---|
Impala takes a long time to start. | Impala instances with large numbers of tables, partitions, or data files take longer to start because the metadata for these objects is broadcast to all impalad nodes and cached. | Adjust timeout and synchronicity settings. |
Joins fail to complete. |
There may be insufficient memory. During a join, data from the second, third, and so on sets to be joined is loaded into memory. If Impala chooses an inefficient join order or join mechanism, the query could exceed the total memory available. |
Start by gathering statistics with the |
Queries return incorrect results. |
Impala metadata may be outdated after changes are performed in Hive. |
Where possible, use the appropriate Impala statement ( |
Queries are slow to return results. |
Some Note: Replace hostname and
port with the hostname and port of your
Impala state store host machine and web server port. The
default port is 25010.
The number of
impalad instances listed should match the
expected number of impalad instances
installed in the cluster. There should also be one
impalad instance installed on each
DataNode. |
Ensure Impala is installed on all DataNodes. Start any |
Queries are slow to return results. |
Impala may not be configured to use native checksumming. Native checksumming uses
machine-specific instructions to compute checksums over HDFS data very quickly. Review Impala
logs. If you find instances of " |
Ensure Impala is configured to use native checksumming as described in Post-Installation Configuration for Impala. |
Queries are slow to return results. |
Impala may not be configured to use data locality tracking. |
Test Impala for data locality tracking and make configuration changes as necessary. Information on this process can be found in Post-Installation Configuration for Impala. |
Attempts to complete Impala tasks such as executing INSERT-SELECT actions fail. The Impala logs include notes that files could not be opened due to permission denied. |
This can be the result of permissions issues. For example, you could use the Hive shell as the hive user to create a table. After creating this table, you could attempt to complete some action, such as an INSERT-SELECT on the table. Because the table was created using one user and the INSERT-SELECT is attempted by another, this action may fail due to permissions issues. |
In general, ensure the Impala user has sufficient permissions. In the preceding example, ensure the Impala user has sufficient permissions to the table that the Hive user created. |
Impala fails to start up, with the impalad logs referring to errors connecting to the statestore service and attempts to re-register. |
A large number of databases, tables, partitions, and so on can require metadata synchronization, particularly on startup, that takes longer than the default timeout for the statestore service. |
Configure the statestore timeout value and possibly other settings related to the frequency of statestore updates and metadata loading. See Increasing the Statestore Timeout and Scalability Considerations for the Impala Statestore. |