Configuring Impala for High Availability

The Impala StateStore checks on the health of all Impala daemons in a cluster, and continuously relays its findings to each of the daemons. The Catalog stores metadata of databases, tables, partitions, resource usage information, configuration settings, and other objects managed by Impala. If StateStore and Catalog daemons are single instances in an Impala cluster, it will create a single point of failure. Although Impala coordinators/executors continue to execute queries if the StateStore node is down, coordinators/executors will not get state updates. This causes degradation of admission control & cluster membership updates. To mitigate this, a pair of StateStore and Catalog instances can be deployed in an Impala cluster so that Impala cluster could survive failures of StateStore or Catalog.

Prerequisite:

To enable High Availability for Impala CatalogD and StateStore, you must configure at least two Impala CatalogD, two StateStore instances on two different nodes.
Note: CatalogD HA and Statestore HA are independent features. Users can enable CatalogD HA, Statestore HA, or both CatalogD HA and Statestore HA.