Configuring Catalog for High Availability

With any new query requests, the Impala coordinator sends metadata requests to Catalog service and sends metadata updates to Catalog which in turn propagates metadata updates to hive metastore. With a pair of primary/standby Catalog instances, the standby Catalog instance will be promoted as the primary instance to continue Catalog service for Impala cluster when the primary instance goes down. The active Catalogd acts as the source of metadata and provides Catalog service for the Impala cluster. This high availability mode of Catalog service reduces the outage duration of the Impala cluster when the primary Catalog instance fails. To support Catalog HA, you can now add two Catalogd instances in an Active-Passive high availability pair to an Impala cluster.

Enabling Catalog High Availability

To enable Catalog high availability in an Impala cluster, follow these steps:
  • Set the starting flag enable_catalogd_ha to true for both catalogd instances and the StateStore.

The active StateStore will assign roles to the CatalogD instances, designating one as the active CatalogD and the other as the standby CatalogD. The active CatalogD acts as the source of metadata and provides Catalog services for the Impala cluster.

Disabling Catalog High Availability

To disable Catalog high availability in an Impala cluster, follow these steps:
  1. Remove one CatalogD instance from the Impala cluster.
  2. Restart the remaining CatalogD instance without the starting flag enable_catalogd_ha.
  3. Restart the StateStore without the starting flag enable_catalogd_ha.

Monitoring Catalog HA Status in the StateStore Web Page

A new web page /catalog_ha_info has been added to the StateStore debug web server. This page displays the Catalog HA status, including:

  • Active Catalog Node
  • Standby Catalog Node
  • Notified Subscribers Table

To access this information, navigate to the /catalog_ha_info page on the StateStore debug web server.

Catalog Failure Detection

The StateStore instance continuously sends heartbeat to its registered clients, including the primary and standby Catalog instances, to track Impala daemons in the cluster to determine if the daemon is healthy. If the StateStore finds the primary Catalog instance is not healthy but the standby Catalog instance is healthy, StateStore promotes the standby Catalog instance as primary instance and notify all coordinators for this change. Coordinators will switch over to the new primary Catalog instance.

When the system detects that the active CatalogD is unhealthy, it initiates a failover to the standby CatalogD. During this brief transition, some nodes might not immediately recognize the new active CatalogD, causing currently running queries to fail due to lack of access to metadata. These failed queries need to be rerun after the failover is complete and the new active CatalogD is operational.