Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <catalog-server.h>
Public Member Functions | |
CatalogServer (MetricGroup *metrics) | |
Status | Start () |
void | RegisterWebpages (Webserver *webserver) |
const boost::shared_ptr < CatalogServiceIf > & | thrift_iface () const |
Returns the Thrift API interface that proxies requests onto the local CatalogService. More... | |
Catalog * | catalog () const |
Static Public Attributes | |
static std::string | IMPALA_CATALOG_TOPIC = "catalog-update" |
Private Member Functions | |
void | UpdateCatalogTopicCallback (const StatestoreSubscriber::TopicDeltaMap &incoming_topic_deltas, std::vector< TTopicDelta > *subscriber_topic_updates) |
void | GatherCatalogUpdatesThread () |
void | BuildTopicUpdates (const std::vector< TCatalogObject > &catalog_objects) |
void | CatalogUrlCallback (const Webserver::ArgumentMap &args, rapidjson::Document *document) |
void | CatalogObjectsUrlCallback (const Webserver::ArgumentMap &args, rapidjson::Document *document) |
Private Attributes | |
boost::shared_ptr < CatalogServiceIf > | thrift_iface_ |
Thrift API implementation which proxies requests onto this CatalogService. More... | |
ThriftSerializer | thrift_serializer_ |
MetricGroup * | metrics_ |
boost::scoped_ptr< Catalog > | catalog_ |
boost::scoped_ptr < StatestoreSubscriber > | statestore_subscriber_ |
StatsMetric< double > * | topic_processing_time_metric_ |
Metric that tracks the amount of time taken preparing a catalog update. More... | |
boost::scoped_ptr< Thread > | catalog_update_gathering_thread_ |
Thread that polls the catalog for any updates. More... | |
boost::unordered_set< std::string > | catalog_topic_entry_keys_ |
boost::mutex | catalog_lock_ |
boost::condition_variable | catalog_update_cv_ |
std::vector< TTopicItem > | pending_topic_updates_ |
bool | topic_updates_ready_ |
int64_t | last_sent_catalog_version_ |
int64_t | catalog_objects_min_version_ |
int64_t | catalog_objects_max_version_ |
The Impala CatalogServer manages the caching and persistence of cluster-wide metadata. The CatalogServer aggregates the metadata from the Hive Metastore, the NameNode, and potentially additional sources in the future. The CatalogServer uses the Statestore to broadcast metadata updates across the cluster. The CatalogService directly handles executing metadata update requests (DDL requests) from clients via a Thrift interface. The CatalogServer has two main components - a C++ daemon that has the Statestore integration code, Thrift service implementiation, and exporting of the debug webpage/metrics. The other main component is written in Java and manages caching and updating of all metadata. For each Statestore heartbeat, the C++ Server queries the Java metadata cache over JNI to get the current state of the catalog. Any updates are broadcast to the rest of the cluster using the Statestore over the IMPALA_CATALOG_TOPIC. The CatalogServer must be the only writer to the IMPALA_CATALOG_TOPIC, meaning there cannot be multiple CatalogServers running at the same time, as the correctness of delta updates relies upon this assumption. TODO: In the future the CatalogServer could go into a "standby" mode if it detects updates from another writer on the topic. This is a bit tricky because it requires some basic form of leader election.
Definition at line 57 of file catalog-server.h.
CatalogServer::CatalogServer | ( | MetricGroup * | metrics | ) |
Definition at line 148 of file catalog-server.cc.
References CATALOG_SERVER_TOPIC_PROCESSING_TIMES, metrics_, impala::MetricGroup::RegisterMetric(), and topic_processing_time_metric_.
|
private |
This function determines what items have been added/removed from the catalog since the last heartbeat and builds the next topic update to send. To do this, it enumerates the given catalog objects returned looking for the objects that have a catalog version that is > the catalog version sent with the last heartbeat. To determine items that have been deleted, it saves the set of topic entry keys sent with the last update and looks at the difference between it and the current set of topic entry keys. The key for each entry is a string composed of: "TCatalogObjectType:<unique object name>". So for table foo.bar, the key would be "TABLE:foo.bar". Encoding the object type information in the key ensures the keys are unique, as well as helps to determine what object type was removed in a state store delta update (since the state store only sends key names for deleted items). Must hold catalog_lock_ when calling this function.
Definition at line 295 of file catalog-server.cc.
References catalog_topic_entry_keys_, impala::Status::GetDetail(), last_sent_catalog_version_, impala::Status::ok(), pending_topic_updates_, impala::ThriftSerializer::Serialize(), impala::TCatalogObjectToEntryKey(), and thrift_serializer_.
Referenced by GatherCatalogUpdatesThread().
|
inline |
Definition at line 72 of file catalog-server.h.
References catalog_.
|
private |
Debug webpage handler that is used to dump the internal state of catalog objects. The caller specifies a "object_type" and "object_name" parameters and this function will get the matching TCatalogObject struct, if one exists. For example, to dump table "bar" in database "foo": <host>:25020/catalog_objects?object_type=TABLE&object_name=foo.bar
Definition at line 380 of file catalog-server.cc.
References catalog_, impala::Status::GetDetail(), impala::Status::ok(), impala::TCatalogObjectFromObjectName(), and impala::TCatalogObjectTypeFromName().
Referenced by RegisterWebpages().
|
private |
Example output: "databases": [ { "name": "_impala_builtins", "num_tables": 0, "tables": [] }, { "name": "default", "num_tables": 1, "tables": [ { "fqtn": "default.test_table", "name": "test_table" } ] } ]
Definition at line 340 of file catalog-server.cc.
References catalog_, impala::Status::GetDetail(), and impala::Status::ok().
Referenced by RegisterWebpages().
|
private |
Executed by the catalog_update_gathering_thread_. Calls into JniCatalog to get the latest set of catalog objects that exist, along with some metadata on each object. The results are stored in the shared catalog_objects_ data structure. Also, explicitly releases free memory back to the OS after each complete iteration.
Definition at line 252 of file catalog-server.cc.
References BuildTopicUpdates(), catalog_, catalog_lock_, catalog_objects_max_version_, catalog_objects_min_version_, catalog_update_cv_, impala::MonotonicStopWatch::ElapsedTime(), impala::Status::GetDetail(), last_sent_catalog_version_, impala::Status::ok(), pending_topic_updates_, impala::MonotonicStopWatch::Start(), topic_processing_time_metric_, topic_updates_ready_, and impala::StatsMetric< T >::Update().
Referenced by Start().
void CatalogServer::RegisterWebpages | ( | Webserver * | webserver | ) |
Definition at line 194 of file catalog-server.cc.
References CATALOG_OBJECT_TEMPLATE, CATALOG_OBJECT_WEB_PAGE, CATALOG_TEMPLATE, CATALOG_WEB_PAGE, CatalogObjectsUrlCallback(), CatalogUrlCallback(), and impala::Webserver::RegisterUrlCallback().
Status CatalogServer::Start | ( | ) |
Starts this CatalogService instance. Returns OK unless some error occurred in which case the status is returned.
Definition at line 159 of file catalog-server.cc.
References catalog_, catalog_lock_, catalog_update_cv_, catalog_update_gathering_thread_, GatherCatalogUpdatesThread(), IMPALA_CATALOG_TOPIC, impala::MakeNetworkAddress(), metrics_, impala::Status::OK, RETURN_IF_ERROR, statestore_subscriber_, impala::TNetworkAddressToString(), and UpdateCatalogTopicCallback().
|
inline |
Returns the Thrift API interface that proxies requests onto the local CatalogService.
Definition at line 69 of file catalog-server.h.
References thrift_iface_.
|
private |
Called during each Statestore heartbeat and is responsible for updating the current set of catalog objects in the IMPALA_CATALOG_TOPIC. Responds to each heartbeat with a delta update containing the set of changes since the last heartbeat. This function finds all catalog objects that have a catalog version greater than the last update sent by calling into the JniCatalog. The topic is updated with any catalog objects that are new or have been modified since the last heartbeat (by comparing the catalog version of the object with last_sent_catalog_version_). Also determines any deletions of catalog objects by looking at the difference of the last set of topic entry keys that were sent and the current set of topic entry keys. At the end of execution it notifies the catalog_update_gathering_thread_ to fetch the next set of updates from the JniCatalog. All updates are added to the subscriber_topic_updates list and sent back to the Statestore.
Definition at line 206 of file catalog-server.cc.
References catalog_lock_, catalog_objects_max_version_, catalog_objects_min_version_, catalog_topic_entry_keys_, catalog_update_cv_, IMPALA_CATALOG_TOPIC, last_sent_catalog_version_, pending_topic_updates_, and topic_updates_ready_.
Referenced by Start().
|
private |
Definition at line 79 of file catalog-server.h.
Referenced by catalog(), CatalogObjectsUrlCallback(), CatalogUrlCallback(), GatherCatalogUpdatesThread(), and Start().
|
private |
Protects catalog_update_cv_, pending_topic_updates_, catalog_objects_to/from_version_, and last_sent_catalog_version.
Definition at line 96 of file catalog-server.h.
Referenced by GatherCatalogUpdatesThread(), Start(), and UpdateCatalogTopicCallback().
|
private |
The max catalog version in pending_topic_updates_. Set by the catalog_update_gathering_thread_ and protected by catalog_lock_.
Definition at line 127 of file catalog-server.h.
Referenced by GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().
|
private |
The minimum catalog object version in pending_topic_updates_. All items in pending_topic_updates_ will be greater than this version. Set by the catalog_update_gathering_thread_ and protected by catalog_lock_.
Definition at line 123 of file catalog-server.h.
Referenced by GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().
|
private |
Tracks the set of catalog objects that exist via their topic entry key. During each IMPALA_CATALOG_TOPIC heartbeat, stores the set of known catalog objects that exist by their topic entry key. Used to track objects that have been removed since the last heartbeat.
Definition at line 92 of file catalog-server.h.
Referenced by BuildTopicUpdates(), and UpdateCatalogTopicCallback().
|
private |
Condition variable used to signal when the catalog_update_gathering_thread_ should fetch its next set of updates from the JniCatalog. At the end of each statestore heartbeat, this CV is signaled and the catalog_update_gathering_thread_ starts querying the JniCatalog for catalog objects. Protected by the catalog_lock_.
Definition at line 102 of file catalog-server.h.
Referenced by GatherCatalogUpdatesThread(), Start(), and UpdateCatalogTopicCallback().
|
private |
Thread that polls the catalog for any updates.
Definition at line 86 of file catalog-server.h.
Referenced by Start().
|
static |
Definition at line 59 of file catalog-server.h.
Referenced by impala::ImpalaServer::CatalogUpdateCallback(), impala::ImpalaServer::ImpalaServer(), Start(), and UpdateCatalogTopicCallback().
|
private |
The last version of the catalog that was sent over a statestore heartbeat. Set in UpdateCatalogTopicCallback() and protected by the catalog_lock_.
Definition at line 118 of file catalog-server.h.
Referenced by BuildTopicUpdates(), GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().
|
private |
Definition at line 78 of file catalog-server.h.
Referenced by CatalogServer(), and Start().
|
private |
The latest available set of catalog topic updates (additions/modifications, and deletions). Set by the catalog_update_gathering_thread_ and protected by catalog_lock_.
Definition at line 107 of file catalog-server.h.
Referenced by BuildTopicUpdates(), GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().
|
private |
Definition at line 80 of file catalog-server.h.
Referenced by Start().
|
private |
Thrift API implementation which proxies requests onto this CatalogService.
Definition at line 76 of file catalog-server.h.
Referenced by thrift_iface().
|
private |
Definition at line 77 of file catalog-server.h.
Referenced by BuildTopicUpdates().
|
private |
Metric that tracks the amount of time taken preparing a catalog update.
Definition at line 83 of file catalog-server.h.
Referenced by CatalogServer(), and GatherCatalogUpdatesThread().
|
private |
Flag used to indicate when new topic updates are ready for processing by the heartbeat thread. Set to false at the end of each heartbeat, before signaling the catalog_update_gathering_thread_. Set to true by the catalog_update_gathering_thread_ when it is done building the latest set of pending_topic_updates_.
Definition at line 114 of file catalog-server.h.
Referenced by GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().