Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::CatalogServer Class Reference

#include <catalog-server.h>

Collaboration diagram for impala::CatalogServer:

Public Member Functions

 CatalogServer (MetricGroup *metrics)
 
Status Start ()
 
void RegisterWebpages (Webserver *webserver)
 
const boost::shared_ptr
< CatalogServiceIf > & 
thrift_iface () const
 Returns the Thrift API interface that proxies requests onto the local CatalogService. More...
 
Catalogcatalog () const
 

Static Public Attributes

static std::string IMPALA_CATALOG_TOPIC = "catalog-update"
 

Private Member Functions

void UpdateCatalogTopicCallback (const StatestoreSubscriber::TopicDeltaMap &incoming_topic_deltas, std::vector< TTopicDelta > *subscriber_topic_updates)
 
void GatherCatalogUpdatesThread ()
 
void BuildTopicUpdates (const std::vector< TCatalogObject > &catalog_objects)
 
void CatalogUrlCallback (const Webserver::ArgumentMap &args, rapidjson::Document *document)
 
void CatalogObjectsUrlCallback (const Webserver::ArgumentMap &args, rapidjson::Document *document)
 

Private Attributes

boost::shared_ptr
< CatalogServiceIf
thrift_iface_
 Thrift API implementation which proxies requests onto this CatalogService. More...
 
ThriftSerializer thrift_serializer_
 
MetricGroupmetrics_
 
boost::scoped_ptr< Catalogcatalog_
 
boost::scoped_ptr
< StatestoreSubscriber
statestore_subscriber_
 
StatsMetric< double > * topic_processing_time_metric_
 Metric that tracks the amount of time taken preparing a catalog update. More...
 
boost::scoped_ptr< Threadcatalog_update_gathering_thread_
 Thread that polls the catalog for any updates. More...
 
boost::unordered_set< std::string > catalog_topic_entry_keys_
 
boost::mutex catalog_lock_
 
boost::condition_variable catalog_update_cv_
 
std::vector< TTopicItem > pending_topic_updates_
 
bool topic_updates_ready_
 
int64_t last_sent_catalog_version_
 
int64_t catalog_objects_min_version_
 
int64_t catalog_objects_max_version_
 

Detailed Description

The Impala CatalogServer manages the caching and persistence of cluster-wide metadata. The CatalogServer aggregates the metadata from the Hive Metastore, the NameNode, and potentially additional sources in the future. The CatalogServer uses the Statestore to broadcast metadata updates across the cluster. The CatalogService directly handles executing metadata update requests (DDL requests) from clients via a Thrift interface. The CatalogServer has two main components - a C++ daemon that has the Statestore integration code, Thrift service implementiation, and exporting of the debug webpage/metrics. The other main component is written in Java and manages caching and updating of all metadata. For each Statestore heartbeat, the C++ Server queries the Java metadata cache over JNI to get the current state of the catalog. Any updates are broadcast to the rest of the cluster using the Statestore over the IMPALA_CATALOG_TOPIC. The CatalogServer must be the only writer to the IMPALA_CATALOG_TOPIC, meaning there cannot be multiple CatalogServers running at the same time, as the correctness of delta updates relies upon this assumption. TODO: In the future the CatalogServer could go into a "standby" mode if it detects updates from another writer on the topic. This is a bit tricky because it requires some basic form of leader election.

Definition at line 57 of file catalog-server.h.

Constructor & Destructor Documentation

CatalogServer::CatalogServer ( MetricGroup metrics)

Member Function Documentation

void CatalogServer::BuildTopicUpdates ( const std::vector< TCatalogObject > &  catalog_objects)
private

This function determines what items have been added/removed from the catalog since the last heartbeat and builds the next topic update to send. To do this, it enumerates the given catalog objects returned looking for the objects that have a catalog version that is > the catalog version sent with the last heartbeat. To determine items that have been deleted, it saves the set of topic entry keys sent with the last update and looks at the difference between it and the current set of topic entry keys. The key for each entry is a string composed of: "TCatalogObjectType:<unique object name>". So for table foo.bar, the key would be "TABLE:foo.bar". Encoding the object type information in the key ensures the keys are unique, as well as helps to determine what object type was removed in a state store delta update (since the state store only sends key names for deleted items). Must hold catalog_lock_ when calling this function.

Definition at line 295 of file catalog-server.cc.

References catalog_topic_entry_keys_, impala::Status::GetDetail(), last_sent_catalog_version_, impala::Status::ok(), pending_topic_updates_, impala::ThriftSerializer::Serialize(), impala::TCatalogObjectToEntryKey(), and thrift_serializer_.

Referenced by GatherCatalogUpdatesThread().

Catalog* impala::CatalogServer::catalog ( ) const
inline

Definition at line 72 of file catalog-server.h.

References catalog_.

void CatalogServer::CatalogObjectsUrlCallback ( const Webserver::ArgumentMap args,
rapidjson::Document *  document 
)
private

Debug webpage handler that is used to dump the internal state of catalog objects. The caller specifies a "object_type" and "object_name" parameters and this function will get the matching TCatalogObject struct, if one exists. For example, to dump table "bar" in database "foo": <host>:25020/catalog_objects?object_type=TABLE&object_name=foo.bar

Definition at line 380 of file catalog-server.cc.

References catalog_, impala::Status::GetDetail(), impala::Status::ok(), impala::TCatalogObjectFromObjectName(), and impala::TCatalogObjectTypeFromName().

Referenced by RegisterWebpages().

void CatalogServer::CatalogUrlCallback ( const Webserver::ArgumentMap args,
rapidjson::Document *  document 
)
private

Example output: "databases": [ { "name": "_impala_builtins", "num_tables": 0, "tables": [] }, { "name": "default", "num_tables": 1, "tables": [ { "fqtn": "default.test_table", "name": "test_table" } ] } ]

Definition at line 340 of file catalog-server.cc.

References catalog_, impala::Status::GetDetail(), and impala::Status::ok().

Referenced by RegisterWebpages().

void CatalogServer::GatherCatalogUpdatesThread ( )
private

Executed by the catalog_update_gathering_thread_. Calls into JniCatalog to get the latest set of catalog objects that exist, along with some metadata on each object. The results are stored in the shared catalog_objects_ data structure. Also, explicitly releases free memory back to the OS after each complete iteration.

Definition at line 252 of file catalog-server.cc.

References BuildTopicUpdates(), catalog_, catalog_lock_, catalog_objects_max_version_, catalog_objects_min_version_, catalog_update_cv_, impala::MonotonicStopWatch::ElapsedTime(), impala::Status::GetDetail(), last_sent_catalog_version_, impala::Status::ok(), pending_topic_updates_, impala::MonotonicStopWatch::Start(), topic_processing_time_metric_, topic_updates_ready_, and impala::StatsMetric< T >::Update().

Referenced by Start().

Status CatalogServer::Start ( )
const boost::shared_ptr<CatalogServiceIf>& impala::CatalogServer::thrift_iface ( ) const
inline

Returns the Thrift API interface that proxies requests onto the local CatalogService.

Definition at line 69 of file catalog-server.h.

References thrift_iface_.

void CatalogServer::UpdateCatalogTopicCallback ( const StatestoreSubscriber::TopicDeltaMap incoming_topic_deltas,
std::vector< TTopicDelta > *  subscriber_topic_updates 
)
private

Called during each Statestore heartbeat and is responsible for updating the current set of catalog objects in the IMPALA_CATALOG_TOPIC. Responds to each heartbeat with a delta update containing the set of changes since the last heartbeat. This function finds all catalog objects that have a catalog version greater than the last update sent by calling into the JniCatalog. The topic is updated with any catalog objects that are new or have been modified since the last heartbeat (by comparing the catalog version of the object with last_sent_catalog_version_). Also determines any deletions of catalog objects by looking at the difference of the last set of topic entry keys that were sent and the current set of topic entry keys. At the end of execution it notifies the catalog_update_gathering_thread_ to fetch the next set of updates from the JniCatalog. All updates are added to the subscriber_topic_updates list and sent back to the Statestore.

Definition at line 206 of file catalog-server.cc.

References catalog_lock_, catalog_objects_max_version_, catalog_objects_min_version_, catalog_topic_entry_keys_, catalog_update_cv_, IMPALA_CATALOG_TOPIC, last_sent_catalog_version_, pending_topic_updates_, and topic_updates_ready_.

Referenced by Start().

Member Data Documentation

boost::scoped_ptr<Catalog> impala::CatalogServer::catalog_
private
boost::mutex impala::CatalogServer::catalog_lock_
private

Protects catalog_update_cv_, pending_topic_updates_, catalog_objects_to/from_version_, and last_sent_catalog_version.

Definition at line 96 of file catalog-server.h.

Referenced by GatherCatalogUpdatesThread(), Start(), and UpdateCatalogTopicCallback().

int64_t impala::CatalogServer::catalog_objects_max_version_
private

The max catalog version in pending_topic_updates_. Set by the catalog_update_gathering_thread_ and protected by catalog_lock_.

Definition at line 127 of file catalog-server.h.

Referenced by GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().

int64_t impala::CatalogServer::catalog_objects_min_version_
private

The minimum catalog object version in pending_topic_updates_. All items in pending_topic_updates_ will be greater than this version. Set by the catalog_update_gathering_thread_ and protected by catalog_lock_.

Definition at line 123 of file catalog-server.h.

Referenced by GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().

boost::unordered_set<std::string> impala::CatalogServer::catalog_topic_entry_keys_
private

Tracks the set of catalog objects that exist via their topic entry key. During each IMPALA_CATALOG_TOPIC heartbeat, stores the set of known catalog objects that exist by their topic entry key. Used to track objects that have been removed since the last heartbeat.

Definition at line 92 of file catalog-server.h.

Referenced by BuildTopicUpdates(), and UpdateCatalogTopicCallback().

boost::condition_variable impala::CatalogServer::catalog_update_cv_
private

Condition variable used to signal when the catalog_update_gathering_thread_ should fetch its next set of updates from the JniCatalog. At the end of each statestore heartbeat, this CV is signaled and the catalog_update_gathering_thread_ starts querying the JniCatalog for catalog objects. Protected by the catalog_lock_.

Definition at line 102 of file catalog-server.h.

Referenced by GatherCatalogUpdatesThread(), Start(), and UpdateCatalogTopicCallback().

boost::scoped_ptr<Thread> impala::CatalogServer::catalog_update_gathering_thread_
private

Thread that polls the catalog for any updates.

Definition at line 86 of file catalog-server.h.

Referenced by Start().

string CatalogServer::IMPALA_CATALOG_TOPIC = "catalog-update"
static
int64_t impala::CatalogServer::last_sent_catalog_version_
private

The last version of the catalog that was sent over a statestore heartbeat. Set in UpdateCatalogTopicCallback() and protected by the catalog_lock_.

Definition at line 118 of file catalog-server.h.

Referenced by BuildTopicUpdates(), GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().

MetricGroup* impala::CatalogServer::metrics_
private

Definition at line 78 of file catalog-server.h.

Referenced by CatalogServer(), and Start().

std::vector<TTopicItem> impala::CatalogServer::pending_topic_updates_
private

The latest available set of catalog topic updates (additions/modifications, and deletions). Set by the catalog_update_gathering_thread_ and protected by catalog_lock_.

Definition at line 107 of file catalog-server.h.

Referenced by BuildTopicUpdates(), GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().

boost::scoped_ptr<StatestoreSubscriber> impala::CatalogServer::statestore_subscriber_
private

Definition at line 80 of file catalog-server.h.

Referenced by Start().

boost::shared_ptr<CatalogServiceIf> impala::CatalogServer::thrift_iface_
private

Thrift API implementation which proxies requests onto this CatalogService.

Definition at line 76 of file catalog-server.h.

Referenced by thrift_iface().

ThriftSerializer impala::CatalogServer::thrift_serializer_
private

Definition at line 77 of file catalog-server.h.

Referenced by BuildTopicUpdates().

StatsMetric<double>* impala::CatalogServer::topic_processing_time_metric_
private

Metric that tracks the amount of time taken preparing a catalog update.

Definition at line 83 of file catalog-server.h.

Referenced by CatalogServer(), and GatherCatalogUpdatesThread().

bool impala::CatalogServer::topic_updates_ready_
private

Flag used to indicate when new topic updates are ready for processing by the heartbeat thread. Set to false at the end of each heartbeat, before signaling the catalog_update_gathering_thread_. Set to true by the catalog_update_gathering_thread_ when it is done building the latest set of pending_topic_updates_.

Definition at line 114 of file catalog-server.h.

Referenced by GatherCatalogUpdatesThread(), and UpdateCatalogTopicCallback().


The documentation for this class was generated from the following files: