Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::LibCache Class Reference

#include <lib-cache.h>

Collaboration diagram for impala::LibCache:

Classes

struct  LibCacheEntry
 

Public Types

enum  LibType { TYPE_SO, TYPE_IR, TYPE_JAR }
 

Public Member Functions

 ~LibCache ()
 Calls dlclose on all cached handles. More...
 
Status GetLocalLibPath (const std::string &hdfs_lib_file, LibType type, std::string *local_path)
 
Status CheckSymbolExists (const std::string &hdfs_lib_file, LibType type, const std::string &symbol, bool quiet=false)
 
Status GetSoFunctionPtr (const std::string &hdfs_lib_file, const std::string &symbol, void **fn_ptr, LibCacheEntry **entry, bool quiet=false)
 If 'quiet' is true, returned error statuses will not be logged. More...
 
void SetNeedsRefresh (const std::string &hdfs_lib_file)
 
void DecrementUseCount (LibCacheEntry *entry)
 See comment in GetSoFunctionPtr(). More...
 
void RemoveEntry (const std::string &hdfs_lib_file)
 Removes the cache entry for 'hdfs_lib_file'. More...
 
void DropCache ()
 Removes all cached entries. More...
 

Static Public Member Functions

static LibCacheinstance ()
 
static Status Init ()
 Initializes the libcache. Must be called before any other APIs. More...
 

Private Types

typedef boost::unordered_map
< std::string, LibCacheEntry * > 
LibMap
 

Private Member Functions

 LibCache ()
 
 LibCache (LibCache const &l)
 
LibCacheoperator= (LibCache const &l)
 
Status InitInternal ()
 
Status GetCacheEntry (const std::string &hdfs_lib_file, LibType type, boost::unique_lock< boost::mutex > *entry_lock, LibCacheEntry **entry)
 
Status GetCacheEntryInternal (const std::string &hdfs_lib_file, LibType type, boost::unique_lock< boost::mutex > *entry_lock, LibCacheEntry **entry)
 
std::string MakeLocalPath (const std::string &hdfs_path, const std::string &local_dir)
 
void RemoveEntryInternal (const std::string &hdfs_lib_file, const LibMap::iterator &entry_iterator)
 

Private Attributes

void * current_process_handle_
 dlopen() handle for the current process (i.e. impalad). More...
 
AtomicInt< int64_t > num_libs_copied_
 
boost::mutex lock_
 
LibMap lib_cache_
 

Static Private Attributes

static boost::scoped_ptr
< LibCache
instance_
 Singleton instance. Instantiated in Init(). More...
 

Detailed Description

Process-wide cache of dynamically-linked libraries loaded from HDFS. These libraries can either be shared objects, llvm modules or jars. For shared objects, when we load the shared object, we dlopen() it and keep it in our process. For modules, we store the symbols in the module to service symbol lookups. We can't cache the module since it (i.e. the external module) is consumed when it is linked with the query codegen module. Locking strategy: We don't want to grab a big lock across all operations since one of the operations is copying a file from HDFS. With one lock that would prevent any UDFs from running on the system. Instead, we have a global lock that is taken when doing the cache lookup, but is not taking during any blocking calls. During the block calls, we take the per-lib lock. Entry lifetime management: We cannot delete the entry while a query is using the library. When the caller requests a ptr into the library, they are given the entry handle and must decrement the ref count when they are done. TODO:

  • refresh libraries
  • better cached module management.

Definition at line 53 of file lib-cache.h.

Member Typedef Documentation

typedef boost::unordered_map<std::string, LibCacheEntry*> impala::LibCache::LibMap
private

Maps HDFS library path => cache entry. Entries in the cache need to be explicitly deleted.

Definition at line 128 of file lib-cache.h.

Member Enumeration Documentation

Enumerator
TYPE_SO 
TYPE_IR 
TYPE_JAR 

Definition at line 57 of file lib-cache.h.

Constructor & Destructor Documentation

LibCache::~LibCache ( )

Calls dlclose on all cached handles.

Definition at line 95 of file lib-cache.cc.

References current_process_handle_, DropCache(), and impala::DynamicClose().

LibCache::LibCache ( )
private

Definition at line 92 of file lib-cache.cc.

Referenced by Init().

impala::LibCache::LibCache ( LibCache const &  l)
private

Member Function Documentation

Status LibCache::CheckSymbolExists ( const std::string &  hdfs_lib_file,
LibType  type,
const std::string &  symbol,
bool  quiet = false 
)

Returns status.ok() if the symbol exists in 'hdfs_lib_file', non-ok otherwise. If 'quiet' is true, the error status for non-Java unfound symbols will not be logged.

Definition at line 192 of file lib-cache.cc.

References impala::LibCache::LibCacheEntry::local_path, impala::OK, RETURN_IF_ERROR, impala::LibCache::LibCacheEntry::symbols, and impala::LibCache::LibCacheEntry::type.

Referenced by ResolveSymbolLookup().

void LibCache::DropCache ( )

Removes all cached entries.

Definition at line 262 of file lib-cache.cc.

References lock_.

Referenced by impala::ImpalaServer::CatalogUpdateCallback(), and ~LibCache().

Status LibCache::GetCacheEntry ( const std::string &  hdfs_lib_file,
LibType  type,
boost::unique_lock< boost::mutex > *  entry_lock,
LibCacheEntry **  entry 
)
private

Returns the cache entry for 'hdfs_lib_file'. If this library has not been copied locally, it will copy it and add a new LibCacheEntry to 'lib_cache_'. Result is returned in *entry. No locks should be take before calling this. On return the entry's lock is taken and returned in *entry_lock. If an error is returned, there will be no entry in lib_cache_ and *entry is NULL.

Definition at line 279 of file lib-cache.cc.

References impala::Status::ok().

Status LibCache::GetCacheEntryInternal ( const std::string &  hdfs_lib_file,
LibType  type,
boost::unique_lock< boost::mutex > *  entry_lock,
LibCacheEntry **  entry 
)
private

Implementation to get the cache entry for 'hdfs_lib_file'. Errors are returned without evicting the cache entry if the status is not OK and *entry is not NULL.

Definition at line 304 of file lib-cache.cc.

References impala::CopyHdfsFile(), impala::DynamicOpen(), impala::GetLastModificationTime(), lock_, impala::OK, impala::Status::ok(), path(), pool, and RETURN_IF_ERROR.

Status LibCache::GetLocalLibPath ( const std::string &  hdfs_lib_file,
LibType  type,
std::string *  local_path 
)

Gets the local file system path for the library at 'hdfs_lib_file'. If this file is not already on the local fs, it copies it and caches the result. Returns an error if 'hdfs_lib_file' cannot be copied to the local fs.

Definition at line 181 of file lib-cache.cc.

References impala::LibCache::LibCacheEntry::local_path, impala::OK, RETURN_IF_ERROR, and impala::LibCache::LibCacheEntry::type.

Referenced by Java_com_cloudera_impala_service_FeSupport_NativeCacheJar(), and ResolveSymbolLookup().

Status LibCache::GetSoFunctionPtr ( const std::string &  hdfs_lib_file,
const std::string &  symbol,
void **  fn_ptr,
LibCacheEntry **  entry,
bool  quiet = false 
)

If 'quiet' is true, returned error statuses will not be logged.

Returns a pointer to the function for the given library and symbol. If 'hdfs_lib_file' is empty, the symbol is looked up in the impalad process. Otherwise, 'hdfs_lib_file' should be the HDFS path to a shared library (.so) file. dlopen handles and symbols are cached. Only usable if 'hdfs_lib_file' refers to a shared object. If entry is non-null and *entry is null, *entry will be set to the cached entry. If entry is non-null and *entry is non-null, *entry will be reused (i.e., the use count is not increased). The caller must call DecrementUseCount(*entry) when it is done using fn_ptr and it is no longer valid to use fn_ptr.

Definition at line 130 of file lib-cache.cc.

References impala::DynamicLookup(), impala::LibCache::LibCacheEntry::lock, impala::OK, RETURN_IF_ERROR, impala::LibCache::LibCacheEntry::shared_object_handle, impala::LibCache::LibCacheEntry::symbol_cache, impala::LibCache::LibCacheEntry::type, and impala::LibCache::LibCacheEntry::use_count.

Referenced by impala::ScalarFnCall::GetFunction(), impala::ScalarFnCall::GetUdf(), and impala::ScalarFnCall::Prepare().

Status LibCache::Init ( )
static

Initializes the libcache. Must be called before any other APIs.

Definition at line 100 of file lib-cache.cc.

References instance_, and LibCache().

Referenced by impala::InitCommonRuntime().

string LibCache::MakeLocalPath ( const std::string &  hdfs_path,
const std::string &  local_dir 
)
private

Utility function for generating a filename unique to this process and 'hdfs_path'. This is to prevent multiple impalad processes or different library files with the same name from clobbering each other. 'hdfs_path' should be the full path (including the filename) of the file we're going to copy to the local FS, and 'local_dir' is the local directory prefix of the returned path.

Definition at line 414 of file lib-cache.cc.

References path().

LibCache& impala::LibCache::operator= ( LibCache const &  l)
private
void LibCache::RemoveEntry ( const std::string &  hdfs_lib_file)

Removes the cache entry for 'hdfs_lib_file'.

Definition at line 232 of file lib-cache.cc.

References lock_.

Referenced by impala::ImpalaServer::CatalogUpdateCallback(), impala::CatalogOpExecutor::HandleDropDataSource(), and impala::CatalogOpExecutor::HandleDropFunction().

void LibCache::RemoveEntryInternal ( const std::string &  hdfs_lib_file,
const LibMap::iterator &  entry_iterator 
)
private

Implementation to remove an entry from the cache. lock_ must be held. The entry's lock should not be held.

Definition at line 239 of file lib-cache.cc.

References impala::LibCache::LibCacheEntry::local_path, impala::LibCache::LibCacheEntry::lock, impala::LibCache::LibCacheEntry::should_remove, and impala::LibCache::LibCacheEntry::use_count.

void LibCache::SetNeedsRefresh ( const std::string &  hdfs_lib_file)

Marks the entry for 'hdfs_lib_file' as needing to be refreshed if the file in HDFS is newer than the local cached copied. The refresh will occur the next time the entry is accessed.

Definition at line 221 of file lib-cache.cc.

References impala::LibCache::LibCacheEntry::check_needs_refresh, impala::LibCache::LibCacheEntry::lock, and lock_.

Referenced by impala::ImpalaServer::CatalogUpdateCallback(), and ResolveSymbolLookup().

Member Data Documentation

void* impala::LibCache::current_process_handle_
private

dlopen() handle for the current process (i.e. impalad).

Definition at line 116 of file lib-cache.h.

Referenced by InitInternal(), and ~LibCache().

scoped_ptr< LibCache > LibCache::instance_
staticprivate

Singleton instance. Instantiated in Init().

Definition at line 113 of file lib-cache.h.

Referenced by Init(), and instance().

LibMap impala::LibCache::lib_cache_
private

Definition at line 129 of file lib-cache.h.

boost::mutex impala::LibCache::lock_
private

Protects lib_cache_. For lock ordering, this lock must always be taken before the per entry lock.

Definition at line 124 of file lib-cache.h.

AtomicInt<int64_t> impala::LibCache::num_libs_copied_
private

The number of libs that have been copied from HDFS to the local FS. This is appended to the local fs path to remove collisions.

Definition at line 120 of file lib-cache.h.


The documentation for this class was generated from the following files: