Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::LlvmCodeGen Class Reference

LLVM code generator. This is the top level object to generate jitted code. More...

#include <llvm-codegen.h>

Collaboration diagram for impala::LlvmCodeGen:

Classes

class  FnPrototype
 
struct  NamedVariable
 Utility struct that wraps a variable name and llvm type. More...
 

Public Types

typedef llvm::IRBuilder LlvmBuilder
 Typedef builder in case we want to change the template arguments later. More...
 

Public Member Functions

 ~LlvmCodeGen ()
 Removes all jit compiled dynamically linked functions from the process. More...
 
RuntimeProfileruntime_profile ()
 
RuntimeProfile::Countercodegen_timer ()
 
void EnableOptimizations (bool enable)
 Turns on/off optimization passes. More...
 
std::string GetIR (bool full_module) const
 
llvm::PointerType * GetPtrType (llvm::Type *type)
 Return a pointer type to 'type'. More...
 
llvm::Type * GetType (const ColumnType &type)
 Returns llvm type for the column type. More...
 
llvm::PointerType * GetPtrType (const ColumnType &type)
 Return a pointer type to 'type' (e.g. int16_t*) More...
 
llvm::Type * GetType (const std::string &name)
 
llvm::PointerType * GetPtrType (const std::string &name)
 Returns the pointer type of the type returned by GetType(name) More...
 
llvm::LLVMContext & context ()
 
llvm::ExecutionEngine * execution_engine ()
 Returns execution engine interface. More...
 
llvm::Module * module ()
 Returns the underlying llvm module. More...
 
void RegisterExprFn (int64_t id, llvm::Function *function)
 
llvm::Function * GetRegisteredExprFn (int64_t id)
 Returns a registered expr function for id or NULL if it does not exist. More...
 
Status FinalizeModule ()
 
llvm::Function * ReplaceCallSites (llvm::Function *caller, bool update_in_place, llvm::Function *new_fn, const std::string &target_name, int *num_replaced)
 
llvm::Function * CloneFunction (llvm::Function *fn)
 Returns a copy of fn. The copy is added to the module. More...
 
void ReplaceInstWithValue (llvm::Instruction *from, llvm::Value *to)
 
llvm::Argument * GetArgument (llvm::Function *fn, int i)
 Returns the i-th argument of fn. More...
 
llvm::Function * FinalizeFunction (llvm::Function *function)
 
int InlineCallSites (llvm::Function *fn, bool skip_registered_fns)
 
llvm::Function * OptimizeFunctionWithExprs (llvm::Function *fn)
 
void AddFunctionToJit (llvm::Function *fn, void **fn_ptr)
 
bool VerifyFunction (llvm::Function *function)
 
void CodegenDebugTrace (LlvmBuilder *builder, const char *message)
 
llvm::Function * GetLibCFunction (FnPrototype *prototype)
 Returns the libc function, adding it to the module if it has not already been. More...
 
llvm::Function * GetFunction (IRFunction::Type)
 
llvm::Function * GetHashFunction (int num_bytes=-1)
 
llvm::Function * GetFnvHashFunction (int num_bytes=-1)
 
llvm::Function * GetMurmurHashFunction (int num_bytes=-1)
 
llvm::AllocaInst * CreateEntryBlockAlloca (llvm::Function *f, const NamedVariable &var)
 
llvm::AllocaInst * CreateEntryBlockAlloca (const LlvmBuilder &builder, llvm::Type *type, const char *name="")
 
void CreateIfElseBlocks (llvm::Function *fn, const std::string &if_name, const std::string &else_name, llvm::BasicBlock **if_block, llvm::BasicBlock **else_block, llvm::BasicBlock *insert_before=NULL)
 
llvm::Value * CastPtrToLlvmPtr (llvm::Type *type, const void *ptr)
 
llvm::Value * GetIntConstant (PrimitiveType type, int64_t val)
 Returns the constant 'val' of 'type'. More...
 
llvm::Value * true_value ()
 Returns true/false constants (bool type) More...
 
llvm::Value * false_value ()
 
llvm::Value * null_ptr_value ()
 
llvm::Type * boolean_type ()
 Simple wrappers to reduce code verbosity. More...
 
llvm::Type * tinyint_type ()
 
llvm::Type * smallint_type ()
 
llvm::Type * int_type ()
 
llvm::Type * bigint_type ()
 
llvm::Type * float_type ()
 
llvm::Type * double_type ()
 
llvm::Type * string_val_type ()
 
llvm::PointerType * ptr_type ()
 
llvm::Type * void_type ()
 
llvm::Type * i128_type ()
 
void GetFunctions (std::vector< llvm::Function * > *functions)
 
void GetSymbols (boost::unordered_set< std::string > *symbols)
 Fils in 'symbols' with all the symbols in the module. More...
 
llvm::Function * CodegenMinMax (const ColumnType &type, bool min)
 Generates function to return min/max(v1, v2) More...
 
void CodegenMemcpy (LlvmBuilder *, llvm::Value *dst, llvm::Value *src, int size)
 
Status LinkModule (const std::string &file)
 

Static Public Member Functions

static void InitializeLlvm (bool load_backend=false)
 
static Status LoadImpalaIR (ObjectPool *, const std::string &id, boost::scoped_ptr< LlvmCodeGen > *codegen)
 
static Status LoadFromFile (ObjectPool *, const std::string &file, const std::string &id, boost::scoped_ptr< LlvmCodeGen > *codegen)
 
template<typename T >
static std::string Print (T *value_or_type)
 Returns the string representation of a llvm::Value* or llvm::Type*. More...
 
static Status LoadModule (LlvmCodeGen *codegen, const std::string &file, llvm::Module **module)
 

Private Member Functions

 LlvmCodeGen (ObjectPool *pool, const std::string &module_id)
 Top level codegen object. 'module_id' is used for debugging when outputting the IR. More...
 
Status Init ()
 Initializes the jitter and execution engine. More...
 
Status LoadIntrinsics ()
 
void * JitFunction (llvm::Function *function)
 
void OptimizeModule ()
 Optimizes the module. This includes pruning the module of any unused functions. More...
 
void ClearHashFns ()
 Clears generated hash fns. This is only used for testing. More...
 

Private Attributes

std::string id_
 ID used for debugging (can be e.g. the fragment instance ID) More...
 
RuntimeProfile profile_
 Codegen counters. More...
 
RuntimeProfile::Counterload_module_timer_
 Time spent reading the .ir file from the file system. More...
 
RuntimeProfile::Counterprepare_module_timer_
 Time spent constructing the in-memory module from the .ir file. More...
 
RuntimeProfile::Countercodegen_timer_
 Time spent doing codegen (adding IR to the module) More...
 
RuntimeProfile::Counteroptimization_timer_
 Time spent optimizing the module. More...
 
RuntimeProfile::Countercompile_timer_
 Time spent compiling the module. More...
 
RuntimeProfile::Countermodule_file_size_
 
bool optimizations_enabled_
 whether or not optimizations are enabled More...
 
bool is_corrupt_
 
bool is_compiled_
 
std::string error_string_
 Error string that llvm will write to. More...
 
boost::scoped_ptr
< llvm::LLVMContext > 
context_
 
llvm::Module * module_
 
boost::scoped_ptr
< llvm::ExecutionEngine > 
execution_engine_
 Execution/Jitting engine. More...
 
std::map< llvm::Function *, booljitted_functions_
 
boost::mutex jitted_functions_lock_
 Lock protecting jitted_functions_. More...
 
std::map< std::string,
llvm::Function * > 
external_functions_
 
std::vector< llvm::Function * > loaded_functions_
 Functions parsed from pre-compiled module. Indexed by ImpalaIR::Function enum. More...
 
std::vector< llvm::Function * > codegend_functions_
 
std::map< int64_t,
llvm::Function * > 
registered_exprs_map_
 A mapping of unique id to registered expr functions. More...
 
std::set< llvm::Function * > registered_exprs_
 A set of all the functions in 'registered_exprs_map_' for quick lookup. More...
 
std::map< llvm::Intrinsic::ID,
llvm::Function * > 
llvm_intrinsics_
 A cache of loaded llvm intrinsics. More...
 
std::map< int, llvm::Function * > hash_fns_
 
std::set< std::string > linked_modules_
 
std::vector< std::pair
< llvm::Function *, void ** > > 
fns_to_jit_compile_
 The vector of functions to automatically JIT compile after FinalizeModule(). More...
 
llvm::Function * debug_trace_fn_
 
std::vector< std::string > debug_strings_
 
llvm::PointerType * ptr_type_
 llvm representation of a few common types. Owned by context. More...
 
llvm::Type * void_type_
 
llvm::Type * string_val_type_
 
llvm::Type * timestamp_val_type_
 
llvm::Value * true_value_
 llvm constants to help with code gen verbosity More...
 
llvm::Value * false_value_
 

Friends

class LlvmCodeGenTest
 
class SubExprElimination
 

Detailed Description

LLVM code generator. This is the top level object to generate jitted code.

LLVM provides a c++ IR builder interface so IR does not need to be written manually. The interface is very low level so each line of IR that needs to be output maps 1:1 with calls to the interface. The llvm documentation is not fantastic and a lot of this was figured out by experimenting. Thankfully, their API is pretty well designed so it's possible to get by without great documentation. The llvm tutorial is very helpful, http://llvm.org/docs/tutorial/LangImpl1.html. In this tutorial, they go over how to JIT an AST for a toy language they create. It is also helpful to use their online app that lets you compile c/c++ to IR. http://llvm.org/demo/index.cgi. This class provides two interfaces, one for testing and one for the query engine. The interface for the query engine will load the cross-compiled IR module (output during the build) and extract all of functions that will be called directly. The test interface can be used to load any precompiled module or none at all (but this class will not validate the module). This class is mostly not threadsafe. During the Prepare() phase of the fragment execution, nodes should codegen functions, and register those functions with AddFunctionToJit(). Afterward, FinalizeModule() should be called at which point all codegened functions are optimized. After FinalizeModule() returns, all function pointers registered with AddFunctionToJit() will be pointing to the appropriate JIT'd function. Currently, each query will create and initialize one of these objects. This requires loading and parsing the cross compiled modules. TODO: we should be able to do this once per process and let llvm compile functions from across modules. LLVM has a nontrivial memory management scheme and objects will take ownership of others. The document is pretty good about being explicit with this but it is not very intuitive. TODO: look into diagnostic output and debuggability TODO: confirm that the multi-threaded usage is correct

Definition at line 107 of file llvm-codegen.h.

Member Typedef Documentation

Typedef builder in case we want to change the template arguments later.

Definition at line 146 of file llvm-codegen.h.

Constructor & Destructor Documentation

impala::LlvmCodeGen::~LlvmCodeGen ( )

Removes all jit compiled dynamically linked functions from the process.

Definition at line 288 of file llvm-codegen.cc.

References execution_engine_, and jitted_functions_.

impala::LlvmCodeGen::LlvmCodeGen ( ObjectPool pool,
const std::string &  module_id 
)
private

Top level codegen object. 'module_id' is used for debugging when outputting the IR.

Definition at line 99 of file llvm-codegen.cc.

References ADD_COUNTER, ADD_TIMER, codegen_timer_, compile_timer_, impala::llvm_initialized, load_module_timer_, loaded_functions_, module_file_size_, optimization_timer_, prepare_module_timer_, and profile_.

Referenced by LoadFromFile().

Member Function Documentation

void impala::LlvmCodeGen::AddFunctionToJit ( llvm::Function *  fn,
void **  fn_ptr 
)

Adds the function to be automatically jit compiled after the module is optimized. That is, after FinalizeModule(), this will do *result_fn_ptr = JitFunction(fn); This is useful since it is not valid to call JitFunction() before every part of the query has finished adding their IR and it's convenient to not have to rewalk the objects. This provides the same behavior as walking each of those objects and calling JitFunction(). In addition, any functions not registered with AddFunctionToJit() are marked as internal in FinalizeModule() and may be removed as part of optimization. This will also wrap functions returning DecimalVals in an ABI-compliant wrapper (see the comment in the .cc file for details). This is so we don't accidentally try to call non-compliant code from native code.

Definition at line 714 of file llvm-codegen.cc.

References impala::LlvmCodeGen::FnPrototype::AddArgument(), context(), FinalizeFunction(), fns_to_jit_compile_, GetType(), impala::CodegenAnyVal::LLVM_DECIMALVAL_NAME, impala::name, and void_type_.

Referenced by impala::PartitionedHashJoinNode::CodegenProcessBuildBatch(), impala::PartitionedHashJoinNode::CodegenProcessProbeBatch(), impala::ScalarFnCall::GetFunction(), impala::HashJoinNode::Prepare(), impala::AggregationNode::Prepare(), impala::ScalarFnCall::Prepare(), impala::PartitionedAggregationNode::Prepare(), and impala::HdfsScanNode::Prepare().

llvm::Type* impala::LlvmCodeGen::bigint_type ( )
inline
llvm::Type* impala::LlvmCodeGen::boolean_type ( )
inline
void impala::LlvmCodeGen::ClearHashFns ( )
private

Clears generated hash fns. This is only used for testing.

Definition at line 958 of file llvm-codegen.cc.

References hash_fns_.

Referenced by impala::LlvmCodeGenTest::ClearHashFns().

Function * impala::LlvmCodeGen::CloneFunction ( llvm::Function *  fn)

Returns a copy of fn. The copy is added to the module.

Definition at line 529 of file llvm-codegen.cc.

References module_.

Referenced by impala::PartitionedHashJoinNode::CodegenProcessProbeBatch(), impala::GetLenOptimizedHashFn(), and ReplaceCallSites().

void impala::LlvmCodeGen::CodegenDebugTrace ( LlvmBuilder builder,
const char *  message 
)

This will generate a printf call instruction to output 'message' at the builder's insert point. Only for debugging.

Definition at line 765 of file llvm-codegen.cc.

References CastPtrToLlvmPtr(), debug_strings_, debug_trace_fn_, impala::DebugTrace(), execution_engine_, module_, ptr_type_, and void_type_.

Referenced by impala::CodegenInnerLoop().

void impala::LlvmCodeGen::CodegenMemcpy ( LlvmBuilder ,
llvm::Value *  dst,
llvm::Value *  src,
int  size 
)

Codegen to call llvm memcpy intrinsic at the current builder location dst & src must be pointer types. size is the number of bytes to copy. No-op if size is zero.

Definition at line 933 of file llvm-codegen.cc.

References false_value(), GetIntConstant(), llvm_intrinsics_, ptr_type(), and impala::TYPE_INT.

Referenced by impala::HashJoinNode::CodegenCreateOutputRow(), impala::PartitionedHashJoinNode::CodegenCreateOutputRow(), and impala::HdfsScanner::CodegenWriteCompleteTuple().

llvm::LLVMContext& impala::LlvmCodeGen::context ( )
inline

Returns reference to llvm context object. Each LlvmCodeGen has its own context to allow multiple threads to be calling into llvm at the same time.

Definition at line 214 of file llvm-codegen.h.

References context_.

Referenced by AddFunctionToJit(), CastPtrToLlvmPtr(), CodegenAssignNullValue(), impala::CompoundPredicate::CodegenComputeFn(), CodegenCrcHash(), impala::HashJoinNode::CodegenCreateOutputRow(), impala::PartitionedHashJoinNode::CodegenCreateOutputRow(), impala::HashTableCtx::CodegenEquals(), impala::OldHashTable::CodegenEquals(), impala::ExecNode::CodegenEvalConjuncts(), impala::HashTableCtx::CodegenEvalRow(), impala::OldHashTable::CodegenEvalTupleRow(), impala::HashTableCtx::CodegenHashCurrentRow(), impala::OldHashTable::CodegenHashCurrentRow(), impala::CodegenInnerLoop(), impala::SlotDescriptor::CodegenIsNull(), impala::HdfsAvroScanner::CodegenMaterializeTuple(), CodegenMinMax(), impala::CodegenStringTest(), impala::SlotDescriptor::CodegenUpdateNull(), impala::AggregationNode::CodegenUpdateTuple(), impala::PartitionedAggregationNode::CodegenUpdateTuple(), impala::HdfsScanner::CodegenWriteCompleteTuple(), impala::TextConverter::CodegenWriteSlot(), CreateIfElseBlocks(), impala::TupleDescriptor::GenerateLlvmStruct(), impala::NullLiteral::GetCodegendComputeFn(), impala::CaseExpr::GetCodegendComputeFn(), impala::SlotRef::GetCodegendComputeFn(), impala::Literal::GetCodegendComputeFn(), impala::ScalarFnCall::GetCodegendComputeFn(), impala::Expr::GetCodegendComputeFnWrapper(), GetHashFunction(), impala::CodegenAnyVal::GetHighBits(), GetIntConstant(), GetType(), i128_type(), Init(), LoadModule(), and impala::CodegenAnyVal::SetHighBits().

llvm::AllocaInst* impala::LlvmCodeGen::CreateEntryBlockAlloca ( llvm::Function *  f,
const NamedVariable var 
)

Allocate stack storage for local variables. This is similar to traditional c, where all the variables must be declared at the top of the function. This helper can be called from anywhere and will add a stack allocation for 'var' at the beginning of the function. This would be used, for example, if a function needed a temporary struct allocated. The allocated variable is scoped to the function. This should always be used instead of calling LlvmBuilder::CreateAlloca directly. LLVM doesn't optimize alloca's occuring in the middle of functions very well (e.g, an alloca may end up in a loop, potentially blowing the stack).

Referenced by impala::TextConverter::CodegenWriteSlot(), impala::CodegenAnyVal::CreateCall(), impala::ScalarFnCall::GetCodegendComputeFn(), and impala::CodegenAnyVal::GetUnloweredPtr().

llvm::AllocaInst* impala::LlvmCodeGen::CreateEntryBlockAlloca ( const LlvmBuilder builder,
llvm::Type *  type,
const char *  name = "" 
)
void impala::LlvmCodeGen::CreateIfElseBlocks ( llvm::Function *  fn,
const std::string &  if_name,
const std::string &  else_name,
llvm::BasicBlock **  if_block,
llvm::BasicBlock **  else_block,
llvm::BasicBlock *  insert_before = NULL 
)

Utility to create two blocks in 'fn' for if/else codegen. if_block and else_block are return parameters. insert_before is optional and if set, the two blocks will be inserted before that block otherwise, it will be inserted at the end of 'fn'. Being able to place blocks is useful for debugging so the IR has a better looking control flow.

Definition at line 405 of file llvm-codegen.cc.

References context().

Referenced by CodegenMinMax(), and impala::TextConverter::CodegenWriteSlot().

llvm::Type* impala::LlvmCodeGen::double_type ( )
inline

Definition at line 391 of file llvm-codegen.h.

References GetType(), and impala::TYPE_DOUBLE.

Referenced by impala::CodegenAnyVal::GetLoweredType().

void impala::LlvmCodeGen::EnableOptimizations ( bool  enable)

Turns on/off optimization passes.

Definition at line 295 of file llvm-codegen.cc.

References optimizations_enabled_.

Referenced by Java_com_cloudera_impala_service_FeSupport_NativeEvalConstExprs().

llvm::ExecutionEngine* impala::LlvmCodeGen::execution_engine ( )
inline

Returns execution engine interface.

Definition at line 217 of file llvm-codegen.h.

References execution_engine_.

Referenced by impala::TupleDescriptor::GenerateLlvmStruct(), impala::ScalarFnCall::GetUdf(), and LoadImpalaIR().

Function * impala::LlvmCodeGen::FinalizeFunction ( llvm::Function *  function)

Verify and optimize function. This should be called at the end for each codegen'd function. If the function does not verify, it will delete the function and return NULL, otherwise, it will optimize and return the function object.

Definition at line 596 of file llvm-codegen.cc.

References VerifyFunction().

Referenced by AddFunctionToJit(), impala::CompoundPredicate::CodegenComputeFn(), CodegenCrcHash(), impala::HashJoinNode::CodegenCreateOutputRow(), impala::PartitionedHashJoinNode::CodegenCreateOutputRow(), impala::HashTableCtx::CodegenEquals(), impala::OldHashTable::CodegenEquals(), impala::ExecNode::CodegenEvalConjuncts(), impala::HashTableCtx::CodegenEvalRow(), impala::OldHashTable::CodegenEvalTupleRow(), impala::HashTableCtx::CodegenHashCurrentRow(), impala::OldHashTable::CodegenHashCurrentRow(), impala::SlotDescriptor::CodegenIsNull(), impala::HdfsAvroScanner::CodegenMaterializeTuple(), impala::SlotDescriptor::CodegenUpdateNull(), impala::AggregationNode::CodegenUpdateTuple(), impala::PartitionedAggregationNode::CodegenUpdateTuple(), impala::HdfsScanner::CodegenWriteAlignedTuples(), impala::HdfsScanner::CodegenWriteCompleteTuple(), impala::TextConverter::CodegenWriteSlot(), impala::NullLiteral::GetCodegendComputeFn(), impala::CaseExpr::GetCodegendComputeFn(), impala::SlotRef::GetCodegendComputeFn(), impala::Literal::GetCodegendComputeFn(), impala::ScalarFnCall::GetCodegendComputeFn(), impala::Expr::GetCodegendComputeFnWrapper(), GetHashFunction(), impala::GetLenOptimizedHashFn(), and OptimizeFunctionWithExprs().

Status impala::LlvmCodeGen::FinalizeModule ( )

Optimize and compile the module. This should be called after all functions to JIT have been added to the module via AddFunctionToJit(). If optimizations_enabled_ is false, the module will not be optimized before compilation.

Definition at line 607 of file llvm-codegen.cc.

References compile_timer_, fns_to_jit_compile_, GetIR(), id_, is_compiled_, is_corrupt_, JitFunction(), impala::Status::OK, optimizations_enabled_, OptimizeModule(), path(), profile_, SCOPED_TIMER, and impala::RuntimeProfile::total_time_counter().

Referenced by Java_com_cloudera_impala_service_FeSupport_NativeEvalConstExprs(), and impala::PlanFragmentExecutor::OptimizeLlvmModule().

llvm::Type* impala::LlvmCodeGen::float_type ( )
inline

Definition at line 390 of file llvm-codegen.h.

References GetType(), and impala::TYPE_FLOAT.

Referenced by impala::CodegenAnyVal::GetVal().

Argument * impala::LlvmCodeGen::GetArgument ( llvm::Function *  fn,
int  i 
)

Returns the i-th argument of fn.

Definition at line 1105 of file llvm-codegen.cc.

Referenced by impala::PartitionedHashJoinNode::CodegenProcessProbeBatch(), and impala::GetLenOptimizedHashFn().

Function * impala::LlvmCodeGen::GetFnvHashFunction ( int  num_bytes = -1)

Definition at line 1092 of file llvm-codegen.cc.

References impala::GetLenOptimizedHashFn().

void impala::LlvmCodeGen::GetFunctions ( std::vector< llvm::Function * > *  functions)

Fills 'functions' with all the functions that are defined in the module. Note: this does not include functions that are just declared

Definition at line 800 of file llvm-codegen.cc.

References module_.

Referenced by LoadImpalaIR().

Function * impala::LlvmCodeGen::GetHashFunction ( int  num_bytes = -1)

Returns the hash function with signature: int32_t Hash(int8_t* data, int len, int32_t seed); If num_bytes is non-zero, the returned function will be codegen'd to only work for that number of bytes. It is invalid to call that function with a different 'len'.

Definition at line 985 of file llvm-codegen.cc.

References impala::LlvmCodeGen::FnPrototype::AddArgument(), context(), FinalizeFunction(), GetFunction(), GetIntConstant(), GetMurmurHashFunction(), GetPtrType(), GetType(), hash_fns_, impala::CpuInfo::IsSupported(), llvm_intrinsics_, ptr_type(), impala::CpuInfo::SSE4_2, impala::TYPE_BIGINT, impala::TYPE_INT, and impala::TYPE_SMALLINT.

Referenced by CodegenCrcHash(), impala::HashTableCtx::CodegenHashCurrentRow(), and impala::OldHashTable::CodegenHashCurrentRow().

string impala::LlvmCodeGen::GetIR ( bool  full_module) const

For debugging. Returns the IR that was generated. If full_module, the entire module is dumped, including what was loaded from precompiled IR. If false, only output IR for functions which were generated.

Definition at line 299 of file llvm-codegen.cc.

References codegend_functions_, and module_.

Referenced by FinalizeModule().

Function * impala::LlvmCodeGen::GetLibCFunction ( FnPrototype prototype)

Returns the libc function, adding it to the module if it has not already been.

Definition at line 412 of file llvm-codegen.cc.

References external_functions_, impala::LlvmCodeGen::FnPrototype::GeneratePrototype(), and impala::LlvmCodeGen::FnPrototype::name().

Function * impala::LlvmCodeGen::GetMurmurHashFunction ( int  num_bytes = -1)
PointerType * impala::LlvmCodeGen::GetPtrType ( const ColumnType type)

Return a pointer type to 'type' (e.g. int16_t*)

Definition at line 344 of file llvm-codegen.cc.

References GetType().

llvm::PointerType* impala::LlvmCodeGen::GetPtrType ( const std::string &  name)

Returns the pointer type of the type returned by GetType(name)

llvm::Function* impala::LlvmCodeGen::GetRegisteredExprFn ( int64_t  id)
inline

Returns a registered expr function for id or NULL if it does not exist.

Definition at line 231 of file llvm-codegen.h.

References registered_exprs_map_.

Referenced by impala::SlotRef::GetCodegendComputeFn().

void impala::LlvmCodeGen::GetSymbols ( boost::unordered_set< std::string > *  symbols)

Fils in 'symbols' with all the symbols in the module.

Definition at line 808 of file llvm-codegen.cc.

References module_.

Type * impala::LlvmCodeGen::GetType ( const ColumnType type)

Returns llvm type for the column type.

Definition at line 312 of file llvm-codegen.cc.

References context(), impala::ColumnType::GetByteSize(), string_val_type_, timestamp_val_type_, impala::ColumnType::type, impala::TYPE_BIGINT, impala::TYPE_BOOLEAN, impala::TYPE_CHAR, impala::TYPE_DECIMAL, impala::TYPE_DOUBLE, impala::TYPE_FLOAT, impala::TYPE_INT, impala::TYPE_NULL, impala::TYPE_SMALLINT, impala::TYPE_STRING, impala::TYPE_TIMESTAMP, impala::TYPE_TINYINT, and impala::TYPE_VARCHAR.

Referenced by AddFunctionToJit(), bigint_type(), boolean_type(), impala::CompoundPredicate::CodegenComputeFn(), CodegenCrcHash(), impala::HashJoinNode::CodegenCreateOutputRow(), impala::PartitionedHashJoinNode::CodegenCreateOutputRow(), impala::HashTableCtx::CodegenEquals(), impala::OldHashTable::CodegenEquals(), impala::ExecNode::CodegenEvalConjuncts(), impala::HashTableCtx::CodegenEvalRow(), impala::OldHashTable::CodegenEvalTupleRow(), impala::HashTableCtx::CodegenHashCurrentRow(), impala::OldHashTable::CodegenHashCurrentRow(), impala::SlotDescriptor::CodegenIsNull(), impala::HdfsAvroScanner::CodegenMaterializeTuple(), CodegenMinMax(), impala::CodegenStringTest(), impala::AggregationNode::CodegenUpdateTuple(), impala::PartitionedAggregationNode::CodegenUpdateTuple(), impala::HdfsScanner::CodegenWriteCompleteTuple(), impala::TextConverter::CodegenWriteSlot(), impala::CodegenAnyVal::CreateCall(), double_type(), float_type(), impala::TupleDescriptor::GenerateLlvmStruct(), impala::SlotRef::GetCodegendComputeFn(), GetHashFunction(), impala::CodegenAnyVal::GetLoweredType(), GetPtrType(), impala::ScalarFnCall::GetUdf(), impala::CodegenAnyVal::GetUnloweredType(), impala::CodegenAnyVal::GetVal(), Init(), int_type(), LoadImpalaIR(), LoadIntrinsics(), impala::CodegenAnyVal::SetFromRawValue(), smallint_type(), tinyint_type(), and impala::CodegenAnyVal::ToNativeValue().

llvm::Type* impala::LlvmCodeGen::GetType ( const std::string &  name)

Returns the type with 'name'. This is used to pull types from clang compiled IR. The types we generate at runtime are unnamed. The name is generated by the clang compiler in this form: <class/struct>.<namespace>::<class name>="">. For example: "class.impala::AggregationNode"

llvm::Type* impala::LlvmCodeGen::i128_type ( )
inline

Definition at line 395 of file llvm-codegen.h.

References context().

Referenced by impala::CodegenAnyVal::SetVal().

void impala::LlvmCodeGen::InitializeLlvm ( bool  load_backend = false)
static

This function must be called once per process before any llvm API calls are made. LLVM needs to allocate data structures for multi-threading support and to enable dynamic linking of jitted code. if 'load_backend', load the backend static object for llvm. This is needed when libbackend.so is loaded from java. llvm will be default only look in the current object and not be able to find the backend symbols TODO: this can probably be removed after impalad refactor where the java side is not loading the be explicitly anymore.

Definition at line 78 of file llvm-codegen.cc.

References impala::llvm_initialization_lock, impala::llvm_initialized, and path().

Referenced by Java_com_cloudera_impala_service_FeSupport_NativeFeTestInit(), and main().

int impala::LlvmCodeGen::InlineCallSites ( llvm::Function *  fn,
bool  skip_registered_fns 
)

Inlines all function calls for 'fn' that are marked as always inline. (We can't inline all call sites since pulling in boost/other libs could have recursion. Instead, we just inline our functions and rely on the llvm inliner to pick the rest.) 'fn' is modified in place. Returns the number of functions inlined. This is not called recursively (i.e. second level function calls are not inlined). This can be called again to inline those until this returns 0.

Definition at line 541 of file llvm-codegen.cc.

References registered_exprs_.

Referenced by OptimizeFunctionWithExprs(), and impala::SubExprElimination::Run().

llvm::Type* impala::LlvmCodeGen::int_type ( )
inline
void * impala::LlvmCodeGen::JitFunction ( llvm::Function *  function)
private

Get the function pointer to the JIT'd version of function. The result is a function pointer that is dynamically linked into the process. Returns NULL if the function is invalid. Note that this will compile, but not optimize, function if necessary. This function shouldn't be called after calling FinalizeModule(). Instead use AddFunctionToJit() to register a function pointer. This is because FinalizeModule() may remove any functions not registered in AddFunctionToJit(). As such, this function is mostly useful for tests that do not call FinalizeModule() at all.

Definition at line 748 of file llvm-codegen.cc.

References execution_engine_, is_corrupt_, jitted_functions_, and jitted_functions_lock_.

Referenced by FinalizeModule(), and impala::LlvmCodeGenTest::JitFunction().

Status impala::LlvmCodeGen::LinkModule ( const std::string &  file)

Loads a module at 'file' and links it to the module associated with this LlvmCodeGen object. The module must be on the local filesystem.

Definition at line 162 of file llvm-codegen.cc.

References linked_modules_, LoadModule(), module_, impala::Status::OK, profile_, RETURN_IF_ERROR, SCOPED_TIMER, and impala::RuntimeProfile::total_time_counter().

Referenced by impala::ScalarFnCall::Prepare().

Status impala::LlvmCodeGen::LoadFromFile ( ObjectPool ,
const std::string &  file,
const std::string &  id,
boost::scoped_ptr< LlvmCodeGen > *  codegen 
)
static

Load a pre-compiled IR module from 'file'. This creates a top level codegen object. codegen will contain the created object on success.

Definition at line 122 of file llvm-codegen.cc.

References LlvmCodeGen(), LoadModule(), RETURN_IF_ERROR, and SCOPED_TIMER.

Referenced by LoadImpalaIR().

Status impala::LlvmCodeGen::LoadImpalaIR ( ObjectPool ,
const std::string &  id,
boost::scoped_ptr< LlvmCodeGen > *  codegen 
)
static
Status impala::LlvmCodeGen::LoadIntrinsics ( )
private

Load the intrinsics impala needs. This is a one time initialization. Values are stored in 'llvm_intrinsics_'

Definition at line 895 of file llvm-codegen.cc.

References GetType(), llvm_intrinsics_, module(), impala::Status::OK, ptr_type(), and impala::TYPE_INT.

Referenced by Init().

Status impala::LlvmCodeGen::LoadModule ( LlvmCodeGen codegen,
const std::string &  file,
llvm::Module **  module 
)
static

Loads an LLVM module. 'file' should be the local path to the LLVM bitcode (.ll) file. If 'file_size' is not NULL, it will be set to the size of 'file'. The caller is responsible for cleaning up module.

Definition at line 134 of file llvm-codegen.cc.

References context(), COUNTER_ADD, load_module_timer_, module_file_size_, impala::Status::OK, prepare_module_timer_, and SCOPED_TIMER.

Referenced by LinkModule(), and LoadFromFile().

llvm::Module* impala::LlvmCodeGen::module ( )
inline

Returns the underlying llvm module.

Definition at line 220 of file llvm-codegen.h.

References module_.

Referenced by impala::ScalarFnCall::GetFunction(), impala::ScalarFnCall::GetUdf(), and LoadIntrinsics().

llvm::Value* impala::LlvmCodeGen::null_ptr_value ( )
inline
Function * impala::LlvmCodeGen::OptimizeFunctionWithExprs ( llvm::Function *  fn)

Optimizes the function in place. This uses a combination of llvm optimization passes as well as some custom heuristics. This should be called for all functions which call Exprs. The exprs will be inlined as much as possible, and will do basic sub expression elimination. This should be called before FinalizeModule for functions that want to remove redundant exprs. This should be called at the highest level possible to maximize the number of redundant exprs that can be found. TODO: we need to spend more time to output better IR. Asking llvm to remove redundant codeblocks on its own is too difficult for it. TODO: this should implement the llvm FunctionPass interface and integrated with the llvm optimization passes.

Definition at line 583 of file llvm-codegen.cc.

References FinalizeFunction(), and InlineCallSites().

Referenced by impala::HdfsAvroScanner::CodegenDecodeAvroData(), impala::PartitionedAggregationNode::CodegenProcessBatch(), impala::HashJoinNode::CodegenProcessBuildBatch(), impala::PartitionedHashJoinNode::CodegenProcessBuildBatch(), impala::HashJoinNode::CodegenProcessProbeBatch(), impala::PartitionedHashJoinNode::CodegenProcessProbeBatch(), impala::AggregationNode::CodegenProcessRowBatch(), and impala::HdfsScanner::CodegenWriteCompleteTuple().

void impala::LlvmCodeGen::OptimizeModule ( )
private

Optimizes the module. This includes pruning the module of any unused functions.

Definition at line 652 of file llvm-codegen.cc.

References fns_to_jit_compile_, module_, optimization_timer_, impala::InstructionCounter::PrintCounters(), SCOPED_TIMER, and impala::InstructionCounter::visit().

Referenced by FinalizeModule().

template<typename T >
static std::string impala::LlvmCodeGen::Print ( T *  value_or_type)
inlinestatic

Returns the string representation of a llvm::Value* or llvm::Type*.

Definition at line 326 of file llvm-codegen.h.

Referenced by impala::PartitionedHashJoinNode::CodegenProcessProbeBatch(), impala::CodegenAnyVal::SetFromRawValue(), and VerifyFunction().

void impala::LlvmCodeGen::RegisterExprFn ( int64_t  id,
llvm::Function *  function 
)
inline

Register a expr function with unique id. It can be subsequently retrieved via GetRegisteredExprFn with that id.

Definition at line 224 of file llvm-codegen.h.

References registered_exprs_, and registered_exprs_map_.

Referenced by impala::SlotRef::GetCodegendComputeFn().

Function * impala::LlvmCodeGen::ReplaceCallSites ( llvm::Function *  caller,
bool  update_in_place,
llvm::Function *  new_fn,
const std::string &  target_name,
int *  num_replaced 
)

Replaces all instructions that call 'target_name' with a call instruction to the new_fn. Returns the modified function.

  • target_name is the unmangled function name that should be replaced. The name is assumed to be unmangled so all call sites that contain the replace_name substring will be replaced. target_name is case-sensitive TODO: be more strict than substring? work out the mangling rules?
  • If update_in_place is true, the caller function will be modified in place. Otherwise, the caller function will be cloned and the original function is unmodified. If update_in_place is false and the function is already been dynamically linked, the existing function will be unlinked. Note that this is very unthread-safe, if there are threads in the function to be unlinked, bad things will happen.
  • 'num_replaced' returns the number of call sites updated Most of our use cases will likely not be in place. We will have one 'template' version of the function loaded for each type of Node (e.g. AggregationNode). Each instance of the node will clone the function, replacing the inner loop body with the codegened version. The codegened bodies differ from instance to instance since they are specific to the node's tuple desc.

Definition at line 489 of file llvm-codegen.cc.

References CloneFunction(), execution_engine_, jitted_functions_, and module_.

Referenced by impala::HdfsAvroScanner::CodegenDecodeAvroData(), impala::PartitionedAggregationNode::CodegenProcessBatch(), impala::HashJoinNode::CodegenProcessBuildBatch(), impala::PartitionedHashJoinNode::CodegenProcessBuildBatch(), impala::HashJoinNode::CodegenProcessProbeBatch(), impala::PartitionedHashJoinNode::CodegenProcessProbeBatch(), impala::AggregationNode::CodegenProcessRowBatch(), and impala::HdfsScanner::CodegenWriteAlignedTuples().

void impala::LlvmCodeGen::ReplaceInstWithValue ( llvm::Instruction *  from,
llvm::Value *  to 
)

Replace all uses of the instruction 'from' with the value 'to', and delete 'from'. This is a wrapper around llvm::ReplaceInstWithValue().

Definition at line 1100 of file llvm-codegen.cc.

RuntimeProfile* impala::LlvmCodeGen::runtime_profile ( )
inline

Definition at line 134 of file llvm-codegen.h.

References profile_.

llvm::Type* impala::LlvmCodeGen::smallint_type ( )
inline

Definition at line 387 of file llvm-codegen.h.

References GetType(), and impala::TYPE_SMALLINT.

Referenced by impala::CodegenAnyVal::GetLoweredType().

llvm::Type* impala::LlvmCodeGen::string_val_type ( )
inline

Definition at line 392 of file llvm-codegen.h.

References string_val_type_.

llvm::Type* impala::LlvmCodeGen::tinyint_type ( )
inline
bool impala::LlvmCodeGen::VerifyFunction ( llvm::Function *  function)

Verfies the function if the verfier is enabled. Returns false if function is invalid.

Definition at line 431 of file llvm-codegen.cc.

References gen_ir_descriptions::fn_name, is_corrupt_, and Print().

Referenced by CodegenMinMax(), and FinalizeFunction().

Friends And Related Function Documentation

friend class LlvmCodeGenTest
friend

Definition at line 423 of file llvm-codegen.h.

friend class SubExprElimination
friend

Definition at line 424 of file llvm-codegen.h.

Member Data Documentation

RuntimeProfile::Counter* impala::LlvmCodeGen::codegen_timer_
private

Time spent doing codegen (adding IR to the module)

Definition at line 466 of file llvm-codegen.h.

Referenced by codegen_timer(), and LlvmCodeGen().

std::vector<llvm::Function*> impala::LlvmCodeGen::codegend_functions_
private

Stores functions codegen'd by impala. This does not contain cross compiled functions, only function that were generated at runtime. Does not overlap with loaded_functions_.

Definition at line 522 of file llvm-codegen.h.

Referenced by GetIR().

RuntimeProfile::Counter* impala::LlvmCodeGen::compile_timer_
private

Time spent compiling the module.

Definition at line 472 of file llvm-codegen.h.

Referenced by FinalizeModule(), and LlvmCodeGen().

boost::scoped_ptr<llvm::LLVMContext> impala::LlvmCodeGen::context_
private

Top level llvm object. Objects from different contexts do not share anything. We can have multiple instances of the LlvmCodeGen object in different threads

Definition at line 494 of file llvm-codegen.h.

Referenced by context().

std::vector<std::string> impala::LlvmCodeGen::debug_strings_
private

Debug strings that will be outputted by jitted code. This is a copy of all strings passed to CodegenDebugTrace.

Definition at line 551 of file llvm-codegen.h.

Referenced by CodegenDebugTrace().

llvm::Function* impala::LlvmCodeGen::debug_trace_fn_
private

Debug utility that will insert a printf-like function into the generated IR. Useful for debugging the IR. This is lazily created.

Definition at line 547 of file llvm-codegen.h.

Referenced by CodegenDebugTrace().

std::string impala::LlvmCodeGen::error_string_
private

Error string that llvm will write to.

Definition at line 490 of file llvm-codegen.h.

Referenced by Init().

boost::scoped_ptr<llvm::ExecutionEngine> impala::LlvmCodeGen::execution_engine_
private

Execution/Jitting engine.

Definition at line 501 of file llvm-codegen.h.

Referenced by CodegenDebugTrace(), execution_engine(), Init(), JitFunction(), ReplaceCallSites(), and ~LlvmCodeGen().

std::map<std::string, llvm::Function*> impala::LlvmCodeGen::external_functions_
private

Keeps track of the external functions that have been included in this module e.g libc functions or non-jitted impala functions. TODO: this should probably be FnPrototype->Functions mapping

Definition at line 514 of file llvm-codegen.h.

Referenced by GetLibCFunction().

llvm::Value* impala::LlvmCodeGen::false_value_
private

Definition at line 561 of file llvm-codegen.h.

Referenced by false_value(), and Init().

std::vector<std::pair<llvm::Function*, void**> > impala::LlvmCodeGen::fns_to_jit_compile_
private

The vector of functions to automatically JIT compile after FinalizeModule().

Definition at line 543 of file llvm-codegen.h.

Referenced by AddFunctionToJit(), FinalizeModule(), and OptimizeModule().

std::map<int, llvm::Function*> impala::LlvmCodeGen::hash_fns_
private

This is a cache of generated hash functions by byte size. It is common for the caller to know the number of bytes to hash (e.g. tuple width) and we can codegen a loop unrolled hash function.

Definition at line 536 of file llvm-codegen.h.

Referenced by ClearHashFns(), and GetHashFunction().

std::string impala::LlvmCodeGen::id_
private

ID used for debugging (can be e.g. the fragment instance ID)

Definition at line 454 of file llvm-codegen.h.

Referenced by FinalizeModule(), and Init().

bool impala::LlvmCodeGen::is_compiled_
private

If true, the module has been compiled. It is not valid to add additional functions after this point.

Definition at line 487 of file llvm-codegen.h.

Referenced by FinalizeModule(), and impala::LlvmCodeGen::FnPrototype::FnPrototype().

bool impala::LlvmCodeGen::is_corrupt_
private

If true, the module is corrupt and we cannot codegen this query. TODO: we could consider just removing the offending function and attempting to codegen the rest of the query. This requires more testing though to make sure that the error is recoverable.

Definition at line 483 of file llvm-codegen.h.

Referenced by FinalizeModule(), JitFunction(), and VerifyFunction().

std::map<llvm::Function*, bool> impala::LlvmCodeGen::jitted_functions_
private

Keeps track of all the functions that have been jit compiled and linked into the process. Special care needs to be taken if we need to modify these functions. bool is unused.

Definition at line 506 of file llvm-codegen.h.

Referenced by JitFunction(), ReplaceCallSites(), and ~LlvmCodeGen().

boost::mutex impala::LlvmCodeGen::jitted_functions_lock_
private

Lock protecting jitted_functions_.

Definition at line 509 of file llvm-codegen.h.

Referenced by JitFunction().

std::set<std::string> impala::LlvmCodeGen::linked_modules_
private

The locations of modules that have been linked. Used to avoid linking the same module twice, which causes symbol collision errors.

Definition at line 540 of file llvm-codegen.h.

Referenced by LinkModule().

std::map<llvm::Intrinsic::ID, llvm::Function*> impala::LlvmCodeGen::llvm_intrinsics_
private

A cache of loaded llvm intrinsics.

Definition at line 531 of file llvm-codegen.h.

Referenced by CodegenMemcpy(), GetHashFunction(), and LoadIntrinsics().

RuntimeProfile::Counter* impala::LlvmCodeGen::load_module_timer_
private

Time spent reading the .ir file from the file system.

Definition at line 460 of file llvm-codegen.h.

Referenced by LlvmCodeGen(), and LoadModule().

std::vector<llvm::Function*> impala::LlvmCodeGen::loaded_functions_
private

Functions parsed from pre-compiled module. Indexed by ImpalaIR::Function enum.

Definition at line 517 of file llvm-codegen.h.

Referenced by GetFunction(), LlvmCodeGen(), and LoadImpalaIR().

llvm::Module* impala::LlvmCodeGen::module_
private

Top level codegen object. Contains everything to jit one 'unit' of code. Owned by the execution_engine_.

Definition at line 498 of file llvm-codegen.h.

Referenced by CloneFunction(), CodegenDebugTrace(), GetFunctions(), GetIR(), GetSymbols(), Init(), LinkModule(), module(), OptimizeModule(), and ReplaceCallSites().

RuntimeProfile::Counter* impala::LlvmCodeGen::module_file_size_
private

Definition at line 474 of file llvm-codegen.h.

Referenced by LlvmCodeGen(), and LoadModule().

RuntimeProfile::Counter* impala::LlvmCodeGen::optimization_timer_
private

Time spent optimizing the module.

Definition at line 469 of file llvm-codegen.h.

Referenced by LlvmCodeGen(), and OptimizeModule().

bool impala::LlvmCodeGen::optimizations_enabled_
private

whether or not optimizations are enabled

Definition at line 477 of file llvm-codegen.h.

Referenced by EnableOptimizations(), and FinalizeModule().

RuntimeProfile::Counter* impala::LlvmCodeGen::prepare_module_timer_
private

Time spent constructing the in-memory module from the .ir file.

Definition at line 463 of file llvm-codegen.h.

Referenced by LlvmCodeGen(), LoadImpalaIR(), and LoadModule().

RuntimeProfile impala::LlvmCodeGen::profile_
private

Codegen counters.

Definition at line 457 of file llvm-codegen.h.

Referenced by FinalizeModule(), LinkModule(), LlvmCodeGen(), LoadImpalaIR(), and runtime_profile().

llvm::PointerType* impala::LlvmCodeGen::ptr_type_
private

llvm representation of a few common types. Owned by context.

Definition at line 554 of file llvm-codegen.h.

Referenced by CodegenDebugTrace(), Init(), and ptr_type().

std::set<llvm::Function*> impala::LlvmCodeGen::registered_exprs_
private

A set of all the functions in 'registered_exprs_map_' for quick lookup.

Definition at line 528 of file llvm-codegen.h.

Referenced by InlineCallSites(), RegisterExprFn(), and impala::SubExprElimination::Run().

std::map<int64_t, llvm::Function*> impala::LlvmCodeGen::registered_exprs_map_
private

A mapping of unique id to registered expr functions.

Definition at line 525 of file llvm-codegen.h.

Referenced by GetRegisteredExprFn(), and RegisterExprFn().

llvm::Type* impala::LlvmCodeGen::string_val_type_
private

Definition at line 556 of file llvm-codegen.h.

Referenced by GetType(), LoadImpalaIR(), and string_val_type().

llvm::Type* impala::LlvmCodeGen::timestamp_val_type_
private

Definition at line 557 of file llvm-codegen.h.

Referenced by GetType(), and LoadImpalaIR().

llvm::Value* impala::LlvmCodeGen::true_value_
private

llvm constants to help with code gen verbosity

Definition at line 560 of file llvm-codegen.h.

Referenced by Init(), and true_value().

llvm::Type* impala::LlvmCodeGen::void_type_
private

Definition at line 555 of file llvm-codegen.h.

Referenced by AddFunctionToJit(), CodegenDebugTrace(), Init(), and void_type().


The documentation for this class was generated from the following files: