Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
Classes | |
class | UdaTestHarnessUtil |
class | UdaTestHarnessBase |
class | UdaTestHarness |
class | UdaTestHarness2 |
class | UdaTestHarness3 |
class | UdaTestHarness4 |
class | UdfTestHarness |
Utility class to help test UDFs. More... | |
class | FunctionContext |
struct | AnyVal |
struct | BooleanVal |
struct | TinyIntVal |
struct | SmallIntVal |
struct | IntVal |
struct | BigIntVal |
struct | FloatVal |
struct | DoubleVal |
struct | TimestampVal |
This object has a compatible storage format with boost::ptime. More... | |
struct | StringVal |
struct | DecimalVal |
Typedefs | |
typedef void(* | UdfPrepare )(FunctionContext *context, FunctionContext::FunctionStateScope scope) |
typedef void(* | UdfClose )(FunctionContext *context, FunctionContext::FunctionStateScope scope) |
typedef AnyVal | InputType |
typedef AnyVal | InputType2 |
typedef AnyVal | ResultType |
typedef AnyVal | IntermediateType |
typedef void(* | UdaInit )(FunctionContext *context, IntermediateType *result) |
typedef void(* | UdaUpdate )(FunctionContext *context, const InputType &input, IntermediateType *result) |
typedef void(* | UdaUpdate2 )(FunctionContext *context, const InputType &input, const InputType2 &input2, IntermediateType *result) |
typedef void(* | UdaMerge )(FunctionContext *context, const IntermediateType &src, IntermediateType *dst) |
Merge an intermediate result 'src' into 'dst'. More... | |
typedef const IntermediateType(* | UdaSerialize )(FunctionContext *context, const IntermediateType &type) |
typedef ResultType(* | UdaFinalize )(FunctionContext *context, const IntermediateType &v) |
typedef uint8_t * | BufferVal |
Enumerations | |
enum | UdaExecutionMode { ALL = 0, SINGLE_NODE = 1, ONE_LEVEL = 2, TWO_LEVEL = 3 } |
Functions | |
template<typename T > | |
std::string | DebugString (const T &val) |
template<> | |
std::string | DebugString (const StringVal &val) |
typedef uint8_t* impala_udf::BufferVal |
typedef AnyVal impala_udf::InputType |
The UDA execution is broken up into a few steps. The general calling pattern is one of these: 1) Init(), Update() (repeatedly), Serialize() 2) Init(), Update() (repeatedly), Finalize() 3) Init(), Merge() (repeatedly), Serialize() 4) Init(), Merge() (repeatedly), Finalize() The UDA is registered with three types: the result type, the input type and the intermediate type. If the UDA needs a fixed byte width intermediate buffer, the type should be TYPE_FIXED_BUFFER and Impala will allocate the buffer. If the UDA needs an unknown sized buffer, it should use TYPE_STRING and allocate it from the FunctionContext manually. For UDAs that need a complex data structure as the intermediate state, the intermediate type should be string and the UDA can cast the ptr to the structure it is using. Memory Management: For allocations that are not returned to Impala, the UDA should use the FunctionContext::Allocate()/Free() methods. In general, Allocate() is called in Init(), and then Free() must be called in both Serialize() and Finalize(), since either of these functions may be called to clean up the state. For StringVal allocations returned to Impala (e.g. returned by UdaSerialize()), the UDA should allocate the result via StringVal(FunctionContext*, int) ctor and Impala will automatically handle freeing it. For clarity in documenting the UDA interface, the various types will be typedefed here. The actual execution resolves all the types at runtime and none of these types should actually be used.
typedef AnyVal impala_udf::InputType2 |
typedef AnyVal impala_udf::IntermediateType |
typedef AnyVal impala_udf::ResultType |
typedef ResultType(* impala_udf::UdaFinalize)(FunctionContext *context, const IntermediateType &v) |
Called once at the end to return the final value for this UDA. No additional functions will be called with this FunctionContext object and the UDA should do final clean (e.g. Free()) here.
typedef void(* impala_udf::UdaInit)(FunctionContext *context, IntermediateType *result) |
typedef void(* impala_udf::UdaMerge)(FunctionContext *context, const IntermediateType &src, IntermediateType *dst) |
typedef const IntermediateType(* impala_udf::UdaSerialize)(FunctionContext *context, const IntermediateType &type) |
Serialize the intermediate type. The serialized data is then sent across the wire. No additional functions will be called with this FunctionContext object and the UDA should do final clean (e.g. Free()) here.
typedef void(* impala_udf::UdaUpdate)(FunctionContext *context, const InputType &input, IntermediateType *result) |
typedef void(* impala_udf::UdaUpdate2)(FunctionContext *context, const InputType &input, const InputType2 &input2, IntermediateType *result) |
typedef void(* impala_udf::UdfClose)(FunctionContext *context, FunctionContext::FunctionStateScope scope) |
The UDF can also optionally include a close function, specified in the "CREATE FUNCTION" statement using "close_fn=<close function symbol>". The close function is called after all calls to the UDF have completed. This is the appropriate time for the UDF to deallocate any shared data structures that are not needed to maintain the results. If there is an error, this function should call FunctionContext::SetError()/ FunctionContext::AddWarning(). The close function is called multiple times with different FunctionStateScopes. It will be called once per fragment with 'scope' set to FRAGMENT_LOCAL, and once per execution thread with 'scope' set to THREAD_LOCAL.
typedef void(* impala_udf::UdfPrepare)(FunctionContext *context, FunctionContext::FunctionStateScope scope) |
The UDF must implement this function prototype. This is not a typedef as the actual UDF's signature varies from UDF to UDF. typedef <Val> Evaluate(FunctionContext context, <const Val& arg>); The UDF must return one of the Val structs. The UDF must accept a pointer to a FunctionContext object and then a const reference for each of the input arguments. Examples of valid Udf signatures are: 1) DoubleVal Example1(FunctionContext context); 2) IntVal Example2(FunctionContext* context, const IntVal& a1, const DoubleVal& a2); UDFs can be variadic. The variable arguments must all come at the end and must be the same type. A example signature is: StringVal Concat(FunctionContext* context, const StringVal& separator, int num_var_args, const StringVal* args); In this case args[0] is the first variable argument and args[num_var_args - 1] is
The UDF can assume that memory from input arguments will have the same lifetime as results for the UDF. In other words, the UDF can return memory from input arguments without making copies. For example, a function like substring will not need to allocate and copy the smaller string. Any state needed across calls must be stored and accessed via FunctionContext::SetFunctionState() and FunctionContext::GetFunctionState(). The UDF should not maintain any other state across calls since there is no guarantee on how
Execution model: For each UDF use occurring in a given query, at least one FunctionContext will be created. For a given FunctionContext, the UDF's functions are never called concurrently and therefore do not need to be thread-safe. State shared across UDF invocations should be initialized and cleaned up using prepare and close functions (described below). Note that a single UDF use may produce multiple FunctionContexts for that UDF (this is so the UDF can be executed concurrently in different threads). For example, the query "select * from tbl where my_udf(x) > 0" may produce multiple FunctionContexts for 'my_udf', each of which may concurrently be passed to 'my_udf's prepare, close, and
The UDF can optionally include a prepare function, specified in the "CREATE FUNCTION" statement using "prepare_fn=<prepare function symbol>". The prepare function is called before any calls to the UDF to evaluate values. This is the appropriate time for the UDF to initialize any shared data structures, validate versions, etc. If there is an error, this function should call FunctionContext::SetError()/ FunctionContext::AddWarning(). The prepare function is called multiple times with different FunctionStateScopes. It will be called once per fragment with 'scope' set to FRAGMENT_LOCAL, and once per execution thread with 'scope' set to THREAD_LOCAL.
Enumerator | |
---|---|
ALL | |
SINGLE_NODE | |
ONE_LEVEL | |
TWO_LEVEL |
Definition at line 32 of file uda-test-harness.h.
|
inline |
Definition at line 27 of file udf-debug.h.
Referenced by impala::NullLiteral::DebugString(), impala::AndPredicate::DebugString(), impala::OrPredicate::DebugString(), impala::IsNullExpr::DebugString(), impala::NullIfExpr::DebugString(), impala::IfExpr::DebugString(), impala::CoalesceExpr::DebugString(), impala::Expr::DebugString(), impala::ImpalaServer::QueryExecState::Done(), impala_udf::UdaTestHarnessBase< RESULT, INTERMEDIATE >::Execute(), impala::ImpalaServer::ExpireQueries(), impala::ImpalaServer::ImpalaServer(), impala::InitCommonRuntime(), impala::ImpalaServer::PrepareQueryContext(), impala::ImpalaServer::QueryExecState::QueryExecState(), impala::StatestoreSubscriber::RecoveryModeChecker(), impala::Webserver::RootHandler(), impala::TEST(), impala::ColumnType::ToHs2Type(), and impala_udf::UdfTestHarness::Validate().
|
inline |
Definition at line 35 of file udf-debug.h.
References impala_udf::AnyVal::is_null, impala_udf::StringVal::len, and impala_udf::StringVal::ptr.