Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala_udf Namespace Reference

Classes

class  UdaTestHarnessUtil
 
class  UdaTestHarnessBase
 
class  UdaTestHarness
 
class  UdaTestHarness2
 
class  UdaTestHarness3
 
class  UdaTestHarness4
 
class  UdfTestHarness
 Utility class to help test UDFs. More...
 
class  FunctionContext
 
struct  AnyVal
 
struct  BooleanVal
 
struct  TinyIntVal
 
struct  SmallIntVal
 
struct  IntVal
 
struct  BigIntVal
 
struct  FloatVal
 
struct  DoubleVal
 
struct  TimestampVal
 This object has a compatible storage format with boost::ptime. More...
 
struct  StringVal
 
struct  DecimalVal
 

Typedefs

typedef void(* UdfPrepare )(FunctionContext *context, FunctionContext::FunctionStateScope scope)
 
typedef void(* UdfClose )(FunctionContext *context, FunctionContext::FunctionStateScope scope)
 
typedef AnyVal InputType
 
typedef AnyVal InputType2
 
typedef AnyVal ResultType
 
typedef AnyVal IntermediateType
 
typedef void(* UdaInit )(FunctionContext *context, IntermediateType *result)
 
typedef void(* UdaUpdate )(FunctionContext *context, const InputType &input, IntermediateType *result)
 
typedef void(* UdaUpdate2 )(FunctionContext *context, const InputType &input, const InputType2 &input2, IntermediateType *result)
 
typedef void(* UdaMerge )(FunctionContext *context, const IntermediateType &src, IntermediateType *dst)
 Merge an intermediate result 'src' into 'dst'. More...
 
typedef const IntermediateType(* UdaSerialize )(FunctionContext *context, const IntermediateType &type)
 
typedef ResultType(* UdaFinalize )(FunctionContext *context, const IntermediateType &v)
 
typedef uint8_t * BufferVal
 

Enumerations

enum  UdaExecutionMode { ALL = 0, SINGLE_NODE = 1, ONE_LEVEL = 2, TWO_LEVEL = 3 }
 

Functions

template<typename T >
std::string DebugString (const T &val)
 
template<>
std::string DebugString (const StringVal &val)
 

Typedef Documentation

typedef uint8_t* impala_udf::BufferVal

Definition at line 600 of file udf.h.

The UDA execution is broken up into a few steps. The general calling pattern is one of these: 1) Init(), Update() (repeatedly), Serialize() 2) Init(), Update() (repeatedly), Finalize() 3) Init(), Merge() (repeatedly), Serialize() 4) Init(), Merge() (repeatedly), Finalize() The UDA is registered with three types: the result type, the input type and the intermediate type. If the UDA needs a fixed byte width intermediate buffer, the type should be TYPE_FIXED_BUFFER and Impala will allocate the buffer. If the UDA needs an unknown sized buffer, it should use TYPE_STRING and allocate it from the FunctionContext manually. For UDAs that need a complex data structure as the intermediate state, the intermediate type should be string and the UDA can cast the ptr to the structure it is using. Memory Management: For allocations that are not returned to Impala, the UDA should use the FunctionContext::Allocate()/Free() methods. In general, Allocate() is called in Init(), and then Free() must be called in both Serialize() and Finalize(), since either of these functions may be called to clean up the state. For StringVal allocations returned to Impala (e.g. returned by UdaSerialize()), the UDA should allocate the result via StringVal(FunctionContext*, int) ctor and Impala will automatically handle freeing it. For clarity in documenting the UDA interface, the various types will be typedefed here. The actual execution resolves all the types at runtime and none of these types should actually be used.

Definition at line 322 of file udf.h.

Definition at line 323 of file udf.h.

Definition at line 325 of file udf.h.

Definition at line 324 of file udf.h.

typedef ResultType(* impala_udf::UdaFinalize)(FunctionContext *context, const IntermediateType &v)

Called once at the end to return the final value for this UDA. No additional functions will be called with this FunctionContext object and the UDA should do final clean (e.g. Free()) here.

Definition at line 353 of file udf.h.

typedef void(* impala_udf::UdaInit)(FunctionContext *context, IntermediateType *result)

UdaInit is called once for each aggregate group before calls to any of the other functions below.

Definition at line 329 of file udf.h.

typedef void(* impala_udf::UdaMerge)(FunctionContext *context, const IntermediateType &src, IntermediateType *dst)

Merge an intermediate result 'src' into 'dst'.

Definition at line 340 of file udf.h.

typedef const IntermediateType(* impala_udf::UdaSerialize)(FunctionContext *context, const IntermediateType &type)

Serialize the intermediate type. The serialized data is then sent across the wire. No additional functions will be called with this FunctionContext object and the UDA should do final clean (e.g. Free()) here.

Definition at line 347 of file udf.h.

typedef void(* impala_udf::UdaUpdate)(FunctionContext *context, const InputType &input, IntermediateType *result)

This is called for each input value. The UDA should update result based on the input value. The update function can take any number of input arguments. Here are some examples:

Definition at line 334 of file udf.h.

typedef void(* impala_udf::UdaUpdate2)(FunctionContext *context, const InputType &input, const InputType2 &input2, IntermediateType *result)

Definition at line 336 of file udf.h.

typedef void(* impala_udf::UdfClose)(FunctionContext *context, FunctionContext::FunctionStateScope scope)

The UDF can also optionally include a close function, specified in the "CREATE FUNCTION" statement using "close_fn=<close function symbol>". The close function is called after all calls to the UDF have completed. This is the appropriate time for the UDF to deallocate any shared data structures that are not needed to maintain the results. If there is an error, this function should call FunctionContext::SetError()/ FunctionContext::AddWarning(). The close function is called multiple times with different FunctionStateScopes. It will be called once per fragment with 'scope' set to FRAGMENT_LOCAL, and once per execution thread with 'scope' set to THREAD_LOCAL.

Definition at line 288 of file udf.h.

typedef void(* impala_udf::UdfPrepare)(FunctionContext *context, FunctionContext::FunctionStateScope scope)

The UDF must implement this function prototype. This is not a typedef as the actual UDF's signature varies from UDF to UDF. typedef <Val> Evaluate(FunctionContext context, <const Val& arg>); The UDF must return one of the Val structs. The UDF must accept a pointer to a FunctionContext object and then a const reference for each of the input arguments. Examples of valid Udf signatures are: 1) DoubleVal Example1(FunctionContext context); 2) IntVal Example2(FunctionContext* context, const IntVal& a1, const DoubleVal& a2); UDFs can be variadic. The variable arguments must all come at the end and must be the same type. A example signature is: StringVal Concat(FunctionContext* context, const StringVal& separator, int num_var_args, const StringVal* args); In this case args[0] is the first variable argument and args[num_var_args - 1] is

the last. -—— Memory Management -——

The UDF can assume that memory from input arguments will have the same lifetime as results for the UDF. In other words, the UDF can return memory from input arguments without making copies. For example, a function like substring will not need to allocate and copy the smaller string. Any state needed across calls must be stored and accessed via FunctionContext::SetFunctionState() and FunctionContext::GetFunctionState(). The UDF should not maintain any other state across calls since there is no guarantee on how

the execution is multithreaded or distributed. –—— Execution Model –——

Execution model: For each UDF use occurring in a given query, at least one FunctionContext will be created. For a given FunctionContext, the UDF's functions are never called concurrently and therefore do not need to be thread-safe. State shared across UDF invocations should be initialized and cleaned up using prepare and close functions (described below). Note that a single UDF use may produce multiple FunctionContexts for that UDF (this is so the UDF can be executed concurrently in different threads). For example, the query "select * from tbl where my_udf(x) > 0" may produce multiple FunctionContexts for 'my_udf', each of which may concurrently be passed to 'my_udf's prepare, close, and

UDF functions. — Prepare / Close Functions —

The UDF can optionally include a prepare function, specified in the "CREATE FUNCTION" statement using "prepare_fn=<prepare function symbol>". The prepare function is called before any calls to the UDF to evaluate values. This is the appropriate time for the UDF to initialize any shared data structures, validate versions, etc. If there is an error, this function should call FunctionContext::SetError()/ FunctionContext::AddWarning(). The prepare function is called multiple times with different FunctionStateScopes. It will be called once per fragment with 'scope' set to FRAGMENT_LOCAL, and once per execution thread with 'scope' set to THREAD_LOCAL.

Definition at line 275 of file udf.h.

Enumeration Type Documentation

Enumerator
ALL 
SINGLE_NODE 
ONE_LEVEL 
TWO_LEVEL 

Definition at line 32 of file uda-test-harness.h.

Function Documentation

template<>
std::string impala_udf::DebugString ( const StringVal &  val)
inline