Classes
class	UdaTestHarnessUtil

class	UdaTestHarnessBase

class	UdaTestHarness

class	UdaTestHarness2

class	UdaTestHarness3

class	UdaTestHarness4

class	UdfTestHarness
	Utility class to help test UDFs. More...

class	FunctionContext

struct	AnyVal

struct	BooleanVal

struct	TinyIntVal

struct	SmallIntVal

struct	IntVal

struct	BigIntVal

struct	FloatVal

struct	DoubleVal

struct	TimestampVal
	This object has a compatible storage format with boost::ptime. More...

struct	StringVal

struct	DecimalVal

Typedefs
typedef void(*	UdfPrepare )(FunctionContext *context, FunctionContext::FunctionStateScope scope)

typedef void(*	UdfClose )(FunctionContext *context, FunctionContext::FunctionStateScope scope)

typedef AnyVal	InputType

typedef AnyVal	InputType2

typedef AnyVal	ResultType

typedef AnyVal	IntermediateType

typedef void(*	UdaInit )(FunctionContext context, IntermediateType result)

typedef void(*	UdaUpdate )(FunctionContext context, const InputType &input, IntermediateType result)

typedef void(*	UdaUpdate2 )(FunctionContext context, const InputType &input, const InputType2 &input2, IntermediateType result)

typedef void(*	UdaMerge )(FunctionContext context, const IntermediateType &src, IntermediateType dst)
	Merge an intermediate result 'src' into 'dst'. More...

typedef const IntermediateType(*	UdaSerialize )(FunctionContext *context, const IntermediateType &type)

typedef ResultType(*	UdaFinalize )(FunctionContext *context, const IntermediateType &v)

typedef uint8_t *	BufferVal

Enumerations
enum	UdaExecutionMode { ALL = 0, SINGLE_NODE = 1, ONE_LEVEL = 2, TWO_LEVEL = 3 }

Functions
template<typename T >
std::string	DebugString (const T &val)

template<>
std::string	DebugString (const StringVal &val)

Typedef Documentation

typedef uint8_t* impala_udf::BufferVal

Definition at line 600 of file udf.h.

typedef AnyVal impala_udf::InputType

The UDA execution is broken up into a few steps. The general calling pattern is one of these: 1) Init(), Update() (repeatedly), Serialize() 2) Init(), Update() (repeatedly), Finalize() 3) Init(), Merge() (repeatedly), Serialize() 4) Init(), Merge() (repeatedly), Finalize() The UDA is registered with three types: the result type, the input type and the intermediate type. If the UDA needs a fixed byte width intermediate buffer, the type should be TYPE_FIXED_BUFFER and Impala will allocate the buffer. If the UDA needs an unknown sized buffer, it should use TYPE_STRING and allocate it from the FunctionContext manually. For UDAs that need a complex data structure as the intermediate state, the intermediate type should be string and the UDA can cast the ptr to the structure it is using. Memory Management: For allocations that are not returned to Impala, the UDA should use the FunctionContext::Allocate()/Free() methods. In general, Allocate() is called in Init(), and then Free() must be called in both Serialize() and Finalize(), since either of these functions may be called to clean up the state. For StringVal allocations returned to Impala (e.g. returned by UdaSerialize()), the UDA should allocate the result via StringVal(FunctionContext*, int) ctor and Impala will automatically handle freeing it. For clarity in documenting the UDA interface, the various types will be typedefed here. The actual execution resolves all the types at runtime and none of these types should actually be used.

Definition at line 322 of file udf.h.

typedef AnyVal impala_udf::InputType2

Definition at line 323 of file udf.h.

typedef AnyVal impala_udf::IntermediateType

Definition at line 325 of file udf.h.

typedef AnyVal impala_udf::ResultType

Definition at line 324 of file udf.h.

typedef ResultType(* impala_udf::UdaFinalize)(FunctionContext *context, const IntermediateType &v)

Called once at the end to return the final value for this UDA. No additional functions will be called with this FunctionContext object and the UDA should do final clean (e.g. Free()) here.

Definition at line 353 of file udf.h.

typedef void(* impala_udf::UdaInit)(FunctionContext *context, IntermediateType *result)

UdaInit is called once for each aggregate group before calls to any of the other functions below.

Definition at line 329 of file udf.h.

typedef void(* impala_udf::UdaMerge)(FunctionContext *context, const IntermediateType &src, IntermediateType *dst)

Merge an intermediate result 'src' into 'dst'.

Definition at line 340 of file udf.h.

typedef const IntermediateType(* impala_udf::UdaSerialize)(FunctionContext *context, const IntermediateType &type)

Serialize the intermediate type. The serialized data is then sent across the wire. No additional functions will be called with this FunctionContext object and the UDA should do final clean (e.g. Free()) here.

Definition at line 347 of file udf.h.

typedef void(* impala_udf::UdaUpdate)(FunctionContext *context, const InputType &input, IntermediateType *result)

This is called for each input value. The UDA should update result based on the input value. The update function can take any number of input arguments. Here are some examples:

Definition at line 334 of file udf.h.

typedef void(* impala_udf::UdaUpdate2)(FunctionContext *context, const InputType &input, const InputType2 &input2, IntermediateType *result)

Definition at line 336 of file udf.h.

typedef void(* impala_udf::UdfClose)(FunctionContext *context, FunctionContext::FunctionStateScope scope)

The UDF can also optionally include a close function, specified in the "CREATE FUNCTION" statement using "close_fn=<close function symbol>". The close function is called after all calls to the UDF have completed. This is the appropriate time for the UDF to deallocate any shared data structures that are not needed to maintain the results. If there is an error, this function should call FunctionContext::SetError()/ FunctionContext::AddWarning(). The close function is called multiple times with different FunctionStateScopes. It will be called once per fragment with 'scope' set to FRAGMENT_LOCAL, and once per execution thread with 'scope' set to THREAD_LOCAL.

Definition at line 288 of file udf.h.

typedef void(* impala_udf::UdfPrepare)(FunctionContext *context, FunctionContext::FunctionStateScope scope)

The UDF must implement this function prototype. This is not a typedef as the actual UDF's signature varies from UDF to UDF. typedef <Val> Evaluate(FunctionContext context, <const Val& arg>); The UDF must return one of the Val structs. The UDF must accept a pointer to a FunctionContext object and then a const reference for each of the input arguments. Examples of valid Udf signatures are: 1) DoubleVal Example1(FunctionContext context); 2) IntVal Example2(FunctionContext* context, const IntVal& a1, const DoubleVal& a2); UDFs can be variadic. The variable arguments must all come at the end and must be the same type. A example signature is: StringVal Concat(FunctionContext* context, const StringVal& separator, int num_var_args, const StringVal* args); In this case args[0] is the first variable argument and args[num_var_args - 1] is

the last. -—— Memory Management -——

The UDF can assume that memory from input arguments will have the same lifetime as results for the UDF. In other words, the UDF can return memory from input arguments without making copies. For example, a function like substring will not need to allocate and copy the smaller string. Any state needed across calls must be stored and accessed via FunctionContext::SetFunctionState() and FunctionContext::GetFunctionState(). The UDF should not maintain any other state across calls since there is no guarantee on how

the execution is multithreaded or distributed. –—— Execution Model –——

Execution model: For each UDF use occurring in a given query, at least one FunctionContext will be created. For a given FunctionContext, the UDF's functions are never called concurrently and therefore do not need to be thread-safe. State shared across UDF invocations should be initialized and cleaned up using prepare and close functions (described below). Note that a single UDF use may produce multiple FunctionContexts for that UDF (this is so the UDF can be executed concurrently in different threads). For example, the query "select * from tbl where my_udf(x) > 0" may produce multiple FunctionContexts for 'my_udf', each of which may concurrently be passed to 'my_udf's prepare, close, and

UDF functions. — Prepare / Close Functions —

The UDF can optionally include a prepare function, specified in the "CREATE FUNCTION" statement using "prepare_fn=<prepare function symbol>". The prepare function is called before any calls to the UDF to evaluate values. This is the appropriate time for the UDF to initialize any shared data structures, validate versions, etc. If there is an error, this function should call FunctionContext::SetError()/ FunctionContext::AddWarning(). The prepare function is called multiple times with different FunctionStateScopes. It will be called once per fragment with 'scope' set to FRAGMENT_LOCAL, and once per execution thread with 'scope' set to THREAD_LOCAL.

Definition at line 275 of file udf.h.

Enumeration Type Documentation

enum impala_udf::UdaExecutionMode

Enumerator
ALL
SINGLE_NODE
ONE_LEVEL
TWO_LEVEL

Definition at line 32 of file uda-test-harness.h.

Function Documentation

template<typename T >

std::string impala_udf::DebugString ( const T & val )

inline

template<>

std::string impala_udf::DebugString ( const StringVal & val )

inline

Definition at line 35 of file udf-debug.h.

References impala_udf::AnyVal::is_null, impala_udf::StringVal::len, and impala_udf::StringVal::ptr.

Classes

Typedefs