Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <text-converter.h>
Public Member Functions | |
TextConverter (char escape_char, const std::string &null_col_val, bool check_null=true) | |
bool | WriteSlot (const SlotDescriptor *slot_desc, Tuple *tuple, const char *data, int len, bool copy_string, bool need_escape, MemPool *pool) |
void | UnescapeString (const char *src, char *dest, int *len, int64_t maxlen=-1) |
void | UnescapeString (StringValue *str, MemPool *pool) |
Static Public Member Functions | |
static llvm::Function * | CodegenWriteSlot (LlvmCodeGen *codegen, TupleDescriptor *tuple_desc, SlotDescriptor *slot_desc, const char *null_col_val, int len, bool check_null) |
Private Attributes | |
char | escape_char_ |
std::string | null_col_val_ |
Special string to indicate NULL column values. More... | |
bool | check_null_ |
Indicates whether we should check for null_col_val_ and set slots to NULL. More... | |
Helper class for dealing with text data, e.g., converting text data to numeric types, etc.
Definition at line 39 of file text-converter.h.
TextConverter::TextConverter | ( | char | escape_char, |
const std::string & | null_col_val, | ||
bool | check_null = true |
||
) |
escape_char: Character to indicate escape sequences. null_col_val: Special string to indicate NULL column values. check_null: If set, then the WriteSlot() functions set the target slot to NULL if their input string matches null_vol_val.
Definition at line 32 of file text-converter.cc.
|
static |
Codegen the function to write a slot for slot_desc. Returns NULL if codegen was not succesful. The signature of the generated function is: bool WriteSlot(Tuple* tuple, const char* data, int len); The codegen function returns true if the slot could be written and false otherwise. If check_null is set, then the codegen'd function sets the target slot to NULL if its input string matches null_vol_val. The codegenerated function does not support escape characters and should not be used for partitions that contain escapes.
Definition at line 99 of file text-converter.cc.
References impala::LlvmCodeGen::FnPrototype::AddArgument(), impala::LlvmCodeGen::CastPtrToLlvmPtr(), impala::LlvmCodeGen::codegen_timer(), impala::SlotDescriptor::CodegenUpdateNull(), impala::LlvmCodeGen::context(), impala::LlvmCodeGen::CreateEntryBlockAlloca(), impala::LlvmCodeGen::CreateIfElseBlocks(), impala::LlvmCodeGen::false_value(), impala::SlotDescriptor::field_idx(), impala::LlvmCodeGen::FinalizeFunction(), impala::TupleDescriptor::GenerateLlvmStruct(), impala::LlvmCodeGen::GetFunction(), impala::LlvmCodeGen::GetIntConstant(), impala::LlvmCodeGen::GetType(), impala::ColumnType::IsVarLen(), impala::ColumnType::len, impala::StringParser::PARSE_FAILURE, impala::LlvmCodeGen::ptr_type(), SCOPED_TIMER, impala::LlvmCodeGen::true_value(), impala::ColumnType::type, impala::SlotDescriptor::type(), impala::TYPE_BIGINT, impala::TYPE_BOOLEAN, impala::TYPE_CHAR, impala::TYPE_DOUBLE, impala::TYPE_FLOAT, impala::TYPE_INT, impala::TYPE_SMALLINT, impala::TYPE_TINYINT, and impala::TYPE_VARCHAR.
Referenced by impala::HdfsScanner::CodegenWriteCompleteTuple().
void TextConverter::UnescapeString | ( | const char * | src, |
char * | dest, | ||
int * | len, | ||
int64_t | maxlen = -1 |
||
) |
Removes escape characters from len characters of the null-terminated string src, and copies the unescaped string into dest, changing *len to the unescaped length. No null-terminator is added to dest. If maxlen > 0, will only copy at most maxlen bytes into dest.
Definition at line 45 of file text-converter.cc.
References escape_char_.
Referenced by UnescapeString(), and WriteSlot().
void TextConverter::UnescapeString | ( | StringValue * | str, |
MemPool * | pool | ||
) |
Removes escape characters from 'str', allocating a new string from pool. 'str' is updated with the new ptr and length.
Definition at line 39 of file text-converter.cc.
References impala::MemPool::Allocate(), impala::StringValue::len, impala::StringValue::ptr, and UnescapeString().
|
inline |
Converts slot data, of length 'len', into type of slot_desc, and writes the result into the tuples's slot. copy_string indicates whether we need to make a separate copy of the string data: For regular unescaped strings, we point to the original data in the file_buf_. For regular escaped strings, we copy an its unescaped string into a separate buffer and point to it. If the string needs to be copied, the memory is allocated from 'pool', otherwise 'pool' is unused. Unsuccessful conversions are turned into NULLs. Returns true if the value was written successfully.
Note: this function has a codegen'd version. Changing this function requires corresponding changes to CodegenWriteSlot.
Definition at line 37 of file text-converter.inline.h.
References impala::MemPool::Allocate(), check_null_, impala::Tuple::GetSlot(), impala::TimestampValue::HasDateOrTime(), impala::ColumnType::IsStringType(), impala::ColumnType::IsVarLen(), impala::StringValue::len, impala::ColumnType::len, null_col_val_, impala::SlotDescriptor::null_indicator_offset(), impala::StringValue::PadWithSpaces(), impala::StringParser::PARSE_FAILURE, impala::StringParser::PARSE_SUCCESS, impala::StringValue::ptr, impala::Tuple::SetNull(), impala::SlotDescriptor::slot_size(), impala::StringCompare(), impala::StringParser::StringToBool(), impala::SlotDescriptor::tuple_offset(), impala::ColumnType::type, impala::SlotDescriptor::type(), impala::TYPE_BIGINT, impala::TYPE_BOOLEAN, impala::TYPE_CHAR, impala::TYPE_DECIMAL, impala::TYPE_DOUBLE, impala::TYPE_FLOAT, impala::TYPE_INT, impala::TYPE_SMALLINT, impala::TYPE_STRING, impala::TYPE_TIMESTAMP, impala::TYPE_TINYINT, impala::TYPE_VARCHAR, and UnescapeString().
|
private |
Indicates whether we should check for null_col_val_ and set slots to NULL.
Definition at line 90 of file text-converter.h.
Referenced by WriteSlot().
|
private |
Definition at line 86 of file text-converter.h.
Referenced by UnescapeString().
|
private |
Special string to indicate NULL column values.
Definition at line 88 of file text-converter.h.
Referenced by WriteSlot().