Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::TextConverter Class Reference

#include <text-converter.h>

Collaboration diagram for impala::TextConverter:

Public Member Functions

 TextConverter (char escape_char, const std::string &null_col_val, bool check_null=true)
 
bool WriteSlot (const SlotDescriptor *slot_desc, Tuple *tuple, const char *data, int len, bool copy_string, bool need_escape, MemPool *pool)
 
void UnescapeString (const char *src, char *dest, int *len, int64_t maxlen=-1)
 
void UnescapeString (StringValue *str, MemPool *pool)
 

Static Public Member Functions

static llvm::Function * CodegenWriteSlot (LlvmCodeGen *codegen, TupleDescriptor *tuple_desc, SlotDescriptor *slot_desc, const char *null_col_val, int len, bool check_null)
 

Private Attributes

char escape_char_
 
std::string null_col_val_
 Special string to indicate NULL column values. More...
 
bool check_null_
 Indicates whether we should check for null_col_val_ and set slots to NULL. More...
 

Detailed Description

Helper class for dealing with text data, e.g., converting text data to numeric types, etc.

Definition at line 39 of file text-converter.h.

Constructor & Destructor Documentation

TextConverter::TextConverter ( char  escape_char,
const std::string &  null_col_val,
bool  check_null = true 
)

escape_char: Character to indicate escape sequences. null_col_val: Special string to indicate NULL column values. check_null: If set, then the WriteSlot() functions set the target slot to NULL if their input string matches null_vol_val.

Definition at line 32 of file text-converter.cc.

Member Function Documentation

Function * TextConverter::CodegenWriteSlot ( LlvmCodeGen codegen,
TupleDescriptor tuple_desc,
SlotDescriptor slot_desc,
const char *  null_col_val,
int  len,
bool  check_null 
)
static

Codegen the function to write a slot for slot_desc. Returns NULL if codegen was not succesful. The signature of the generated function is: bool WriteSlot(Tuple* tuple, const char* data, int len); The codegen function returns true if the slot could be written and false otherwise. If check_null is set, then the codegen'd function sets the target slot to NULL if its input string matches null_vol_val. The codegenerated function does not support escape characters and should not be used for partitions that contain escapes.

Definition at line 99 of file text-converter.cc.

References impala::LlvmCodeGen::FnPrototype::AddArgument(), impala::LlvmCodeGen::CastPtrToLlvmPtr(), impala::LlvmCodeGen::codegen_timer(), impala::SlotDescriptor::CodegenUpdateNull(), impala::LlvmCodeGen::context(), impala::LlvmCodeGen::CreateEntryBlockAlloca(), impala::LlvmCodeGen::CreateIfElseBlocks(), impala::LlvmCodeGen::false_value(), impala::SlotDescriptor::field_idx(), impala::LlvmCodeGen::FinalizeFunction(), impala::TupleDescriptor::GenerateLlvmStruct(), impala::LlvmCodeGen::GetFunction(), impala::LlvmCodeGen::GetIntConstant(), impala::LlvmCodeGen::GetType(), impala::ColumnType::IsVarLen(), impala::ColumnType::len, impala::StringParser::PARSE_FAILURE, impala::LlvmCodeGen::ptr_type(), SCOPED_TIMER, impala::LlvmCodeGen::true_value(), impala::ColumnType::type, impala::SlotDescriptor::type(), impala::TYPE_BIGINT, impala::TYPE_BOOLEAN, impala::TYPE_CHAR, impala::TYPE_DOUBLE, impala::TYPE_FLOAT, impala::TYPE_INT, impala::TYPE_SMALLINT, impala::TYPE_TINYINT, and impala::TYPE_VARCHAR.

Referenced by impala::HdfsScanner::CodegenWriteCompleteTuple().

void TextConverter::UnescapeString ( const char *  src,
char *  dest,
int *  len,
int64_t  maxlen = -1 
)

Removes escape characters from len characters of the null-terminated string src, and copies the unescaped string into dest, changing *len to the unescaped length. No null-terminator is added to dest. If maxlen > 0, will only copy at most maxlen bytes into dest.

Definition at line 45 of file text-converter.cc.

References escape_char_.

Referenced by UnescapeString(), and WriteSlot().

void TextConverter::UnescapeString ( StringValue str,
MemPool pool 
)

Removes escape characters from 'str', allocating a new string from pool. 'str' is updated with the new ptr and length.

Definition at line 39 of file text-converter.cc.

References impala::MemPool::Allocate(), impala::StringValue::len, impala::StringValue::ptr, and UnescapeString().

bool impala::TextConverter::WriteSlot ( const SlotDescriptor slot_desc,
Tuple tuple,
const char *  data,
int  len,
bool  copy_string,
bool  need_escape,
MemPool pool 
)
inline

Converts slot data, of length 'len', into type of slot_desc, and writes the result into the tuples's slot. copy_string indicates whether we need to make a separate copy of the string data: For regular unescaped strings, we point to the original data in the file_buf_. For regular escaped strings, we copy an its unescaped string into a separate buffer and point to it. If the string needs to be copied, the memory is allocated from 'pool', otherwise 'pool' is unused. Unsuccessful conversions are turned into NULLs. Returns true if the value was written successfully.

Note: this function has a codegen'd version. Changing this function requires corresponding changes to CodegenWriteSlot.

Definition at line 37 of file text-converter.inline.h.

References impala::MemPool::Allocate(), check_null_, impala::Tuple::GetSlot(), impala::TimestampValue::HasDateOrTime(), impala::ColumnType::IsStringType(), impala::ColumnType::IsVarLen(), impala::StringValue::len, impala::ColumnType::len, null_col_val_, impala::SlotDescriptor::null_indicator_offset(), impala::StringValue::PadWithSpaces(), impala::StringParser::PARSE_FAILURE, impala::StringParser::PARSE_SUCCESS, impala::StringValue::ptr, impala::Tuple::SetNull(), impala::SlotDescriptor::slot_size(), impala::StringCompare(), impala::StringParser::StringToBool(), impala::SlotDescriptor::tuple_offset(), impala::ColumnType::type, impala::SlotDescriptor::type(), impala::TYPE_BIGINT, impala::TYPE_BOOLEAN, impala::TYPE_CHAR, impala::TYPE_DECIMAL, impala::TYPE_DOUBLE, impala::TYPE_FLOAT, impala::TYPE_INT, impala::TYPE_SMALLINT, impala::TYPE_STRING, impala::TYPE_TIMESTAMP, impala::TYPE_TINYINT, impala::TYPE_VARCHAR, and UnescapeString().

Member Data Documentation

bool impala::TextConverter::check_null_
private

Indicates whether we should check for null_col_val_ and set slots to NULL.

Definition at line 90 of file text-converter.h.

Referenced by WriteSlot().

char impala::TextConverter::escape_char_
private

Definition at line 86 of file text-converter.h.

Referenced by UnescapeString().

std::string impala::TextConverter::null_col_val_
private

Special string to indicate NULL column values.

Definition at line 88 of file text-converter.h.

Referenced by WriteSlot().


The documentation for this class was generated from the following files: