Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::StringParser Class Reference

#include <string-parser.h>

Collaboration diagram for impala::StringParser:

Classes

class  StringParseTraits
 

Public Types

enum  ParseResult { PARSE_SUCCESS = 0, PARSE_FAILURE, PARSE_OVERFLOW, PARSE_UNDERFLOW }
 

Static Public Member Functions

template<typename T >
static T StringToInt (const char *s, int len, ParseResult *result)
 
template<typename T >
static T StringToInt (const char *s, int len, int base, ParseResult *result)
 Convert a string s representing a number in given base into a decimal number. More...
 
template<typename T >
static T StringToFloat (const char *s, int len, ParseResult *result)
 
static bool StringToBool (const char *s, int len, ParseResult *result)
 Parses a string for 'true' or 'false', case insensitive. More...
 
template<typename T >
static DecimalValue< T > StringToDecimal (const uint8_t *s, int len, const ColumnType &type, StringParser::ParseResult *result)
 
template<typename T >
static DecimalValue< T > StringToDecimal (const char *s, int len, const ColumnType &type, StringParser::ParseResult *result)
 

Static Private Member Functions

template<typename T >
static T StringToIntInternal (const char *s, int len, ParseResult *result)
 
template<typename T >
static T StringToIntInternal (const char *s, int len, int base, ParseResult *result)
 
template<typename T >
static T StringToFloatInternal (const char *s, int len, ParseResult *result)
 
static bool StringToBoolInternal (const char *s, int len, ParseResult *result)
 
static int SkipLeadingWhitespace (const char *s, int len)
 Returns the position of the first non-whitespace character in s. More...
 
static bool IsAllWhitespace (const char *s, int len)
 Returns true if s only contains whitespace. More...
 
template<typename T >
static T StringToIntNoOverflow (const char *s, int len, ParseResult *result)
 
static bool IsWhitespace (const char &c)
 

Detailed Description

Utility functions for doing atoi/atof on non-null terminated strings. On micro benchmarks, this is significantly faster than libc (atoi/strtol and atof/strtod). Strings with leading and trailing whitespaces are accepted. Branching is heavily optimized for the non-whitespace successful case. All the StringTo* functions first parse the input string assuming it has no leading whitespace. If that first attempt was unsuccessful, these functions retry the parsing after removing whitespace. Therefore, strings with whitespace take a perf hit on branch mis-prediction. For overflows, we are following the mysql behavior, to cap values at the max/min value for that data type. This is different from hive, which returns NULL for overflow slots for int types and inf/-inf for float types. Things we tried that did not work:

  • lookup table for converting character to digit Improvements (TODO):
  • Validate input using _sidd_compare_ranges
  • Since we know the length, we can parallelize this: i.e. result = 100*s[0] + 10*s[1] + s[2]

Definition at line 49 of file string-parser.h.

Member Enumeration Documentation

Enumerator
PARSE_SUCCESS 
PARSE_FAILURE 
PARSE_OVERFLOW 
PARSE_UNDERFLOW 

Definition at line 51 of file string-parser.h.

Member Function Documentation

static bool impala::StringParser::IsAllWhitespace ( const char *  s,
int  len 
)
inlinestaticprivate

Returns true if s only contains whitespace.

Definition at line 494 of file string-parser.h.

References IsWhitespace(), and LIKELY.

Referenced by StringToBoolInternal(), StringToFloatInternal(), StringToIntInternal(), and StringToIntNoOverflow().

static bool impala::StringParser::IsWhitespace ( const char &  c)
inlinestaticprivate

Definition at line 543 of file string-parser.h.

References UNLIKELY.

Referenced by IsAllWhitespace(), SkipLeadingWhitespace(), and StringToDecimal().

static int impala::StringParser::SkipLeadingWhitespace ( const char *  s,
int  len 
)
inlinestaticprivate

Returns the position of the first non-whitespace character in s.

Definition at line 487 of file string-parser.h.

References IsWhitespace().

Referenced by StringToBool(), StringToFloat(), and StringToInt().

static bool impala::StringParser::StringToBool ( const char *  s,
int  len,
ParseResult result 
)
inlinestatic

Parses a string for 'true' or 'false', case insensitive.

Definition at line 87 of file string-parser.h.

References LIKELY, PARSE_SUCCESS, SkipLeadingWhitespace(), and StringToBoolInternal().

Referenced by impala::TestBoolValue(), and impala::TextConverter::WriteSlot().

static bool impala::StringParser::StringToBoolInternal ( const char *  s,
int  len,
ParseResult result 
)
inlinestaticprivate

Parses a string for 'true' or 'false', case insensitive. Return PARSE_FAILURE on leading whitespace. Trailing whitespace is allowed.

Definition at line 468 of file string-parser.h.

References IsAllWhitespace(), LIKELY, PARSE_FAILURE, and PARSE_SUCCESS.

Referenced by StringToBool().

template<typename T >
static DecimalValue<T> impala::StringParser::StringToDecimal ( const uint8_t *  s,
int  len,
const ColumnType type,
StringParser::ParseResult result 
)
inlinestatic

Parses a decimal from s, returning the result. The parse status is returned in *result. On overflow or invalid values, the return value is undefined. On underflow, the truncated value is returned.

Definition at line 100 of file string-parser.h.

template<typename T >
static DecimalValue<T> impala::StringParser::StringToDecimal ( const char *  s,
int  len,
const ColumnType type,
StringParser::ParseResult result 
)
inlinestatic
template<typename T >
static T impala::StringParser::StringToFloat ( const char *  s,
int  len,
ParseResult result 
)
inlinestatic

Definition at line 78 of file string-parser.h.

References LIKELY, PARSE_SUCCESS, and SkipLeadingWhitespace().

template<typename T >
static T impala::StringParser::StringToFloatInternal ( const char *  s,
int  len,
ParseResult result 
)
inlinestaticprivate

This is considerably faster than glibc's implementation (>100x why???) No special case handling needs to be done for overflows, the floating point spec already does it and will cap the values to -inf/inf To avoid inaccurate conversions this function falls back to strtod for scientific notation. Return PARSE_FAILURE on leading whitespace. Trailing whitespace is allowed. TODO: Investigate using intrinsics to speed up the slow strtod path. TODO: there are other possible optimizations, see IMPALA-1729

Definition at line 357 of file string-parser.h.

References IsAllWhitespace(), LIKELY, PARSE_FAILURE, PARSE_OVERFLOW, PARSE_SUCCESS, and UNLIKELY.

template<typename T >
static T impala::StringParser::StringToInt ( const char *  s,
int  len,
ParseResult result 
)
inlinestatic

Definition at line 59 of file string-parser.h.

References LIKELY, PARSE_SUCCESS, and SkipLeadingWhitespace().

template<typename T >
static T impala::StringParser::StringToInt ( const char *  s,
int  len,
int  base,
ParseResult result 
)
inlinestatic

Convert a string s representing a number in given base into a decimal number.

Definition at line 69 of file string-parser.h.

References LIKELY, PARSE_SUCCESS, and SkipLeadingWhitespace().

template<typename T >
static T impala::StringParser::StringToIntInternal ( const char *  s,
int  len,
ParseResult result 
)
inlinestaticprivate

This is considerably faster than glibc's implementation. In the case of overflow, the max/min value for the data type will be returned. Assumes s represents a decimal number. Return PARSE_FAILURE on leading whitespace. Trailing whitespace is allowed.

Definition at line 236 of file string-parser.h.

References IsAllWhitespace(), LIKELY, PARSE_FAILURE, PARSE_OVERFLOW, PARSE_SUCCESS, and UNLIKELY.

template<typename T >
static T impala::StringParser::StringToIntInternal ( const char *  s,
int  len,
int  base,
ParseResult result 
)
inlinestaticprivate

Convert a string s representing a number in given base into a decimal number. Return PARSE_FAILURE on leading whitespace. Trailing whitespace is allowed.

Definition at line 292 of file string-parser.h.

References IsAllWhitespace(), LIKELY, PARSE_FAILURE, PARSE_OVERFLOW, PARSE_SUCCESS, and UNLIKELY.

template<typename T >
static T impala::StringParser::StringToIntNoOverflow ( const char *  s,
int  len,
ParseResult result 
)
inlinestaticprivate

Converts an ascii string to an integer of type T assuming it cannot overflow and the number is positive. Leading whitespace is not allowed. Trailing whitespace will be skipped.

Definition at line 513 of file string-parser.h.

References IsAllWhitespace(), LIKELY, PARSE_FAILURE, PARSE_SUCCESS, and UNLIKELY.


The documentation for this class was generated from the following file: