Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <dict-encoding.h>
Classes | |
struct | Node |
Node in the chained hash table. More... | |
Public Member Functions | |
DictEncoder (MemPool *pool, int encoded_value_size) | |
int | Put (const T &value) |
virtual void | WriteDict (uint8_t *buffer) |
virtual int | num_entries () const |
The number of entries in the dictionary. More... | |
void | ClearIndices () |
Clears all the indices (but leaves the dictionary). More... | |
int | EstimatedDataEncodedSize () |
int | bit_width () const |
The minimum bit width required to encode the currently buffered indices. More... | |
int | WriteData (uint8_t *buffer, int buffer_len) |
int | dict_encoded_size () |
Protected Attributes | |
std::vector< int > | buffered_indices_ |
Indices that have not yet be written out by WriteData(). More... | |
int | dict_encoded_size_ |
The number of bytes needed to encode the dictionary. More... | |
MemPool * | pool_ |
Pool to store StringValue data. Not owned. More... | |
Private Types | |
enum | { HASH_TABLE_SIZE = 1 << 16 } |
Size of the table. Must be a power of 2. More... | |
typedef uint16_t | NodeIndex |
Dictates an upper bound on the capacity of the hash table. More... | |
Private Member Functions | |
uint32_t | Hash (const T &value) const |
Hash function for mapping a value to a bucket. More... | |
int | AddToTable (const T &value, NodeIndex *bucket) |
template<> | |
uint32_t | Hash (const StringValue &value) const |
template<> | |
int | AddToTable (const StringValue &value, NodeIndex *bucket) |
Private Attributes | |
std::vector< NodeIndex > | buckets_ |
std::vector< Node > | nodes_ |
int | encoded_value_size_ |
Size of each encoded dictionary value. -1 for variable-length types. More... | |
Definition at line 102 of file dict-encoding.h.
|
private |
Dictates an upper bound on the capacity of the hash table.
Definition at line 124 of file dict-encoding.h.
|
private |
Size of the table. Must be a power of 2.
Enumerator | |
---|---|
HASH_TABLE_SIZE |
Definition at line 121 of file dict-encoding.h.
|
inline |
Definition at line 104 of file dict-encoding.h.
|
inlineprivate |
Adds value to the hash table and updates dict_encoded_size_. Returns the number of bytes added to dict_encoded_size_. bucket gives a pointer to the location (i.e. chain) to add the value so that the hash for value doesn't need to be recomputed.
Definition at line 240 of file dict-encoding.h.
|
inlineprivate |
Definition at line 250 of file dict-encoding.h.
References impala::ParquetPlainEncoder::ByteSize(), impala::StringValue::len, and impala::StringValue::ptr.
|
inlineinherited |
The minimum bit width required to encode the currently buffered indices.
Definition at line 71 of file dict-encoding.h.
References impala::BitUtil::Log2(), impala::DictEncoderBase::num_entries(), and UNLIKELY.
Referenced by impala::DictEncoderBase::EstimatedDataEncodedSize(), and impala::DictEncoderBase::WriteData().
|
inlineinherited |
Clears all the indices (but leaves the dictionary).
Definition at line 62 of file dict-encoding.h.
References impala::DictEncoderBase::buffered_indices_.
Referenced by impala::ValidateDict().
|
inlineinherited |
Definition at line 84 of file dict-encoding.h.
References impala::DictEncoderBase::dict_encoded_size_.
Referenced by impala::ValidateDict().
|
inlineinherited |
Returns a conservative estimate of the number of bytes needed to encode the buffered indices. Used to size the buffer passed to WriteData().
Definition at line 66 of file dict-encoding.h.
References impala::DictEncoderBase::bit_width(), impala::DictEncoderBase::buffered_indices_, and impala::RleEncoder::MaxBufferSize().
Referenced by impala::ValidateDict().
|
inlineprivate |
Hash function for mapping a value to a bucket.
Definition at line 230 of file dict-encoding.h.
References impala::HashUtil::Hash().
|
inlineprivate |
Definition at line 235 of file dict-encoding.h.
References impala::HashUtil::Hash(), impala::StringValue::len, and impala::StringValue::ptr.
|
inlinevirtual |
The number of entries in the dictionary.
Implements impala::DictEncoderBase.
Definition at line 117 of file dict-encoding.h.
References impala::DictEncoder< T >::nodes_.
Referenced by impala::ValidateDict().
|
inline |
Encode value. Returns the number of bytes added to the dictionary page length (will be 0 if this value is already in the dictionary) or -1 if the dictionary is full (in which case the caller should give up on dictionary encoding). Note that this does not actually write any data, just buffers the value's index to be written later.
Definition at line 209 of file dict-encoding.h.
References Hash(), LIKELY, impala::DictEncoder< T >::Node::next, UNLIKELY, and impala::DictEncoder< T >::Node::value.
Referenced by impala::ValidateDict().
|
inlineinherited |
Writes out any buffered indices to buffer preceded by the bit width of this data. Returns the number of bytes written. If the supplied buffer is not big enough, returns -1. buffer must be preallocated with buffer_len bytes. Use EstimatedDataEncodedSize() to size buffer.
Definition at line 296 of file dict-encoding.h.
References impala::DictEncoderBase::bit_width(), impala::DictEncoderBase::buffered_indices_, impala::RleEncoder::Flush(), impala::RleEncoder::len(), and impala::RleEncoder::Put().
Referenced by impala::ValidateDict().
|
inlinevirtual |
Writes out the encoded dictionary to buffer. buffer must be preallocated to dict_encoded_size() bytes.
Implements impala::DictEncoderBase.
Definition at line 290 of file dict-encoding.h.
References impala::ParquetPlainEncoder::Encode(), and impala::DictEncoder< T >::Node::value.
Referenced by impala::ValidateDict().
|
private |
Hash table mapping value to dictionary index (i.e. the number used to encode this value in the data). Each table entry is a index into the nodes_ vector (giving the first node of a chain for this bucket) or Node::INVALID_INDEX for an empty bucket.
Definition at line 129 of file dict-encoding.h.
|
protectedinherited |
Indices that have not yet be written out by WriteData().
Definition at line 92 of file dict-encoding.h.
Referenced by impala::DictEncoderBase::ClearIndices(), impala::DictEncoderBase::EstimatedDataEncodedSize(), impala::DictEncoderBase::WriteData(), and impala::DictEncoderBase::~DictEncoderBase().
|
protectedinherited |
The number of bytes needed to encode the dictionary.
Definition at line 95 of file dict-encoding.h.
Referenced by impala::DictEncoderBase::dict_encoded_size().
|
private |
Size of each encoded dictionary value. -1 for variable-length types.
Definition at line 151 of file dict-encoding.h.
|
private |
The nodes of the hash table. Ordered by dictionary index (and so also represents the reverse mapping from encoded index to value).
Definition at line 148 of file dict-encoding.h.
Referenced by impala::DictEncoder< T >::num_entries().
|
protectedinherited |
Pool to store StringValue data. Not owned.
Definition at line 98 of file dict-encoding.h.