Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <dict-encoding.h>
Public Member Functions | |
virtual | ~DictEncoderBase () |
virtual void | WriteDict (uint8_t *buffer)=0 |
virtual int | num_entries () const =0 |
The number of entries in the dictionary. More... | |
void | ClearIndices () |
Clears all the indices (but leaves the dictionary). More... | |
int | EstimatedDataEncodedSize () |
int | bit_width () const |
The minimum bit width required to encode the currently buffered indices. More... | |
int | WriteData (uint8_t *buffer, int buffer_len) |
int | dict_encoded_size () |
Protected Member Functions | |
DictEncoderBase (MemPool *pool) | |
Protected Attributes | |
std::vector< int > | buffered_indices_ |
Indices that have not yet be written out by WriteData(). More... | |
int | dict_encoded_size_ |
The number of bytes needed to encode the dictionary. More... | |
MemPool * | pool_ |
Pool to store StringValue data. Not owned. More... | |
See the dictionary encoding section of https://github.com/Parquet/parquet-format. This class supports dictionary encoding of all Impala types. The encoding supports streaming encoding. Values are encoded as they are added while the dictionary is being constructed. At any time, the buffered values can be written out with the current dictionary size. More values can then be added to the encoder, including new dictionary entries. TODO: if the dictionary was made to be ordered, the dictionary would compress better. Add this to the spec as future improvement. Base class for encoders. This is convenient so users can have a type that abstracts over the actual dictionary type. Note: it does not provide a virtual Put(). Users are expected to know the subclass type when using Put(). TODO: once we can easily remove virtual calls with codegen, this interface can rely less on templating and be easier to follow. The type should be passed in as an argument rather than template argument.
Definition at line 48 of file dict-encoding.h.
|
inlinevirtual |
Definition at line 50 of file dict-encoding.h.
References buffered_indices_.
|
inlineprotected |
Definition at line 87 of file dict-encoding.h.
|
inline |
The minimum bit width required to encode the currently buffered indices.
Definition at line 71 of file dict-encoding.h.
References impala::BitUtil::Log2(), num_entries(), and UNLIKELY.
Referenced by EstimatedDataEncodedSize(), and WriteData().
|
inline |
Clears all the indices (but leaves the dictionary).
Definition at line 62 of file dict-encoding.h.
References buffered_indices_.
Referenced by impala::ValidateDict().
|
inline |
Definition at line 84 of file dict-encoding.h.
References dict_encoded_size_.
Referenced by impala::ValidateDict().
|
inline |
Returns a conservative estimate of the number of bytes needed to encode the buffered indices. Used to size the buffer passed to WriteData().
Definition at line 66 of file dict-encoding.h.
References bit_width(), buffered_indices_, and impala::RleEncoder::MaxBufferSize().
Referenced by impala::ValidateDict().
|
pure virtual |
The number of entries in the dictionary.
Implemented in impala::DictEncoder< T >.
Referenced by bit_width().
|
inline |
Writes out any buffered indices to buffer preceded by the bit width of this data. Returns the number of bytes written. If the supplied buffer is not big enough, returns -1. buffer must be preallocated with buffer_len bytes. Use EstimatedDataEncodedSize() to size buffer.
Definition at line 296 of file dict-encoding.h.
References bit_width(), buffered_indices_, impala::RleEncoder::Flush(), impala::RleEncoder::len(), and impala::RleEncoder::Put().
Referenced by impala::ValidateDict().
|
pure virtual |
Writes out the encoded dictionary to buffer. buffer must be preallocated to dict_encoded_size() bytes.
Implemented in impala::DictEncoder< T >.
|
protected |
Indices that have not yet be written out by WriteData().
Definition at line 92 of file dict-encoding.h.
Referenced by ClearIndices(), EstimatedDataEncodedSize(), WriteData(), and ~DictEncoderBase().
|
protected |
The number of bytes needed to encode the dictionary.
Definition at line 95 of file dict-encoding.h.
Referenced by dict_encoded_size().
|
protected |
Pool to store StringValue data. Not owned.
Definition at line 98 of file dict-encoding.h.