Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::DictEncoderBase Class Referenceabstract

#include <dict-encoding.h>

Inheritance diagram for impala::DictEncoderBase:
Collaboration diagram for impala::DictEncoderBase:

Public Member Functions

virtual ~DictEncoderBase ()
 
virtual void WriteDict (uint8_t *buffer)=0
 
virtual int num_entries () const =0
 The number of entries in the dictionary. More...
 
void ClearIndices ()
 Clears all the indices (but leaves the dictionary). More...
 
int EstimatedDataEncodedSize ()
 
int bit_width () const
 The minimum bit width required to encode the currently buffered indices. More...
 
int WriteData (uint8_t *buffer, int buffer_len)
 
int dict_encoded_size ()
 

Protected Member Functions

 DictEncoderBase (MemPool *pool)
 

Protected Attributes

std::vector< int > buffered_indices_
 Indices that have not yet be written out by WriteData(). More...
 
int dict_encoded_size_
 The number of bytes needed to encode the dictionary. More...
 
MemPoolpool_
 Pool to store StringValue data. Not owned. More...
 

Detailed Description

See the dictionary encoding section of https://github.com/Parquet/parquet-format. This class supports dictionary encoding of all Impala types. The encoding supports streaming encoding. Values are encoded as they are added while the dictionary is being constructed. At any time, the buffered values can be written out with the current dictionary size. More values can then be added to the encoder, including new dictionary entries. TODO: if the dictionary was made to be ordered, the dictionary would compress better. Add this to the spec as future improvement. Base class for encoders. This is convenient so users can have a type that abstracts over the actual dictionary type. Note: it does not provide a virtual Put(). Users are expected to know the subclass type when using Put(). TODO: once we can easily remove virtual calls with codegen, this interface can rely less on templating and be easier to follow. The type should be passed in as an argument rather than template argument.

Definition at line 48 of file dict-encoding.h.

Constructor & Destructor Documentation

virtual impala::DictEncoderBase::~DictEncoderBase ( )
inlinevirtual

Definition at line 50 of file dict-encoding.h.

References buffered_indices_.

impala::DictEncoderBase::DictEncoderBase ( MemPool pool)
inlineprotected

Definition at line 87 of file dict-encoding.h.

Member Function Documentation

int impala::DictEncoderBase::bit_width ( ) const
inline

The minimum bit width required to encode the currently buffered indices.

Definition at line 71 of file dict-encoding.h.

References impala::BitUtil::Log2(), num_entries(), and UNLIKELY.

Referenced by EstimatedDataEncodedSize(), and WriteData().

void impala::DictEncoderBase::ClearIndices ( )
inline

Clears all the indices (but leaves the dictionary).

Definition at line 62 of file dict-encoding.h.

References buffered_indices_.

Referenced by impala::ValidateDict().

int impala::DictEncoderBase::dict_encoded_size ( )
inline

Definition at line 84 of file dict-encoding.h.

References dict_encoded_size_.

Referenced by impala::ValidateDict().

int impala::DictEncoderBase::EstimatedDataEncodedSize ( )
inline

Returns a conservative estimate of the number of bytes needed to encode the buffered indices. Used to size the buffer passed to WriteData().

Definition at line 66 of file dict-encoding.h.

References bit_width(), buffered_indices_, and impala::RleEncoder::MaxBufferSize().

Referenced by impala::ValidateDict().

virtual int impala::DictEncoderBase::num_entries ( ) const
pure virtual

The number of entries in the dictionary.

Implemented in impala::DictEncoder< T >.

Referenced by bit_width().

int impala::DictEncoderBase::WriteData ( uint8_t *  buffer,
int  buffer_len 
)
inline

Writes out any buffered indices to buffer preceded by the bit width of this data. Returns the number of bytes written. If the supplied buffer is not big enough, returns -1. buffer must be preallocated with buffer_len bytes. Use EstimatedDataEncodedSize() to size buffer.

Definition at line 296 of file dict-encoding.h.

References bit_width(), buffered_indices_, impala::RleEncoder::Flush(), impala::RleEncoder::len(), and impala::RleEncoder::Put().

Referenced by impala::ValidateDict().

virtual void impala::DictEncoderBase::WriteDict ( uint8_t *  buffer)
pure virtual

Writes out the encoded dictionary to buffer. buffer must be preallocated to dict_encoded_size() bytes.

Implemented in impala::DictEncoder< T >.

Member Data Documentation

std::vector<int> impala::DictEncoderBase::buffered_indices_
protected

Indices that have not yet be written out by WriteData().

Definition at line 92 of file dict-encoding.h.

Referenced by ClearIndices(), EstimatedDataEncodedSize(), WriteData(), and ~DictEncoderBase().

int impala::DictEncoderBase::dict_encoded_size_
protected

The number of bytes needed to encode the dictionary.

Definition at line 95 of file dict-encoding.h.

Referenced by dict_encoded_size().

MemPool* impala::DictEncoderBase::pool_
protected

Pool to store StringValue data. Not owned.

Definition at line 98 of file dict-encoding.h.


The documentation for this class was generated from the following file: