Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::DictEncoder< T > Class Template Reference

#include <dict-encoding.h>

Inheritance diagram for impala::DictEncoder< T >:
Collaboration diagram for impala::DictEncoder< T >:

Classes

struct  Node
 Node in the chained hash table. More...
 

Public Member Functions

 DictEncoder (MemPool *pool, int encoded_value_size)
 
int Put (const T &value)
 
virtual void WriteDict (uint8_t *buffer)
 
virtual int num_entries () const
 The number of entries in the dictionary. More...
 
void ClearIndices ()
 Clears all the indices (but leaves the dictionary). More...
 
int EstimatedDataEncodedSize ()
 
int bit_width () const
 The minimum bit width required to encode the currently buffered indices. More...
 
int WriteData (uint8_t *buffer, int buffer_len)
 
int dict_encoded_size ()
 

Protected Attributes

std::vector< int > buffered_indices_
 Indices that have not yet be written out by WriteData(). More...
 
int dict_encoded_size_
 The number of bytes needed to encode the dictionary. More...
 
MemPoolpool_
 Pool to store StringValue data. Not owned. More...
 

Private Types

enum  { HASH_TABLE_SIZE = 1 << 16 }
 Size of the table. Must be a power of 2. More...
 
typedef uint16_t NodeIndex
 Dictates an upper bound on the capacity of the hash table. More...
 

Private Member Functions

uint32_t Hash (const T &value) const
 Hash function for mapping a value to a bucket. More...
 
int AddToTable (const T &value, NodeIndex *bucket)
 
template<>
uint32_t Hash (const StringValue &value) const
 
template<>
int AddToTable (const StringValue &value, NodeIndex *bucket)
 

Private Attributes

std::vector< NodeIndexbuckets_
 
std::vector< Nodenodes_
 
int encoded_value_size_
 Size of each encoded dictionary value. -1 for variable-length types. More...
 

Detailed Description

template<typename T>
class impala::DictEncoder< T >

Definition at line 102 of file dict-encoding.h.

Member Typedef Documentation

template<typename T>
typedef uint16_t impala::DictEncoder< T >::NodeIndex
private

Dictates an upper bound on the capacity of the hash table.

Definition at line 124 of file dict-encoding.h.

Member Enumeration Documentation

template<typename T>
anonymous enum
private

Size of the table. Must be a power of 2.

Enumerator
HASH_TABLE_SIZE 

Definition at line 121 of file dict-encoding.h.

Constructor & Destructor Documentation

template<typename T>
impala::DictEncoder< T >::DictEncoder ( MemPool pool,
int  encoded_value_size 
)
inline

Definition at line 104 of file dict-encoding.h.

Member Function Documentation

template<typename T >
int impala::DictEncoder< T >::AddToTable ( const T &  value,
NodeIndex bucket 
)
inlineprivate

Adds value to the hash table and updates dict_encoded_size_. Returns the number of bytes added to dict_encoded_size_. bucket gives a pointer to the location (i.e. chain) to add the value so that the hash for value doesn't need to be recomputed.

Definition at line 240 of file dict-encoding.h.

template<>
int impala::DictEncoder< StringValue >::AddToTable ( const StringValue value,
NodeIndex bucket 
)
inlineprivate
int impala::DictEncoderBase::bit_width ( ) const
inlineinherited

The minimum bit width required to encode the currently buffered indices.

Definition at line 71 of file dict-encoding.h.

References impala::BitUtil::Log2(), impala::DictEncoderBase::num_entries(), and UNLIKELY.

Referenced by impala::DictEncoderBase::EstimatedDataEncodedSize(), and impala::DictEncoderBase::WriteData().

void impala::DictEncoderBase::ClearIndices ( )
inlineinherited

Clears all the indices (but leaves the dictionary).

Definition at line 62 of file dict-encoding.h.

References impala::DictEncoderBase::buffered_indices_.

Referenced by impala::ValidateDict().

int impala::DictEncoderBase::dict_encoded_size ( )
inlineinherited

Definition at line 84 of file dict-encoding.h.

References impala::DictEncoderBase::dict_encoded_size_.

Referenced by impala::ValidateDict().

int impala::DictEncoderBase::EstimatedDataEncodedSize ( )
inlineinherited

Returns a conservative estimate of the number of bytes needed to encode the buffered indices. Used to size the buffer passed to WriteData().

Definition at line 66 of file dict-encoding.h.

References impala::DictEncoderBase::bit_width(), impala::DictEncoderBase::buffered_indices_, and impala::RleEncoder::MaxBufferSize().

Referenced by impala::ValidateDict().

template<typename T >
uint32_t impala::DictEncoder< T >::Hash ( const T &  value) const
inlineprivate

Hash function for mapping a value to a bucket.

Definition at line 230 of file dict-encoding.h.

References impala::HashUtil::Hash().

template<>
uint32_t impala::DictEncoder< StringValue >::Hash ( const StringValue value) const
inlineprivate
template<typename T>
virtual int impala::DictEncoder< T >::num_entries ( ) const
inlinevirtual

The number of entries in the dictionary.

Implements impala::DictEncoderBase.

Definition at line 117 of file dict-encoding.h.

References impala::DictEncoder< T >::nodes_.

Referenced by impala::ValidateDict().

template<typename T >
int impala::DictEncoder< T >::Put ( const T &  value)
inline

Encode value. Returns the number of bytes added to the dictionary page length (will be 0 if this value is already in the dictionary) or -1 if the dictionary is full (in which case the caller should give up on dictionary encoding). Note that this does not actually write any data, just buffers the value's index to be written later.

Definition at line 209 of file dict-encoding.h.

References Hash(), LIKELY, impala::DictEncoder< T >::Node::next, UNLIKELY, and impala::DictEncoder< T >::Node::value.

Referenced by impala::ValidateDict().

int impala::DictEncoderBase::WriteData ( uint8_t *  buffer,
int  buffer_len 
)
inlineinherited

Writes out any buffered indices to buffer preceded by the bit width of this data. Returns the number of bytes written. If the supplied buffer is not big enough, returns -1. buffer must be preallocated with buffer_len bytes. Use EstimatedDataEncodedSize() to size buffer.

Definition at line 296 of file dict-encoding.h.

References impala::DictEncoderBase::bit_width(), impala::DictEncoderBase::buffered_indices_, impala::RleEncoder::Flush(), impala::RleEncoder::len(), and impala::RleEncoder::Put().

Referenced by impala::ValidateDict().

template<typename T >
void impala::DictEncoder< T >::WriteDict ( uint8_t *  buffer)
inlinevirtual

Writes out the encoded dictionary to buffer. buffer must be preallocated to dict_encoded_size() bytes.

Implements impala::DictEncoderBase.

Definition at line 290 of file dict-encoding.h.

References impala::ParquetPlainEncoder::Encode(), and impala::DictEncoder< T >::Node::value.

Referenced by impala::ValidateDict().

Member Data Documentation

template<typename T>
std::vector<NodeIndex> impala::DictEncoder< T >::buckets_
private

Hash table mapping value to dictionary index (i.e. the number used to encode this value in the data). Each table entry is a index into the nodes_ vector (giving the first node of a chain for this bucket) or Node::INVALID_INDEX for an empty bucket.

Definition at line 129 of file dict-encoding.h.

std::vector<int> impala::DictEncoderBase::buffered_indices_
protectedinherited
int impala::DictEncoderBase::dict_encoded_size_
protectedinherited

The number of bytes needed to encode the dictionary.

Definition at line 95 of file dict-encoding.h.

Referenced by impala::DictEncoderBase::dict_encoded_size().

template<typename T>
int impala::DictEncoder< T >::encoded_value_size_
private

Size of each encoded dictionary value. -1 for variable-length types.

Definition at line 151 of file dict-encoding.h.

template<typename T>
std::vector<Node> impala::DictEncoder< T >::nodes_
private

The nodes of the hash table. Ordered by dictionary index (and so also represents the reverse mapping from encoded index to value).

Definition at line 148 of file dict-encoding.h.

Referenced by impala::DictEncoder< T >::num_entries().

MemPool* impala::DictEncoderBase::pool_
protectedinherited

Pool to store StringValue data. Not owned.

Definition at line 98 of file dict-encoding.h.


The documentation for this class was generated from the following file: