Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::HashTable Class Reference

#include <hash-table.h>

Collaboration diagram for impala::HashTable:

Classes

struct  Bucket
 
struct  DuplicateNode
 Linked list of entries used for duplicates. More...
 
union  HtData
 Either the row in the tuple stream or a pointer to the single tuple of this row. More...
 
class  Iterator
 stl-like iterator interface. More...
 

Public Member Functions

 HashTable (RuntimeState *state, BufferedBlockMgr::Client *client, int num_build_tuples, BufferedTupleStream *tuple_stream, int64_t max_num_buckets, int64_t initial_num_buckets=1024)
 
 HashTable (MemPool *pool, bool quadratic_probing, int num_buckets)
 
bool Init ()
 Allocates the initial bucket structure. Returns false if OOM. More...
 
void Close ()
 Call to cleanup any resources. Must be called once. More...
 
bool IR_ALWAYS_INLINE Insert (HashTableCtx *ht_ctx, const BufferedTupleStream::RowIdx &idx, TupleRow *row, uint32_t hash)
 
bool IR_ALWAYS_INLINE Insert (HashTableCtx *ht_ctx, Tuple *tuple, uint32_t hash)
 
Iterator IR_ALWAYS_INLINE Find (HashTableCtx *ht_ctx, uint32_t hash)
 
int64_t size () const
 Returns number of elements inserted in the hash table. More...
 
int64_t EmptyBuckets () const
 Returns the number of empty buckets. More...
 
int64_t num_buckets () const
 Returns the number of buckets. More...
 
double load_factor () const
 Returns the load factor (the number of non-empty buckets) More...
 
int64_t CurrentMemSize () const
 
bool CheckAndResize (uint64_t buckets_to_fill, HashTableCtx *ht_ctx)
 
int64_t byte_size () const
 Returns the number of bytes allocated to the hash table. More...
 
Iterator Begin (HashTableCtx *ht_ctx)
 
Iterator FirstUnmatched (HashTableCtx *ctx)
 
bool HasMatches () const
 Return true if there was a least one match. More...
 
Iterator End ()
 Return end marker. More...
 
std::string DebugString (bool skip_empty, bool show_match, const RowDescriptor *build_desc)
 
void DebugStringTuple (std::stringstream &ss, HtData &htdata, const RowDescriptor *desc)
 Print the content of a bucket or node. More...
 
std::string PrintStats () const
 Update and print some statistics that can be used for performance debugging. More...
 

Static Public Member Functions

static int64_t EstimateNumBuckets (int64_t num_rows)
 
static int64_t EstimateSize (int64_t num_rows)
 

Private Member Functions

int64_t IR_ALWAYS_INLINE Probe (Bucket *buckets, int64_t num_buckets, HashTableCtx *ht_ctx, uint32_t hash, bool *found)
 There are wrappers of this function that perform the Find and Insert logic. More...
 
HtData *IR_ALWAYS_INLINE InsertInternal (HashTableCtx *ht_ctx, uint32_t hash)
 
void NextFilledBucket (int64_t *bucket_idx, DuplicateNode **node)
 
bool ResizeBuckets (int64_t num_buckets, HashTableCtx *ht_ctx)
 Resize the hash table to 'num_buckets'. Returns false on OOM. More...
 
DuplicateNode *IR_ALWAYS_INLINE AppendNextNode (Bucket *bucket)
 
DuplicateNode *IR_ALWAYS_INLINE InsertDuplicateNode (int64_t bucket_idx)
 
void IR_ALWAYS_INLINE PrepareBucketForInsert (int64_t bucket_idx, uint32_t hash)
 
TupleRowGetRow (HtData &htdata, TupleRow *row) const
 Return the TupleRow pointed by 'htdata'. More...
 
TupleRowGetRow (Bucket *bucket, TupleRow *row) const
 
bool GrowNodeArray ()
 Grow the node array. Returns false on OOM. More...
 

Private Attributes

RuntimeStatestate_
 
BufferedBlockMgr::Clientblock_mgr_client_
 Client to allocate data pages with. More...
 
BufferedTupleStreamtuple_stream_
 
MemPooldata_page_pool_
 Only used for tests to allocate data pages instead of the block mgr. More...
 
const bool stores_tuples_
 
const bool quadratic_probing_
 Quadratic probing enabled (as opposed to linear). More...
 
std::vector
< BufferedBlockMgr::Block * > 
data_pages_
 Data pages for all nodes. These are always pinned. More...
 
int64_t total_data_page_size_
 Byte size of all buffers in data_pages_. More...
 
DuplicateNodenext_node_
 Next duplicate node to insert. Vaild when node_remaining_current_page_ > 0. More...
 
int node_remaining_current_page_
 Number of nodes left in the current page. More...
 
int64_t num_duplicate_nodes_
 Number of duplicate nodes. More...
 
const int64_t max_num_buckets_
 
Bucketbuckets_
 
int64_t num_buckets_
 Total number of buckets (filled and empty). More...
 
int64_t num_filled_buckets_
 Number of non-empty buckets. Used to determine when to resize. More...
 
int64_t num_buckets_with_duplicates_
 
int num_build_tuples_
 
bool has_matches_
 
int64_t num_probes_
 
int64_t num_failed_probes_
 Number of probes that failed and had to fall back to linear probing without cap. More...
 
int64_t travel_length_
 
int64_t num_hash_collisions_
 
int64_t num_resizes_
 How many times this table has resized so far. More...
 

Static Private Attributes

static const double MAX_FILL_FACTOR = 0.75f
 

Friends

class Iterator
 
class HashTableTest
 

Detailed Description

The hash table consists of a contiguous array of buckets that contain a pointer to the data, the hash value and three flags: whether this bucket is filled, whether this entry has been matched (used in right and full joins) and whether this entry has duplicates. If there are duplicates, then the data is pointing to the head of a linked list of duplicate nodes that point to the actual data. Note that the duplicate nodes do not contain the hash value, because all the linked nodes have the same hash value, the one in the bucket. The data is either a tuple stream index or a Tuple*. This array of buckets is sparse, we are shooting for up to 3/4 fill factor (75%). The data allocated by the hash table comes from the BufferedBlockMgr.

Definition at line 281 of file hash-table.h.

Constructor & Destructor Documentation

HashTable::HashTable ( RuntimeState state,
BufferedBlockMgr::Client client,
int  num_build_tuples,
BufferedTupleStream tuple_stream,
int64_t  max_num_buckets,
int64_t  initial_num_buckets = 1024 
)

Create a hash table.

  • client: block mgr client to allocate data pages from.
  • num_build_tuples: number of Tuples in the build tuple row.
  • tuple_stream: the tuple stream which contains the tuple rows index by the hash table. Can be NULL if the rows contain only a single tuple, in which the 'tuple_stream' is unused.
  • max_num_buckets: the maximum number of buckets that can be stored. If we try to grow the number of buckets to a larger number, the inserts will fail. -1, if it unlimited.
  • initial_num_buckets: number of buckets that the hash table should be initialized with.

Definition at line 192 of file hash-table.cc.

References stores_tuples_.

HashTable::HashTable ( MemPool pool,
bool  quadratic_probing,
int  num_buckets 
)

Ctor used only for testing. Memory is allocated from the 'pool' instead of the block mgr.

Definition at line 219 of file hash-table.cc.

References Init().

Member Function Documentation

HashTable::DuplicateNode * impala::HashTable::AppendNextNode ( Bucket bucket)
inlineprivate

Appends the DuplicateNode pointed by next_node_ to 'bucket' and moves the next_node_ pointer to the next DuplicateNode in the page, updating the remaining node counter.

Definition at line 175 of file hash-table.inline.h.

References impala::HashTable::Bucket::bucketData, impala::HashTable::Bucket::duplicates, next_node_, node_remaining_current_page_, and num_duplicate_nodes_.

Referenced by InsertDuplicateNode().

HashTable::Iterator impala::HashTable::Begin ( HashTableCtx ht_ctx)
inline

Returns an iterator at the beginning of the hash table. Advancing this iterator will traverse all elements.

Definition at line 128 of file hash-table.inline.h.

References impala::HashTable::Iterator::BUCKET_NOT_FOUND, Iterator, NextFilledBucket(), and impala::HashTableCtx::row().

Referenced by impala::HashTableTest::FullScan(), and ResizeBuckets().

int64_t impala::HashTable::byte_size ( ) const
inline

Returns the number of bytes allocated to the hash table.

Definition at line 419 of file hash-table.h.

References total_data_page_size_.

bool HashTable::CheckAndResize ( uint64_t  buckets_to_fill,
HashTableCtx ht_ctx 
)

Calculates the fill factor if 'buckets_to_fill' additional buckets were to be filled and resizes the hash table so that the projected fill factor is below the max fill factor. If it returns true, then it is guaranteed at least 'rows_to_add' rows can be inserted without need to resize.

Definition at line 282 of file hash-table.cc.

References MAX_FILL_FACTOR, num_buckets_, num_filled_buckets_, and ResizeBuckets().

Referenced by impala::HashTableTest::BasicTest(), impala::HashTableTest::GrowTableTest(), and impala::HashTableTest::ScanTest().

int64_t HashTable::CurrentMemSize ( ) const

Returns the memory occupied by the hash table, takes into account the number of duplicates.

Definition at line 278 of file hash-table.cc.

References num_buckets_, and num_duplicate_nodes_.

string HashTable::DebugString ( bool  skip_empty,
bool  show_match,
const RowDescriptor build_desc 
)

Dump out the entire hash table to string. If 'skip_empty', empty buckets are skipped. If 'show_match', it also prints the matched flag of each node. If 'build_desc' is non-null, the build rows will be printed. Otherwise, only the the addresses of the build rows will be printed.

Definition at line 387 of file hash-table.cc.

References impala::HashTable::Bucket::bucketData, buckets_, DebugStringTuple(), impala::HashTable::Bucket::duplicates, impala::HashTable::DuplicateNode::htdata, impala::HashTable::Bucket::htdata, impala::HashTable::DuplicateNode::next, and num_buckets_.

void HashTable::DebugStringTuple ( std::stringstream &  ss,
HtData htdata,
const RowDescriptor desc 
)
int64_t impala::HashTable::EmptyBuckets ( ) const
inline

Returns the number of empty buckets.

Definition at line 385 of file hash-table.h.

References num_buckets_, and num_filled_buckets_.

Referenced by impala::HashTableTest::InsertFullTest().

Iterator impala::HashTable::End ( )
inline

Return end marker.

Definition at line 434 of file hash-table.h.

References Iterator.

Referenced by Find().

static int64_t impala::HashTable::EstimateNumBuckets ( int64_t  num_rows)
inlinestatic

Returns an estimate of the number of bytes needed to build the hash table structure for 'num_rows'. To do that, it estimates the number of buckets, rounded up to a power of two, and also assumes that there are no duplicates.

Assume max 66% fill factor and no duplicates.

Definition at line 398 of file hash-table.h.

References impala::BitUtil::NextPowerOfTwo().

Referenced by impala::PartitionedHashJoinNode::Partition::BuildHashTableInternal(), and EstimateSize().

static int64_t impala::HashTable::EstimateSize ( int64_t  num_rows)
inlinestatic
HashTable::Iterator impala::HashTable::Find ( HashTableCtx ht_ctx,
uint32_t  hash 
)
inline

Returns an iterator to the bucket matching the last row evaluated in 'ht_ctx'. Returns HashTable::End() if no match is found. The iterator can be iterated until HashTable::End() to find all the matching rows. Advancing the returned iterator will go to the next matching row. The matching rows do not need to be evaluated since all the nodes of a bucket are duplicates. One scan can be in progress for each 'ht_ctx'. Used during the probe phase of hash joins.

Definition at line 117 of file hash-table.inline.h.

References impala::HashTable::Bucket::bucketData, buckets_, impala::HashTable::Bucket::duplicates, End(), impala::hash, Iterator, num_buckets_, num_probes_, Probe(), and impala::HashTableCtx::row().

Referenced by impala::HashTableTest::GrowTableTest(), impala::HashTableTest::InsertFullTest(), impala::HashTableTest::ProbeTest(), and impala::PartitionedHashJoinNode::ProcessProbeBatch().

HashTable::Iterator impala::HashTable::FirstUnmatched ( HashTableCtx ctx)
inline

Return an iterator pointing to the first element (Bucket or DuplicateNode, if the bucket has duplicates) in the hash table that does not have its matched flag set. Used in right joins and full-outer joins.

Definition at line 135 of file hash-table.inline.h.

References impala::HashTable::Iterator::BUCKET_NOT_FOUND, buckets_, NextFilledBucket(), and impala::HashTableCtx::row().

TupleRow * impala::HashTable::GetRow ( HtData htdata,
TupleRow row 
) const
inlineprivate
TupleRow * impala::HashTable::GetRow ( Bucket bucket,
TupleRow row 
) const
inlineprivate

Returns the TupleRow of the pointed 'bucket'. In case of duplicates, it returns the content of the first chained duplicate node of the bucket.

Definition at line 219 of file hash-table.inline.h.

References impala::HashTable::Bucket::bucketData, impala::HashTable::Bucket::duplicates, GetRow(), impala::HashTable::Bucket::hasDuplicates, impala::HashTable::DuplicateNode::htdata, impala::HashTable::Bucket::htdata, and UNLIKELY.

bool impala::HashTable::HasMatches ( ) const
inline

Return true if there was a least one match.

Definition at line 431 of file hash-table.h.

References has_matches_.

bool HashTable::Init ( )

Allocates the initial bucket structure. Returns false if OOM.

Definition at line 245 of file hash-table.cc.

References impala::RuntimeState::block_mgr(), block_mgr_client_, buckets_, impala::BufferedBlockMgr::ConsumeMemory(), GrowNodeArray(), num_buckets_, and state_.

Referenced by HashTable().

bool impala::HashTable::Insert ( HashTableCtx ht_ctx,
const BufferedTupleStream::RowIdx idx,
TupleRow row,
uint32_t  hash 
)
inline

Inserts the row to the hash table. Returns true if the insertion was successful. 'idx' is the index into tuple_stream_ for this row. If the row contains more than one tuple, the 'idx' is stored instead of the 'row'. The 'row' is not copied by the hash table and the caller must guarantee it stays in memory. This will not grow the hash table. In the case that there is a need to insert a duplicate node, instead of filling a new bucket, and there is not enough memory to insert a duplicate node, the insert fails and this function returns false. Used during the build phase of hash joins.

Definition at line 94 of file hash-table.inline.h.

References impala::TupleRow::GetTuple(), impala::hash, gen_ir_descriptions::idx, InsertInternal(), LIKELY, and stores_tuples_.

Referenced by impala::HashTableTest::BasicTest(), impala::HashTableTest::GrowTableTest(), impala::HashTableTest::InsertFullTest(), and impala::HashTableTest::ScanTest().

bool impala::HashTable::Insert ( HashTableCtx ht_ctx,
Tuple tuple,
uint32_t  hash 
)
inline

Same as Insert() but for inserting a single Tuple. The 'tuple' is not copied by the hash table and the caller must guarantee it stays in memory.

Definition at line 106 of file hash-table.inline.h.

References InsertInternal(), LIKELY, stores_tuples_, and impala::HashTable::HtData::tuple.

HashTable::DuplicateNode * impala::HashTable::InsertDuplicateNode ( int64_t  bucket_idx)
inlineprivate

Creates a new DuplicateNode for a entry and chains it to the bucket with index 'bucket_idx'. The duplicate nodes of a bucket are chained as a linked list. This places the new duplicate node at the beginning of the list. If this is the first duplicate entry inserted in this bucket, then the entry already contained by the bucket is converted to a DuplicateNode. That is, the contents of 'data' of the bucket are copied to a DuplicateNode and 'data' is updated to pointing to a DuplicateNode. Returns NULL if the node array could not grow, i.e. there was not enough memory to allocate a new DuplicateNode.

Definition at line 183 of file hash-table.inline.h.

References AppendNextNode(), impala::HashTable::Bucket::bucketData, buckets_, impala::HashTable::Bucket::duplicates, impala::HashTable::Bucket::filled, GrowNodeArray(), impala::HashTable::Bucket::hasDuplicates, impala::HashTable::DuplicateNode::htdata, impala::HashTable::Bucket::htdata, impala::HashTable::HtData::idx, impala::HashTable::DuplicateNode::matched, impala::HashTable::Bucket::matched, impala::HashTable::DuplicateNode::next, next_node_, num_buckets_, num_buckets_with_duplicates_, and UNLIKELY.

Referenced by InsertInternal().

HashTable::HtData * impala::HashTable::InsertInternal ( HashTableCtx ht_ctx,
uint32_t  hash 
)
inlineprivate

Performs the insert logic. Returns the HtData* of the bucket or duplicate node where the data should be inserted. Returns NULL if the insert was not successful.

Definition at line 77 of file hash-table.inline.h.

References impala::HashTable::Iterator::BUCKET_NOT_FOUND, impala::HashTable::Bucket::bucketData, buckets_, impala::HashTable::DuplicateNode::htdata, impala::HashTable::Bucket::htdata, InsertDuplicateNode(), num_buckets_, num_probes_, PrepareBucketForInsert(), Probe(), and UNLIKELY.

Referenced by Insert().

double impala::HashTable::load_factor ( ) const
inline

Returns the load factor (the number of non-empty buckets)

Definition at line 391 of file hash-table.h.

References num_buckets_, and num_filled_buckets_.

void impala::HashTable::NextFilledBucket ( int64_t *  bucket_idx,
DuplicateNode **  node 
)
inlineprivate

Updates 'bucket_idx' to the index of the next non-empty bucket. If the bucket has duplicates, 'node' will be pointing to the head of the linked list of duplicates. Otherwise, 'node' should not be used. If there are no more buckets, sets 'bucket_idx' to BUCKET_NOT_FOUND.

Definition at line 150 of file hash-table.inline.h.

References impala::HashTable::Iterator::BUCKET_NOT_FOUND, impala::HashTable::Bucket::bucketData, buckets_, impala::HashTable::Bucket::duplicates, and num_buckets_.

Referenced by Begin(), FirstUnmatched(), and ResizeBuckets().

int64_t impala::HashTable::num_buckets ( ) const
inline

Returns the number of buckets.

Definition at line 388 of file hash-table.h.

References num_buckets_.

Referenced by impala::HashTableTest::BasicTest(), EstimateSize(), ResizeBuckets(), and impala::HashTableTest::ScanTest().

void impala::HashTable::PrepareBucketForInsert ( int64_t  bucket_idx,
uint32_t  hash 
)
inlineprivate

Resets the contents of the bucket with index 'bucket_idx', in preparation for an insert. Sets all the fields of the bucket other than 'data'.

Definition at line 163 of file hash-table.inline.h.

References buckets_, impala::HashTable::Bucket::filled, impala::HashTable::Bucket::hasDuplicates, impala::hash, impala::HashTable::Bucket::hash, impala::HashTable::Bucket::matched, num_buckets_, and num_filled_buckets_.

Referenced by InsertInternal().

string HashTable::PrintStats ( ) const

Update and print some statistics that can be used for performance debugging.

Definition at line 424 of file hash-table.cc.

References num_buckets_, num_buckets_with_duplicates_, num_duplicate_nodes_, num_failed_probes_, num_filled_buckets_, num_hash_collisions_, num_probes_, num_resizes_, and travel_length_.

Referenced by Close().

int64_t impala::HashTable::Probe ( Bucket buckets,
int64_t  num_buckets,
HashTableCtx ht_ctx,
uint32_t  hash,
bool found 
)
inlineprivate

There are wrappers of this function that perform the Find and Insert logic.

Performs the probing operation according to the probing algorithm (linear or quadratic. Returns one of the following: (a) the index of the bucket that contains the entry that matches with the last row evaluated in 'ht_ctx'. If 'ht_ctx' is NULL then it does not check for row equality and returns the index of the first empty bucket. (b) the index of the first empty bucket according to the probing algorithm (linear or quadratic), if the entry is not in the hash table or 'ht_ctx' is NULL. (c) Iterator::BUCKET_NOT_FOUND if the probe was not successful, i.e. the maximum distance was traveled without finding either an empty or a matching bucket. Using the returned index value, the caller can create an iterator that can be iterated until End() to find all the matching rows. EvalAndHashBuild() or EvalAndHashProb(e) must have been called before calling this. 'hash' must be the hash returned by these functions. 'found' indicates that a bucket that contains an equal row is found.

Definition at line 37 of file hash-table.inline.h.

References impala::HashTable::Iterator::BUCKET_NOT_FOUND, impala::HashTableCtx::Equals(), impala::HashTable::Bucket::filled, GetRow(), impala::hash, impala::HashTable::Bucket::hash, LIKELY, num_filled_buckets_, num_hash_collisions_, quadratic_probing_, impala::HashTableCtx::row_, and travel_length_.

Referenced by Find(), InsertInternal(), and ResizeBuckets().

int64_t impala::HashTable::size ( ) const
inline

Friends And Related Function Documentation

friend class HashTableTest
friend

Definition at line 518 of file hash-table.h.

friend class Iterator
friend

Definition at line 517 of file hash-table.h.

Referenced by Begin(), End(), and Find().

Member Data Documentation

BufferedBlockMgr::Client* impala::HashTable::block_mgr_client_
private

Client to allocate data pages with.

Definition at line 588 of file hash-table.h.

Referenced by Close(), GrowNodeArray(), Init(), and ResizeBuckets().

Bucket* impala::HashTable::buckets_
private

Array of all buckets. Owned by this node. Using c-style array to control control memory footprint.

Definition at line 626 of file hash-table.h.

Referenced by Close(), DebugString(), Find(), FirstUnmatched(), impala::HashTable::Iterator::GetRow(), Init(), InsertDuplicateNode(), InsertInternal(), NextFilledBucket(), PrepareBucketForInsert(), and ResizeBuckets().

MemPool* impala::HashTable::data_page_pool_
private

Only used for tests to allocate data pages instead of the block mgr.

Definition at line 596 of file hash-table.h.

Referenced by GrowNodeArray().

std::vector<BufferedBlockMgr::Block*> impala::HashTable::data_pages_
private

Data pages for all nodes. These are always pinned.

Definition at line 608 of file hash-table.h.

Referenced by Close(), and GrowNodeArray().

bool impala::HashTable::has_matches_
private

Flag used to disable spilling hash tables that already had matches in case of right joins (IMPALA-1488). TODO: Not fail when spilling hash tables with matches in right joins

Definition at line 645 of file hash-table.h.

Referenced by HasMatches().

const double HashTable::MAX_FILL_FACTOR = 0.75f
staticprivate

Load factor that will trigger growing the hash table on insert. This is defined as the number of non-empty buckets / total_buckets

Definition at line 583 of file hash-table.h.

Referenced by CheckAndResize().

const int64_t impala::HashTable::max_num_buckets_
private

Definition at line 622 of file hash-table.h.

Referenced by ResizeBuckets().

DuplicateNode* impala::HashTable::next_node_
private

Next duplicate node to insert. Vaild when node_remaining_current_page_ > 0.

Definition at line 614 of file hash-table.h.

Referenced by AppendNextNode(), GrowNodeArray(), and InsertDuplicateNode().

int impala::HashTable::node_remaining_current_page_
private

Number of nodes left in the current page.

Definition at line 617 of file hash-table.h.

Referenced by AppendNextNode(), and GrowNodeArray().

int64_t impala::HashTable::num_buckets_
private
int64_t impala::HashTable::num_buckets_with_duplicates_
private

Number of (non-empty) buckets with duplicates. These buckets do not point to slots in the tuple stream, rather than to a linked list of Nodes.

Definition at line 636 of file hash-table.h.

Referenced by InsertDuplicateNode(), PrintStats(), and size().

int impala::HashTable::num_build_tuples_
private

Number of build tuples, used for constructing temp row* for probes. TODO: We should remove it.

Definition at line 640 of file hash-table.h.

Referenced by DebugStringTuple().

int64_t impala::HashTable::num_duplicate_nodes_
private

Number of duplicate nodes.

Definition at line 620 of file hash-table.h.

Referenced by AppendNextNode(), CurrentMemSize(), PrintStats(), and size().

int64_t impala::HashTable::num_failed_probes_
private

Number of probes that failed and had to fall back to linear probing without cap.

Definition at line 653 of file hash-table.h.

Referenced by PrintStats().

int64_t impala::HashTable::num_filled_buckets_
private

Number of non-empty buckets. Used to determine when to resize.

Definition at line 632 of file hash-table.h.

Referenced by CheckAndResize(), EmptyBuckets(), load_factor(), PrepareBucketForInsert(), PrintStats(), Probe(), ResizeBuckets(), and size().

int64_t impala::HashTable::num_hash_collisions_
private

The number of cases where we had to compare buckets with the same hash value, but the row equality failed.

Definition at line 661 of file hash-table.h.

Referenced by PrintStats(), and Probe().

int64_t impala::HashTable::num_probes_
private

The stats below can be used for debugging perf. TODO: Should we make these statistics atomic? Number of calls to either Find, Insert or FindOrInsert an entry.

Definition at line 650 of file hash-table.h.

Referenced by Close(), Find(), InsertInternal(), and PrintStats().

int64_t impala::HashTable::num_resizes_
private

How many times this table has resized so far.

Definition at line 664 of file hash-table.h.

Referenced by PrintStats(), and ResizeBuckets().

const bool impala::HashTable::quadratic_probing_
private

Quadratic probing enabled (as opposed to linear).

Definition at line 605 of file hash-table.h.

Referenced by Probe().

RuntimeState* impala::HashTable::state_
private

Definition at line 585 of file hash-table.h.

Referenced by Close(), GrowNodeArray(), Init(), and ResizeBuckets().

const bool impala::HashTable::stores_tuples_
private

Constants on how the hash table should behave. Joins and aggs have slightly different behavior. TODO: these constants are an ideal candidate to be removed with codegen. TODO: ..or with template-ization

Definition at line 602 of file hash-table.h.

Referenced by DebugStringTuple(), GetRow(), HashTable(), and Insert().

int64_t impala::HashTable::total_data_page_size_
private

Byte size of all buffers in data_pages_.

Definition at line 611 of file hash-table.h.

Referenced by byte_size(), Close(), and GrowNodeArray().

int64_t impala::HashTable::travel_length_
private

Total distance traveled for each probe. That is the sum of the diff between the end position of a probe (find/insert) and its start position (hash & (num_buckets_ - 1)).

Definition at line 657 of file hash-table.h.

Referenced by PrintStats(), and Probe().

BufferedTupleStream* impala::HashTable::tuple_stream_
private

Stream contains the rows referenced by the hash table. Can be NULL if the row only contains a single tuple, in which case the TupleRow indirection is removed by the hash table.

Definition at line 593 of file hash-table.h.

Referenced by GetRow().


The documentation for this class was generated from the following files: