Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <hash-table.h>
Classes | |
struct | Bucket |
struct | DuplicateNode |
Linked list of entries used for duplicates. More... | |
union | HtData |
Either the row in the tuple stream or a pointer to the single tuple of this row. More... | |
class | Iterator |
stl-like iterator interface. More... | |
Public Member Functions | |
HashTable (RuntimeState *state, BufferedBlockMgr::Client *client, int num_build_tuples, BufferedTupleStream *tuple_stream, int64_t max_num_buckets, int64_t initial_num_buckets=1024) | |
HashTable (MemPool *pool, bool quadratic_probing, int num_buckets) | |
bool | Init () |
Allocates the initial bucket structure. Returns false if OOM. More... | |
void | Close () |
Call to cleanup any resources. Must be called once. More... | |
bool IR_ALWAYS_INLINE | Insert (HashTableCtx *ht_ctx, const BufferedTupleStream::RowIdx &idx, TupleRow *row, uint32_t hash) |
bool IR_ALWAYS_INLINE | Insert (HashTableCtx *ht_ctx, Tuple *tuple, uint32_t hash) |
Iterator IR_ALWAYS_INLINE | Find (HashTableCtx *ht_ctx, uint32_t hash) |
int64_t | size () const |
Returns number of elements inserted in the hash table. More... | |
int64_t | EmptyBuckets () const |
Returns the number of empty buckets. More... | |
int64_t | num_buckets () const |
Returns the number of buckets. More... | |
double | load_factor () const |
Returns the load factor (the number of non-empty buckets) More... | |
int64_t | CurrentMemSize () const |
bool | CheckAndResize (uint64_t buckets_to_fill, HashTableCtx *ht_ctx) |
int64_t | byte_size () const |
Returns the number of bytes allocated to the hash table. More... | |
Iterator | Begin (HashTableCtx *ht_ctx) |
Iterator | FirstUnmatched (HashTableCtx *ctx) |
bool | HasMatches () const |
Return true if there was a least one match. More... | |
Iterator | End () |
Return end marker. More... | |
std::string | DebugString (bool skip_empty, bool show_match, const RowDescriptor *build_desc) |
void | DebugStringTuple (std::stringstream &ss, HtData &htdata, const RowDescriptor *desc) |
Print the content of a bucket or node. More... | |
std::string | PrintStats () const |
Update and print some statistics that can be used for performance debugging. More... | |
Static Public Member Functions | |
static int64_t | EstimateNumBuckets (int64_t num_rows) |
static int64_t | EstimateSize (int64_t num_rows) |
Private Member Functions | |
int64_t IR_ALWAYS_INLINE | Probe (Bucket *buckets, int64_t num_buckets, HashTableCtx *ht_ctx, uint32_t hash, bool *found) |
There are wrappers of this function that perform the Find and Insert logic. More... | |
HtData *IR_ALWAYS_INLINE | InsertInternal (HashTableCtx *ht_ctx, uint32_t hash) |
void | NextFilledBucket (int64_t *bucket_idx, DuplicateNode **node) |
bool | ResizeBuckets (int64_t num_buckets, HashTableCtx *ht_ctx) |
Resize the hash table to 'num_buckets'. Returns false on OOM. More... | |
DuplicateNode *IR_ALWAYS_INLINE | AppendNextNode (Bucket *bucket) |
DuplicateNode *IR_ALWAYS_INLINE | InsertDuplicateNode (int64_t bucket_idx) |
void IR_ALWAYS_INLINE | PrepareBucketForInsert (int64_t bucket_idx, uint32_t hash) |
TupleRow * | GetRow (HtData &htdata, TupleRow *row) const |
Return the TupleRow pointed by 'htdata'. More... | |
TupleRow * | GetRow (Bucket *bucket, TupleRow *row) const |
bool | GrowNodeArray () |
Grow the node array. Returns false on OOM. More... | |
Private Attributes | |
RuntimeState * | state_ |
BufferedBlockMgr::Client * | block_mgr_client_ |
Client to allocate data pages with. More... | |
BufferedTupleStream * | tuple_stream_ |
MemPool * | data_page_pool_ |
Only used for tests to allocate data pages instead of the block mgr. More... | |
const bool | stores_tuples_ |
const bool | quadratic_probing_ |
Quadratic probing enabled (as opposed to linear). More... | |
std::vector < BufferedBlockMgr::Block * > | data_pages_ |
Data pages for all nodes. These are always pinned. More... | |
int64_t | total_data_page_size_ |
Byte size of all buffers in data_pages_. More... | |
DuplicateNode * | next_node_ |
Next duplicate node to insert. Vaild when node_remaining_current_page_ > 0. More... | |
int | node_remaining_current_page_ |
Number of nodes left in the current page. More... | |
int64_t | num_duplicate_nodes_ |
Number of duplicate nodes. More... | |
const int64_t | max_num_buckets_ |
Bucket * | buckets_ |
int64_t | num_buckets_ |
Total number of buckets (filled and empty). More... | |
int64_t | num_filled_buckets_ |
Number of non-empty buckets. Used to determine when to resize. More... | |
int64_t | num_buckets_with_duplicates_ |
int | num_build_tuples_ |
bool | has_matches_ |
int64_t | num_probes_ |
int64_t | num_failed_probes_ |
Number of probes that failed and had to fall back to linear probing without cap. More... | |
int64_t | travel_length_ |
int64_t | num_hash_collisions_ |
int64_t | num_resizes_ |
How many times this table has resized so far. More... | |
Static Private Attributes | |
static const double | MAX_FILL_FACTOR = 0.75f |
Friends | |
class | Iterator |
class | HashTableTest |
The hash table consists of a contiguous array of buckets that contain a pointer to the data, the hash value and three flags: whether this bucket is filled, whether this entry has been matched (used in right and full joins) and whether this entry has duplicates. If there are duplicates, then the data is pointing to the head of a linked list of duplicate nodes that point to the actual data. Note that the duplicate nodes do not contain the hash value, because all the linked nodes have the same hash value, the one in the bucket. The data is either a tuple stream index or a Tuple*. This array of buckets is sparse, we are shooting for up to 3/4 fill factor (75%). The data allocated by the hash table comes from the BufferedBlockMgr.
Definition at line 281 of file hash-table.h.
HashTable::HashTable | ( | RuntimeState * | state, |
BufferedBlockMgr::Client * | client, | ||
int | num_build_tuples, | ||
BufferedTupleStream * | tuple_stream, | ||
int64_t | max_num_buckets, | ||
int64_t | initial_num_buckets = 1024 |
||
) |
Create a hash table.
Definition at line 192 of file hash-table.cc.
References stores_tuples_.
Ctor used only for testing. Memory is allocated from the 'pool' instead of the block mgr.
Definition at line 219 of file hash-table.cc.
References Init().
|
inlineprivate |
Appends the DuplicateNode pointed by next_node_ to 'bucket' and moves the next_node_ pointer to the next DuplicateNode in the page, updating the remaining node counter.
Definition at line 175 of file hash-table.inline.h.
References impala::HashTable::Bucket::bucketData, impala::HashTable::Bucket::duplicates, next_node_, node_remaining_current_page_, and num_duplicate_nodes_.
Referenced by InsertDuplicateNode().
|
inline |
Returns an iterator at the beginning of the hash table. Advancing this iterator will traverse all elements.
Definition at line 128 of file hash-table.inline.h.
References impala::HashTable::Iterator::BUCKET_NOT_FOUND, Iterator, NextFilledBucket(), and impala::HashTableCtx::row().
Referenced by impala::HashTableTest::FullScan(), and ResizeBuckets().
|
inline |
Returns the number of bytes allocated to the hash table.
Definition at line 419 of file hash-table.h.
References total_data_page_size_.
bool HashTable::CheckAndResize | ( | uint64_t | buckets_to_fill, |
HashTableCtx * | ht_ctx | ||
) |
Calculates the fill factor if 'buckets_to_fill' additional buckets were to be filled and resizes the hash table so that the projected fill factor is below the max fill factor. If it returns true, then it is guaranteed at least 'rows_to_add' rows can be inserted without need to resize.
Definition at line 282 of file hash-table.cc.
References MAX_FILL_FACTOR, num_buckets_, num_filled_buckets_, and ResizeBuckets().
Referenced by impala::HashTableTest::BasicTest(), impala::HashTableTest::GrowTableTest(), and impala::HashTableTest::ScanTest().
void HashTable::Close | ( | ) |
Call to cleanup any resources. Must be called once.
Definition at line 257 of file hash-table.cc.
References impala::RuntimeState::block_mgr(), block_mgr_client_, buckets_, data_pages_, impala::ImpaladMetrics::HASH_TABLE_TOTAL_BYTES, num_buckets_, num_probes_, PrintStats(), impala::BufferedBlockMgr::ReleaseMemory(), state_, and total_data_page_size_.
Referenced by impala::HashTableTest::BasicTest(), impala::HashTableTest::GrowTableTest(), impala::HashTableTest::InsertFullTest(), impala::HashTableTest::ScanTest(), and impala::HashTableTest::SetupTest().
int64_t HashTable::CurrentMemSize | ( | ) | const |
Returns the memory occupied by the hash table, takes into account the number of duplicates.
Definition at line 278 of file hash-table.cc.
References num_buckets_, and num_duplicate_nodes_.
string HashTable::DebugString | ( | bool | skip_empty, |
bool | show_match, | ||
const RowDescriptor * | build_desc | ||
) |
Dump out the entire hash table to string. If 'skip_empty', empty buckets are skipped. If 'show_match', it also prints the matched flag of each node. If 'build_desc' is non-null, the build rows will be printed. Otherwise, only the the addresses of the build rows will be printed.
Definition at line 387 of file hash-table.cc.
References impala::HashTable::Bucket::bucketData, buckets_, DebugStringTuple(), impala::HashTable::Bucket::duplicates, impala::HashTable::DuplicateNode::htdata, impala::HashTable::Bucket::htdata, impala::HashTable::DuplicateNode::next, and num_buckets_.
void HashTable::DebugStringTuple | ( | std::stringstream & | ss, |
HtData & | htdata, | ||
const RowDescriptor * | desc | ||
) |
Print the content of a bucket or node.
Definition at line 373 of file hash-table.cc.
References impala::BufferedTupleStream::RowIdx::block(), GetRow(), impala::BufferedTupleStream::RowIdx::idx(), impala::HashTable::HtData::idx, num_build_tuples_, impala::BufferedTupleStream::RowIdx::offset(), impala::PrintRow(), stores_tuples_, and impala::HashTable::HtData::tuple.
Referenced by DebugString().
|
inline |
Returns the number of empty buckets.
Definition at line 385 of file hash-table.h.
References num_buckets_, and num_filled_buckets_.
Referenced by impala::HashTableTest::InsertFullTest().
|
inline |
Return end marker.
Definition at line 434 of file hash-table.h.
References Iterator.
Referenced by Find().
|
inlinestatic |
Returns an estimate of the number of bytes needed to build the hash table structure for 'num_rows'. To do that, it estimates the number of buckets, rounded up to a power of two, and also assumes that there are no duplicates.
Assume max 66% fill factor and no duplicates.
Definition at line 398 of file hash-table.h.
References impala::BitUtil::NextPowerOfTwo().
Referenced by impala::PartitionedHashJoinNode::Partition::BuildHashTableInternal(), and EstimateSize().
|
inlinestatic |
Definition at line 402 of file hash-table.h.
References EstimateNumBuckets(), and num_buckets().
Referenced by impala::PartitionedHashJoinNode::Partition::EstimatedInMemSize().
|
inline |
Returns an iterator to the bucket matching the last row evaluated in 'ht_ctx'. Returns HashTable::End() if no match is found. The iterator can be iterated until HashTable::End() to find all the matching rows. Advancing the returned iterator will go to the next matching row. The matching rows do not need to be evaluated since all the nodes of a bucket are duplicates. One scan can be in progress for each 'ht_ctx'. Used during the probe phase of hash joins.
Definition at line 117 of file hash-table.inline.h.
References impala::HashTable::Bucket::bucketData, buckets_, impala::HashTable::Bucket::duplicates, End(), impala::hash, Iterator, num_buckets_, num_probes_, Probe(), and impala::HashTableCtx::row().
Referenced by impala::HashTableTest::GrowTableTest(), impala::HashTableTest::InsertFullTest(), impala::HashTableTest::ProbeTest(), and impala::PartitionedHashJoinNode::ProcessProbeBatch().
|
inline |
Return an iterator pointing to the first element (Bucket or DuplicateNode, if the bucket has duplicates) in the hash table that does not have its matched flag set. Used in right joins and full-outer joins.
Definition at line 135 of file hash-table.inline.h.
References impala::HashTable::Iterator::BUCKET_NOT_FOUND, buckets_, NextFilledBucket(), and impala::HashTableCtx::row().
Return the TupleRow pointed by 'htdata'.
Definition at line 210 of file hash-table.inline.h.
References impala::BufferedTupleStream::GetTupleRow(), impala::HashTable::HtData::idx, stores_tuples_, impala::HashTable::HtData::tuple, and tuple_stream_.
Referenced by DebugStringTuple(), impala::HashTable::Iterator::GetRow(), GetRow(), and Probe().
Returns the TupleRow of the pointed 'bucket'. In case of duplicates, it returns the content of the first chained duplicate node of the bucket.
Definition at line 219 of file hash-table.inline.h.
References impala::HashTable::Bucket::bucketData, impala::HashTable::Bucket::duplicates, GetRow(), impala::HashTable::Bucket::hasDuplicates, impala::HashTable::DuplicateNode::htdata, impala::HashTable::Bucket::htdata, and UNLIKELY.
|
private |
Grow the node array. Returns false on OOM.
Definition at line 345 of file hash-table.cc.
References impala::MemPool::Allocate(), impala::BufferedBlockMgr::Block::Allocate(), impala::RuntimeState::block_mgr(), block_mgr_client_, data_page_pool_, data_pages_, impala::BufferedBlockMgr::GetNewBlock(), impala::ImpaladMetrics::HASH_TABLE_TOTAL_BYTES, INITIAL_DATA_PAGE_SIZES, impala::MemTracker::LimitExceeded(), impala::BufferedBlockMgr::max_block_size(), impala::MemPool::mem_tracker(), next_node_, node_remaining_current_page_, NUM_SMALL_DATA_PAGES, impala::Status::ok(), state_, TEST_PAGE_SIZE, and total_data_page_size_.
Referenced by Init(), and InsertDuplicateNode().
|
inline |
Return true if there was a least one match.
Definition at line 431 of file hash-table.h.
References has_matches_.
bool HashTable::Init | ( | ) |
Allocates the initial bucket structure. Returns false if OOM.
Definition at line 245 of file hash-table.cc.
References impala::RuntimeState::block_mgr(), block_mgr_client_, buckets_, impala::BufferedBlockMgr::ConsumeMemory(), GrowNodeArray(), num_buckets_, and state_.
Referenced by HashTable().
|
inline |
Inserts the row to the hash table. Returns true if the insertion was successful. 'idx' is the index into tuple_stream_ for this row. If the row contains more than one tuple, the 'idx' is stored instead of the 'row'. The 'row' is not copied by the hash table and the caller must guarantee it stays in memory. This will not grow the hash table. In the case that there is a need to insert a duplicate node, instead of filling a new bucket, and there is not enough memory to insert a duplicate node, the insert fails and this function returns false. Used during the build phase of hash joins.
Definition at line 94 of file hash-table.inline.h.
References impala::TupleRow::GetTuple(), impala::hash, gen_ir_descriptions::idx, InsertInternal(), LIKELY, and stores_tuples_.
Referenced by impala::HashTableTest::BasicTest(), impala::HashTableTest::GrowTableTest(), impala::HashTableTest::InsertFullTest(), and impala::HashTableTest::ScanTest().
|
inline |
Same as Insert() but for inserting a single Tuple. The 'tuple' is not copied by the hash table and the caller must guarantee it stays in memory.
Definition at line 106 of file hash-table.inline.h.
References InsertInternal(), LIKELY, stores_tuples_, and impala::HashTable::HtData::tuple.
|
inlineprivate |
Creates a new DuplicateNode for a entry and chains it to the bucket with index 'bucket_idx'. The duplicate nodes of a bucket are chained as a linked list. This places the new duplicate node at the beginning of the list. If this is the first duplicate entry inserted in this bucket, then the entry already contained by the bucket is converted to a DuplicateNode. That is, the contents of 'data' of the bucket are copied to a DuplicateNode and 'data' is updated to pointing to a DuplicateNode. Returns NULL if the node array could not grow, i.e. there was not enough memory to allocate a new DuplicateNode.
Definition at line 183 of file hash-table.inline.h.
References AppendNextNode(), impala::HashTable::Bucket::bucketData, buckets_, impala::HashTable::Bucket::duplicates, impala::HashTable::Bucket::filled, GrowNodeArray(), impala::HashTable::Bucket::hasDuplicates, impala::HashTable::DuplicateNode::htdata, impala::HashTable::Bucket::htdata, impala::HashTable::HtData::idx, impala::HashTable::DuplicateNode::matched, impala::HashTable::Bucket::matched, impala::HashTable::DuplicateNode::next, next_node_, num_buckets_, num_buckets_with_duplicates_, and UNLIKELY.
Referenced by InsertInternal().
|
inlineprivate |
Performs the insert logic. Returns the HtData* of the bucket or duplicate node where the data should be inserted. Returns NULL if the insert was not successful.
Definition at line 77 of file hash-table.inline.h.
References impala::HashTable::Iterator::BUCKET_NOT_FOUND, impala::HashTable::Bucket::bucketData, buckets_, impala::HashTable::DuplicateNode::htdata, impala::HashTable::Bucket::htdata, InsertDuplicateNode(), num_buckets_, num_probes_, PrepareBucketForInsert(), Probe(), and UNLIKELY.
Referenced by Insert().
|
inline |
Returns the load factor (the number of non-empty buckets)
Definition at line 391 of file hash-table.h.
References num_buckets_, and num_filled_buckets_.
|
inlineprivate |
Updates 'bucket_idx' to the index of the next non-empty bucket. If the bucket has duplicates, 'node' will be pointing to the head of the linked list of duplicates. Otherwise, 'node' should not be used. If there are no more buckets, sets 'bucket_idx' to BUCKET_NOT_FOUND.
Definition at line 150 of file hash-table.inline.h.
References impala::HashTable::Iterator::BUCKET_NOT_FOUND, impala::HashTable::Bucket::bucketData, buckets_, impala::HashTable::Bucket::duplicates, and num_buckets_.
Referenced by Begin(), FirstUnmatched(), and ResizeBuckets().
|
inline |
Returns the number of buckets.
Definition at line 388 of file hash-table.h.
References num_buckets_.
Referenced by impala::HashTableTest::BasicTest(), EstimateSize(), ResizeBuckets(), and impala::HashTableTest::ScanTest().
|
inlineprivate |
Resets the contents of the bucket with index 'bucket_idx', in preparation for an insert. Sets all the fields of the bucket other than 'data'.
Definition at line 163 of file hash-table.inline.h.
References buckets_, impala::HashTable::Bucket::filled, impala::HashTable::Bucket::hasDuplicates, impala::hash, impala::HashTable::Bucket::hash, impala::HashTable::Bucket::matched, num_buckets_, and num_filled_buckets_.
Referenced by InsertInternal().
string HashTable::PrintStats | ( | ) | const |
Update and print some statistics that can be used for performance debugging.
Definition at line 424 of file hash-table.cc.
References num_buckets_, num_buckets_with_duplicates_, num_duplicate_nodes_, num_failed_probes_, num_filled_buckets_, num_hash_collisions_, num_probes_, num_resizes_, and travel_length_.
Referenced by Close().
|
inlineprivate |
There are wrappers of this function that perform the Find and Insert logic.
Performs the probing operation according to the probing algorithm (linear or quadratic. Returns one of the following: (a) the index of the bucket that contains the entry that matches with the last row evaluated in 'ht_ctx'. If 'ht_ctx' is NULL then it does not check for row equality and returns the index of the first empty bucket. (b) the index of the first empty bucket according to the probing algorithm (linear or quadratic), if the entry is not in the hash table or 'ht_ctx' is NULL. (c) Iterator::BUCKET_NOT_FOUND if the probe was not successful, i.e. the maximum distance was traveled without finding either an empty or a matching bucket. Using the returned index value, the caller can create an iterator that can be iterated until End() to find all the matching rows. EvalAndHashBuild() or EvalAndHashProb(e) must have been called before calling this. 'hash' must be the hash returned by these functions. 'found' indicates that a bucket that contains an equal row is found.
Definition at line 37 of file hash-table.inline.h.
References impala::HashTable::Iterator::BUCKET_NOT_FOUND, impala::HashTableCtx::Equals(), impala::HashTable::Bucket::filled, GetRow(), impala::hash, impala::HashTable::Bucket::hash, LIKELY, num_filled_buckets_, num_hash_collisions_, quadratic_probing_, impala::HashTableCtx::row_, and travel_length_.
Referenced by Find(), InsertInternal(), and ResizeBuckets().
|
private |
Resize the hash table to 'num_buckets'. Returns false on OOM.
Definition at line 293 of file hash-table.cc.
References impala::HashTable::Iterator::AtEnd(), Begin(), impala::RuntimeState::block_mgr(), block_mgr_client_, impala::HashTable::Iterator::BUCKET_NOT_FOUND, buckets_, impala::BufferedBlockMgr::ConsumeMemory(), impala::HashTable::Bucket::hash, max_num_buckets_, NextFilledBucket(), num_buckets(), num_buckets_, num_filled_buckets_, num_resizes_, Probe(), impala::BufferedBlockMgr::ReleaseMemory(), and state_.
Referenced by CheckAndResize(), and impala::HashTableTest::ResizeTable().
|
inline |
Returns number of elements inserted in the hash table.
Definition at line 380 of file hash-table.h.
References num_buckets_with_duplicates_, num_duplicate_nodes_, and num_filled_buckets_.
Referenced by impala::HashTableTest::BasicTest(), impala::HashTableTest::GrowTableTest(), and impala::PartitionedHashJoinNode::NodeDebugString().
|
friend |
Definition at line 518 of file hash-table.h.
|
friend |
Definition at line 517 of file hash-table.h.
|
private |
Client to allocate data pages with.
Definition at line 588 of file hash-table.h.
Referenced by Close(), GrowNodeArray(), Init(), and ResizeBuckets().
|
private |
Array of all buckets. Owned by this node. Using c-style array to control control memory footprint.
Definition at line 626 of file hash-table.h.
Referenced by Close(), DebugString(), Find(), FirstUnmatched(), impala::HashTable::Iterator::GetRow(), Init(), InsertDuplicateNode(), InsertInternal(), NextFilledBucket(), PrepareBucketForInsert(), and ResizeBuckets().
|
private |
Only used for tests to allocate data pages instead of the block mgr.
Definition at line 596 of file hash-table.h.
Referenced by GrowNodeArray().
|
private |
Data pages for all nodes. These are always pinned.
Definition at line 608 of file hash-table.h.
Referenced by Close(), and GrowNodeArray().
|
private |
Flag used to disable spilling hash tables that already had matches in case of right joins (IMPALA-1488). TODO: Not fail when spilling hash tables with matches in right joins
Definition at line 645 of file hash-table.h.
Referenced by HasMatches().
|
staticprivate |
Load factor that will trigger growing the hash table on insert. This is defined as the number of non-empty buckets / total_buckets
Definition at line 583 of file hash-table.h.
Referenced by CheckAndResize().
|
private |
Definition at line 622 of file hash-table.h.
Referenced by ResizeBuckets().
|
private |
Next duplicate node to insert. Vaild when node_remaining_current_page_ > 0.
Definition at line 614 of file hash-table.h.
Referenced by AppendNextNode(), GrowNodeArray(), and InsertDuplicateNode().
|
private |
Number of nodes left in the current page.
Definition at line 617 of file hash-table.h.
Referenced by AppendNextNode(), and GrowNodeArray().
|
private |
Total number of buckets (filled and empty).
Definition at line 629 of file hash-table.h.
Referenced by CheckAndResize(), Close(), CurrentMemSize(), DebugString(), EmptyBuckets(), Find(), Init(), InsertDuplicateNode(), InsertInternal(), load_factor(), NextFilledBucket(), num_buckets(), PrepareBucketForInsert(), PrintStats(), and ResizeBuckets().
|
private |
Number of (non-empty) buckets with duplicates. These buckets do not point to slots in the tuple stream, rather than to a linked list of Nodes.
Definition at line 636 of file hash-table.h.
Referenced by InsertDuplicateNode(), PrintStats(), and size().
|
private |
Number of build tuples, used for constructing temp row* for probes. TODO: We should remove it.
Definition at line 640 of file hash-table.h.
Referenced by DebugStringTuple().
|
private |
Number of duplicate nodes.
Definition at line 620 of file hash-table.h.
Referenced by AppendNextNode(), CurrentMemSize(), PrintStats(), and size().
|
private |
Number of probes that failed and had to fall back to linear probing without cap.
Definition at line 653 of file hash-table.h.
Referenced by PrintStats().
|
private |
Number of non-empty buckets. Used to determine when to resize.
Definition at line 632 of file hash-table.h.
Referenced by CheckAndResize(), EmptyBuckets(), load_factor(), PrepareBucketForInsert(), PrintStats(), Probe(), ResizeBuckets(), and size().
|
private |
The number of cases where we had to compare buckets with the same hash value, but the row equality failed.
Definition at line 661 of file hash-table.h.
Referenced by PrintStats(), and Probe().
|
private |
The stats below can be used for debugging perf. TODO: Should we make these statistics atomic? Number of calls to either Find, Insert or FindOrInsert an entry.
Definition at line 650 of file hash-table.h.
Referenced by Close(), Find(), InsertInternal(), and PrintStats().
|
private |
How many times this table has resized so far.
Definition at line 664 of file hash-table.h.
Referenced by PrintStats(), and ResizeBuckets().
|
private |
Quadratic probing enabled (as opposed to linear).
Definition at line 605 of file hash-table.h.
Referenced by Probe().
|
private |
Definition at line 585 of file hash-table.h.
Referenced by Close(), GrowNodeArray(), Init(), and ResizeBuckets().
|
private |
Constants on how the hash table should behave. Joins and aggs have slightly different behavior. TODO: these constants are an ideal candidate to be removed with codegen. TODO: ..or with template-ization
Definition at line 602 of file hash-table.h.
Referenced by DebugStringTuple(), GetRow(), HashTable(), and Insert().
|
private |
Byte size of all buffers in data_pages_.
Definition at line 611 of file hash-table.h.
Referenced by byte_size(), Close(), and GrowNodeArray().
|
private |
Total distance traveled for each probe. That is the sum of the diff between the end position of a probe (find/insert) and its start position (hash & (num_buckets_ - 1)).
Definition at line 657 of file hash-table.h.
Referenced by PrintStats(), and Probe().
|
private |
Stream contains the rows referenced by the hash table. Can be NULL if the row only contains a single tuple, in which case the TupleRow indirection is removed by the hash table.
Definition at line 593 of file hash-table.h.
Referenced by GetRow().