Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <sorted-run-merger.h>
Classes | |
class | BatchedRowSupplier |
Public Types | |
typedef boost::function < Status(RowBatch **)> | RunBatchSupplier |
Public Member Functions | |
SortedRunMerger (const TupleRowComparator &compare_less_than, RowDescriptor *row_desc, RuntimeProfile *profile, bool deep_copy_input) | |
Status | Prepare (const std::vector< RunBatchSupplier > &input_runs) |
Status | GetNext (RowBatch *output_batch, bool *eos) |
Return the next batch of sorted rows from this merger. More... | |
void | TransferAllResources (RowBatch *transfer_resource_batch) |
Private Member Functions | |
void | Heapify (int parent_index) |
Private Attributes | |
std::vector< BatchedRowSupplier * > | min_heap_ |
TupleRowComparator | compare_less_than_ |
Row comparator. Returns true if lhs < rhs. More... | |
RowDescriptor * | input_row_desc_ |
bool | deep_copy_input_ |
True if rows must be deep copied into the output batch. More... | |
ObjectPool | pool_ |
Pool of BatchedRowSupplier instances. More... | |
RuntimeProfile::Counter * | get_next_timer_ |
Times calls to GetNext(). More... | |
RuntimeProfile::Counter * | get_next_batch_timer_ |
Times calls to get the next batch of rows from the input run. More... | |
SortedRunMerger is used to merge multiple sorted runs of tuples. A run is a sorted sequence of row batches, which are fetched from a RunBatchSupplier function object. Merging is implemented using a binary min-heap that maintains the run with the next tuple in sorted order at the top of the heap. Merged batches of rows are retrieved from SortedRunMerger via calls to GetNext(). The merger is constructed with a boolean flag deep_copy_input. If true, sorted output rows are deep copied into the data pool of the output batch. If false, GetNext() only copies tuple pointers (TupleRows) into the output batch, and transfers resource ownership from the input batches to the output batch when an input batch is processed.
Definition at line 42 of file sorted-run-merger.h.
typedef boost::function<Status (RowBatch**)> impala::SortedRunMerger::RunBatchSupplier |
Function that returns the next batch of rows from an input sorted run. The batch is owned by the supplier (i.e. not SortedRunMerger). eos is indicated by an NULL batch being returned.
Definition at line 47 of file sorted-run-merger.h.
impala::SortedRunMerger::SortedRunMerger | ( | const TupleRowComparator & | compare_less_than, |
RowDescriptor * | row_desc, | ||
RuntimeProfile * | profile, | ||
bool | deep_copy_input | ||
) |
Definition at line 118 of file sorted-run-merger.cc.
References ADD_TIMER, get_next_batch_timer_, and get_next_timer_.
Return the next batch of sorted rows from this merger.
Definition at line 146 of file sorted-run-merger.cc.
References impala::RowBatch::AddRow(), impala::RowBatch::AtCapacity(), impala::RowBatch::CommitLastRow(), impala::SortedRunMerger::BatchedRowSupplier::current_row(), deep_copy_input_, impala::TupleRow::DeepCopy(), get_next_timer_, impala::RowBatch::GetRow(), Heapify(), input_row_desc_, min_heap_, impala::SortedRunMerger::BatchedRowSupplier::Next(), impala::Status::OK, RETURN_IF_ERROR, impala::RowBatch::tuple_data_pool(), and impala::RowDescriptor::tuple_descriptors().
|
private |
Assuming the element at parent_index is the only out of place element in the heap, restore the heap property (i.e. swap elements so parent <= children).
Definition at line 95 of file sorted-run-merger.cc.
References compare_less_than_, and min_heap_.
Status impala::SortedRunMerger::Prepare | ( | const std::vector< RunBatchSupplier > & | input_runs | ) |
Prepare this merger to merge and return rows from the sorted runs in 'input_runs' Retrieves the first batch from each run and sets up the binary heap implementing the priority queue.
Definition at line 127 of file sorted-run-merger.cc.
References impala::ObjectPool::Add(), Heapify(), impala::SortedRunMerger::BatchedRowSupplier::Init(), min_heap_, impala::Status::OK, pool_, and RETURN_IF_ERROR.
void impala::SortedRunMerger::TransferAllResources | ( | RowBatch * | transfer_resource_batch | ) |
Called to finalize a merge when deep_copy is false. Transfers resources from all input batches to the specified output batch.
|
private |
Row comparator. Returns true if lhs < rhs.
Definition at line 81 of file sorted-run-merger.h.
Referenced by Heapify().
|
private |
True if rows must be deep copied into the output batch.
Definition at line 88 of file sorted-run-merger.h.
Referenced by GetNext().
|
private |
Times calls to get the next batch of rows from the input run.
Definition at line 97 of file sorted-run-merger.h.
Referenced by impala::SortedRunMerger::BatchedRowSupplier::Next(), and SortedRunMerger().
|
private |
Times calls to GetNext().
Definition at line 94 of file sorted-run-merger.h.
Referenced by GetNext(), and SortedRunMerger().
|
private |
Descriptor for the rows provided by the input runs. Owned by the exec-node through which this merger was created.
Definition at line 85 of file sorted-run-merger.h.
Referenced by GetNext().
|
private |
The binary min-heap used to merge rows from the sorted input runs. Since the heap is stored in a 0-indexed array, the 0-th element is the minimum element in the heap, and the children of the element at index i are 2*i+1 and 2*i+2. The heap property is that row of the parent element is <= the rows of the child elements according to the comparator compare_less_than_. The BatchedRowSupplier objects used in the min_heap_ are owned by this SortedRunMerger instance.
Definition at line 78 of file sorted-run-merger.h.
|
private |
Pool of BatchedRowSupplier instances.
Definition at line 91 of file sorted-run-merger.h.
Referenced by Prepare().