Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::SortedRunMerger Class Reference

#include <sorted-run-merger.h>

Collaboration diagram for impala::SortedRunMerger:

Classes

class  BatchedRowSupplier
 

Public Types

typedef boost::function
< Status(RowBatch **)> 
RunBatchSupplier
 

Public Member Functions

 SortedRunMerger (const TupleRowComparator &compare_less_than, RowDescriptor *row_desc, RuntimeProfile *profile, bool deep_copy_input)
 
Status Prepare (const std::vector< RunBatchSupplier > &input_runs)
 
Status GetNext (RowBatch *output_batch, bool *eos)
 Return the next batch of sorted rows from this merger. More...
 
void TransferAllResources (RowBatch *transfer_resource_batch)
 

Private Member Functions

void Heapify (int parent_index)
 

Private Attributes

std::vector< BatchedRowSupplier * > min_heap_
 
TupleRowComparator compare_less_than_
 Row comparator. Returns true if lhs < rhs. More...
 
RowDescriptorinput_row_desc_
 
bool deep_copy_input_
 True if rows must be deep copied into the output batch. More...
 
ObjectPool pool_
 Pool of BatchedRowSupplier instances. More...
 
RuntimeProfile::Counterget_next_timer_
 Times calls to GetNext(). More...
 
RuntimeProfile::Counterget_next_batch_timer_
 Times calls to get the next batch of rows from the input run. More...
 

Detailed Description

SortedRunMerger is used to merge multiple sorted runs of tuples. A run is a sorted sequence of row batches, which are fetched from a RunBatchSupplier function object. Merging is implemented using a binary min-heap that maintains the run with the next tuple in sorted order at the top of the heap. Merged batches of rows are retrieved from SortedRunMerger via calls to GetNext(). The merger is constructed with a boolean flag deep_copy_input. If true, sorted output rows are deep copied into the data pool of the output batch. If false, GetNext() only copies tuple pointers (TupleRows) into the output batch, and transfers resource ownership from the input batches to the output batch when an input batch is processed.

Definition at line 42 of file sorted-run-merger.h.

Member Typedef Documentation

Function that returns the next batch of rows from an input sorted run. The batch is owned by the supplier (i.e. not SortedRunMerger). eos is indicated by an NULL batch being returned.

Definition at line 47 of file sorted-run-merger.h.

Constructor & Destructor Documentation

impala::SortedRunMerger::SortedRunMerger ( const TupleRowComparator compare_less_than,
RowDescriptor row_desc,
RuntimeProfile profile,
bool  deep_copy_input 
)

Definition at line 118 of file sorted-run-merger.cc.

References ADD_TIMER, get_next_batch_timer_, and get_next_timer_.

Member Function Documentation

void impala::SortedRunMerger::Heapify ( int  parent_index)
private

Assuming the element at parent_index is the only out of place element in the heap, restore the heap property (i.e. swap elements so parent <= children).

Definition at line 95 of file sorted-run-merger.cc.

References compare_less_than_, and min_heap_.

Referenced by GetNext(), and Prepare().

Status impala::SortedRunMerger::Prepare ( const std::vector< RunBatchSupplier > &  input_runs)

Prepare this merger to merge and return rows from the sorted runs in 'input_runs' Retrieves the first batch from each run and sets up the binary heap implementing the priority queue.

Definition at line 127 of file sorted-run-merger.cc.

References impala::ObjectPool::Add(), Heapify(), impala::SortedRunMerger::BatchedRowSupplier::Init(), min_heap_, impala::Status::OK, pool_, and RETURN_IF_ERROR.

void impala::SortedRunMerger::TransferAllResources ( RowBatch transfer_resource_batch)

Called to finalize a merge when deep_copy is false. Transfers resources from all input batches to the specified output batch.

Member Data Documentation

TupleRowComparator impala::SortedRunMerger::compare_less_than_
private

Row comparator. Returns true if lhs < rhs.

Definition at line 81 of file sorted-run-merger.h.

Referenced by Heapify().

bool impala::SortedRunMerger::deep_copy_input_
private

True if rows must be deep copied into the output batch.

Definition at line 88 of file sorted-run-merger.h.

Referenced by GetNext().

RuntimeProfile::Counter* impala::SortedRunMerger::get_next_batch_timer_
private

Times calls to get the next batch of rows from the input run.

Definition at line 97 of file sorted-run-merger.h.

Referenced by impala::SortedRunMerger::BatchedRowSupplier::Next(), and SortedRunMerger().

RuntimeProfile::Counter* impala::SortedRunMerger::get_next_timer_
private

Times calls to GetNext().

Definition at line 94 of file sorted-run-merger.h.

Referenced by GetNext(), and SortedRunMerger().

RowDescriptor* impala::SortedRunMerger::input_row_desc_
private

Descriptor for the rows provided by the input runs. Owned by the exec-node through which this merger was created.

Definition at line 85 of file sorted-run-merger.h.

Referenced by GetNext().

std::vector<BatchedRowSupplier*> impala::SortedRunMerger::min_heap_
private

The binary min-heap used to merge rows from the sorted input runs. Since the heap is stored in a 0-indexed array, the 0-th element is the minimum element in the heap, and the children of the element at index i are 2*i+1 and 2*i+2. The heap property is that row of the parent element is <= the rows of the child elements according to the comparator compare_less_than_. The BatchedRowSupplier objects used in the min_heap_ are owned by this SortedRunMerger instance.

Definition at line 78 of file sorted-run-merger.h.

Referenced by GetNext(), Heapify(), and Prepare().

ObjectPool impala::SortedRunMerger::pool_
private

Pool of BatchedRowSupplier instances.

Definition at line 91 of file sorted-run-merger.h.

Referenced by Prepare().


The documentation for this class was generated from the following files: