Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::ScannerContext Class Reference

#include <scanner-context.h>

Collaboration diagram for impala::ScannerContext:

Classes

class  Stream
 

Public Member Functions

 ScannerContext (RuntimeState *, HdfsScanNode *, HdfsPartitionDescriptor *, DiskIoMgr::ScanRange *scan_range)
 
StreamGetStream (int idx=0)
 
void ReleaseCompletedResources (RowBatch *batch, bool done)
 
StreamAddStream (DiskIoMgr::ScanRange *range)
 
bool cancelled () const
 If true, the ScanNode has been cancelled and the scanner thread should finish up. More...
 
int num_completed_io_buffers () const
 
HdfsPartitionDescriptorpartition_descriptor ()
 

Private Attributes

RuntimeStatestate_
 
HdfsScanNodescan_node_
 
HdfsPartitionDescriptorpartition_desc_
 
std::vector< Stream * > streams_
 Vector of streams. Non-columnar formats will always have one stream per context. More...
 
int num_completed_io_buffers_
 Always equal to the sum of completed_io_buffers_.size() across all streams. More...
 

Friends

class Stream
 

Detailed Description

This class abstracts over getting buffers from the IoMgr. Each ScannerContext is 1:1 a HdfsScanner. ScannerContexts contain Streams, which are 1:1 with a ScanRange. Columnar formats have multiple streams per context object. This class handles stitching data split across IO buffers and providing some basic parsing utilities. This class it not thread safe. It is designed to have a single scanner thread reading from it. Each scanner context maps to a single hdfs split. There are three threads that are interacting with the context.

  1. IoMgr threads that read io buffers from the disk and enqueue them to the stream's underlying ScanRange object. This is the producer.
  2. Scanner thread that calls GetBytes() (which can block), materializing tuples from processing the bytes. This is the consumer.
  3. The scan node/main thread which calls into the context to trigger cancellation or other end of stream conditions.

Definition at line 55 of file scanner-context.h.

Constructor & Destructor Documentation

ScannerContext::ScannerContext ( RuntimeState state,
HdfsScanNode scan_node,
HdfsPartitionDescriptor partition_desc,
DiskIoMgr::ScanRange scan_range 
)

Create a scanner context with the parent scan_node (where materialized row batches get pushed to) and the scan range to process. This context starts with 1 stream.

Definition at line 36 of file scanner-context.cc.

References AddStream().

Member Function Documentation

bool ScannerContext::cancelled ( ) const

If true, the ScanNode has been cancelled and the scanner thread should finish up.

Definition at line 282 of file scanner-context.cc.

References impala::HdfsScanNode::done_, and scan_node_.

Referenced by impala::HdfsParquetScanner::AssembleRows(), and impala::HdfsScanner::CommitRows().

Stream* impala::ScannerContext::GetStream ( int  idx = 0)
inline
int impala::ScannerContext::num_completed_io_buffers ( ) const
inline

Definition at line 277 of file scanner-context.h.

References num_completed_io_buffers_.

Referenced by impala::HdfsScanner::CommitRows().

void ScannerContext::ReleaseCompletedResources ( RowBatch batch,
bool  done 
)

If a non-NULL 'batch' is passed, attaches completed io buffers and boundary mem pools from all streams to 'batch'. Attaching only completed resources ensures that buffers (and their cleanup) trail the rows that reference them (row batches are consumed and cleaned up in order by the rest of the query). If a NULL 'batch' is passed, then it tries to release whatever resource can be released, ie. completed io buffers if 'done' is not set, and the mem pool if 'done' is set. In that case, contains_tuple_data_ should be false. If 'done' is true, this is the final call for the current streams and any pending resources in each stream are also passed to the row batch, and the streams are cleared from this context. This must be called with 'done' set when the scanner is complete and no longer needs any resources (e.g. tuple memory, io buffers) returned from the current streams. After calling with 'done' set, this should be called again if new streams are created via AddStream().

Definition at line 45 of file scanner-context.cc.

References streams_.

Referenced by impala::HdfsScanner::AddFinalRowBatch(), impala::HdfsScanner::CommitRows(), impala::HdfsTextScanner::FillByteBufferCompressedFile(), impala::HdfsTextScanner::FillByteBufferGzip(), and impala::HdfsParquetScanner::ProcessSplit().

Friends And Related Function Documentation

friend class Stream
friend

Definition at line 281 of file scanner-context.h.

Referenced by AddStream().

Member Data Documentation

int impala::ScannerContext::num_completed_io_buffers_
private

Always equal to the sum of completed_io_buffers_.size() across all streams.

Definition at line 292 of file scanner-context.h.

Referenced by num_completed_io_buffers().

HdfsPartitionDescriptor* impala::ScannerContext::partition_desc_
private

Definition at line 286 of file scanner-context.h.

Referenced by partition_descriptor().

HdfsScanNode* impala::ScannerContext::scan_node_
private

Definition at line 284 of file scanner-context.h.

Referenced by AddStream(), and cancelled().

RuntimeState* impala::ScannerContext::state_
private

Definition at line 283 of file scanner-context.h.

Referenced by AddStream().

std::vector<Stream*> impala::ScannerContext::streams_
private

Vector of streams. Non-columnar formats will always have one stream per context.

Definition at line 289 of file scanner-context.h.

Referenced by AddStream(), GetStream(), and ReleaseCompletedResources().


The documentation for this class was generated from the following files: