Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <scanner-context.h>
Classes | |
class | Stream |
Public Member Functions | |
ScannerContext (RuntimeState *, HdfsScanNode *, HdfsPartitionDescriptor *, DiskIoMgr::ScanRange *scan_range) | |
Stream * | GetStream (int idx=0) |
void | ReleaseCompletedResources (RowBatch *batch, bool done) |
Stream * | AddStream (DiskIoMgr::ScanRange *range) |
bool | cancelled () const |
If true, the ScanNode has been cancelled and the scanner thread should finish up. More... | |
int | num_completed_io_buffers () const |
HdfsPartitionDescriptor * | partition_descriptor () |
Private Attributes | |
RuntimeState * | state_ |
HdfsScanNode * | scan_node_ |
HdfsPartitionDescriptor * | partition_desc_ |
std::vector< Stream * > | streams_ |
Vector of streams. Non-columnar formats will always have one stream per context. More... | |
int | num_completed_io_buffers_ |
Always equal to the sum of completed_io_buffers_.size() across all streams. More... | |
Friends | |
class | Stream |
This class abstracts over getting buffers from the IoMgr. Each ScannerContext is 1:1 a HdfsScanner. ScannerContexts contain Streams, which are 1:1 with a ScanRange. Columnar formats have multiple streams per context object. This class handles stitching data split across IO buffers and providing some basic parsing utilities. This class it not thread safe. It is designed to have a single scanner thread reading from it. Each scanner context maps to a single hdfs split. There are three threads that are interacting with the context.
Definition at line 55 of file scanner-context.h.
ScannerContext::ScannerContext | ( | RuntimeState * | state, |
HdfsScanNode * | scan_node, | ||
HdfsPartitionDescriptor * | partition_desc, | ||
DiskIoMgr::ScanRange * | scan_range | ||
) |
Create a scanner context with the parent scan_node (where materialized row batches get pushed to) and the scan range to process. This context starts with 1 stream.
Definition at line 36 of file scanner-context.cc.
References AddStream().
ScannerContext::Stream * ScannerContext::AddStream | ( | DiskIoMgr::ScanRange * | range | ) |
Add a stream to this ScannerContext for 'range'. Returns the added stream. The stream is created in the runtime state's object pool
Definition at line 58 of file scanner-context.cc.
References impala::ObjectPool::Add(), impala::ScannerContext::Stream::boundary_buffer_bytes_left_, impala::ScannerContext::Stream::contains_tuple_data_, impala::ScannerContext::Stream::file_desc_, impala::ScannerContext::Stream::file_len_, impala::HdfsFileDesc::file_length, impala::ScannerContext::Stream::filename(), impala::HdfsScanNode::GetFileDesc(), impala::ScannerContext::Stream::io_buffer_, impala::ScannerContext::Stream::io_buffer_bytes_left_, impala::ScannerContext::Stream::io_buffer_pos_, impala::RuntimeState::obj_pool(), impala::ScannerContext::Stream::output_buffer_bytes_left_, OUTPUT_BUFFER_BYTES_LEFT_INIT, impala::ScannerContext::Stream::output_buffer_pos_, scan_node_, impala::ScannerContext::Stream::scan_range_, state_, Stream, streams_, impala::TupleDescriptor::string_slots(), impala::ScannerContext::Stream::total_bytes_returned_, and impala::HdfsScanNode::tuple_desc().
Referenced by impala::HdfsParquetScanner::InitColumns(), and ScannerContext().
bool ScannerContext::cancelled | ( | ) | const |
If true, the ScanNode has been cancelled and the scanner thread should finish up.
Definition at line 282 of file scanner-context.cc.
References impala::HdfsScanNode::done_, and scan_node_.
Referenced by impala::HdfsParquetScanner::AssembleRows(), and impala::HdfsScanner::CommitRows().
|
inline |
Definition at line 246 of file scanner-context.h.
References gen_ir_descriptions::idx, and streams_.
Referenced by impala::HdfsScanNode::CreateAndPrepareScanner(), impala::HdfsScanner::Prepare(), and impala::HdfsScanNode::ScannerThread().
|
inline |
Definition at line 277 of file scanner-context.h.
References num_completed_io_buffers_.
Referenced by impala::HdfsScanner::CommitRows().
|
inline |
Definition at line 278 of file scanner-context.h.
References partition_desc_.
Referenced by impala::HdfsTextScanner::InitNewRange(), impala::HdfsSequenceScanner::InitNewRange(), impala::HdfsScanner::Prepare(), and impala::HdfsTextScanner::ResetScanner().
If a non-NULL 'batch' is passed, attaches completed io buffers and boundary mem pools from all streams to 'batch'. Attaching only completed resources ensures that buffers (and their cleanup) trail the rows that reference them (row batches are consumed and cleaned up in order by the rest of the query). If a NULL 'batch' is passed, then it tries to release whatever resource can be released, ie. completed io buffers if 'done' is not set, and the mem pool if 'done' is set. In that case, contains_tuple_data_ should be false. If 'done' is true, this is the final call for the current streams and any pending resources in each stream are also passed to the row batch, and the streams are cleared from this context. This must be called with 'done' set when the scanner is complete and no longer needs any resources (e.g. tuple memory, io buffers) returned from the current streams. After calling with 'done' set, this should be called again if new streams are created via AddStream().
Definition at line 45 of file scanner-context.cc.
References streams_.
Referenced by impala::HdfsScanner::AddFinalRowBatch(), impala::HdfsScanner::CommitRows(), impala::HdfsTextScanner::FillByteBufferCompressedFile(), impala::HdfsTextScanner::FillByteBufferGzip(), and impala::HdfsParquetScanner::ProcessSplit().
|
friend |
Definition at line 281 of file scanner-context.h.
Referenced by AddStream().
|
private |
Always equal to the sum of completed_io_buffers_.size() across all streams.
Definition at line 292 of file scanner-context.h.
Referenced by num_completed_io_buffers().
|
private |
Definition at line 286 of file scanner-context.h.
Referenced by partition_descriptor().
|
private |
Definition at line 284 of file scanner-context.h.
Referenced by AddStream(), and cancelled().
|
private |
Definition at line 283 of file scanner-context.h.
Referenced by AddStream().
|
private |
Vector of streams. Non-columnar formats will always have one stream per context.
Definition at line 289 of file scanner-context.h.
Referenced by AddStream(), GetStream(), and ReleaseCompletedResources().