Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <hdfs-text-scanner.h>
Public Member Functions | |
HdfsTextScanner (HdfsScanNode *scan_node, RuntimeState *state) | |
virtual | ~HdfsTextScanner () |
virtual Status | Prepare (ScannerContext *context) |
Implementation of HdfsScanner interface. More... | |
virtual Status | ProcessSplit () |
virtual void | Close () |
Static Public Member Functions | |
static Status | IssueInitialRanges (HdfsScanNode *scan_node, const std::vector< HdfsFileDesc * > &files) |
Issue io manager byte ranges for 'files'. More... | |
static llvm::Function * | Codegen (HdfsScanNode *, const std::vector< ExprContext * > &conjunct_ctxs) |
Codegen writing tuples and evaluating predicates. More... | |
Static Public Attributes | |
static const std::string | LZO_INDEX_SUFFIX = ".index" |
Suffix for lzo index files. More... | |
static const char * | LLVM_CLASS_NAME = "class.impala::HdfsTextScanner" |
static const int | FILE_BLOCK_SIZE = 4096 |
Protected Types | |
typedef int(* | WriteTuplesFn )(HdfsScanner *, MemPool *, TupleRow *, int, FieldLocation *, int, int, int, int) |
Protected Member Functions | |
Status | ResetScanner () |
Status | InitializeWriteTuplesFn (HdfsPartitionDescriptor *partition, THdfsFileFormat::type type, const std::string &scanner_name) |
void | StartNewRowBatch () |
Set batch_ to a new row batch and update tuple_mem_ accordingly. More... | |
int | GetMemory (MemPool **pool, Tuple **tuple_mem, TupleRow **tuple_row_mem) |
Status | CommitRows (int num_rows) |
void | AddFinalRowBatch () |
void | AttachPool (MemPool *pool, bool commit_batch) |
bool IR_ALWAYS_INLINE | EvalConjuncts (TupleRow *row) |
int | WriteEmptyTuples (RowBatch *row_batch, int num_tuples) |
int | WriteEmptyTuples (ScannerContext *context, TupleRow *tuple_row, int num_tuples) |
Write empty tuples and commit them to the context object. More... | |
int | WriteAlignedTuples (MemPool *pool, TupleRow *tuple_row_mem, int row_size, FieldLocation *fields, int num_tuples, int max_added_tuples, int slots_per_tuple, int row_start_indx) |
Status | UpdateDecompressor (const THdfsCompression::type &compression) |
Status | UpdateDecompressor (const std::string &codec) |
bool | ReportTupleParseError (FieldLocation *fields, uint8_t *errors, int row_idx) |
bool | WriteCompleteTuple (MemPool *pool, FieldLocation *fields, Tuple *tuple, TupleRow *tuple_row, Tuple *template_tuple, uint8_t *error_fields, uint8_t *error_in_row) |
void | ReportColumnParseError (const SlotDescriptor *desc, const char *data, int len) |
void | InitTuple (Tuple *template_tuple, Tuple *tuple) |
Tuple * | next_tuple (Tuple *t) const |
TupleRow * | next_row (TupleRow *r) const |
ExprContext * | GetConjunctCtx (int idx) const |
Static Protected Member Functions | |
static llvm::Function * | CodegenWriteCompleteTuple (HdfsScanNode *, LlvmCodeGen *, const std::vector< ExprContext * > &conjunct_ctxs) |
static llvm::Function * | CodegenWriteAlignedTuples (HdfsScanNode *, LlvmCodeGen *, llvm::Function *write_tuple_fn) |
Protected Attributes | |
char * | byte_buffer_ptr_ |
Current position in byte buffer. More... | |
char * | byte_buffer_end_ |
Ending position of HDFS buffer. More... | |
int64_t | byte_buffer_read_size_ |
Actual bytes received from last file read. More... | |
bool | only_parsing_header_ |
True if we are parsing the header for this scanner. More... | |
HdfsScanNode * | scan_node_ |
The scan node that started this scanner. More... | |
RuntimeState * | state_ |
RuntimeState for error reporting. More... | |
ScannerContext * | context_ |
Context for this scanner. More... | |
ScannerContext::Stream * | stream_ |
The first stream for context_. More... | |
std::vector< ExprContext * > | conjunct_ctxs_ |
Tuple * | template_tuple_ |
int | tuple_byte_size_ |
Fixed size of each tuple, in bytes. More... | |
Tuple * | tuple_ |
Current tuple pointer into tuple_mem_. More... | |
RowBatch * | batch_ |
uint8_t * | tuple_mem_ |
The tuple memory of batch_. More... | |
int | num_errors_in_file_ |
number of errors in current file More... | |
boost::scoped_ptr< TextConverter > | text_converter_ |
Helper class for converting text to other types;. More... | |
int32_t | num_null_bytes_ |
Number of null bytes in the tuple. More... | |
Status | parse_status_ |
boost::scoped_ptr< Codec > | decompressor_ |
Decompressor class to use, if any. More... | |
THdfsCompression::type | decompression_type_ |
The most recently used decompression type. More... | |
boost::scoped_ptr< MemPool > | data_buffer_pool_ |
RuntimeProfile::Counter * | decompress_timer_ |
Time spent decompressing bytes. More... | |
WriteTuplesFn | write_tuples_fn_ |
Jitted write tuples function pointer. Null if codegen is disabled. More... | |
Private Member Functions | |
virtual Status | InitNewRange () |
Status | FindFirstTuple (bool *tuple_found) |
Status | ProcessRange (int *num_tuples, bool past_scan_range) |
Status | FinishScanRange () |
Reads past the end of the scan range for the next tuple end. More... | |
virtual Status | FillByteBuffer (bool *eosr, int num_bytes=0) |
Status | FillByteBufferCompressedFile (bool *eosr) |
Status | FillByteBufferGzip (bool *eosr) |
void | CopyBoundaryField (FieldLocation *data, MemPool *pool) |
int | WriteFields (MemPool *, TupleRow *tuple_row_mem, int num_fields, int num_tuples) |
int | WritePartialTuple (FieldLocation *, int num_fields, bool copy_strings) |
virtual void | LogRowParseError (int row_idx, std::stringstream *) |
Private Attributes | |
boost::scoped_ptr< MemPool > | boundary_pool_ |
Mem pool for boundary_row_ and boundary_column_. More... | |
StringBuffer | boundary_row_ |
StringBuffer | boundary_column_ |
Helper string for dealing with columns that span file blocks. More... | |
int | slot_idx_ |
Index into materialized_slots_ for the next slot to output for the current tuple. More... | |
boost::scoped_ptr < DelimitedTextParser > | delimited_text_parser_ |
Helper class for picking fields and rows from delimited text. More... | |
std::vector< FieldLocation > | field_locations_ |
Return field locations from the Delimited Text Parser. More... | |
std::vector< char * > | row_end_locations_ |
char * | batch_start_ptr_ |
bool | error_in_row_ |
Tuple * | partial_tuple_ |
bool | partial_tuple_empty_ |
RuntimeProfile::Counter * | parse_delimiter_timer_ |
Time parsing text files. More... | |
Static Private Attributes | |
static const int | NEXT_BLOCK_READ_SIZE = 1024 |
HdfsScanner implementation that understands text-formatted records. Uses SSE instructions, if available, for performance.
Definition at line 30 of file hdfs-text-scanner.h.
|
protectedinherited |
Matching typedef for WriteAlignedTuples for codegen. Refer to comments for that function.
Definition at line 212 of file hdfs-scanner.h.
HdfsTextScanner::HdfsTextScanner | ( | HdfsScanNode * | scan_node, |
RuntimeState * | state | ||
) |
Definition at line 51 of file hdfs-text-scanner.cc.
|
virtual |
Definition at line 64 of file hdfs-text-scanner.cc.
|
protectedinherited |
Attach all remaining resources from context_ to batch_ and send batch_ to the scan node. This must be called after all rows have been committed and no further resources are needed from context_ (in practice this will happen in each scanner subclass's Close() implementation).
Definition at line 145 of file hdfs-scanner.cc.
References impala::HdfsScanNode::AddMaterializedRowBatch(), impala::HdfsScanner::batch_, impala::HdfsScanner::context_, impala::ScannerContext::ReleaseCompletedResources(), and impala::HdfsScanner::scan_node_.
Referenced by Close(), impala::BaseSequenceScanner::Close(), and impala::HdfsParquetScanner::Close().
Release all memory in 'pool' to batch_. If commit_batch is true, the row batch will be committed. commit_batch should be true if the attached pool is expected to be non-trivial (i.e. a decompression buffer) to minimize scanner mem usage.
Definition at line 256 of file hdfs-scanner.h.
References impala::MemPool::AcquireData(), impala::HdfsScanner::batch_, impala::HdfsScanner::CommitRows(), and impala::RowBatch::tuple_data_pool().
Referenced by Close(), impala::BaseSequenceScanner::Close(), impala::HdfsParquetScanner::Close(), FillByteBufferGzip(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ReadCompressedBlock(), impala::HdfsParquetScanner::BaseColumnReader::ReadDataPage(), and impala::HdfsRCFileScanner::ResetRowGroup().
|
virtual |
Release all resources the scanner has allocated. This is the last chance for the scanner to attach any resources to the ScannerContext object.
Reimplemented from impala::HdfsScanner.
Definition at line 185 of file hdfs-text-scanner.cc.
References impala::HdfsScanner::AddFinalRowBatch(), impala::HdfsScanner::AttachPool(), boundary_pool_, impala::HdfsScanner::Close(), impala::HdfsScanner::data_buffer_pool_, impala::HdfsScanner::decompressor_, impala::HdfsFileDesc::file_compression, impala::ScannerContext::Stream::file_desc(), only_parsing_header_, impala::HdfsScanNode::RangeComplete(), impala::HdfsScanner::scan_node_, and impala::HdfsScanner::stream_.
|
static |
Codegen writing tuples and evaluating predicates.
Definition at line 609 of file hdfs-text-scanner.cc.
References impala::RuntimeState::codegen_enabled(), impala::HdfsScanner::CodegenWriteAlignedTuples(), impala::HdfsScanner::CodegenWriteCompleteTuple(), impala::RuntimeState::GetCodegen(), impala::Status::ok(), and impala::HdfsScanNode::runtime_state().
Referenced by impala::HdfsScanNode::Prepare().
|
staticprotectedinherited |
Codegen function to replace WriteAlignedTuples. WriteAlignedTuples is cross compiled to IR. This function loads the precompiled IR function, modifies it and returns the resulting function.
Definition at line 495 of file hdfs-scanner.cc.
References impala::LlvmCodeGen::codegen_timer(), impala::LlvmCodeGen::FinalizeFunction(), impala::LlvmCodeGen::GetFunction(), impala::LlvmCodeGen::ReplaceCallSites(), and SCOPED_TIMER.
Referenced by Codegen(), and impala::HdfsSequenceScanner::Codegen().
|
staticprotectedinherited |
Codegen function to replace WriteCompleteTuple. Should behave identically to WriteCompleteTuple.
Definition at line 296 of file hdfs-scanner.cc.
References impala::LlvmCodeGen::FnPrototype::AddArgument(), impala::TupleDescriptor::byte_size(), impala::LlvmCodeGen::codegen_timer(), impala::LlvmCodeGen::CodegenMemcpy(), impala::TextConverter::CodegenWriteSlot(), impala::HdfsScanNode::ComputeSlotMaterializationOrder(), impala::LlvmCodeGen::context(), impala::CodegenAnyVal::CreateCallWrapped(), impala::LlvmCodeGen::false_value(), impala::LlvmCodeGen::FinalizeFunction(), impala::TupleDescriptor::GenerateLlvmStruct(), impala::Status::GetDetail(), impala::LlvmCodeGen::GetFunction(), impala::LlvmCodeGen::GetIntConstant(), impala::LlvmCodeGen::GetType(), impala::CodegenAnyVal::GetVal(), impala::HdfsScanNode::hdfs_table(), impala::FieldLocation::LLVM_CLASS_NAME, impala::TupleRow::LLVM_CLASS_NAME, impala::Tuple::LLVM_CLASS_NAME, impala::HdfsScanner::LLVM_CLASS_NAME, impala::MemPool::LLVM_CLASS_NAME, impala::HdfsScanNode::materialized_slots(), impala::HdfsTableDescriptor::null_column_value(), impala::HdfsScanNode::num_materialized_partition_keys(), impala::TupleDescriptor::num_null_bytes(), impala::Status::ok(), impala::LlvmCodeGen::OptimizeFunctionWithExprs(), impala::HdfsScanNode::runtime_state(), SCOPED_TIMER, impala::LlvmCodeGen::true_value(), impala::HdfsScanNode::tuple_desc(), impala::HdfsScanNode::tuple_idx(), impala::ColumnType::type, impala::SlotDescriptor::type(), impala::TYPE_BOOLEAN, impala::TYPE_DECIMAL, impala::TYPE_INT, impala::TYPE_TIMESTAMP, and impala::TYPE_TINYINT.
Referenced by Codegen(), and impala::HdfsSequenceScanner::Codegen().
|
protectedinherited |
Commit num_rows to the current row batch. If this completes, the row batch is enqueued with the scan node and StartNewRowBatch() is called. Returns Status::OK if the query is not cancelled and hasn't exceeded any mem limits. Scanner can call this with 0 rows to flush any pending resources (attached pools and io buffers) to minimize memory consumption.
Definition at line 124 of file hdfs-scanner.cc.
References impala::HdfsScanNode::AddMaterializedRowBatch(), impala::RowBatch::AtCapacity(), impala::HdfsScanner::batch_, impala::TupleDescriptor::byte_size(), impala::Status::CANCELLED, impala::ScannerContext::cancelled(), impala::RowBatch::capacity(), impala::RuntimeState::CheckQueryState(), impala::RowBatch::CommitRows(), impala::HdfsScanner::conjunct_ctxs_, impala::HdfsScanner::context_, impala::ExprContext::FreeLocalAllocations(), impala::ScannerContext::num_completed_io_buffers(), impala::RowBatch::num_rows(), impala::Status::OK, impala::ScannerContext::ReleaseCompletedResources(), RETURN_IF_ERROR, impala::HdfsScanner::scan_node_, impala::HdfsScanner::StartNewRowBatch(), impala::HdfsScanner::state_, impala::HdfsScanNode::tuple_desc(), and impala::HdfsScanner::tuple_mem_.
Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsScanner::AttachPool(), FinishScanRange(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessFooter(), ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), impala::HdfsRCFileScanner::ProcessRange(), and impala::HdfsParquetScanner::ProcessSplit().
|
private |
Prepends field data that was from the previous file buffer (This field straddled two file buffers). 'data' already contains the pointer/len from the current file buffer, boundary_column_ contains the beginning of the data from the previous file buffer. This function will allocate a new string from the tuple pool, concatenate the two pieces and update 'data' to contain the new pointer/len.
Definition at line 757 of file hdfs-text-scanner.cc.
References impala::MemPool::Allocate(), boundary_column_, impala::FieldLocation::len, impala::StringValue::ptr, impala::StringBuffer::Size(), impala::FieldLocation::start, and impala::StringBuffer::str().
Referenced by ProcessRange().
|
inlineprotectedinherited |
Convenience function for evaluating conjuncts using this scanner's ExprContexts. This must always be inlined so we can correctly replace the call to ExecNode::EvalConjuncts() during codegen.
Definition at line 266 of file hdfs-scanner.h.
References impala::HdfsScanner::conjunct_ctxs_, and impala::ExecNode::EvalConjuncts().
Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsAvroScanner::DecodeAvroData(), impala::HdfsRCFileScanner::ProcessRange(), impala::HdfsScanner::WriteCompleteTuple(), impala::HdfsScanner::WriteEmptyTuples(), and WriteFields().
Fills the next byte buffer from the context. This will block if there are no bytes ready. Updates byte_buffer_ptr_, byte_buffer_end_ and byte_buffer_read_size_. If num_bytes is 0, the scanner will read whatever is the io mgr buffer size, otherwise it will just read num_bytes.
Definition at line 410 of file hdfs-text-scanner.cc.
References byte_buffer_end_, byte_buffer_ptr_, byte_buffer_read_size_, impala::HdfsScanner::decompression_type_, impala::HdfsScanner::decompressor_, impala::ScannerContext::Stream::eosr(), FillByteBufferCompressedFile(), FillByteBufferGzip(), impala::ScannerContext::Stream::GetBuffer(), impala::ScannerContext::Stream::GetBytes(), RETURN_IF_ERROR, and impala::HdfsScanner::stream_.
Referenced by FindFirstTuple(), FinishScanRange(), and ProcessRange().
Fills the next byte buffer from the compressed data in stream_ by reading the entire file, decompressing it, and setting the byte_buffer_ptr_ to the decompressed buffer.
Definition at line 530 of file hdfs-text-scanner.cc.
References byte_buffer_ptr_, byte_buffer_read_size_, impala::HdfsScanner::context_, impala::HdfsScanner::decompress_timer_, impala::HdfsScanner::decompression_type_, impala::HdfsScanner::decompressor_, impala::ScannerContext::Stream::eosr(), impala::HdfsFileDesc::file_length, impala::ScannerContext::Stream::filename(), impala::ScannerContext::Stream::GetBytes(), impala::HdfsScanNode::GetFileDesc(), impala::Status::OK, impala::ScannerContext::ReleaseCompletedResources(), RETURN_IF_ERROR, impala::HdfsScanner::scan_node_, SCOPED_TIMER, impala::HdfsScanner::stream_, and VLOG_FILE.
Referenced by FillByteBuffer().
Fills the next byte buffer from the gzip compressed data in stream_. Unlike FillByteBufferCompressedFile(), the entire file does not need to be read at once. Buffers from stream_ are decompressed as they are read and byte_buffer_ptr_ is set to available decompressed data.
Definition at line 437 of file hdfs-text-scanner.cc.
References impala::RuntimeState::abort_on_error(), impala::HdfsScanner::AttachPool(), byte_buffer_ptr_, byte_buffer_read_size_, impala::HdfsScanner::context_, impala::HdfsScanner::data_buffer_pool_, impala::HdfsScanner::decompress_timer_, impala::HdfsScanner::decompressor_, impala::ScannerContext::Stream::eosr(), impala::ScannerContext::Stream::filename(), impala::ScannerContext::Stream::GetBuffer(), impala::ScannerContext::Stream::GetBytes(), GZIP_FIXED_READ_SIZE, impala::RuntimeState::LogError(), impala::RuntimeState::LogHasSpace(), impala::Status::OK, impala::HdfsScanner::parse_status_, impala::ScannerContext::ReleaseCompletedResources(), RETURN_IF_ERROR, SCOPED_TIMER, impala::ScannerContext::Stream::SkipBytes(), impala::HdfsScanner::state_, impala::HdfsScanner::stream_, and VLOG_FILE.
Referenced by FillByteBuffer().
Finds the start of the first tuple in this scan range and initializes byte_buffer_ptr to be the next character (the start of the first tuple). If there are no tuples starts in the entire range, *tuple_found is set to false and no more processing neesd to be done in this range (i.e. there are really large columns)
Definition at line 577 of file hdfs-text-scanner.cc.
References byte_buffer_ptr_, byte_buffer_read_size_, delimited_text_parser_, FillByteBuffer(), impala::DiskIoMgr::RequestRange::offset(), impala::Status::OK, parse_delimiter_timer_, RETURN_IF_ERROR, impala::ScannerContext::Stream::scan_range(), SCOPED_TIMER, and impala::HdfsScanner::stream_.
Referenced by ProcessSplit().
|
private |
Reads past the end of the scan range for the next tuple end.
Definition at line 253 of file hdfs-text-scanner.cc.
References impala::RuntimeState::abort_on_error(), batch_start_ptr_, boundary_column_, boundary_row_, byte_buffer_read_size_, impala::HdfsScanner::CommitRows(), COUNTER_ADD, impala::HdfsScanner::decompressor_, delimited_text_parser_, impala::StringBuffer::Empty(), impala::ScannerContext::Stream::eof(), field_locations_, impala::ScannerContext::Stream::file_offset(), impala::ScannerContext::Stream::filename(), FillByteBuffer(), impala::Status::GetDetail(), impala::HdfsScanner::GetMemory(), impala::Status::IsCancelled(), impala::StringValue::len, impala::RuntimeState::LogError(), impala::RuntimeState::LogHasSpace(), impala::HdfsScanNode::materialized_slots(), NEXT_BLOCK_READ_SIZE, impala::HdfsScanNode::num_materialized_partition_keys(), impala::Status::OK, impala::Status::ok(), partial_tuple_empty_, pool, ProcessRange(), impala::StringValue::ptr, impala::ExecNode::ReachedLimit(), RETURN_IF_ERROR, row_end_locations_, impala::ScanNode::rows_read_counter(), impala::HdfsScanner::scan_node_, impala::StringBuffer::Size(), impala::HdfsScanner::state_, impala::StringBuffer::str(), impala::HdfsScanner::stream_, impala::HdfsScanner::tuple_, and WriteFields().
Referenced by ProcessSplit().
|
protectedinherited |
Simple wrapper around conjunct_ctxs_. Used in the codegen'd version of WriteCompleteTuple() because it's easier than writing IR to access conjunct_ctxs_.
Definition at line 79 of file hdfs-scanner-ir.cc.
References impala::HdfsScanner::conjunct_ctxs_, and gen_ir_descriptions::idx.
|
protectedinherited |
Gets memory for outputting tuples into batch_. *pool is the mem pool that should be used for memory allocated for those tuples. *tuple_mem should be the location to output tuples, and *tuple_row_mem for outputting tuple rows. Returns the maximum number of tuples/tuple rows that can be output (before the current row batch is complete and a new one is allocated). Memory returned from this call is invalidated after calling CommitRows. Callers must call GetMemory again after calling this function.
Definition at line 115 of file hdfs-scanner.cc.
References impala::RowBatch::AddRow(), impala::HdfsScanner::batch_, impala::RowBatch::capacity(), impala::RowBatch::GetRow(), impala::RowBatch::num_rows(), impala::RowBatch::tuple_data_pool(), and impala::HdfsScanner::tuple_mem_.
Referenced by impala::HdfsParquetScanner::AssembleRows(), FinishScanRange(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessFooter(), ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), and impala::HdfsRCFileScanner::ProcessRange().
|
protectedinherited |
Initializes write_tuples_fn_ to the jitted function if codegen is possible.
Definition at line 87 of file hdfs-scanner.cc.
References impala::HdfsPartitionDescriptor::escape_char(), impala::HdfsScanNode::GetCodegenFn(), impala::ExecNode::id(), impala::HdfsScanNode::IncNumScannersCodegenDisabled(), impala::HdfsScanNode::IncNumScannersCodegenEnabled(), impala::Status::OK, impala::HdfsScanner::scan_node_, impala::TupleDescriptor::string_slots(), impala::HdfsScanNode::tuple_desc(), and impala::HdfsScanner::write_tuples_fn_.
Referenced by impala::HdfsSequenceScanner::InitNewRange(), and ResetScanner().
|
privatevirtual |
Initializes this scanner for this context. The context maps to a single scan range.
Implements impala::HdfsScanner.
Definition at line 202 of file hdfs-text-scanner.cc.
References impala::HdfsPartitionDescriptor::collection_delim(), impala::HdfsScanner::context_, delimited_text_parser_, impala::HdfsPartitionDescriptor::escape_char(), impala::HdfsPartitionDescriptor::field_delim(), impala::HdfsFileDesc::file_compression, impala::ScannerContext::Stream::file_desc(), impala::HdfsScanNode::hdfs_table(), impala::HdfsScanNode::is_materialized_col(), impala::HdfsPartitionDescriptor::line_delim(), impala::HdfsScanNode::materialized_slots(), impala::HdfsTableDescriptor::null_column_value(), impala::TableDescriptor::num_cols(), impala::HdfsScanNode::num_partition_keys(), impala::Status::OK, impala::ScannerContext::partition_descriptor(), ResetScanner(), RETURN_IF_ERROR, impala::HdfsScanner::scan_node_, impala::ScannerContext::Stream::set_contains_tuple_data(), impala::HdfsScanner::stream_, and impala::HdfsScanner::text_converter_.
Referenced by ProcessSplit().
|
inlineprotectedinherited |
Initialize a tuple. TODO: only copy over non-null slots. TODO: InitTuple is called frequently, avoid the if, perhaps via templatization.
Definition at line 355 of file hdfs-scanner.h.
References impala::HdfsScanner::num_null_bytes_, and impala::HdfsScanner::tuple_byte_size_.
Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsAvroScanner::DecodeAvroData(), impala::HdfsRCFileScanner::ProcessRange(), impala::HdfsScanner::WriteCompleteTuple(), and WriteFields().
|
static |
Issue io manager byte ranges for 'files'.
Definition at line 67 of file hdfs-text-scanner.cc.
References impala::HdfsScanNode::AddDiskIoRanges(), impala::HdfsScanNode::AllocateScanRange(), impala::DiskIoMgr::RequestRange::disk_id(), impala::DiskIoMgr::ScanRange::expected_local(), impala::HdfsLzoTextScanner::IssueInitialRanges(), impala::RuntimeState::LogError(), LZO_INDEX_SUFFIX, impala::HdfsScanNode::max_compressed_text_file_length(), impala::DiskIoMgr::ScanRange::meta_data(), impala::DiskIoMgr::RequestRange::offset(), impala::Status::OK, impala::ScanRangeMetadata::partition_id, impala::HdfsScanNode::RangeComplete(), RETURN_IF_ERROR, impala::HdfsScanNode::runtime_state(), impala::RuntimeProfile::HighWaterMarkCounter::Set(), and impala::DiskIoMgr::ScanRange::try_cache().
Referenced by impala::HdfsScanNode::GetNext().
|
privatevirtual |
Appends the current file and line to the RuntimeState's error log. row_idx is 0-based (in current batch) where the parse error occured.
Reimplemented from impala::HdfsScanner.
Definition at line 635 of file hdfs-text-scanner.cc.
References batch_start_ptr_, boundary_row_, impala::StringBuffer::Empty(), row_end_locations_, and impala::StringBuffer::str().
Referenced by WriteFields().
Definition at line 368 of file hdfs-scanner.h.
References impala::HdfsScanner::batch_, and impala::RowBatch::row_byte_size().
Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsAvroScanner::DecodeAvroData(), impala::HdfsRCFileScanner::ProcessRange(), impala::HdfsScanner::WriteEmptyTuples(), and WriteFields().
Definition at line 363 of file hdfs-scanner.h.
References impala::HdfsScanner::tuple_byte_size_.
Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsAvroScanner::DecodeAvroData(), impala::HdfsRCFileScanner::ProcessRange(), and WriteFields().
|
virtual |
Implementation of HdfsScanner interface.
Reimplemented from impala::HdfsScanner.
Definition at line 620 of file hdfs-text-scanner.cc.
References ADD_CHILD_TIMER, impala::RuntimeState::batch_size(), field_locations_, impala::HdfsScanNode::materialized_slots(), impala::Status::OK, parse_delimiter_timer_, impala::HdfsScanner::Prepare(), RETURN_IF_ERROR, row_end_locations_, impala::ExecNode::runtime_profile(), impala::HdfsScanner::scan_node_, impala::ScanNode::SCANNER_THREAD_TOTAL_WALLCLOCK_TIME, and impala::HdfsScanner::state_.
Process the entire scan range, reading bytes from context and appending materialized row batches to the scan node. *num_tuples returns the number of tuples parsed. past_scan_range is true if this is processing beyond the end of the scan range and this function should stop after finding one tuple.
Definition at line 325 of file hdfs-text-scanner.cc.
References impala::StringBuffer::Append(), batch_start_ptr_, boundary_column_, boundary_row_, byte_buffer_end_, byte_buffer_ptr_, impala::StringBuffer::Clear(), impala::HdfsScanner::CommitRows(), impala::HdfsScanner::context_, CopyBoundaryField(), COUNTER_ADD, delimited_text_parser_, impala::StringBuffer::Empty(), impala::ScannerContext::Stream::eosr(), field_locations_, FillByteBuffer(), impala::HdfsScanner::GetMemory(), impala::ScanNode::materialize_tuple_timer(), impala::HdfsScanNode::materialized_slots(), impala::Status::OK, parse_delimiter_timer_, impala::HdfsScanner::parse_status_, pool, impala::ExecNode::ReachedLimit(), RETURN_IF_ERROR, row_end_locations_, impala::ScanNode::rows_read_counter(), impala::HdfsScanner::scan_node_, SCOPED_TIMER, impala::HdfsScanner::stream_, impala::HdfsScanner::tuple_, impala::HdfsScanner::WriteEmptyTuples(), and WriteFields().
Referenced by FinishScanRange(), and ProcessSplit().
|
virtual |
Process an entire split, reading bytes from the context's streams. Context is initialized with the split data (e.g. template tuple, partition descriptor, etc). This function should only return on error or end of scan range.
Implements impala::HdfsScanner.
Definition at line 156 of file hdfs-text-scanner.cc.
References impala::HdfsFileDesc::file_compression, impala::ScannerContext::Stream::file_desc(), FindFirstTuple(), FinishScanRange(), InitNewRange(), impala::Status::OK, ProcessRange(), RETURN_IF_ERROR, impala::HdfsScanner::stream_, and impala::HdfsScanner::UpdateDecompressor().
|
protectedinherited |
Report parse error for column @ desc. If abort_on_error is true, sets parse_status_ to the error message.
Definition at line 577 of file hdfs-scanner.cc.
References impala::RuntimeState::abort_on_error(), impala::SlotDescriptor::col_pos(), impala::RuntimeState::LogError(), impala::RuntimeState::LogHasSpace(), impala::HdfsScanNode::num_partition_keys(), impala::Status::ok(), impala::HdfsScanner::parse_status_, impala::HdfsScanner::scan_node_, impala::HdfsScanner::state_, and impala::SlotDescriptor::type().
Referenced by impala::HdfsRCFileScanner::ProcessRange(), impala::HdfsScanner::ReportTupleParseError(), and WritePartialTuple().
|
protectedinherited |
Utility function to report parse errors for each field. If errors[i] is nonzero, fields[i] had a parse error. row_idx is the idx of the row in the current batch that had the parse error Returns false if parsing should be aborted. In this case parse_status_ is set to the error. This is called from WriteAlignedTuples.
Definition at line 546 of file hdfs-scanner.cc.
References impala::RuntimeState::abort_on_error(), impala::ScannerContext::Stream::filename(), impala::RuntimeState::LogError(), impala::RuntimeState::LogHasSpace(), impala::HdfsScanner::LogRowParseError(), impala::HdfsScanNode::materialized_slots(), impala::HdfsScanner::num_errors_in_file_, impala::Status::ok(), impala::HdfsScanner::parse_status_, impala::HdfsScanner::ReportColumnParseError(), impala::RuntimeState::ReportFileErrors(), impala::HdfsScanner::scan_node_, impala::HdfsScanner::state_, and impala::HdfsScanner::stream_.
Referenced by impala::HdfsSequenceScanner::ProcessRange(), and impala::HdfsScanner::WriteAlignedTuples().
|
protected |
Reset the scanner. This clears any partial state that needs to be cleared when starting or when restarting after an error.
Definition at line 228 of file hdfs-text-scanner.cc.
References boundary_column_, boundary_pool_, boundary_row_, byte_buffer_end_, byte_buffer_ptr_, impala::TupleDescriptor::byte_size(), impala::StringBuffer::Clear(), impala::HdfsScanner::context_, impala::Tuple::Create(), delimited_text_parser_, error_in_row_, impala::HdfsScanner::InitializeWriteTuplesFn(), impala::Status::OK, partial_tuple_, partial_tuple_empty_, impala::ScannerContext::partition_descriptor(), RETURN_IF_ERROR, impala::HdfsScanner::scan_node_, slot_idx_, and impala::HdfsScanNode::tuple_desc().
Referenced by InitNewRange().
|
protectedinherited |
Set batch_ to a new row batch and update tuple_mem_ accordingly.
Definition at line 108 of file hdfs-scanner.cc.
References impala::MemPool::Allocate(), impala::HdfsScanner::batch_, impala::RuntimeState::batch_size(), impala::ExecNode::mem_tracker(), impala::ExecNode::row_desc(), impala::HdfsScanner::scan_node_, impala::HdfsScanner::state_, impala::HdfsScanner::tuple_byte_size_, impala::RowBatch::tuple_data_pool(), and impala::HdfsScanner::tuple_mem_.
Referenced by impala::HdfsScanner::CommitRows(), and impala::HdfsScanner::Prepare().
|
protectedinherited |
Update the decompressor_ object given a compression type or codec name. Depending on the old compression type and the new one, it may close the old decompressor and/or create a new one of different type.
Definition at line 513 of file hdfs-scanner.cc.
References impala::Codec::CreateDecompressor(), impala::HdfsScanner::data_buffer_pool_, impala::HdfsScanner::decompression_type_, impala::HdfsScanner::decompressor_, impala::Status::OK, RETURN_IF_ERROR, impala::HdfsScanner::scan_node_, impala::TupleDescriptor::string_slots(), and impala::HdfsScanNode::tuple_desc().
Referenced by impala::HdfsAvroScanner::InitNewRange(), impala::HdfsSequenceScanner::InitNewRange(), and ProcessSplit().
|
protectedinherited |
|
protectedinherited |
Processes batches of fields and writes them out to tuple_row_mem.
Definition at line 33 of file hdfs-scanner-ir.cc.
References impala::HdfsScanner::ReportTupleParseError(), impala::HdfsScanner::template_tuple_, impala::HdfsScanner::tuple_, impala::HdfsScanner::tuple_byte_size_, UNLIKELY, and impala::HdfsScanner::WriteCompleteTuple().
Referenced by impala::HdfsSequenceScanner::ProcessDecompressedBlock(), and WriteFields().
|
protectedinherited |
Writes out all slots for 'tuple' from 'fields'. 'fields' must be aligned to the start of the tuple (e.g. fields[0] maps to slots[0]). After writing the tuple, it will be evaluated against the conjuncts.
Definition at line 217 of file hdfs-scanner.cc.
References impala::HdfsScanner::EvalConjuncts(), impala::HdfsScanner::InitTuple(), impala::FieldLocation::len, impala::HdfsScanNode::materialized_slots(), impala::HdfsScanner::scan_node_, impala::TupleRow::SetTuple(), impala::HdfsScanner::text_converter_, impala::HdfsScanNode::tuple_idx(), and UNLIKELY.
Referenced by impala::HdfsSequenceScanner::ProcessRange(), and impala::HdfsScanner::WriteAlignedTuples().
|
protectedinherited |
Utility method to write out tuples when there are no materialized fields (e.g. select count(*) or only partition keys). num_tuples - Total number of tuples to write out. Returns the number of tuples added to the row batch.
Definition at line 157 of file hdfs-scanner.cc.
References impala::RowBatch::AddRow(), impala::RowBatch::AddRows(), impala::RowBatch::AtCapacity(), impala::RowBatch::capacity(), impala::RowBatch::CommitLastRow(), impala::RowBatch::CommitRows(), impala::HdfsScanner::EvalConjuncts(), impala::RowBatch::GetRow(), impala::RowBatch::INVALID_ROW_INDEX, impala::RowBatch::num_rows(), impala::HdfsScanner::scan_node_, impala::TupleRow::SetTuple(), impala::HdfsScanner::template_tuple_, and impala::HdfsScanNode::tuple_idx().
Referenced by impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessFooter(), ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), and impala::HdfsRCFileScanner::ProcessRange().
|
protectedinherited |
Write empty tuples and commit them to the context object.
Definition at line 195 of file hdfs-scanner.cc.
References impala::HdfsScanner::EvalConjuncts(), impala::HdfsScanner::next_row(), impala::HdfsScanner::scan_node_, impala::TupleRow::SetTuple(), impala::HdfsScanner::template_tuple_, and impala::HdfsScanNode::tuple_idx().
|
private |
Writes the intermediate parsed data into slots, outputting tuples to row_batch as they complete. Input Parameters: mempool: MemPool to allocate from for field data num_fields: Total number of fields contained in parsed_data_ num_tuples: Number of tuples in parsed_data_. This includes the potential partial tuple at the beginning of 'field_locations_'. Returns the number of tuples added to the row batch.
Definition at line 663 of file hdfs-text-scanner.cc.
References impala::RuntimeState::abort_on_error(), impala::HdfsScanner::batch_, boundary_row_, impala::TupleDescriptor::byte_size(), impala::StringBuffer::Clear(), error_in_row_, impala::RuntimeState::ErrorLog(), impala::HdfsScanner::EvalConjuncts(), field_locations_, impala::ScannerContext::Stream::filename(), impala::HdfsScanner::InitTuple(), impala::HdfsScanNode::limit(), impala::RuntimeState::LogError(), impala::RuntimeState::LogHasSpace(), LogRowParseError(), impala::ScanNode::materialize_tuple_timer(), impala::HdfsScanNode::materialized_slots(), impala::HdfsScanner::next_row(), impala::HdfsScanner::next_tuple(), impala::Status::ok(), impala::HdfsScanner::parse_status_, partial_tuple_, partial_tuple_empty_, impala::RowBatch::row_byte_size(), impala::ExecNode::rows_returned(), impala::HdfsScanner::scan_node_, SCOPED_TIMER, impala::TupleRow::SetTuple(), slot_idx_, impala::HdfsScanner::state_, impala::HdfsScanner::stream_, impala::HdfsScanner::template_tuple_, impala::HdfsScanner::tuple_, impala::HdfsScanNode::tuple_desc(), impala::HdfsScanNode::tuple_idx(), UNLIKELY, impala::HdfsScanner::write_tuples_fn_, impala::HdfsScanner::WriteAlignedTuples(), and WritePartialTuple().
Referenced by FinishScanRange(), and ProcessRange().
|
private |
Utility function to write out 'num_fields' to 'tuple_'. This is used to parse partial tuples. Returns bytes processed. If copy_strings is true, strings from fields will be copied into the boundary pool.
Definition at line 768 of file hdfs-text-scanner.cc.
References impala::HdfsScanner::data_buffer_pool_, error_in_row_, impala::FieldLocation::len, impala::HdfsScanNode::materialized_slots(), partial_tuple_, impala::HdfsScanner::ReportColumnParseError(), impala::HdfsScanner::scan_node_, slot_idx_, impala::FieldLocation::start, and impala::HdfsScanner::text_converter_.
Referenced by WriteFields().
|
protectedinherited |
The current row batch being populated. Creating new row batches, attaching context resources, and handing off to the scan node is handled by this class in CommitRows(), but AttachPool() must be called by scanner subclasses to attach any memory allocated by that subclass. All row batches created by this class are transferred to the scan node (i.e., all batches are ultimately owned by the scan node).
Definition at line 177 of file hdfs-scanner.h.
Referenced by impala::HdfsScanner::AddFinalRowBatch(), impala::HdfsScanner::AttachPool(), impala::HdfsScanner::CommitRows(), impala::HdfsScanner::GetMemory(), impala::HdfsScanner::next_row(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessSplit(), impala::HdfsScanner::StartNewRowBatch(), WriteFields(), and impala::HdfsScanner::~HdfsScanner().
|
private |
Pointer into byte_buffer that is the start of the current batch being processed.
Definition at line 162 of file hdfs-text-scanner.h.
Referenced by FinishScanRange(), LogRowParseError(), and ProcessRange().
|
private |
Helper string for dealing with columns that span file blocks.
Definition at line 145 of file hdfs-text-scanner.h.
Referenced by CopyBoundaryField(), FinishScanRange(), ProcessRange(), and ResetScanner().
|
private |
Mem pool for boundary_row_ and boundary_column_.
Definition at line 137 of file hdfs-text-scanner.h.
Referenced by Close(), and ResetScanner().
|
private |
Helper string for dealing with input rows that span file blocks. We keep track of a whole line that spans file blocks to be able to report the line as erroneous in case of parsing errors.
Definition at line 142 of file hdfs-text-scanner.h.
Referenced by FinishScanRange(), LogRowParseError(), ProcessRange(), ResetScanner(), and WriteFields().
|
protected |
Ending position of HDFS buffer.
Definition at line 62 of file hdfs-text-scanner.h.
Referenced by FillByteBuffer(), ProcessRange(), and ResetScanner().
|
protected |
Current position in byte buffer.
Definition at line 59 of file hdfs-text-scanner.h.
Referenced by FillByteBuffer(), FillByteBufferCompressedFile(), FillByteBufferGzip(), FindFirstTuple(), ProcessRange(), and ResetScanner().
|
protected |
Actual bytes received from last file read.
Definition at line 65 of file hdfs-text-scanner.h.
Referenced by FillByteBuffer(), FillByteBufferCompressedFile(), FillByteBufferGzip(), FindFirstTuple(), and FinishScanRange().
|
protectedinherited |
ExprContext for each conjunct. Each scanner has its own ExprContexts so the conjuncts can be safely evaluated in parallel.
Definition at line 154 of file hdfs-scanner.h.
Referenced by impala::HdfsScanner::Close(), impala::HdfsScanner::CommitRows(), impala::HdfsScanner::EvalConjuncts(), impala::HdfsScanner::GetConjunctCtx(), and impala::HdfsScanner::Prepare().
|
protectedinherited |
Context for this scanner.
Definition at line 147 of file hdfs-scanner.h.
Referenced by impala::HdfsScanner::AddFinalRowBatch(), impala::HdfsParquetScanner::AssembleRows(), impala::HdfsScanner::CommitRows(), FillByteBufferCompressedFile(), FillByteBufferGzip(), impala::HdfsParquetScanner::InitColumns(), InitNewRange(), impala::HdfsSequenceScanner::InitNewRange(), impala::HdfsScanner::Prepare(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessFooter(), ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), impala::HdfsRCFileScanner::ProcessRange(), impala::HdfsParquetScanner::ProcessSplit(), and ResetScanner().
|
protectedinherited |
Pool to allocate per data block memory. This should be used with the decompressor and any other per data block allocations.
Definition at line 205 of file hdfs-scanner.h.
Referenced by Close(), impala::BaseSequenceScanner::Close(), FillByteBufferGzip(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ReadCompressedBlock(), impala::HdfsRCFileScanner::ReadRowGroup(), impala::HdfsRCFileScanner::ResetRowGroup(), impala::HdfsScanner::UpdateDecompressor(), and WritePartialTuple().
|
protectedinherited |
Time spent decompressing bytes.
Definition at line 208 of file hdfs-scanner.h.
Referenced by FillByteBufferCompressedFile(), FillByteBufferGzip(), impala::HdfsSequenceScanner::GetRecord(), impala::HdfsScanner::Prepare(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsRCFileScanner::ReadColumnBuffers(), impala::HdfsSequenceScanner::ReadCompressedBlock(), impala::HdfsParquetScanner::BaseColumnReader::ReadDataPage(), and impala::HdfsRCFileScanner::ReadKeyBuffers().
|
protectedinherited |
The most recently used decompression type.
Definition at line 201 of file hdfs-scanner.h.
Referenced by FillByteBuffer(), FillByteBufferCompressedFile(), and impala::HdfsScanner::UpdateDecompressor().
|
protectedinherited |
Decompressor class to use, if any.
Definition at line 198 of file hdfs-scanner.h.
Referenced by Close(), impala::BaseSequenceScanner::Close(), impala::HdfsScanner::Close(), FillByteBuffer(), FillByteBufferCompressedFile(), FillByteBufferGzip(), FinishScanRange(), impala::HdfsSequenceScanner::GetRecord(), impala::HdfsRCFileScanner::InitNewRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsRCFileScanner::ReadColumnBuffers(), impala::HdfsSequenceScanner::ReadCompressedBlock(), impala::HdfsRCFileScanner::ReadKeyBuffers(), and impala::HdfsScanner::UpdateDecompressor().
|
private |
Helper class for picking fields and rows from delimited text.
Definition at line 151 of file hdfs-text-scanner.h.
Referenced by FindFirstTuple(), FinishScanRange(), InitNewRange(), ProcessRange(), and ResetScanner().
|
private |
Whether or not there was a parse error in the current row. Used for counting the number of errors per file. Once the error log is full, error_in_row will still be set, in order to be able to record the errors per file, even if the details are not logged.
Definition at line 168 of file hdfs-text-scanner.h.
Referenced by ResetScanner(), WriteFields(), and WritePartialTuple().
|
private |
Return field locations from the Delimited Text Parser.
Definition at line 154 of file hdfs-text-scanner.h.
Referenced by FinishScanRange(), Prepare(), ProcessRange(), and WriteFields().
|
staticinherited |
Assumed size of an OS file block. Used mostly when reading file format headers, etc. This probably ought to be a derived number from the environment.
Definition at line 95 of file hdfs-scanner.h.
|
static |
Definition at line 51 of file hdfs-text-scanner.h.
|
static |
Suffix for lzo index files.
Definition at line 49 of file hdfs-text-scanner.h.
Referenced by IssueInitialRanges().
|
staticprivate |
Definition at line 71 of file hdfs-text-scanner.h.
Referenced by FinishScanRange().
|
protectedinherited |
number of errors in current file
Definition at line 183 of file hdfs-scanner.h.
Referenced by impala::HdfsScanner::ReportTupleParseError().
|
protectedinherited |
Number of null bytes in the tuple.
Definition at line 189 of file hdfs-scanner.h.
Referenced by impala::HdfsScanner::InitTuple().
|
protected |
True if we are parsing the header for this scanner.
Definition at line 68 of file hdfs-text-scanner.h.
Referenced by Close().
|
private |
Time parsing text files.
Definition at line 180 of file hdfs-text-scanner.h.
Referenced by FindFirstTuple(), Prepare(), and ProcessRange().
|
protectedinherited |
Contains current parse status to minimize the number of Status objects returned. This significantly minimizes the cross compile dependencies for llvm since status objects inline a bunch of string functions. Also, status objects aren't extremely cheap to create and destroy.
Definition at line 195 of file hdfs-scanner.h.
Referenced by FillByteBufferGzip(), impala::HdfsSequenceScanner::GetRecord(), impala::HdfsAvroScanner::ParseMetadata(), impala::HdfsSequenceScanner::ProcessBlockCompressedScanRange(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), impala::HdfsRCFileScanner::ProcessRange(), impala::BaseSequenceScanner::ProcessSplit(), impala::HdfsSequenceScanner::ReadBlockHeader(), impala::HdfsRCFileScanner::ReadColumnBuffers(), impala::HdfsSequenceScanner::ReadCompressedBlock(), impala::HdfsAvroScanner::ReadFileHeader(), impala::HdfsSequenceScanner::ReadFileHeader(), impala::HdfsRCFileScanner::ReadFileHeader(), impala::HdfsRCFileScanner::ReadKeyBuffers(), impala::HdfsRCFileScanner::ReadNumColumnsMetadata(), impala::HdfsRCFileScanner::ReadRowGroupHeader(), impala::BaseSequenceScanner::ReadSync(), impala::HdfsScanner::ReportColumnParseError(), impala::HdfsScanner::ReportTupleParseError(), impala::BaseSequenceScanner::SkipToSync(), and WriteFields().
|
private |
Memory to store partial tuples split across buffers. Memory comes from boundary_pool_. There is only one tuple allocated for this object and reused for boundary tuples.
Definition at line 173 of file hdfs-text-scanner.h.
Referenced by ResetScanner(), WriteFields(), and WritePartialTuple().
|
private |
If false, there is a tuple that is partially materialized (i.e. partial_tuple_ contains data)
Definition at line 177 of file hdfs-text-scanner.h.
Referenced by FinishScanRange(), ResetScanner(), and WriteFields().
|
private |
Pointers into 'byte_buffer_' for the end ptr locations for each row processed in the current batch. Used to report row errors.
Definition at line 158 of file hdfs-text-scanner.h.
Referenced by FinishScanRange(), LogRowParseError(), Prepare(), and ProcessRange().
|
protectedinherited |
The scan node that started this scanner.
Definition at line 141 of file hdfs-scanner.h.
Referenced by impala::HdfsScanner::AddFinalRowBatch(), impala::HdfsParquetScanner::AssembleRows(), impala::HdfsParquetScanner::BaseColumnReader::BaseColumnReader(), Close(), impala::BaseSequenceScanner::Close(), impala::HdfsParquetScanner::Close(), impala::BaseSequenceScanner::CloseFileRanges(), impala::HdfsScanner::CommitRows(), impala::HdfsParquetScanner::CreateColumnReaders(), impala::HdfsParquetScanner::CreateReader(), impala::HdfsRCFileScanner::DebugString(), impala::HdfsAvroScanner::DecodeAvroData(), FillByteBufferCompressedFile(), FinishScanRange(), impala::HdfsParquetScanner::InitColumns(), impala::HdfsScanner::InitializeWriteTuplesFn(), InitNewRange(), impala::HdfsAvroScanner::InitNewRange(), impala::HdfsSequenceScanner::InitNewRange(), impala::HdfsRCFileScanner::InitNewRange(), impala::HdfsAvroScanner::ParseMetadata(), Prepare(), impala::BaseSequenceScanner::Prepare(), impala::HdfsParquetScanner::Prepare(), impala::HdfsScanner::Prepare(), impala::HdfsSequenceScanner::Prepare(), impala::HdfsRCFileScanner::Prepare(), impala::HdfsSequenceScanner::ProcessBlockCompressedScanRange(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessFooter(), ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), impala::HdfsRCFileScanner::ProcessRange(), impala::BaseSequenceScanner::ProcessSplit(), impala::HdfsParquetScanner::BaseColumnReader::ReadDataPage(), impala::HdfsRCFileScanner::ReadRowGroup(), impala::HdfsScanner::ReportColumnParseError(), impala::HdfsScanner::ReportTupleParseError(), ResetScanner(), impala::HdfsAvroScanner::ResolveSchemas(), impala::HdfsScanner::StartNewRowBatch(), impala::HdfsScanner::UpdateDecompressor(), impala::HdfsAvroScanner::VerifyTypesMatch(), impala::HdfsScanner::WriteCompleteTuple(), impala::HdfsScanner::WriteEmptyTuples(), WriteFields(), and WritePartialTuple().
|
private |
Index into materialized_slots_ for the next slot to output for the current tuple.
Definition at line 148 of file hdfs-text-scanner.h.
Referenced by ResetScanner(), WriteFields(), and WritePartialTuple().
|
protectedinherited |
RuntimeState for error reporting.
Definition at line 144 of file hdfs-scanner.h.
Referenced by impala::HdfsScanner::Close(), impala::HdfsScanner::CommitRows(), FillByteBufferGzip(), FinishScanRange(), Prepare(), impala::HdfsScanner::Prepare(), impala::HdfsSequenceScanner::Prepare(), impala::HdfsSequenceScanner::ProcessBlockCompressedScanRange(), impala::HdfsRCFileScanner::ProcessRange(), impala::BaseSequenceScanner::ProcessSplit(), impala::HdfsSequenceScanner::ReadCompressedBlock(), impala::BaseSequenceScanner::ReadPastSize(), impala::HdfsRCFileScanner::ReadRowGroup(), impala::HdfsScanner::ReportColumnParseError(), impala::HdfsScanner::ReportTupleParseError(), impala::HdfsAvroScanner::ResolveSchemas(), impala::HdfsScanner::StartNewRowBatch(), impala::HdfsParquetScanner::ValidateColumn(), and WriteFields().
|
protectedinherited |
The first stream for context_.
Definition at line 150 of file hdfs-scanner.h.
Referenced by Close(), impala::BaseSequenceScanner::Close(), impala::HdfsParquetScanner::CreateColumnReaders(), impala::HdfsRCFileScanner::DebugString(), FillByteBuffer(), FillByteBufferCompressedFile(), FillByteBufferGzip(), FindFirstTuple(), FinishScanRange(), impala::HdfsSequenceScanner::GetRecord(), InitNewRange(), impala::HdfsRCFileScanner::InitNewRange(), impala::HdfsRCFileScanner::NextField(), impala::HdfsAvroScanner::ParseMetadata(), impala::BaseSequenceScanner::Prepare(), impala::HdfsScanner::Prepare(), impala::HdfsSequenceScanner::ProcessBlockCompressedScanRange(), impala::HdfsParquetScanner::ProcessFooter(), ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), impala::HdfsRCFileScanner::ProcessRange(), ProcessSplit(), impala::BaseSequenceScanner::ProcessSplit(), impala::HdfsParquetScanner::ProcessSplit(), impala::HdfsSequenceScanner::ReadBlockHeader(), impala::HdfsRCFileScanner::ReadColumnBuffers(), impala::HdfsSequenceScanner::ReadCompressedBlock(), impala::HdfsAvroScanner::ReadFileHeader(), impala::HdfsSequenceScanner::ReadFileHeader(), impala::HdfsRCFileScanner::ReadFileHeader(), impala::HdfsRCFileScanner::ReadKeyBuffers(), impala::HdfsRCFileScanner::ReadNumColumnsMetadata(), impala::HdfsRCFileScanner::ReadRowGroupHeader(), impala::BaseSequenceScanner::ReadSync(), impala::HdfsScanner::ReportTupleParseError(), impala::BaseSequenceScanner::SkipToSync(), impala::HdfsParquetScanner::ValidateFileMetadata(), impala::HdfsAvroScanner::VerifyTypesMatch(), and WriteFields().
|
protectedinherited |
A partially materialized tuple with only partition key slots set. The non-partition key slots are set to NULL. The template tuple must be copied into tuple_ before any of the other slots are materialized. Pointer is NULL if there are no partition key slots. This template tuple is computed once for each file and valid for the duration of that file. It is owned by the HDFS scan node.
Definition at line 164 of file hdfs-scanner.h.
Referenced by impala::HdfsAvroScanner::AllocateFileHeader(), impala::HdfsParquetScanner::AssembleRows(), impala::HdfsParquetScanner::CreateColumnReaders(), impala::HdfsAvroScanner::DecodeAvroData(), impala::HdfsAvroScanner::InitNewRange(), impala::HdfsScanner::Prepare(), impala::HdfsSequenceScanner::ProcessRange(), impala::HdfsRCFileScanner::ProcessRange(), impala::HdfsAvroScanner::ResolveSchemas(), impala::HdfsScanner::WriteAlignedTuples(), impala::HdfsScanner::WriteEmptyTuples(), and WriteFields().
|
protectedinherited |
Helper class for converting text to other types;.
Definition at line 186 of file hdfs-scanner.h.
Referenced by InitNewRange(), impala::HdfsSequenceScanner::InitNewRange(), impala::HdfsRCFileScanner::Prepare(), impala::HdfsRCFileScanner::ProcessRange(), impala::HdfsScanner::WriteCompleteTuple(), and WritePartialTuple().
|
protectedinherited |
Current tuple pointer into tuple_mem_.
Definition at line 170 of file hdfs-scanner.h.
Referenced by FinishScanRange(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), impala::HdfsScanner::WriteAlignedTuples(), and WriteFields().
|
protectedinherited |
Fixed size of each tuple, in bytes.
Definition at line 167 of file hdfs-scanner.h.
Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsScanner::InitTuple(), impala::HdfsScanner::next_tuple(), impala::HdfsScanner::StartNewRowBatch(), and impala::HdfsScanner::WriteAlignedTuples().
|
protectedinherited |
The tuple memory of batch_.
Definition at line 180 of file hdfs-scanner.h.
Referenced by impala::HdfsScanner::CommitRows(), impala::HdfsScanner::GetMemory(), and impala::HdfsScanner::StartNewRowBatch().
|
protectedinherited |
Jitted write tuples function pointer. Null if codegen is disabled.
Definition at line 215 of file hdfs-scanner.h.
Referenced by impala::HdfsScanner::InitializeWriteTuplesFn(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), and WriteFields().