Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::HdfsRCFileScanner Class Reference

A scanner for reading RCFiles into tuples. More...

#include <hdfs-rcfile-scanner.h>

Inheritance diagram for impala::HdfsRCFileScanner:
Collaboration diagram for impala::HdfsRCFileScanner:

Classes

struct  ColumnInfo
 
struct  RcFileHeader
 Data that is fixed across headers. This struct is shared between scan ranges. More...
 

Public Member Functions

 HdfsRCFileScanner (HdfsScanNode *scan_node, RuntimeState *state)
 
virtual ~HdfsRCFileScanner ()
 
virtual Status Prepare (ScannerContext *context)
 One-time initialisation of state that is constant across scan ranges. More...
 
void DebugString (int indentation_level, std::stringstream *out) const
 
virtual void Close ()
 
virtual Status ProcessSplit ()
 

Static Public Member Functions

static Status IssueInitialRanges (HdfsScanNode *scan_node, const std::vector< HdfsFileDesc * > &files)
 Issue the initial ranges for all sequence container files. More...
 

Static Public Attributes

static const int FILE_BLOCK_SIZE = 4096
 
static const char * LLVM_CLASS_NAME = "class.impala::HdfsScanner"
 

Protected Types

typedef int(* WriteTuplesFn )(HdfsScanner *, MemPool *, TupleRow *, int, FieldLocation *, int, int, int, int)
 

Protected Member Functions

Status ReadSync ()
 
Status SkipToSync (const uint8_t *sync, int sync_size)
 
bool finished ()
 
Status InitializeWriteTuplesFn (HdfsPartitionDescriptor *partition, THdfsFileFormat::type type, const std::string &scanner_name)
 
void StartNewRowBatch ()
 Set batch_ to a new row batch and update tuple_mem_ accordingly. More...
 
int GetMemory (MemPool **pool, Tuple **tuple_mem, TupleRow **tuple_row_mem)
 
Status CommitRows (int num_rows)
 
void AddFinalRowBatch ()
 
void AttachPool (MemPool *pool, bool commit_batch)
 
bool IR_ALWAYS_INLINE EvalConjuncts (TupleRow *row)
 
int WriteEmptyTuples (RowBatch *row_batch, int num_tuples)
 
int WriteEmptyTuples (ScannerContext *context, TupleRow *tuple_row, int num_tuples)
 Write empty tuples and commit them to the context object. More...
 
int WriteAlignedTuples (MemPool *pool, TupleRow *tuple_row_mem, int row_size, FieldLocation *fields, int num_tuples, int max_added_tuples, int slots_per_tuple, int row_start_indx)
 
Status UpdateDecompressor (const THdfsCompression::type &compression)
 
Status UpdateDecompressor (const std::string &codec)
 
bool ReportTupleParseError (FieldLocation *fields, uint8_t *errors, int row_idx)
 
virtual void LogRowParseError (int row_idx, std::stringstream *)
 
bool WriteCompleteTuple (MemPool *pool, FieldLocation *fields, Tuple *tuple, TupleRow *tuple_row, Tuple *template_tuple, uint8_t *error_fields, uint8_t *error_in_row)
 
void ReportColumnParseError (const SlotDescriptor *desc, const char *data, int len)
 
void InitTuple (Tuple *template_tuple, Tuple *tuple)
 
Tuplenext_tuple (Tuple *t) const
 
TupleRownext_row (TupleRow *r) const
 
ExprContextGetConjunctCtx (int idx) const
 

Static Protected Member Functions

static llvm::Function * CodegenWriteCompleteTuple (HdfsScanNode *, LlvmCodeGen *, const std::vector< ExprContext * > &conjunct_ctxs)
 
static llvm::Function * CodegenWriteAlignedTuples (HdfsScanNode *, LlvmCodeGen *, llvm::Function *write_tuple_fn)
 

Protected Attributes

FileHeaderheader_
 File header for this scan range. This is not owned by the parent scan node. More...
 
bool only_parsing_header_
 If true, this scanner object is only for processing the header. More...
 
HdfsScanNodescan_node_
 The scan node that started this scanner. More...
 
RuntimeStatestate_
 RuntimeState for error reporting. More...
 
ScannerContextcontext_
 Context for this scanner. More...
 
ScannerContext::Streamstream_
 The first stream for context_. More...
 
std::vector< ExprContext * > conjunct_ctxs_
 
Tupletemplate_tuple_
 
int tuple_byte_size_
 Fixed size of each tuple, in bytes. More...
 
Tupletuple_
 Current tuple pointer into tuple_mem_. More...
 
RowBatchbatch_
 
uint8_t * tuple_mem_
 The tuple memory of batch_. More...
 
int num_errors_in_file_
 number of errors in current file More...
 
boost::scoped_ptr< TextConvertertext_converter_
 Helper class for converting text to other types;. More...
 
int32_t num_null_bytes_
 Number of null bytes in the tuple. More...
 
Status parse_status_
 
boost::scoped_ptr< Codecdecompressor_
 Decompressor class to use, if any. More...
 
THdfsCompression::type decompression_type_
 The most recently used decompression type. More...
 
boost::scoped_ptr< MemPooldata_buffer_pool_
 
RuntimeProfile::Counterdecompress_timer_
 Time spent decompressing bytes. More...
 
WriteTuplesFn write_tuples_fn_
 Jitted write tuples function pointer. Null if codegen is disabled. More...
 

Static Protected Attributes

static const int SYNC_HASH_SIZE = 16
 Size of the sync hash field. More...
 
static const int HEADER_SIZE = 1024
 
static const int SYNC_MARKER = -1
 Sync indicator. More...
 

Private Types

enum  Version { SEQ6, RCF1 }
 

Private Member Functions

virtual FileHeaderAllocateFileHeader ()
 Implementation of superclass functions. More...
 
virtual Status ReadFileHeader ()
 
virtual Status InitNewRange ()
 Reset internal state for a new scan range. More...
 
virtual Status ProcessRange ()
 
virtual THdfsFileFormat::type file_format () const
 Returns type of scanner: e.g. rcfile, seqfile. More...
 
Status ReadNumColumnsMetadata ()
 
Status ReadRowGroupHeader ()
 
Status ReadKeyBuffers ()
 
void GetCurrentKeyBuffer (int col_idx, bool skip_col_data, uint8_t **key_buf_ptr)
 
Status ReadColumnBuffers ()
 
Status NextField (int col_idx)
 
Status ReadRowGroup ()
 
void ResetRowGroup ()
 Reset state for a new row group. More...
 
Status NextRow ()
 

Private Attributes

std::vector< ColumnInfocolumns_
 
std::vector< uint8_t > key_buffer_
 Buffer for copying key buffers. This buffer is reused between row groups. More...
 
int num_rows_
 number of rows in this rowgroup object More...
 
int row_pos_
 
int key_length_
 
int compressed_key_length_
 
bool reuse_row_group_buffer_
 
uint8_t * row_group_buffer_
 
int row_group_length_
 
int row_group_buffer_size_
 

Static Private Attributes

static const char *const RCFILE_KEY_CLASS_NAME
 
static const char *const RCFILE_VALUE_CLASS_NAME
 
static const char *const RCFILE_METADATA_KEY_NUM_COLS
 
static const uint8_t RCFILE_VERSION_HEADER [4] = {'R', 'C', 'F', 1}
 

Detailed Description

A scanner for reading RCFiles into tuples.

Definition at line 231 of file hdfs-rcfile-scanner.h.

Member Typedef Documentation

typedef int(* impala::HdfsScanner::WriteTuplesFn)(HdfsScanner *, MemPool *, TupleRow *, int, FieldLocation *, int, int, int, int)
protectedinherited

Matching typedef for WriteAlignedTuples for codegen. Refer to comments for that function.

Definition at line 212 of file hdfs-scanner.h.

Member Enumeration Documentation

Enumerator
SEQ6 
RCF1 

Definition at line 328 of file hdfs-rcfile-scanner.h.

Constructor & Destructor Documentation

HdfsRCFileScanner::HdfsRCFileScanner ( HdfsScanNode scan_node,
RuntimeState state 
)

Definition at line 53 of file hdfs-rcfile-scanner.cc.

HdfsRCFileScanner::~HdfsRCFileScanner ( )
virtual

Definition at line 57 of file hdfs-rcfile-scanner.cc.

Member Function Documentation

void HdfsScanner::AddFinalRowBatch ( )
protectedinherited

Attach all remaining resources from context_ to batch_ and send batch_ to the scan node. This must be called after all rows have been committed and no further resources are needed from context_ (in practice this will happen in each scanner subclass's Close() implementation).

Definition at line 145 of file hdfs-scanner.cc.

References impala::HdfsScanNode::AddMaterializedRowBatch(), impala::HdfsScanner::batch_, impala::HdfsScanner::context_, impala::ScannerContext::ReleaseCompletedResources(), and impala::HdfsScanner::scan_node_.

Referenced by impala::HdfsTextScanner::Close(), impala::BaseSequenceScanner::Close(), and impala::HdfsParquetScanner::Close().

BaseSequenceScanner::FileHeader * HdfsRCFileScanner::AllocateFileHeader ( )
privatevirtual

Implementation of superclass functions.

Implements impala::BaseSequenceScanner.

Definition at line 227 of file hdfs-rcfile-scanner.cc.

void impala::HdfsScanner::AttachPool ( MemPool pool,
bool  commit_batch 
)
inlineprotectedinherited

Release all memory in 'pool' to batch_. If commit_batch is true, the row batch will be committed. commit_batch should be true if the attached pool is expected to be non-trivial (i.e. a decompression buffer) to minimize scanner mem usage.

Definition at line 256 of file hdfs-scanner.h.

References impala::MemPool::AcquireData(), impala::HdfsScanner::batch_, impala::HdfsScanner::CommitRows(), and impala::RowBatch::tuple_data_pool().

Referenced by impala::HdfsTextScanner::Close(), impala::BaseSequenceScanner::Close(), impala::HdfsParquetScanner::Close(), impala::HdfsTextScanner::FillByteBufferGzip(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ReadCompressedBlock(), impala::HdfsParquetScanner::BaseColumnReader::ReadDataPage(), and ResetRowGroup().

Function * HdfsScanner::CodegenWriteAlignedTuples ( HdfsScanNode ,
LlvmCodeGen ,
llvm::Function *  write_tuple_fn 
)
staticprotectedinherited

Codegen function to replace WriteAlignedTuples. WriteAlignedTuples is cross compiled to IR. This function loads the precompiled IR function, modifies it and returns the resulting function.

Definition at line 495 of file hdfs-scanner.cc.

References impala::LlvmCodeGen::codegen_timer(), impala::LlvmCodeGen::FinalizeFunction(), impala::LlvmCodeGen::GetFunction(), impala::LlvmCodeGen::ReplaceCallSites(), and SCOPED_TIMER.

Referenced by impala::HdfsTextScanner::Codegen(), and impala::HdfsSequenceScanner::Codegen().

Function * HdfsScanner::CodegenWriteCompleteTuple ( HdfsScanNode ,
LlvmCodeGen ,
const std::vector< ExprContext * > &  conjunct_ctxs 
)
staticprotectedinherited

Codegen function to replace WriteCompleteTuple. Should behave identically to WriteCompleteTuple.

Definition at line 296 of file hdfs-scanner.cc.

References impala::LlvmCodeGen::FnPrototype::AddArgument(), impala::TupleDescriptor::byte_size(), impala::LlvmCodeGen::codegen_timer(), impala::LlvmCodeGen::CodegenMemcpy(), impala::TextConverter::CodegenWriteSlot(), impala::HdfsScanNode::ComputeSlotMaterializationOrder(), impala::LlvmCodeGen::context(), impala::CodegenAnyVal::CreateCallWrapped(), impala::LlvmCodeGen::false_value(), impala::LlvmCodeGen::FinalizeFunction(), impala::TupleDescriptor::GenerateLlvmStruct(), impala::Status::GetDetail(), impala::LlvmCodeGen::GetFunction(), impala::LlvmCodeGen::GetIntConstant(), impala::LlvmCodeGen::GetType(), impala::CodegenAnyVal::GetVal(), impala::HdfsScanNode::hdfs_table(), impala::FieldLocation::LLVM_CLASS_NAME, impala::TupleRow::LLVM_CLASS_NAME, impala::Tuple::LLVM_CLASS_NAME, impala::HdfsScanner::LLVM_CLASS_NAME, impala::MemPool::LLVM_CLASS_NAME, impala::HdfsScanNode::materialized_slots(), impala::HdfsTableDescriptor::null_column_value(), impala::HdfsScanNode::num_materialized_partition_keys(), impala::TupleDescriptor::num_null_bytes(), impala::Status::ok(), impala::LlvmCodeGen::OptimizeFunctionWithExprs(), impala::HdfsScanNode::runtime_state(), SCOPED_TIMER, impala::LlvmCodeGen::true_value(), impala::HdfsScanNode::tuple_desc(), impala::HdfsScanNode::tuple_idx(), impala::ColumnType::type, impala::SlotDescriptor::type(), impala::TYPE_BOOLEAN, impala::TYPE_DECIMAL, impala::TYPE_INT, impala::TYPE_TIMESTAMP, and impala::TYPE_TINYINT.

Referenced by impala::HdfsTextScanner::Codegen(), and impala::HdfsSequenceScanner::Codegen().

Status HdfsScanner::CommitRows ( int  num_rows)
protectedinherited

Commit num_rows to the current row batch. If this completes, the row batch is enqueued with the scan node and StartNewRowBatch() is called. Returns Status::OK if the query is not cancelled and hasn't exceeded any mem limits. Scanner can call this with 0 rows to flush any pending resources (attached pools and io buffers) to minimize memory consumption.

Definition at line 124 of file hdfs-scanner.cc.

References impala::HdfsScanNode::AddMaterializedRowBatch(), impala::RowBatch::AtCapacity(), impala::HdfsScanner::batch_, impala::TupleDescriptor::byte_size(), impala::Status::CANCELLED, impala::ScannerContext::cancelled(), impala::RowBatch::capacity(), impala::RuntimeState::CheckQueryState(), impala::RowBatch::CommitRows(), impala::HdfsScanner::conjunct_ctxs_, impala::HdfsScanner::context_, impala::ExprContext::FreeLocalAllocations(), impala::ScannerContext::num_completed_io_buffers(), impala::RowBatch::num_rows(), impala::Status::OK, impala::ScannerContext::ReleaseCompletedResources(), RETURN_IF_ERROR, impala::HdfsScanner::scan_node_, impala::HdfsScanner::StartNewRowBatch(), impala::HdfsScanner::state_, impala::HdfsScanNode::tuple_desc(), and impala::HdfsScanner::tuple_mem_.

Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsScanner::AttachPool(), impala::HdfsTextScanner::FinishScanRange(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessFooter(), impala::HdfsTextScanner::ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), ProcessRange(), and impala::HdfsParquetScanner::ProcessSplit().

void HdfsRCFileScanner::DebugString ( int  indentation_level,
std::stringstream *  out 
) const
bool IR_ALWAYS_INLINE impala::HdfsScanner::EvalConjuncts ( TupleRow row)
inlineprotectedinherited

Convenience function for evaluating conjuncts using this scanner's ExprContexts. This must always be inlined so we can correctly replace the call to ExecNode::EvalConjuncts() during codegen.

Definition at line 266 of file hdfs-scanner.h.

References impala::HdfsScanner::conjunct_ctxs_, and impala::ExecNode::EvalConjuncts().

Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsAvroScanner::DecodeAvroData(), ProcessRange(), impala::HdfsScanner::WriteCompleteTuple(), impala::HdfsScanner::WriteEmptyTuples(), and impala::HdfsTextScanner::WriteFields().

virtual THdfsFileFormat::type impala::HdfsRCFileScanner::file_format ( ) const
inlineprivatevirtual

Returns type of scanner: e.g. rcfile, seqfile.

Implements impala::BaseSequenceScanner.

Definition at line 263 of file hdfs-rcfile-scanner.h.

ExprContext * HdfsScanner::GetConjunctCtx ( int  idx) const
protectedinherited

Simple wrapper around conjunct_ctxs_. Used in the codegen'd version of WriteCompleteTuple() because it's easier than writing IR to access conjunct_ctxs_.

Definition at line 79 of file hdfs-scanner-ir.cc.

References impala::HdfsScanner::conjunct_ctxs_, and gen_ir_descriptions::idx.

void HdfsRCFileScanner::GetCurrentKeyBuffer ( int  col_idx,
bool  skip_col_data,
uint8_t **  key_buf_ptr 
)
private

Process the current key buffer. Inputs: col_idx: column to process skip_col_data: if true, just skip over the key data. Input/Output: key_buf_ptr: Pointer to the buffered file data, this will be moved past the data for this column. Sets: col_buf_len_ col_buf_uncompressed_len_ col_key_bufs_ col_bufs_off_

Definition at line 344 of file hdfs-rcfile-scanner.cc.

References impala::HdfsRCFileScanner::ColumnInfo::buffer_len, columns_, impala::ReadWriteUtil::GetVInt(), impala::HdfsRCFileScanner::ColumnInfo::key_buffer, row_group_length_, impala::HdfsRCFileScanner::ColumnInfo::start_offset, and impala::HdfsRCFileScanner::ColumnInfo::uncompressed_buffer_len.

Referenced by ReadKeyBuffers().

int HdfsScanner::GetMemory ( MemPool **  pool,
Tuple **  tuple_mem,
TupleRow **  tuple_row_mem 
)
protectedinherited

Gets memory for outputting tuples into batch_. *pool is the mem pool that should be used for memory allocated for those tuples. *tuple_mem should be the location to output tuples, and *tuple_row_mem for outputting tuple rows. Returns the maximum number of tuples/tuple rows that can be output (before the current row batch is complete and a new one is allocated). Memory returned from this call is invalidated after calling CommitRows. Callers must call GetMemory again after calling this function.

Definition at line 115 of file hdfs-scanner.cc.

References impala::RowBatch::AddRow(), impala::HdfsScanner::batch_, impala::RowBatch::capacity(), impala::RowBatch::GetRow(), impala::RowBatch::num_rows(), impala::RowBatch::tuple_data_pool(), and impala::HdfsScanner::tuple_mem_.

Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsTextScanner::FinishScanRange(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessFooter(), impala::HdfsTextScanner::ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), and ProcessRange().

Status HdfsScanner::InitializeWriteTuplesFn ( HdfsPartitionDescriptor partition,
THdfsFileFormat::type  type,
const std::string &  scanner_name 
)
protectedinherited
void impala::HdfsScanner::InitTuple ( Tuple template_tuple,
Tuple tuple 
)
inlineprotectedinherited

Initialize a tuple. TODO: only copy over non-null slots. TODO: InitTuple is called frequently, avoid the if, perhaps via templatization.

Definition at line 355 of file hdfs-scanner.h.

References impala::HdfsScanner::num_null_bytes_, and impala::HdfsScanner::tuple_byte_size_.

Referenced by impala::HdfsParquetScanner::AssembleRows(), impala::HdfsAvroScanner::DecodeAvroData(), ProcessRange(), impala::HdfsScanner::WriteCompleteTuple(), and impala::HdfsTextScanner::WriteFields().

Status BaseSequenceScanner::IssueInitialRanges ( HdfsScanNode scan_node,
const std::vector< HdfsFileDesc * > &  files 
)
staticinherited
void HdfsScanner::LogRowParseError ( int  row_idx,
std::stringstream *   
)
protectedvirtualinherited

Utility function to append an error message for an invalid row. This is called from ReportTupleParseError() row_idx is the index of the row in the current batch. Subclasses should override this function (i.e. text needs to join boundary rows). Since this is only in the error path, vtable overhead is acceptable.

Reimplemented in impala::HdfsSequenceScanner, and impala::HdfsTextScanner.

Definition at line 572 of file hdfs-scanner.cc.

Referenced by impala::HdfsScanner::ReportTupleParseError().

Tuple* impala::HdfsScanner::next_tuple ( Tuple t) const
inlineprotectedinherited
Status HdfsRCFileScanner::NextField ( int  col_idx)
inlineprivate

Look at the next field in the specified column buffer Input: col_idx: Column of the field. Modifies: cur_field_length_rep_[col_idx] key_buf_pos_[col_idx] cur_field_length_rep_[col_idx] cur_field_length_[col_idx]

Definition at line 368 of file hdfs-rcfile-scanner.cc.

References impala::HdfsRCFileScanner::ColumnInfo::buffer_pos, columns_, impala::HdfsRCFileScanner::ColumnInfo::current_field_len, impala::HdfsRCFileScanner::ColumnInfo::current_field_len_rep, impala::ScannerContext::Stream::file_offset(), impala::ReadWriteUtil::GetVLong(), impala::HdfsRCFileScanner::ColumnInfo::key_buffer, impala::HdfsRCFileScanner::ColumnInfo::key_buffer_pos, impala::Status::OK, and impala::HdfsScanner::stream_.

Referenced by NextRow().

Status HdfsRCFileScanner::NextRow ( )
inlineprivate

Move to next row. Calls NextField on each column that we are reading. Modifies: row_pos_

Definition at line 400 of file hdfs-rcfile-scanner.cc.

References columns_, NextField(), num_rows_, impala::Status::OK, RETURN_IF_ERROR, and row_pos_.

Referenced by ProcessRange().

Status HdfsRCFileScanner::ProcessRange ( )
privatevirtual

Process the current range until the end or an error occurred. Note this might be called multiple times if we skip over bad data. This function should read from the underlying ScannerContext materializing tuples to the context. When this function is called, it is guaranteed to be at the start of a data block (i.e. right after the sync marker).

Implements impala::BaseSequenceScanner.

Definition at line 451 of file hdfs-rcfile-scanner.cc.

References impala::RuntimeState::abort_on_error(), impala::HdfsRCFileScanner::ColumnInfo::buffer_pos, impala::SlotDescriptor::col_pos(), columns_, impala::HdfsScanner::CommitRows(), impala::HdfsScanner::context_, COUNTER_ADD, impala::HdfsRCFileScanner::ColumnInfo::current_field_len, impala::ScannerContext::Stream::eof(), impala::RuntimeState::ErrorLog(), impala::HdfsScanner::EvalConjuncts(), impala::ScannerContext::Stream::filename(), impala::BaseSequenceScanner::finished(), impala::HdfsScanner::GetMemory(), impala::HdfsScanner::InitTuple(), impala::RuntimeState::LogError(), impala::RuntimeState::LogHasSpace(), impala::HdfsRCFileScanner::ColumnInfo::materialize_column, impala::ScanNode::materialize_tuple_timer(), impala::HdfsScanNode::materialized_slots(), impala::HdfsScanner::next_row(), impala::HdfsScanner::next_tuple(), NextRow(), impala::SlotDescriptor::null_indicator_offset(), impala::HdfsScanNode::num_partition_keys(), num_rows_, impala::Status::OK, impala::HdfsScanner::parse_status_, pool, impala::ExecNode::ReachedLimit(), impala::ScannerContext::Stream::ReadInt(), ReadRowGroup(), impala::BaseSequenceScanner::ReadSync(), impala::HdfsScanner::ReportColumnParseError(), impala::RuntimeState::ReportFileErrors(), ResetRowGroup(), RETURN_IF_ERROR, RETURN_IF_FALSE, row_group_buffer_, row_group_length_, row_pos_, impala::ScanNode::rows_read_counter(), impala::HdfsScanner::scan_node_, SCOPED_TIMER, impala::Tuple::SetNull(), impala::TupleRow::SetTuple(), impala::HdfsRCFileScanner::ColumnInfo::start_offset, impala::HdfsScanner::state_, impala::HdfsScanner::stream_, impala::BaseSequenceScanner::SYNC_MARKER, impala::HdfsScanner::template_tuple_, impala::HdfsScanner::text_converter_, impala::HdfsScanNode::tuple_idx(), and impala::HdfsScanner::WriteEmptyTuples().

Status BaseSequenceScanner::ProcessSplit ( )
virtualinherited

Process an entire split, reading bytes from the context's streams. Context is initialized with the split data (e.g. template tuple, partition descriptor, etc). This function should only return on error or end of scan range.

Implements impala::HdfsScanner.

Definition at line 100 of file base-sequence-scanner.cc.

References impala::RuntimeState::abort_on_error(), impala::ObjectPool::Add(), impala::HdfsScanNode::AddDiskIoRanges(), impala::BaseSequenceScanner::AllocateFileHeader(), impala::BaseSequenceScanner::bytes_skipped_counter_, impala::BaseSequenceScanner::CloseFileRanges(), COUNTER_ADD, impala::ScannerContext::Stream::eof(), impala::ScannerContext::Stream::file_offset(), impala::ScannerContext::Stream::filename(), impala::BaseSequenceScanner::finished_, impala::HdfsScanNode::GetFileDesc(), impala::HdfsScanNode::GetFileMetadata(), impala::BaseSequenceScanner::header_, impala::BaseSequenceScanner::FileHeader::header_size, impala::HdfsScanner::InitNewRange(), impala::BaseSequenceScanner::FileHeader::is_compressed, impala::Status::IsCancelled(), impala::Status::IsMemLimitExceeded(), impala::RuntimeState::LogError(), impala::Status::msg(), impala::RuntimeState::obj_pool(), impala::Status::OK, impala::Status::ok(), impala::BaseSequenceScanner::only_parsing_header_, impala::HdfsScanner::parse_status_, impala::BaseSequenceScanner::ProcessRange(), impala::BaseSequenceScanner::ReadFileHeader(), RETURN_IF_ERROR, RETURN_IF_FALSE, impala::HdfsScanner::scan_node_, impala::ScannerContext::Stream::set_contains_tuple_data(), impala::HdfsScanNode::SetFileMetadata(), impala::ScannerContext::Stream::SkipBytes(), impala::BaseSequenceScanner::SkipToSync(), impala::HdfsScanner::state_, impala::HdfsScanner::stream_, impala::BaseSequenceScanner::FileHeader::sync, and impala::BaseSequenceScanner::SYNC_HASH_SIZE.

Status HdfsRCFileScanner::ReadKeyBuffers ( )
private

Read the rowgroup key buffers, decompress if necessary. The "keys" are really the lengths for the column values. They are read here and then used to decode the values in the column buffer. Calls GetCurrentKeyBuffer for each column to process the key data.

Definition at line 308 of file hdfs-rcfile-scanner.cc.

References columns_, compressed_key_length_, impala::HdfsScanner::decompress_timer_, impala::HdfsScanner::decompressor_, GetCurrentKeyBuffer(), impala::ReadWriteUtil::GetVInt(), impala::BaseSequenceScanner::header_, impala::BaseSequenceScanner::FileHeader::is_compressed, key_buffer_, key_length_, num_rows_, impala::Status::OK, impala::HdfsScanner::parse_status_, impala::ScannerContext::Stream::ReadBytes(), RETURN_IF_ERROR, RETURN_IF_FALSE, row_group_length_, SCOPED_TIMER, impala::HdfsScanner::stream_, and VLOG_FILE.

Referenced by ReadRowGroup().

Status HdfsRCFileScanner::ReadRowGroupHeader ( )
private

Reads the rowgroup header starting after the sync. Sets: key_length_ compressed_key_length_ num_rows_

Definition at line 278 of file hdfs-rcfile-scanner.cc.

References compressed_key_length_, impala::ScannerContext::Stream::file_offset(), key_length_, impala::Status::OK, impala::HdfsScanner::parse_status_, impala::ScannerContext::Stream::ReadInt(), RETURN_IF_FALSE, and impala::HdfsScanner::stream_.

Referenced by ReadRowGroup().

void HdfsScanner::ReportColumnParseError ( const SlotDescriptor desc,
const char *  data,
int  len 
)
protectedinherited
bool HdfsScanner::ReportTupleParseError ( FieldLocation fields,
uint8_t *  errors,
int  row_idx 
)
protectedinherited

Utility function to report parse errors for each field. If errors[i] is nonzero, fields[i] had a parse error. row_idx is the idx of the row in the current batch that had the parse error Returns false if parsing should be aborted. In this case parse_status_ is set to the error. This is called from WriteAlignedTuples.

Definition at line 546 of file hdfs-scanner.cc.

References impala::RuntimeState::abort_on_error(), impala::ScannerContext::Stream::filename(), impala::RuntimeState::LogError(), impala::RuntimeState::LogHasSpace(), impala::HdfsScanner::LogRowParseError(), impala::HdfsScanNode::materialized_slots(), impala::HdfsScanner::num_errors_in_file_, impala::Status::ok(), impala::HdfsScanner::parse_status_, impala::HdfsScanner::ReportColumnParseError(), impala::RuntimeState::ReportFileErrors(), impala::HdfsScanner::scan_node_, impala::HdfsScanner::state_, and impala::HdfsScanner::stream_.

Referenced by impala::HdfsSequenceScanner::ProcessRange(), and impala::HdfsScanner::WriteAlignedTuples().

void HdfsRCFileScanner::ResetRowGroup ( )
private
Status BaseSequenceScanner::SkipToSync ( const uint8_t *  sync,
int  sync_size 
)
protectedinherited
Status HdfsScanner::UpdateDecompressor ( const THdfsCompression::type &  compression)
protectedinherited

Update the decompressor_ object given a compression type or codec name. Depending on the old compression type and the new one, it may close the old decompressor and/or create a new one of different type.

Definition at line 513 of file hdfs-scanner.cc.

References impala::Codec::CreateDecompressor(), impala::HdfsScanner::data_buffer_pool_, impala::HdfsScanner::decompression_type_, impala::HdfsScanner::decompressor_, impala::Status::OK, RETURN_IF_ERROR, impala::HdfsScanner::scan_node_, impala::TupleDescriptor::string_slots(), and impala::HdfsScanNode::tuple_desc().

Referenced by impala::HdfsAvroScanner::InitNewRange(), impala::HdfsSequenceScanner::InitNewRange(), and impala::HdfsTextScanner::ProcessSplit().

Status impala::HdfsScanner::UpdateDecompressor ( const std::string &  codec)
protectedinherited
int HdfsScanner::WriteAlignedTuples ( MemPool pool,
TupleRow tuple_row_mem,
int  row_size,
FieldLocation fields,
int  num_tuples,
int  max_added_tuples,
int  slots_per_tuple,
int  row_start_indx 
)
protectedinherited

Processes batches of fields and writes them out to tuple_row_mem.

  • 'pool' mempool to allocate from for auxiliary tuple memory
  • 'tuple_row_mem' preallocated tuple_row memory this function must use.
  • 'fields' must start at the beginning of a tuple.
  • 'num_tuples' number of tuples to process
  • 'max_added_tuples' the maximum number of tuples that should be added to the batch.
  • 'row_start_index' is the number of rows that have already been processed as part of WritePartialTuple. Returns the number of tuples added to the row batch. This can be less than num_tuples/tuples_till_limit because of failed conjuncts. Returns -1 if parsing should be aborted due to parse errors.

Definition at line 33 of file hdfs-scanner-ir.cc.

References impala::HdfsScanner::ReportTupleParseError(), impala::HdfsScanner::template_tuple_, impala::HdfsScanner::tuple_, impala::HdfsScanner::tuple_byte_size_, UNLIKELY, and impala::HdfsScanner::WriteCompleteTuple().

Referenced by impala::HdfsSequenceScanner::ProcessDecompressedBlock(), and impala::HdfsTextScanner::WriteFields().

bool HdfsScanner::WriteCompleteTuple ( MemPool pool,
FieldLocation fields,
Tuple tuple,
TupleRow tuple_row,
Tuple template_tuple,
uint8_t *  error_fields,
uint8_t *  error_in_row 
)
protectedinherited

Writes out all slots for 'tuple' from 'fields'. 'fields' must be aligned to the start of the tuple (e.g. fields[0] maps to slots[0]). After writing the tuple, it will be evaluated against the conjuncts.

  • error_fields is an out array. error_fields[i] will be set to true if the ith field had a parse error
  • error_in_row is an out bool. It is set to true if any field had parse errors Returns whether the resulting tuplerow passed the conjuncts. The parsing of the fields and evaluating against conjuncts is combined in this function. This is done so it can be possible to evaluate conjuncts as slots are materialized (on partial tuples). This function is replaced by a codegen'd function at runtime. This is the reason that the out error parameters are typed uint8_t instead of bool. We need to be able to match this function's signature identically for the codegen'd function. Bool's as out parameters can get converted to bytes by the compiler and rather than implicitly depending on that to happen, we will explicitly type them to bytes. TODO: revisit this

Definition at line 217 of file hdfs-scanner.cc.

References impala::HdfsScanner::EvalConjuncts(), impala::HdfsScanner::InitTuple(), impala::FieldLocation::len, impala::HdfsScanNode::materialized_slots(), impala::HdfsScanner::scan_node_, impala::TupleRow::SetTuple(), impala::HdfsScanner::text_converter_, impala::HdfsScanNode::tuple_idx(), and UNLIKELY.

Referenced by impala::HdfsSequenceScanner::ProcessRange(), and impala::HdfsScanner::WriteAlignedTuples().

int HdfsScanner::WriteEmptyTuples ( ScannerContext context,
TupleRow tuple_row,
int  num_tuples 
)
protectedinherited

Member Data Documentation

RowBatch* impala::HdfsScanner::batch_
protectedinherited

The current row batch being populated. Creating new row batches, attaching context resources, and handing off to the scan node is handled by this class in CommitRows(), but AttachPool() must be called by scanner subclasses to attach any memory allocated by that subclass. All row batches created by this class are transferred to the scan node (i.e., all batches are ultimately owned by the scan node).

Definition at line 177 of file hdfs-scanner.h.

Referenced by impala::HdfsScanner::AddFinalRowBatch(), impala::HdfsScanner::AttachPool(), impala::HdfsScanner::CommitRows(), impala::HdfsScanner::GetMemory(), impala::HdfsScanner::next_row(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessSplit(), impala::HdfsScanner::StartNewRowBatch(), impala::HdfsTextScanner::WriteFields(), and impala::HdfsScanner::~HdfsScanner().

std::vector<ColumnInfo> impala::HdfsRCFileScanner::columns_
private

Vector of column descriptions for each column in the file (i.e., may contain a different number of non-partition columns than are in the table metadata). Indexed by column index, including non-materialized columns.

Definition at line 376 of file hdfs-rcfile-scanner.h.

Referenced by GetCurrentKeyBuffer(), InitNewRange(), NextField(), NextRow(), ProcessRange(), ReadColumnBuffers(), ReadKeyBuffers(), and ResetRowGroup().

int impala::HdfsRCFileScanner::compressed_key_length_
private

Compressed size of the row group's key buffers. Read from the row group header.

Definition at line 394 of file hdfs-rcfile-scanner.h.

Referenced by ReadKeyBuffers(), ReadRowGroupHeader(), and ResetRowGroup().

std::vector<ExprContext*> impala::HdfsScanner::conjunct_ctxs_
protectedinherited

ExprContext for each conjunct. Each scanner has its own ExprContexts so the conjuncts can be safely evaluated in parallel.

Definition at line 154 of file hdfs-scanner.h.

Referenced by impala::HdfsScanner::Close(), impala::HdfsScanner::CommitRows(), impala::HdfsScanner::EvalConjuncts(), impala::HdfsScanner::GetConjunctCtx(), and impala::HdfsScanner::Prepare().

boost::scoped_ptr<MemPool> impala::HdfsScanner::data_buffer_pool_
protectedinherited
THdfsCompression::type impala::HdfsScanner::decompression_type_
protectedinherited
const int impala::HdfsScanner::FILE_BLOCK_SIZE = 4096
staticinherited

Assumed size of an OS file block. Used mostly when reading file format headers, etc. This probably ought to be a derived number from the environment.

Definition at line 95 of file hdfs-scanner.h.

const int BaseSequenceScanner::HEADER_SIZE = 1024
staticprotectedinherited

Estimate of header size in bytes. This is initial number of bytes to issue per file. If the estimate is too low, more bytes will be read as necessary.

Definition at line 121 of file base-sequence-scanner.h.

Referenced by impala::BaseSequenceScanner::IssueInitialRanges().

std::vector<uint8_t> impala::HdfsRCFileScanner::key_buffer_
private

Buffer for copying key buffers. This buffer is reused between row groups.

Definition at line 379 of file hdfs-rcfile-scanner.h.

Referenced by ReadKeyBuffers().

int impala::HdfsRCFileScanner::key_length_
private

Size of the row group's key buffers. Read from the row group header.

Definition at line 390 of file hdfs-rcfile-scanner.h.

Referenced by ReadKeyBuffers(), ReadRowGroupHeader(), and ResetRowGroup().

const char * HdfsScanner::LLVM_CLASS_NAME = "class.impala::HdfsScanner"
staticinherited

Scanner subclasses must implement these static functions as well. Unfortunately, c++ does not allow static virtual functions. Issue the initial ranges for 'files'. HdfsFileDesc groups all the splits assigned to this scan node by file. This is called before any of the scanner subclasses are created to process splits in 'files'. The strategy on how to parse the scan ranges depends on the file format.

  • For simple text files, all the splits are simply issued to the io mgr and one split == one scan range.
  • For formats with a header, the metadata is first parsed, and then the ranges are issued to the io mgr. There is one scan range for the header and one range for each split.
  • For columnar formats, the header is parsed and only the relevant byte ranges should be issued to the io mgr. This is one range for the metadata and one range for each column, for each split. This function is how scanners can pick their strategy. void IssueInitialRanges(HdfsScanNode* scan_node, const std::vector<HdfsFileDesc*>& files); Codegen all functions for this scanner. The codegen'd function is specific to the scanner subclass but not specific to each scanner object. We don't want to codegen the functions for each scanner object. llvm::Function* Codegen(HdfsScanNode* scan_node);

Definition at line 137 of file hdfs-scanner.h.

Referenced by impala::HdfsScanner::CodegenWriteCompleteTuple().

int impala::HdfsScanner::num_errors_in_file_
protectedinherited

number of errors in current file

Definition at line 183 of file hdfs-scanner.h.

Referenced by impala::HdfsScanner::ReportTupleParseError().

int32_t impala::HdfsScanner::num_null_bytes_
protectedinherited

Number of null bytes in the tuple.

Definition at line 189 of file hdfs-scanner.h.

Referenced by impala::HdfsScanner::InitTuple().

int impala::HdfsRCFileScanner::num_rows_
private

number of rows in this rowgroup object

Definition at line 382 of file hdfs-rcfile-scanner.h.

Referenced by NextRow(), ProcessRange(), ReadKeyBuffers(), ReadRowGroup(), and ResetRowGroup().

bool impala::BaseSequenceScanner::only_parsing_header_
protectedinherited
const char *const HdfsRCFileScanner::RCFILE_KEY_CLASS_NAME
staticprivate
Initial value:
=
"org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer"

The key class name located in the RCFile Header. This is always "org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer"

Definition at line 243 of file hdfs-rcfile-scanner.h.

Referenced by ReadFileHeader().

const char *const HdfsRCFileScanner::RCFILE_METADATA_KEY_NUM_COLS
staticprivate
Initial value:
=
"hive.io.rcfile.column.number"

RCFile metadata key for determining the number of columns present in the RCFile: "hive.io.rcfile.column.number"

Definition at line 251 of file hdfs-rcfile-scanner.h.

Referenced by ReadNumColumnsMetadata().

const char *const HdfsRCFileScanner::RCFILE_VALUE_CLASS_NAME
staticprivate
Initial value:
=
"org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer"

The value class name located in the RCFile Header. This is always "org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer"

Definition at line 247 of file hdfs-rcfile-scanner.h.

Referenced by ReadFileHeader().

const uint8_t HdfsRCFileScanner::RCFILE_VERSION_HEADER = {'R', 'C', 'F', 1}
staticprivate

The four byte RCFile unique version header present at the beginning of the file {'R', 'C', 'F' 1}

Definition at line 255 of file hdfs-rcfile-scanner.h.

Referenced by ReadFileHeader().

bool impala::HdfsRCFileScanner::reuse_row_group_buffer_
private

If true, the row_group_buffer_ can be reused across row groups, otherwise, it (more specifically the data_buffer_pool_ that allocated the row_group_buffer_) must be attached to the row batch.

Definition at line 399 of file hdfs-rcfile-scanner.h.

Referenced by InitNewRange(), ReadRowGroup(), and ResetRowGroup().

uint8_t* impala::HdfsRCFileScanner::row_group_buffer_
private

Buffer containing the entire row group. We allocate a buffer for the entire row group, skipping non-materialized columns.

Definition at line 403 of file hdfs-rcfile-scanner.h.

Referenced by ProcessRange(), ReadColumnBuffers(), and ReadRowGroup().

int impala::HdfsRCFileScanner::row_group_buffer_size_
private

This is the allocated size of 'row_group_buffer_'. 'row_group_buffer_' is reused across row groups and will grow as necessary.

Definition at line 411 of file hdfs-rcfile-scanner.h.

Referenced by InitNewRange(), ReadRowGroup(), and ResetRowGroup().

int impala::HdfsRCFileScanner::row_group_length_
private

Sum of the bytes lengths of the materialized columns in the current row group. This is the number of valid bytes in row_group_buffer_.

Definition at line 407 of file hdfs-rcfile-scanner.h.

Referenced by GetCurrentKeyBuffer(), ProcessRange(), ReadColumnBuffers(), ReadKeyBuffers(), and ReadRowGroup().

int impala::HdfsRCFileScanner::row_pos_
private

Current row position in this rowgroup. This value is incremented each time NextRow() is called.

Definition at line 386 of file hdfs-rcfile-scanner.h.

Referenced by NextRow(), ProcessRange(), and ResetRowGroup().

HdfsScanNode* impala::HdfsScanner::scan_node_
protectedinherited

The scan node that started this scanner.

Definition at line 141 of file hdfs-scanner.h.

Referenced by impala::HdfsScanner::AddFinalRowBatch(), impala::HdfsParquetScanner::AssembleRows(), impala::HdfsParquetScanner::BaseColumnReader::BaseColumnReader(), impala::HdfsTextScanner::Close(), impala::BaseSequenceScanner::Close(), impala::HdfsParquetScanner::Close(), impala::BaseSequenceScanner::CloseFileRanges(), impala::HdfsScanner::CommitRows(), impala::HdfsParquetScanner::CreateColumnReaders(), impala::HdfsParquetScanner::CreateReader(), DebugString(), impala::HdfsAvroScanner::DecodeAvroData(), impala::HdfsTextScanner::FillByteBufferCompressedFile(), impala::HdfsTextScanner::FinishScanRange(), impala::HdfsParquetScanner::InitColumns(), impala::HdfsScanner::InitializeWriteTuplesFn(), impala::HdfsTextScanner::InitNewRange(), impala::HdfsAvroScanner::InitNewRange(), impala::HdfsSequenceScanner::InitNewRange(), InitNewRange(), impala::HdfsAvroScanner::ParseMetadata(), impala::HdfsTextScanner::Prepare(), impala::BaseSequenceScanner::Prepare(), impala::HdfsParquetScanner::Prepare(), impala::HdfsScanner::Prepare(), impala::HdfsSequenceScanner::Prepare(), Prepare(), impala::HdfsSequenceScanner::ProcessBlockCompressedScanRange(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), impala::HdfsParquetScanner::ProcessFooter(), impala::HdfsTextScanner::ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), ProcessRange(), impala::BaseSequenceScanner::ProcessSplit(), impala::HdfsParquetScanner::BaseColumnReader::ReadDataPage(), ReadRowGroup(), impala::HdfsScanner::ReportColumnParseError(), impala::HdfsScanner::ReportTupleParseError(), impala::HdfsTextScanner::ResetScanner(), impala::HdfsAvroScanner::ResolveSchemas(), impala::HdfsScanner::StartNewRowBatch(), impala::HdfsScanner::UpdateDecompressor(), impala::HdfsAvroScanner::VerifyTypesMatch(), impala::HdfsScanner::WriteCompleteTuple(), impala::HdfsScanner::WriteEmptyTuples(), impala::HdfsTextScanner::WriteFields(), and impala::HdfsTextScanner::WritePartialTuple().

ScannerContext::Stream* impala::HdfsScanner::stream_
protectedinherited

The first stream for context_.

Definition at line 150 of file hdfs-scanner.h.

Referenced by impala::HdfsTextScanner::Close(), impala::BaseSequenceScanner::Close(), impala::HdfsParquetScanner::CreateColumnReaders(), DebugString(), impala::HdfsTextScanner::FillByteBuffer(), impala::HdfsTextScanner::FillByteBufferCompressedFile(), impala::HdfsTextScanner::FillByteBufferGzip(), impala::HdfsTextScanner::FindFirstTuple(), impala::HdfsTextScanner::FinishScanRange(), impala::HdfsSequenceScanner::GetRecord(), impala::HdfsTextScanner::InitNewRange(), InitNewRange(), NextField(), impala::HdfsAvroScanner::ParseMetadata(), impala::BaseSequenceScanner::Prepare(), impala::HdfsScanner::Prepare(), impala::HdfsSequenceScanner::ProcessBlockCompressedScanRange(), impala::HdfsParquetScanner::ProcessFooter(), impala::HdfsTextScanner::ProcessRange(), impala::HdfsAvroScanner::ProcessRange(), impala::HdfsSequenceScanner::ProcessRange(), ProcessRange(), impala::HdfsTextScanner::ProcessSplit(), impala::BaseSequenceScanner::ProcessSplit(), impala::HdfsParquetScanner::ProcessSplit(), impala::HdfsSequenceScanner::ReadBlockHeader(), ReadColumnBuffers(), impala::HdfsSequenceScanner::ReadCompressedBlock(), impala::HdfsAvroScanner::ReadFileHeader(), impala::HdfsSequenceScanner::ReadFileHeader(), ReadFileHeader(), ReadKeyBuffers(), ReadNumColumnsMetadata(), ReadRowGroupHeader(), impala::BaseSequenceScanner::ReadSync(), impala::HdfsScanner::ReportTupleParseError(), impala::BaseSequenceScanner::SkipToSync(), impala::HdfsParquetScanner::ValidateFileMetadata(), impala::HdfsAvroScanner::VerifyTypesMatch(), and impala::HdfsTextScanner::WriteFields().

const int impala::BaseSequenceScanner::SYNC_HASH_SIZE = 16
staticprotectedinherited
const int BaseSequenceScanner::SYNC_MARKER = -1
staticprotectedinherited

Sync indicator.

Definition at line 124 of file base-sequence-scanner.h.

Referenced by impala::HdfsSequenceScanner::ProcessRange(), and ProcessRange().

Tuple* impala::HdfsScanner::template_tuple_
protectedinherited

A partially materialized tuple with only partition key slots set. The non-partition key slots are set to NULL. The template tuple must be copied into tuple_ before any of the other slots are materialized. Pointer is NULL if there are no partition key slots. This template tuple is computed once for each file and valid for the duration of that file. It is owned by the HDFS scan node.

Definition at line 164 of file hdfs-scanner.h.

Referenced by impala::HdfsAvroScanner::AllocateFileHeader(), impala::HdfsParquetScanner::AssembleRows(), impala::HdfsParquetScanner::CreateColumnReaders(), impala::HdfsAvroScanner::DecodeAvroData(), impala::HdfsAvroScanner::InitNewRange(), impala::HdfsScanner::Prepare(), impala::HdfsSequenceScanner::ProcessRange(), ProcessRange(), impala::HdfsAvroScanner::ResolveSchemas(), impala::HdfsScanner::WriteAlignedTuples(), impala::HdfsScanner::WriteEmptyTuples(), and impala::HdfsTextScanner::WriteFields().

boost::scoped_ptr<TextConverter> impala::HdfsScanner::text_converter_
protectedinherited
int impala::HdfsScanner::tuple_byte_size_
protectedinherited
uint8_t* impala::HdfsScanner::tuple_mem_
protectedinherited

The tuple memory of batch_.

Definition at line 180 of file hdfs-scanner.h.

Referenced by impala::HdfsScanner::CommitRows(), impala::HdfsScanner::GetMemory(), and impala::HdfsScanner::StartNewRowBatch().

WriteTuplesFn impala::HdfsScanner::write_tuples_fn_
protectedinherited

Jitted write tuples function pointer. Null if codegen is disabled.

Definition at line 215 of file hdfs-scanner.h.

Referenced by impala::HdfsScanner::InitializeWriteTuplesFn(), impala::HdfsSequenceScanner::ProcessDecompressedBlock(), and impala::HdfsTextScanner::WriteFields().


The documentation for this class was generated from the following files: