Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::HdfsTextTableWriter Class Reference

#include <hdfs-text-table-writer.h>

Inheritance diagram for impala::HdfsTextTableWriter:
Collaboration diagram for impala::HdfsTextTableWriter:

Public Member Functions

 HdfsTextTableWriter (HdfsTableSink *parent, RuntimeState *state, OutputPartition *output, const HdfsPartitionDescriptor *partition, const HdfsTableDescriptor *table_desc, const std::vector< ExprContext * > &output_expr_ctxs)
 
 ~HdfsTextTableWriter ()
 
virtual Status Init ()
 Do initialization of writer. More...
 
virtual Status Finalize ()
 
virtual Status InitNewFile ()
 Called when a new file is started. More...
 
virtual void Close ()
 Called once when this writer should cleanup any resources. More...
 
virtual uint64_t default_block_size () const
 
virtual std::string file_extension () const
 Returns the file extension for this writer. More...
 
Status AppendRowBatch (RowBatch *current_row, const std::vector< int32_t > &row_group_indices, bool *new_file)
 
TInsertStats & stats ()
 Returns the stats for this writer. More...
 

Protected Member Functions

Status Write (const char *data, int32_t len)
 Write to the current hdfs file. More...
 
Status Write (const uint8_t *data, int32_t len)
 
template<typename T >
Status Write (T v)
 

Protected Attributes

HdfsTableSinkparent_
 Parent table sink object. More...
 
RuntimeStatestate_
 Runtime state. More...
 
OutputPartitionoutput_
 Structure describing partition written to by this writer. More...
 
const HdfsTableDescriptortable_desc_
 Table descriptor of table to be written. More...
 
std::vector< ExprContext * > output_expr_ctxs_
 Expressions that materialize output values. More...
 
TInsertStats stats_
 Subclass should populate any file format specific stats. More...
 

Static Protected Attributes

static const int HDFS_FLUSH_WRITE_SIZE = 50 * 1024
 

Private Member Functions

void PrintEscaped (const StringValue *str_val)
 
Status Flush ()
 

Private Attributes

char tuple_delim_
 Character delimiting tuples. More...
 
char field_delim_
 Character delimiting fields (to become slots). More...
 
char escape_char_
 Escape character. More...
 
int64_t flush_size_
 Size in rowbatch_stringstream_ before we call flush. More...
 
std::stringstream rowbatch_stringstream_
 
THdfsCompression::type codec_
 Compression codec. More...
 
boost::scoped_ptr< Codeccompressor_
 Compressor if compression is enabled. More...
 
boost::scoped_ptr< MemPoolmem_pool_
 Memory pool to use with compressor_. More...
 

Detailed Description

The writer consumes all rows passed to it and writes the evaluated output_exprs_ as delimited text into Hdfs files.

Definition at line 40 of file hdfs-text-table-writer.h.

Constructor & Destructor Documentation

impala::HdfsTextTableWriter::~HdfsTextTableWriter ( )
inline

Definition at line 48 of file hdfs-text-table-writer.h.

Member Function Documentation

void impala::HdfsTextTableWriter::Close ( )
virtual

Called once when this writer should cleanup any resources.

Implements impala::HdfsTableWriter.

Definition at line 82 of file hdfs-text-table-writer.cc.

References flush_size_, mem_pool_, impala::HdfsTableSink::mem_tracker(), impala::HdfsTableWriter::parent_, and impala::MemTracker::Release().

uint64_t impala::HdfsTextTableWriter::default_block_size ( ) const
virtual

Default block size to use for this file format. If the file format doesn't care, it should return 0 and the hdfs config default will be used.

Implements impala::HdfsTableWriter.

Definition at line 87 of file hdfs-text-table-writer.cc.

References COMPRESSED_BLOCK_SIZE, and compressor_.

string impala::HdfsTextTableWriter::file_extension ( ) const
virtual

Returns the file extension for this writer.

Implements impala::HdfsTableWriter.

Definition at line 91 of file hdfs-text-table-writer.cc.

References compressor_.

Status impala::HdfsTextTableWriter::Finalize ( )
virtual

Finalize this partition. The writer needs to finish processing all data have written out after the return from this call. This is called once for each call to InitNewFile()

Implements impala::HdfsTableWriter.

Definition at line 162 of file hdfs-text-table-writer.cc.

References Flush().

Status impala::HdfsTextTableWriter::Flush ( )
private
Status impala::HdfsTextTableWriter::Init ( )
virtual

Do initialization of writer.

The sequence of calls to this object are:

  1. Init()
  2. InitNewFile()
  3. AppendRowBatch() - called repeatedly
  4. Finalize() For files formats that are splittable (and therefore can be written to an arbitrarily large file), 1-4 is called once. For files formats that are not splittable (i.e. columnar formats, compressed text), 1) is called once and 2-4) is called repeatedly for each file.

Implements impala::HdfsTableWriter.

Definition at line 59 of file hdfs-text-table-writer.cc.

References codec_, COMPRESSED_BUFFERED_SIZE, compressor_, impala::MemTracker::Consume(), impala::Codec::CreateCompressor(), flush_size_, impala::HdfsTableWriter::HDFS_FLUSH_WRITE_SIZE, mem_pool_, impala::HdfsTableSink::mem_tracker(), impala::Status::OK, impala::HdfsTableWriter::parent_, impala::RuntimeState::query_options(), RETURN_IF_ERROR, and impala::HdfsTableWriter::state_.

virtual Status impala::HdfsTextTableWriter::InitNewFile ( )
inlinevirtual

Called when a new file is started.

Implements impala::HdfsTableWriter.

Definition at line 52 of file hdfs-text-table-writer.h.

References impala::Status::OK.

void impala::HdfsTextTableWriter::PrintEscaped ( const StringValue str_val)
inlineprivate

Escapes occurrences of field_delim_ and escape_char_ with escape_char_ and writes the escaped result into rowbatch_stringstream_. Neither Hive nor Impala support escaping tuple_delim_.

Definition at line 194 of file hdfs-text-table-writer.cc.

References escape_char_, field_delim_, impala::StringValue::len, impala::StringValue::ptr, rowbatch_stringstream_, and UNLIKELY.

Referenced by AppendRowBatch().

TInsertStats& impala::HdfsTableWriter::stats ( )
inlineinherited

Returns the stats for this writer.

Definition at line 86 of file hdfs-table-writer.h.

References impala::HdfsTableWriter::stats_.

template<typename T >
Status impala::HdfsTableWriter::Write ( v)
inlineprotectedinherited

Definition at line 107 of file hdfs-table-writer.h.

References impala::HdfsTableWriter::Write().

Member Data Documentation

THdfsCompression::type impala::HdfsTextTableWriter::codec_
private

Compression codec.

Definition at line 90 of file hdfs-text-table-writer.h.

Referenced by Init().

boost::scoped_ptr<Codec> impala::HdfsTextTableWriter::compressor_
private

Compressor if compression is enabled.

Definition at line 93 of file hdfs-text-table-writer.h.

Referenced by AppendRowBatch(), default_block_size(), file_extension(), Flush(), and Init().

char impala::HdfsTextTableWriter::escape_char_
private

Escape character.

Definition at line 80 of file hdfs-text-table-writer.h.

Referenced by HdfsTextTableWriter(), and PrintEscaped().

char impala::HdfsTextTableWriter::field_delim_
private

Character delimiting fields (to become slots).

Definition at line 77 of file hdfs-text-table-writer.h.

Referenced by AppendRowBatch(), HdfsTextTableWriter(), and PrintEscaped().

int64_t impala::HdfsTextTableWriter::flush_size_
private

Size in rowbatch_stringstream_ before we call flush.

Definition at line 83 of file hdfs-text-table-writer.h.

Referenced by AppendRowBatch(), Close(), HdfsTextTableWriter(), and Init().

const int impala::HdfsTableWriter::HDFS_FLUSH_WRITE_SIZE = 50 * 1024
staticprotectedinherited

Size to buffer output before calling Write() (which calls hdfsWrite), in bytes to minimize the overhead of Write()

Definition at line 98 of file hdfs-table-writer.h.

Referenced by HdfsTextTableWriter(), and Init().

boost::scoped_ptr<MemPool> impala::HdfsTextTableWriter::mem_pool_
private

Memory pool to use with compressor_.

Definition at line 96 of file hdfs-text-table-writer.h.

Referenced by Close(), and Init().

OutputPartition* impala::HdfsTableWriter::output_
protectedinherited

Structure describing partition written to by this writer.

Definition at line 118 of file hdfs-table-writer.h.

Referenced by AppendRowBatch(), impala::HdfsParquetTableWriter::AppendRowBatch(), impala::HdfsParquetTableWriter::InitNewFile(), and impala::HdfsTableWriter::Write().

std::stringstream impala::HdfsTextTableWriter::rowbatch_stringstream_
private

Stringstream to buffer output. The stream is cleared between HDFS Write calls to allow for the internal buffers to be reused.

Definition at line 87 of file hdfs-text-table-writer.h.

Referenced by AppendRowBatch(), Flush(), HdfsTextTableWriter(), and PrintEscaped().

TInsertStats impala::HdfsTableWriter::stats_
protectedinherited

Subclass should populate any file format specific stats.

Definition at line 127 of file hdfs-table-writer.h.

Referenced by impala::HdfsParquetTableWriter::Finalize(), impala::HdfsTableWriter::stats(), and impala::HdfsTableWriter::Write().

char impala::HdfsTextTableWriter::tuple_delim_
private

Character delimiting tuples.

Definition at line 74 of file hdfs-text-table-writer.h.

Referenced by AppendRowBatch(), and HdfsTextTableWriter().


The documentation for this class was generated from the following files: