Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <hdfs-sequence-table-writer.h>
Public Member Functions | |
HdfsSequenceTableWriter (HdfsTableSink *parent, RuntimeState *state, OutputPartition *output, const HdfsPartitionDescriptor *partition, const HdfsTableDescriptor *table_desc, const std::vector< ExprContext * > &output_exprs) | |
~HdfsSequenceTableWriter () | |
virtual Status | Init () |
Do initialization of writer. More... | |
virtual Status | Finalize () |
virtual Status | InitNewFile () |
Called when a new file is started. More... | |
virtual void | Close () |
Called once when this writer should cleanup any resources. More... | |
virtual uint64_t | default_block_size () const |
virtual std::string | file_extension () const |
Returns the file extension for this writer. More... | |
virtual Status | AppendRowBatch (RowBatch *rows, const std::vector< int32_t > &row_group_indices, bool *new_file) |
TInsertStats & | stats () |
Returns the stats for this writer. More... | |
Protected Member Functions | |
Status | Write (const char *data, int32_t len) |
Write to the current hdfs file. More... | |
Status | Write (const uint8_t *data, int32_t len) |
template<typename T > | |
Status | Write (T v) |
Protected Attributes | |
HdfsTableSink * | parent_ |
Parent table sink object. More... | |
RuntimeState * | state_ |
Runtime state. More... | |
OutputPartition * | output_ |
Structure describing partition written to by this writer. More... | |
const HdfsTableDescriptor * | table_desc_ |
Table descriptor of table to be written. More... | |
std::vector< ExprContext * > | output_expr_ctxs_ |
Expressions that materialize output values. More... | |
TInsertStats | stats_ |
Subclass should populate any file format specific stats. More... | |
Static Protected Attributes | |
static const int | HDFS_FLUSH_WRITE_SIZE = 50 * 1024 |
Private Member Functions | |
Status | ConsumeRow (TupleRow *row) |
processes a single row, delegates to Compress or NoCompress ConsumeRow(). More... | |
Status | WriteFileHeader () |
writes the SEQ file header to HDFS More... | |
Status | WriteCompressedBlock () |
writes the contents of out_ as a single compressed block More... | |
void | EncodeRow (TupleRow *row, WriteStream *buf) |
void | WriteEscapedString (const StringValue *str_val, WriteStream *buf) |
writes the str_val to the buffer, escaping special characters More... | |
Status | Flush () |
Private Attributes | |
uint64_t | approx_block_size_ |
WriteStream | out_ |
buffer which holds accumulated output More... | |
WriteStream | row_buf_ |
Temporary Buffer for a single row. More... | |
MemPool * | mem_pool_ |
memory pool used by codec to allocate output buffer More... | |
bool | compress_flag_ |
true if compression is enabled More... | |
uint64_t | unflushed_rows_ |
number of rows consumed since last flush More... | |
std::string | codec_name_ |
name of codec, only set if compress_flag_ More... | |
boost::scoped_ptr< Codec > | compressor_ |
the codec for compressing, only set if compress_flag_ More... | |
bool | record_compression_ |
true if compression is applied on each record individually More... | |
char | field_delim_ |
Character delimiting fields. More... | |
char | escape_char_ |
Escape character for text encoding. More... | |
std::string | sync_marker_ |
16 byte sync marker (a uuid) More... | |
std::string | neg1_sync_marker_ |
A -1 infront of the sync marker, used in decompressed formats. More... | |
Static Private Attributes | |
static const char * | VALUE_CLASS_NAME = "org.apache.hadoop.io.Text" |
Name of java class to use when reading the values. More... | |
static uint8_t | SEQ6_CODE [4] = {'S', 'E', 'Q', 6} |
Magic characters used to identify the file type. More... | |
Consumes rows and outputs the rows into a sequence file in HDFS Output is buffered to fill sequence file blocks.
Definition at line 38 of file hdfs-sequence-table-writer.h.
impala::HdfsSequenceTableWriter::HdfsSequenceTableWriter | ( | HdfsTableSink * | parent, |
RuntimeState * | state, | ||
OutputPartition * | output, | ||
const HdfsPartitionDescriptor * | partition, | ||
const HdfsTableDescriptor * | table_desc, | ||
const std::vector< ExprContext * > & | output_exprs | ||
) |
Definition at line 40 of file hdfs-sequence-table-writer.cc.
References approx_block_size_, impala::MemTracker::Consume(), impala::HdfsPartitionDescriptor::escape_char(), escape_char_, impala::HdfsPartitionDescriptor::field_delim(), field_delim_, and impala::HdfsTableSink::mem_tracker().
|
inline |
Definition at line 46 of file hdfs-sequence-table-writer.h.
|
virtual |
Outputs the given rows into an HDFS sequence file. The rows are buffered to fill a sequence file block.
Implements impala::HdfsTableWriter.
Definition at line 90 of file hdfs-sequence-table-writer.cc.
References approx_block_size_, compress_flag_, ConsumeRow(), COUNTER_ADD, impala::HdfsTableSink::DebugString(), impala::HdfsTableSink::encode_timer(), Flush(), impala::RowBatch::GetRow(), neg1_sync_marker_, impala::TableDescriptor::num_clustering_cols(), impala::TableDescriptor::num_cols(), impala::RowBatch::num_rows(), impala::Status::OK, out_, impala::HdfsTableWriter::output_expr_ctxs_, impala::HdfsTableWriter::parent_, RETURN_IF_ERROR, impala::HdfsTableSink::rows_inserted_counter(), SCOPED_TIMER, impala::WriteStream::Size(), impala::HdfsTableWriter::table_desc_, and impala::WriteStream::WriteBytes().
|
inlinevirtual |
Called once when this writer should cleanup any resources.
Implements impala::HdfsTableWriter.
Definition at line 51 of file hdfs-sequence-table-writer.h.
processes a single row, delegates to Compress or NoCompress ConsumeRow().
Definition at line 233 of file hdfs-sequence-table-writer.cc.
References impala::WriteStream::Clear(), compress_flag_, impala::HdfsTableSink::compress_timer(), compressor_, EncodeRow(), impala::Status::OK, out_, impala::HdfsTableWriter::parent_, record_compression_, RETURN_IF_ERROR, row_buf_, SCOPED_TIMER, impala::WriteStream::Size(), impala::WriteStream::String(), unflushed_rows_, impala::ReadWriteUtil::VLongRequiredBytes(), impala::WriteStream::WriteBytes(), impala::WriteStream::WriteInt(), impala::WriteStream::WriteText(), and impala::WriteStream::WriteVLong().
Referenced by AppendRowBatch().
|
inlinevirtual |
Default block size to use for this file format. If the file format doesn't care, it should return 0 and the hdfs config default will be used.
Implements impala::HdfsTableWriter.
Definition at line 52 of file hdfs-sequence-table-writer.h.
|
inlineprivate |
writes the tuple row to the given buffer; separates fields by field_delim_, escapes string.
Definition at line 206 of file hdfs-sequence-table-writer.cc.
References impala::HdfsTableSink::DebugString(), field_delim_, impala::HdfsTableDescriptor::null_column_value(), impala::TableDescriptor::num_clustering_cols(), impala::TableDescriptor::num_cols(), impala::HdfsTableWriter::output_expr_ctxs_, impala::HdfsTableWriter::parent_, row_buf_, impala::HdfsTableWriter::table_desc_, impala::TYPE_STRING, impala::WriteStream::WriteByte(), impala::WriteStream::WriteBytes(), and WriteEscapedString().
Referenced by ConsumeRow().
|
inlinevirtual |
Returns the file extension for this writer.
Implements impala::HdfsTableWriter.
Definition at line 53 of file hdfs-sequence-table-writer.h.
|
inlinevirtual |
Finalize this partition. The writer needs to finish processing all data have written out after the return from this call. This is called once for each call to InitNewFile()
Implements impala::HdfsTableWriter.
Definition at line 49 of file hdfs-sequence-table-writer.h.
References Flush().
|
private |
flushes the output – clearing out_ and writing to HDFS if compress_flag_, will write contents of out_ as a single compressed block
Definition at line 291 of file hdfs-sequence-table-writer.cc.
References impala::WriteStream::Clear(), compress_flag_, impala::HdfsTableSink::hdfs_write_timer(), impala::Status::OK, out_, impala::HdfsTableWriter::parent_, record_compression_, RETURN_IF_ERROR, SCOPED_TIMER, impala::WriteStream::String(), unflushed_rows_, impala::HdfsTableWriter::Write(), and WriteCompressedBlock().
Referenced by AppendRowBatch(), and Finalize().
|
virtual |
Do initialization of writer.
The sequence of calls to this object are:
Implements impala::HdfsTableWriter.
Definition at line 54 of file hdfs-sequence-table-writer.cc.
References codec_name_, compress_flag_, compressor_, impala::Codec::CreateCompressor(), impala::GenerateUUIDString(), impala::Codec::GetHadoopCodecClassName(), mem_pool_, neg1_sync_marker_, impala::Status::OK, impala::ReadWriteUtil::PutInt(), impala::RuntimeState::query_options(), record_compression_, RETURN_IF_ERROR, impala::HdfsTableWriter::state_, and sync_marker_.
|
inlinevirtual |
Called when a new file is started.
Implements impala::HdfsTableWriter.
Definition at line 50 of file hdfs-sequence-table-writer.h.
References WriteFileHeader().
|
inlineinherited |
Returns the stats for this writer.
Definition at line 86 of file hdfs-table-writer.h.
References impala::HdfsTableWriter::stats_.
|
inlineprotectedinherited |
Write to the current hdfs file.
Definition at line 101 of file hdfs-table-writer.h.
Referenced by impala::HdfsTextTableWriter::Flush(), Flush(), impala::HdfsAvroTableWriter::Flush(), impala::HdfsParquetTableWriter::FlushCurrentRowGroup(), impala::HdfsTableWriter::Write(), WriteCompressedBlock(), impala::HdfsParquetTableWriter::WriteFileFooter(), WriteFileHeader(), impala::HdfsAvroTableWriter::WriteFileHeader(), and impala::HdfsParquetTableWriter::WriteFileHeader().
|
protectedinherited |
Definition at line 36 of file hdfs-table-writer.cc.
References impala::HdfsTableSink::bytes_written_counter(), COUNTER_ADD, impala::OutputPartition::current_file_name, impala::GetHdfsErrorMsg(), impala::OutputPartition::hdfs_connection, impala::Status::OK, impala::HdfsTableWriter::output_, impala::HdfsTableWriter::parent_, impala::HdfsTableWriter::stats_, and impala::OutputPartition::tmp_hdfs_file.
|
inlineprotectedinherited |
Definition at line 107 of file hdfs-table-writer.h.
References impala::HdfsTableWriter::Write().
|
private |
writes the contents of out_ as a single compressed block
Definition at line 163 of file hdfs-sequence-table-writer.cc.
References compress_flag_, impala::HdfsTableSink::compress_timer(), compressor_, impala::Status::OK, out_, impala::HdfsTableWriter::parent_, RETURN_IF_ERROR, SCOPED_TIMER, impala::WriteStream::String(), sync_marker_, unflushed_rows_, impala::HdfsTableWriter::Write(), impala::WriteStream::WriteBytes(), impala::WriteStream::WriteEmptyText(), impala::WriteStream::WriteVInt(), and impala::WriteStream::WriteVLong().
Referenced by Flush().
|
inlineprivate |
writes the str_val to the buffer, escaping special characters
Definition at line 196 of file hdfs-sequence-table-writer.cc.
References escape_char_, field_delim_, impala::StringValue::len, impala::StringValue::ptr, and impala::WriteStream::WriteByte().
Referenced by EncodeRow().
|
private |
writes the SEQ file header to HDFS
Definition at line 129 of file hdfs-sequence-table-writer.cc.
References impala::WriteStream::Clear(), codec_name_, compress_flag_, impala::Status::OK, out_, record_compression_, RETURN_IF_ERROR, SEQ6_CODE, impala::WriteStream::String(), sync_marker_, VALUE_CLASS_NAME, impala::HdfsTableWriter::Write(), impala::WriteStream::WriteBoolean(), impala::WriteStream::WriteBytes(), impala::WriteStream::WriteEmptyText(), impala::WriteStream::WriteInt(), and impala::WriteStream::WriteText().
Referenced by InitNewFile().
|
private |
desired size of each block (bytes); actual block size will vary +/- the size of a row; this is before compression is applied.
Definition at line 84 of file hdfs-sequence-table-writer.h.
Referenced by AppendRowBatch(), and HdfsSequenceTableWriter().
|
private |
name of codec, only set if compress_flag_
Definition at line 102 of file hdfs-sequence-table-writer.h.
Referenced by Init(), and WriteFileHeader().
|
private |
true if compression is enabled
Definition at line 96 of file hdfs-sequence-table-writer.h.
Referenced by AppendRowBatch(), ConsumeRow(), Flush(), Init(), WriteCompressedBlock(), and WriteFileHeader().
|
private |
the codec for compressing, only set if compress_flag_
Definition at line 104 of file hdfs-sequence-table-writer.h.
Referenced by ConsumeRow(), Init(), and WriteCompressedBlock().
|
private |
Escape character for text encoding.
Definition at line 113 of file hdfs-sequence-table-writer.h.
Referenced by HdfsSequenceTableWriter(), and WriteEscapedString().
|
private |
Character delimiting fields.
Definition at line 110 of file hdfs-sequence-table-writer.h.
Referenced by EncodeRow(), HdfsSequenceTableWriter(), and WriteEscapedString().
|
staticprotectedinherited |
Size to buffer output before calling Write() (which calls hdfsWrite), in bytes to minimize the overhead of Write()
Definition at line 98 of file hdfs-table-writer.h.
Referenced by impala::HdfsTextTableWriter::HdfsTextTableWriter(), and impala::HdfsTextTableWriter::Init().
|
private |
memory pool used by codec to allocate output buffer
Definition at line 93 of file hdfs-sequence-table-writer.h.
Referenced by Init().
|
private |
A -1 infront of the sync marker, used in decompressed formats.
Definition at line 118 of file hdfs-sequence-table-writer.h.
Referenced by AppendRowBatch(), and Init().
|
private |
buffer which holds accumulated output
Definition at line 87 of file hdfs-sequence-table-writer.h.
Referenced by AppendRowBatch(), ConsumeRow(), Flush(), WriteCompressedBlock(), and WriteFileHeader().
|
protectedinherited |
Structure describing partition written to by this writer.
Definition at line 118 of file hdfs-table-writer.h.
Referenced by impala::HdfsTextTableWriter::AppendRowBatch(), impala::HdfsParquetTableWriter::AppendRowBatch(), impala::HdfsParquetTableWriter::InitNewFile(), and impala::HdfsTableWriter::Write().
|
protectedinherited |
Expressions that materialize output values.
Definition at line 124 of file hdfs-table-writer.h.
Referenced by AppendRowBatch(), impala::HdfsTextTableWriter::AppendRowBatch(), impala::HdfsAvroTableWriter::ConsumeRow(), impala::HdfsParquetTableWriter::CreateSchema(), EncodeRow(), impala::HdfsTableWriter::HdfsTableWriter(), and impala::HdfsParquetTableWriter::Init().
|
protectedinherited |
Parent table sink object.
Definition at line 112 of file hdfs-table-writer.h.
Referenced by AppendRowBatch(), impala::HdfsTextTableWriter::AppendRowBatch(), impala::HdfsParquetTableWriter::AppendRowBatch(), impala::HdfsAvroTableWriter::AppendRowBatch(), impala::HdfsTextTableWriter::Close(), ConsumeRow(), EncodeRow(), impala::HdfsParquetTableWriter::Finalize(), impala::HdfsTextTableWriter::Flush(), Flush(), impala::HdfsAvroTableWriter::Flush(), impala::HdfsTableWriter::HdfsTableWriter(), impala::HdfsTextTableWriter::Init(), impala::HdfsTableWriter::Write(), and WriteCompressedBlock().
|
private |
true if compression is applied on each record individually
Definition at line 107 of file hdfs-sequence-table-writer.h.
Referenced by ConsumeRow(), Flush(), Init(), and WriteFileHeader().
|
private |
Temporary Buffer for a single row.
Definition at line 90 of file hdfs-sequence-table-writer.h.
Referenced by ConsumeRow(), and EncodeRow().
|
staticprivate |
Magic characters used to identify the file type.
Definition at line 123 of file hdfs-sequence-table-writer.h.
Referenced by WriteFileHeader().
|
protectedinherited |
Runtime state.
Definition at line 115 of file hdfs-table-writer.h.
Referenced by impala::HdfsParquetTableWriter::default_block_size(), Init(), impala::HdfsTextTableWriter::Init(), impala::HdfsParquetTableWriter::Init(), and impala::HdfsAvroTableWriter::Init().
|
protectedinherited |
Subclass should populate any file format specific stats.
Definition at line 127 of file hdfs-table-writer.h.
Referenced by impala::HdfsParquetTableWriter::Finalize(), impala::HdfsTableWriter::stats(), and impala::HdfsTableWriter::Write().
|
private |
16 byte sync marker (a uuid)
Definition at line 116 of file hdfs-sequence-table-writer.h.
Referenced by Init(), WriteCompressedBlock(), and WriteFileHeader().
|
protectedinherited |
Table descriptor of table to be written.
Definition at line 121 of file hdfs-table-writer.h.
Referenced by impala::HdfsParquetTableWriter::AddRowGroup(), AppendRowBatch(), impala::HdfsTextTableWriter::AppendRowBatch(), impala::HdfsAvroTableWriter::ConsumeRow(), impala::HdfsParquetTableWriter::CreateSchema(), EncodeRow(), impala::HdfsParquetTableWriter::FlushCurrentRowGroup(), impala::HdfsTableWriter::HdfsTableWriter(), impala::HdfsParquetTableWriter::Init(), and impala::HdfsAvroTableWriter::WriteFileHeader().
|
private |
number of rows consumed since last flush
Definition at line 99 of file hdfs-sequence-table-writer.h.
Referenced by ConsumeRow(), Flush(), and WriteCompressedBlock().
|
staticprivate |
Name of java class to use when reading the values.
Definition at line 121 of file hdfs-sequence-table-writer.h.
Referenced by WriteFileHeader().