Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <hdfs-text-table-writer.h>
Public Member Functions | |
HdfsTextTableWriter (HdfsTableSink *parent, RuntimeState *state, OutputPartition *output, const HdfsPartitionDescriptor *partition, const HdfsTableDescriptor *table_desc, const std::vector< ExprContext * > &output_expr_ctxs) | |
~HdfsTextTableWriter () | |
virtual Status | Init () |
Do initialization of writer. More... | |
virtual Status | Finalize () |
virtual Status | InitNewFile () |
Called when a new file is started. More... | |
virtual void | Close () |
Called once when this writer should cleanup any resources. More... | |
virtual uint64_t | default_block_size () const |
virtual std::string | file_extension () const |
Returns the file extension for this writer. More... | |
Status | AppendRowBatch (RowBatch *current_row, const std::vector< int32_t > &row_group_indices, bool *new_file) |
TInsertStats & | stats () |
Returns the stats for this writer. More... | |
Protected Member Functions | |
Status | Write (const char *data, int32_t len) |
Write to the current hdfs file. More... | |
Status | Write (const uint8_t *data, int32_t len) |
template<typename T > | |
Status | Write (T v) |
Protected Attributes | |
HdfsTableSink * | parent_ |
Parent table sink object. More... | |
RuntimeState * | state_ |
Runtime state. More... | |
OutputPartition * | output_ |
Structure describing partition written to by this writer. More... | |
const HdfsTableDescriptor * | table_desc_ |
Table descriptor of table to be written. More... | |
std::vector< ExprContext * > | output_expr_ctxs_ |
Expressions that materialize output values. More... | |
TInsertStats | stats_ |
Subclass should populate any file format specific stats. More... | |
Static Protected Attributes | |
static const int | HDFS_FLUSH_WRITE_SIZE = 50 * 1024 |
Private Member Functions | |
void | PrintEscaped (const StringValue *str_val) |
Status | Flush () |
Private Attributes | |
char | tuple_delim_ |
Character delimiting tuples. More... | |
char | field_delim_ |
Character delimiting fields (to become slots). More... | |
char | escape_char_ |
Escape character. More... | |
int64_t | flush_size_ |
Size in rowbatch_stringstream_ before we call flush. More... | |
std::stringstream | rowbatch_stringstream_ |
THdfsCompression::type | codec_ |
Compression codec. More... | |
boost::scoped_ptr< Codec > | compressor_ |
Compressor if compression is enabled. More... | |
boost::scoped_ptr< MemPool > | mem_pool_ |
Memory pool to use with compressor_. More... | |
The writer consumes all rows passed to it and writes the evaluated output_exprs_ as delimited text into Hdfs files.
Definition at line 40 of file hdfs-text-table-writer.h.
impala::HdfsTextTableWriter::HdfsTextTableWriter | ( | HdfsTableSink * | parent, |
RuntimeState * | state, | ||
OutputPartition * | output, | ||
const HdfsPartitionDescriptor * | partition, | ||
const HdfsTableDescriptor * | table_desc, | ||
const std::vector< ExprContext * > & | output_expr_ctxs | ||
) |
Definition at line 41 of file hdfs-text-table-writer.cc.
References impala::RawValue::ASCII_PRECISION, impala::HdfsPartitionDescriptor::escape_char(), escape_char_, impala::HdfsPartitionDescriptor::field_delim(), field_delim_, flush_size_, impala::HdfsTableWriter::HDFS_FLUSH_WRITE_SIZE, impala::HdfsPartitionDescriptor::line_delim(), rowbatch_stringstream_, and tuple_delim_.
|
inline |
Definition at line 48 of file hdfs-text-table-writer.h.
|
virtual |
Appends delimited string representation of the rows in the batch to output partition. The resulting output is buffered until HDFS_FLUSH_WRITE_SIZE before being written to HDFS.
Implements impala::HdfsTableWriter.
Definition at line 96 of file hdfs-text-table-writer.cc.
References impala::StringValue::CharSlotToPtr(), compressor_, COUNTER_ADD, impala::HdfsTableSink::DebugString(), impala::HdfsTableSink::encode_timer(), field_delim_, Flush(), flush_size_, impala::RowBatch::GetRow(), impala::ColumnType::IsVarLen(), impala::ColumnType::len, impala::HdfsTableDescriptor::null_column_value(), impala::TableDescriptor::num_clustering_cols(), impala::TableDescriptor::num_cols(), impala::OutputPartition::num_rows, impala::RowBatch::num_rows(), impala::Status::OK, impala::HdfsTableWriter::output_, impala::HdfsTableWriter::output_expr_ctxs_, impala::HdfsTableWriter::parent_, PrintEscaped(), RETURN_IF_ERROR, rowbatch_stringstream_, impala::HdfsTableSink::rows_inserted_counter(), SCOPED_TIMER, impala::HdfsTableWriter::table_desc_, tuple_delim_, impala::ColumnType::type, impala::TYPE_CHAR, and impala::StringValue::UnpaddedCharLength().
|
virtual |
Called once when this writer should cleanup any resources.
Implements impala::HdfsTableWriter.
Definition at line 82 of file hdfs-text-table-writer.cc.
References flush_size_, mem_pool_, impala::HdfsTableSink::mem_tracker(), impala::HdfsTableWriter::parent_, and impala::MemTracker::Release().
|
virtual |
Default block size to use for this file format. If the file format doesn't care, it should return 0 and the hdfs config default will be used.
Implements impala::HdfsTableWriter.
Definition at line 87 of file hdfs-text-table-writer.cc.
References COMPRESSED_BLOCK_SIZE, and compressor_.
|
virtual |
Returns the file extension for this writer.
Implements impala::HdfsTableWriter.
Definition at line 91 of file hdfs-text-table-writer.cc.
References compressor_.
|
virtual |
Finalize this partition. The writer needs to finish processing all data have written out after the return from this call. This is called once for each call to InitNewFile()
Implements impala::HdfsTableWriter.
Definition at line 162 of file hdfs-text-table-writer.cc.
References Flush().
|
private |
Writes the buffered data in rowbatch_stringstream_ to HDFS, applying compression if necessary.
Definition at line 166 of file hdfs-text-table-writer.cc.
References impala::HdfsTableSink::compress_timer(), compressor_, impala::HdfsTableSink::hdfs_write_timer(), impala::Status::OK, impala::HdfsTableWriter::parent_, RETURN_IF_ERROR, rowbatch_stringstream_, SCOPED_TIMER, and impala::HdfsTableWriter::Write().
Referenced by AppendRowBatch(), and Finalize().
|
virtual |
Do initialization of writer.
The sequence of calls to this object are:
Implements impala::HdfsTableWriter.
Definition at line 59 of file hdfs-text-table-writer.cc.
References codec_, COMPRESSED_BUFFERED_SIZE, compressor_, impala::MemTracker::Consume(), impala::Codec::CreateCompressor(), flush_size_, impala::HdfsTableWriter::HDFS_FLUSH_WRITE_SIZE, mem_pool_, impala::HdfsTableSink::mem_tracker(), impala::Status::OK, impala::HdfsTableWriter::parent_, impala::RuntimeState::query_options(), RETURN_IF_ERROR, and impala::HdfsTableWriter::state_.
|
inlinevirtual |
Called when a new file is started.
Implements impala::HdfsTableWriter.
Definition at line 52 of file hdfs-text-table-writer.h.
References impala::Status::OK.
|
inlineprivate |
Escapes occurrences of field_delim_ and escape_char_ with escape_char_ and writes the escaped result into rowbatch_stringstream_. Neither Hive nor Impala support escaping tuple_delim_.
Definition at line 194 of file hdfs-text-table-writer.cc.
References escape_char_, field_delim_, impala::StringValue::len, impala::StringValue::ptr, rowbatch_stringstream_, and UNLIKELY.
Referenced by AppendRowBatch().
|
inlineinherited |
Returns the stats for this writer.
Definition at line 86 of file hdfs-table-writer.h.
References impala::HdfsTableWriter::stats_.
|
inlineprotectedinherited |
Write to the current hdfs file.
Definition at line 101 of file hdfs-table-writer.h.
Referenced by Flush(), impala::HdfsSequenceTableWriter::Flush(), impala::HdfsAvroTableWriter::Flush(), impala::HdfsParquetTableWriter::FlushCurrentRowGroup(), impala::HdfsTableWriter::Write(), impala::HdfsSequenceTableWriter::WriteCompressedBlock(), impala::HdfsParquetTableWriter::WriteFileFooter(), impala::HdfsSequenceTableWriter::WriteFileHeader(), impala::HdfsAvroTableWriter::WriteFileHeader(), and impala::HdfsParquetTableWriter::WriteFileHeader().
|
protectedinherited |
Definition at line 36 of file hdfs-table-writer.cc.
References impala::HdfsTableSink::bytes_written_counter(), COUNTER_ADD, impala::OutputPartition::current_file_name, impala::GetHdfsErrorMsg(), impala::OutputPartition::hdfs_connection, impala::Status::OK, impala::HdfsTableWriter::output_, impala::HdfsTableWriter::parent_, impala::HdfsTableWriter::stats_, and impala::OutputPartition::tmp_hdfs_file.
|
inlineprotectedinherited |
Definition at line 107 of file hdfs-table-writer.h.
References impala::HdfsTableWriter::Write().
|
private |
|
private |
Compressor if compression is enabled.
Definition at line 93 of file hdfs-text-table-writer.h.
Referenced by AppendRowBatch(), default_block_size(), file_extension(), Flush(), and Init().
|
private |
Escape character.
Definition at line 80 of file hdfs-text-table-writer.h.
Referenced by HdfsTextTableWriter(), and PrintEscaped().
|
private |
Character delimiting fields (to become slots).
Definition at line 77 of file hdfs-text-table-writer.h.
Referenced by AppendRowBatch(), HdfsTextTableWriter(), and PrintEscaped().
|
private |
Size in rowbatch_stringstream_ before we call flush.
Definition at line 83 of file hdfs-text-table-writer.h.
Referenced by AppendRowBatch(), Close(), HdfsTextTableWriter(), and Init().
|
staticprotectedinherited |
Size to buffer output before calling Write() (which calls hdfsWrite), in bytes to minimize the overhead of Write()
Definition at line 98 of file hdfs-table-writer.h.
Referenced by HdfsTextTableWriter(), and Init().
|
private |
Memory pool to use with compressor_.
Definition at line 96 of file hdfs-text-table-writer.h.
|
protectedinherited |
Structure describing partition written to by this writer.
Definition at line 118 of file hdfs-table-writer.h.
Referenced by AppendRowBatch(), impala::HdfsParquetTableWriter::AppendRowBatch(), impala::HdfsParquetTableWriter::InitNewFile(), and impala::HdfsTableWriter::Write().
|
protectedinherited |
Expressions that materialize output values.
Definition at line 124 of file hdfs-table-writer.h.
Referenced by impala::HdfsSequenceTableWriter::AppendRowBatch(), AppendRowBatch(), impala::HdfsAvroTableWriter::ConsumeRow(), impala::HdfsParquetTableWriter::CreateSchema(), impala::HdfsSequenceTableWriter::EncodeRow(), impala::HdfsTableWriter::HdfsTableWriter(), and impala::HdfsParquetTableWriter::Init().
|
protectedinherited |
Parent table sink object.
Definition at line 112 of file hdfs-table-writer.h.
Referenced by impala::HdfsSequenceTableWriter::AppendRowBatch(), AppendRowBatch(), impala::HdfsParquetTableWriter::AppendRowBatch(), impala::HdfsAvroTableWriter::AppendRowBatch(), Close(), impala::HdfsSequenceTableWriter::ConsumeRow(), impala::HdfsSequenceTableWriter::EncodeRow(), impala::HdfsParquetTableWriter::Finalize(), Flush(), impala::HdfsSequenceTableWriter::Flush(), impala::HdfsAvroTableWriter::Flush(), impala::HdfsTableWriter::HdfsTableWriter(), Init(), impala::HdfsTableWriter::Write(), and impala::HdfsSequenceTableWriter::WriteCompressedBlock().
|
private |
Stringstream to buffer output. The stream is cleared between HDFS Write calls to allow for the internal buffers to be reused.
Definition at line 87 of file hdfs-text-table-writer.h.
Referenced by AppendRowBatch(), Flush(), HdfsTextTableWriter(), and PrintEscaped().
|
protectedinherited |
Runtime state.
Definition at line 115 of file hdfs-table-writer.h.
Referenced by impala::HdfsParquetTableWriter::default_block_size(), impala::HdfsSequenceTableWriter::Init(), Init(), impala::HdfsParquetTableWriter::Init(), and impala::HdfsAvroTableWriter::Init().
|
protectedinherited |
Subclass should populate any file format specific stats.
Definition at line 127 of file hdfs-table-writer.h.
Referenced by impala::HdfsParquetTableWriter::Finalize(), impala::HdfsTableWriter::stats(), and impala::HdfsTableWriter::Write().
|
protectedinherited |
Table descriptor of table to be written.
Definition at line 121 of file hdfs-table-writer.h.
Referenced by impala::HdfsParquetTableWriter::AddRowGroup(), impala::HdfsSequenceTableWriter::AppendRowBatch(), AppendRowBatch(), impala::HdfsAvroTableWriter::ConsumeRow(), impala::HdfsParquetTableWriter::CreateSchema(), impala::HdfsSequenceTableWriter::EncodeRow(), impala::HdfsParquetTableWriter::FlushCurrentRowGroup(), impala::HdfsTableWriter::HdfsTableWriter(), impala::HdfsParquetTableWriter::Init(), and impala::HdfsAvroTableWriter::WriteFileHeader().
|
private |
Character delimiting tuples.
Definition at line 74 of file hdfs-text-table-writer.h.
Referenced by AppendRowBatch(), and HdfsTextTableWriter().