Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <hdfs-table-writer.h>
Public Member Functions | |
HdfsTableWriter (HdfsTableSink *parent, RuntimeState *state, OutputPartition *output_partition, const HdfsPartitionDescriptor *partition_desc, const HdfsTableDescriptor *table_desc, const std::vector< ExprContext * > &output_expr_ctxs) | |
virtual | ~HdfsTableWriter () |
virtual Status | Init ()=0 |
Do initialization of writer. More... | |
virtual Status | InitNewFile ()=0 |
Called when a new file is started. More... | |
virtual Status | AppendRowBatch (RowBatch *batch, const std::vector< int32_t > &row_group_indices, bool *new_file)=0 |
virtual Status | Finalize ()=0 |
virtual void | Close ()=0 |
Called once when this writer should cleanup any resources. More... | |
TInsertStats & | stats () |
Returns the stats for this writer. More... | |
virtual uint64_t | default_block_size () const =0 |
virtual std::string | file_extension () const =0 |
Returns the file extension for this writer. More... | |
Protected Member Functions | |
Status | Write (const char *data, int32_t len) |
Write to the current hdfs file. More... | |
Status | Write (const uint8_t *data, int32_t len) |
template<typename T > | |
Status | Write (T v) |
Protected Attributes | |
HdfsTableSink * | parent_ |
Parent table sink object. More... | |
RuntimeState * | state_ |
Runtime state. More... | |
OutputPartition * | output_ |
Structure describing partition written to by this writer. More... | |
const HdfsTableDescriptor * | table_desc_ |
Table descriptor of table to be written. More... | |
std::vector< ExprContext * > | output_expr_ctxs_ |
Expressions that materialize output values. More... | |
TInsertStats | stats_ |
Subclass should populate any file format specific stats. More... | |
Static Protected Attributes | |
static const int | HDFS_FLUSH_WRITE_SIZE = 50 * 1024 |
Pure virtual class for writing to hdfs table partition files. Subclasses implement the code needed to write to a specific file type. A subclass needs to implement functions to format and add rows to the file and to do whatever processing is needed prior to closing the file.
Definition at line 33 of file hdfs-table-writer.h.
impala::HdfsTableWriter::HdfsTableWriter | ( | HdfsTableSink * | parent, |
RuntimeState * | state, | ||
OutputPartition * | output_partition, | ||
const HdfsPartitionDescriptor * | partition_desc, | ||
const HdfsTableDescriptor * | table_desc, | ||
const std::vector< ExprContext * > & | output_expr_ctxs | ||
) |
The implementation of a writer may reference the parameters to the constructor during the lifetime of the object. output_partition – Information on the output partition file. partition – the descriptor for the partition being written table_desc – the descriptor for the table being written. output_exprs – expressions which generate the output values.
Definition at line 21 of file hdfs-table-writer.cc.
References impala::HdfsTableSink::DebugString(), impala::TableDescriptor::num_clustering_cols(), impala::TableDescriptor::num_cols(), output_expr_ctxs_, parent_, and table_desc_.
|
inlinevirtual |
Definition at line 47 of file hdfs-table-writer.h.
|
pure virtual |
Appends the current batch of rows to the partition. If there are multiple partitions then row_group_indices will contain the rows that are for this partition, otherwise all rows in the batch are appended. If the current file is full, the writer stops appending and returns with *new_file == true. A new file will be opened and the same row batch will be passed again. The writer must track how much of the batch it had already processed asking for a new file. Otherwise the writer will return with *newfile == false.
Implemented in impala::HdfsAvroTableWriter, impala::HdfsParquetTableWriter, impala::HdfsTextTableWriter, and impala::HdfsSequenceTableWriter.
|
pure virtual |
Called once when this writer should cleanup any resources.
Implemented in impala::HdfsParquetTableWriter, impala::HdfsAvroTableWriter, impala::HdfsTextTableWriter, and impala::HdfsSequenceTableWriter.
|
pure virtual |
Default block size to use for this file format. If the file format doesn't care, it should return 0 and the hdfs config default will be used.
Implemented in impala::HdfsParquetTableWriter, impala::HdfsAvroTableWriter, impala::HdfsTextTableWriter, and impala::HdfsSequenceTableWriter.
|
pure virtual |
Returns the file extension for this writer.
Implemented in impala::HdfsParquetTableWriter, impala::HdfsAvroTableWriter, impala::HdfsTextTableWriter, and impala::HdfsSequenceTableWriter.
|
pure virtual |
Finalize this partition. The writer needs to finish processing all data have written out after the return from this call. This is called once for each call to InitNewFile()
Implemented in impala::HdfsParquetTableWriter, impala::HdfsAvroTableWriter, impala::HdfsTextTableWriter, and impala::HdfsSequenceTableWriter.
|
pure virtual |
Do initialization of writer.
The sequence of calls to this object are:
Implemented in impala::HdfsAvroTableWriter, impala::HdfsParquetTableWriter, impala::HdfsTextTableWriter, and impala::HdfsSequenceTableWriter.
|
pure virtual |
Called when a new file is started.
Implemented in impala::HdfsAvroTableWriter, impala::HdfsParquetTableWriter, impala::HdfsTextTableWriter, and impala::HdfsSequenceTableWriter.
|
inline |
Returns the stats for this writer.
Definition at line 86 of file hdfs-table-writer.h.
References stats_.
|
inlineprotected |
Write to the current hdfs file.
Definition at line 101 of file hdfs-table-writer.h.
Referenced by impala::HdfsTextTableWriter::Flush(), impala::HdfsSequenceTableWriter::Flush(), impala::HdfsAvroTableWriter::Flush(), impala::HdfsParquetTableWriter::FlushCurrentRowGroup(), Write(), impala::HdfsSequenceTableWriter::WriteCompressedBlock(), impala::HdfsParquetTableWriter::WriteFileFooter(), impala::HdfsSequenceTableWriter::WriteFileHeader(), impala::HdfsAvroTableWriter::WriteFileHeader(), and impala::HdfsParquetTableWriter::WriteFileHeader().
|
protected |
Definition at line 36 of file hdfs-table-writer.cc.
References impala::HdfsTableSink::bytes_written_counter(), COUNTER_ADD, impala::OutputPartition::current_file_name, impala::GetHdfsErrorMsg(), impala::OutputPartition::hdfs_connection, impala::Status::OK, output_, parent_, stats_, and impala::OutputPartition::tmp_hdfs_file.
|
inlineprotected |
Definition at line 107 of file hdfs-table-writer.h.
References Write().
|
staticprotected |
Size to buffer output before calling Write() (which calls hdfsWrite), in bytes to minimize the overhead of Write()
Definition at line 98 of file hdfs-table-writer.h.
Referenced by impala::HdfsTextTableWriter::HdfsTextTableWriter(), and impala::HdfsTextTableWriter::Init().
|
protected |
Structure describing partition written to by this writer.
Definition at line 118 of file hdfs-table-writer.h.
Referenced by impala::HdfsTextTableWriter::AppendRowBatch(), impala::HdfsParquetTableWriter::AppendRowBatch(), impala::HdfsParquetTableWriter::InitNewFile(), and Write().
|
protected |
Expressions that materialize output values.
Definition at line 124 of file hdfs-table-writer.h.
Referenced by impala::HdfsSequenceTableWriter::AppendRowBatch(), impala::HdfsTextTableWriter::AppendRowBatch(), impala::HdfsAvroTableWriter::ConsumeRow(), impala::HdfsParquetTableWriter::CreateSchema(), impala::HdfsSequenceTableWriter::EncodeRow(), HdfsTableWriter(), and impala::HdfsParquetTableWriter::Init().
|
protected |
Parent table sink object.
Definition at line 112 of file hdfs-table-writer.h.
Referenced by impala::HdfsSequenceTableWriter::AppendRowBatch(), impala::HdfsTextTableWriter::AppendRowBatch(), impala::HdfsParquetTableWriter::AppendRowBatch(), impala::HdfsAvroTableWriter::AppendRowBatch(), impala::HdfsTextTableWriter::Close(), impala::HdfsSequenceTableWriter::ConsumeRow(), impala::HdfsSequenceTableWriter::EncodeRow(), impala::HdfsParquetTableWriter::Finalize(), impala::HdfsTextTableWriter::Flush(), impala::HdfsSequenceTableWriter::Flush(), impala::HdfsAvroTableWriter::Flush(), HdfsTableWriter(), impala::HdfsTextTableWriter::Init(), Write(), and impala::HdfsSequenceTableWriter::WriteCompressedBlock().
|
protected |
Runtime state.
Definition at line 115 of file hdfs-table-writer.h.
Referenced by impala::HdfsParquetTableWriter::default_block_size(), impala::HdfsSequenceTableWriter::Init(), impala::HdfsTextTableWriter::Init(), impala::HdfsParquetTableWriter::Init(), and impala::HdfsAvroTableWriter::Init().
|
protected |
Subclass should populate any file format specific stats.
Definition at line 127 of file hdfs-table-writer.h.
Referenced by impala::HdfsParquetTableWriter::Finalize(), stats(), and Write().
|
protected |
Table descriptor of table to be written.
Definition at line 121 of file hdfs-table-writer.h.
Referenced by impala::HdfsParquetTableWriter::AddRowGroup(), impala::HdfsSequenceTableWriter::AppendRowBatch(), impala::HdfsTextTableWriter::AppendRowBatch(), impala::HdfsAvroTableWriter::ConsumeRow(), impala::HdfsParquetTableWriter::CreateSchema(), impala::HdfsSequenceTableWriter::EncodeRow(), impala::HdfsParquetTableWriter::FlushCurrentRowGroup(), HdfsTableWriter(), impala::HdfsParquetTableWriter::Init(), and impala::HdfsAvroTableWriter::WriteFileHeader().