Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <data-stream-sender.h>
Classes | |
class | Channel |
Public Member Functions | |
DataStreamSender (ObjectPool *pool, int sender_id, const RowDescriptor &row_desc, const TDataStreamSink &sink, const std::vector< TPlanFragmentDestination > &destinations, int per_channel_buffer_size) | |
virtual | ~DataStreamSender () |
virtual Status | Prepare (RuntimeState *state) |
virtual Status | Open (RuntimeState *state) |
virtual Status | Send (RuntimeState *state, RowBatch *batch, bool eos) |
virtual void | Close (RuntimeState *state) |
void | SerializeBatch (RowBatch *src, TRowBatch *dest, int num_receivers=1) |
int64_t | GetNumDataBytesSent () const |
virtual RuntimeProfile * | profile () |
Returns the runtime profile for the sink. More... | |
Static Public Member Functions | |
static Status | CreateDataSink (ObjectPool *pool, const TDataSink &thrift_sink, const std::vector< TExpr > &output_exprs, const TPlanFragmentExecParams ¶ms, const RowDescriptor &row_desc, boost::scoped_ptr< DataSink > *sink) |
static void | MergeInsertStats (const TInsertStats &src_stats, TInsertStats *dst_stats) |
static std::string | OutputInsertStats (const PartitionStatusMap &stats, const std::string &prefix="") |
Outputs the insert stats contained in the map of insert partition updates to a string. More... | |
Protected Attributes | |
boost::scoped_ptr< MemTracker > | expr_mem_tracker_ |
Single sender of an m:n data stream. Row batch data is routed to destinations based on the provided partitioning specification. Not thread-safe. TODO: capture stats that describe distribution of rows/data volume across channels.
Definition at line 46 of file data-stream-sender.h.
impala::DataStreamSender::DataStreamSender | ( | ObjectPool * | pool, |
int | sender_id, | ||
const RowDescriptor & | row_desc, | ||
const TDataStreamSink & | sink, | ||
const std::vector< TPlanFragmentDestination > & | destinations, | ||
int | per_channel_buffer_size | ||
) |
Construct a sender according to the output specification (sink), sending to the given destinations. sender_id identifies this sender instance, and is unique within a fragment. Per_channel_buffer_size is the buffer size allocated to each channel and is specified in bytes. The RowDescriptor must live until Close() is called. NOTE: supported partition types are UNPARTITIONED (broadcast), HASH_PARTITIONED, and RANDOM.
Definition at line 310 of file data-stream-sender.cc.
References broadcast_, channels_, impala::Expr::CreateExprTrees(), impala::Status::ok(), partition_expr_ctxs_, and random_.
|
virtual |
Definition at line 354 of file data-stream-sender.cc.
References channels_.
|
virtual |
Flush all buffered data and close all existing channels to destination hosts. Further Send() calls are illegal after calling Close().
Implements impala::DataSink.
Definition at line 450 of file data-stream-sender.cc.
References channels_, impala::Expr::Close(), closed_, and partition_expr_ctxs_.
Referenced by impala::DataStreamTest::Sender().
|
staticinherited |
Creates a new data sink from thrift_sink. A pointer to the new sink is written to *sink, and is owned by the caller.
Definition at line 34 of file data-sink.cc.
References impala::Status::OK.
Referenced by impala::PlanFragmentExecutor::Prepare().
int64_t impala::DataStreamSender::GetNumDataBytesSent | ( | ) | const |
Return total number of bytes sent in TRowBatch.data. If batches are broadcast to multiple receivers, they are counted once per receiver.
Definition at line 469 of file data-stream-sender.cc.
References channels_.
Referenced by impala::DataStreamTest::Sender().
|
staticinherited |
Merges one update to the insert stats for a partition. dst_stats will have the combined stats of src_stats and dst_stats after this method returns.
Definition at line 90 of file data-sink.cc.
Referenced by impala::HdfsTableSink::FinalizePartitionFile().
|
virtual |
Must be called before Send() or Close(), and after the codegen'd IR module is compiled (i.e. in an ExecNode's Open() function).
Implements impala::DataSink.
Definition at line 397 of file data-stream-sender.cc.
References impala::Expr::Open(), and partition_expr_ctxs_.
Referenced by impala::DataStreamTest::Sender().
|
staticinherited |
Outputs the insert stats contained in the map of insert partition updates to a string.
Definition at line 103 of file data-sink.cc.
References impala::PrettyPrinter::Print().
|
virtual |
Must be called before other API calls, and before the codegen'd IR module is compiled (i.e. in an ExecNode's Prepare() function).
Reimplemented from impala::DataSink.
Definition at line 362 of file data-stream-sender.cc.
References impala::ObjectPool::Add(), ADD_COUNTER, ADD_TIMER, impala::RuntimeProfile::AddDerivedCounter(), bytes_sent_counter_, channels_, dest_node_id_, impala::RuntimeState::instance_mem_tracker(), mem_tracker_, network_throughput_, impala::Status::OK, overall_throughput_, partition_expr_ctxs_, pool_, impala::Expr::Prepare(), profile(), profile_, RETURN_IF_ERROR, row_desc_, SCOPED_TIMER, serialize_batch_timer_, state_, thrift_transmit_timer_, impala::RuntimeProfile::total_time_counter(), uncompressed_bytes_counter_, and impala::RuntimeProfile::UnitsPerSecond().
Referenced by impala::DataStreamTest::Sender().
|
inlinevirtual |
Returns the runtime profile for the sink.
Implements impala::DataSink.
Definition at line 90 of file data-stream-sender.h.
References profile_.
Referenced by Prepare().
|
virtual |
Send data in 'batch' to destination nodes according to partitioning specification provided in c'tor. Blocks until all rows in batch are placed in their appropriate outgoing buffers (ie, blocks if there are still in-flight rpcs from the last Send() call).
Implements impala::DataSink.
Definition at line 401 of file data-stream-sender.cc.
References broadcast_, channels_, impala::RuntimeState::CheckQueryState(), closed_, current_channel_idx_, current_thrift_batch_, impala::HashUtil::FNV_SEED, impala::ExprContext::FreeLocalAllocations(), impala::RawValue::GetHashValueFnv(), impala::RowBatch::GetRow(), impala::ExprContext::GetValue(), impala::RowBatch::num_rows(), impala::Status::OK, partition_expr_ctxs_, profile_, random_, RETURN_IF_ERROR, impala::ExprContext::root(), SCOPED_TIMER, impala::DataStreamSender::Channel::SendBatch(), SerializeBatch(), impala::DataStreamSender::Channel::thrift_batch(), thrift_batch1_, thrift_batch2_, impala::RuntimeProfile::total_time_counter(), impala::Expr::type(), and impala::DataStreamSender::Channel::WaitForRpc().
Referenced by impala::DataStreamTest::Sender().
void impala::DataStreamSender::SerializeBatch | ( | RowBatch * | src, |
TRowBatch * | dest, | ||
int | num_receivers = 1 |
||
) |
Serializes the src batch into the dest thrift batch. Maintains metrics. num_receivers is the number of receivers this batch will be sent to. Only used to maintain metrics.
Definition at line 459 of file data-stream-sender.cc.
References bytes_sent_counter_, COUNTER_ADD, impala::RowBatch::GetBatchSize(), impala::RowBatch::num_rows(), SCOPED_TIMER, impala::RowBatch::Serialize(), serialize_batch_timer_, uncompressed_bytes_counter_, and VLOG_ROW.
Referenced by Send().
|
private |
Definition at line 100 of file data-stream-sender.h.
Referenced by DataStreamSender(), and Send().
|
private |
Definition at line 119 of file data-stream-sender.h.
Referenced by Prepare(), and SerializeBatch().
|
private |
Definition at line 114 of file data-stream-sender.h.
Referenced by Close(), DataStreamSender(), GetNumDataBytesSent(), Prepare(), Send(), and ~DataStreamSender().
|
private |
If true, this sender has been closed. Not valid to call Send() anymore.
Definition at line 105 of file data-stream-sender.h.
|
private |
Definition at line 102 of file data-stream-sender.h.
Referenced by Send().
|
private |
Definition at line 111 of file data-stream-sender.h.
Referenced by Send().
|
private |
Identifier of the destination plan node.
Definition at line 130 of file data-stream-sender.h.
Referenced by Prepare().
|
protectedinherited |
Definition at line 85 of file data-sink.h.
Referenced by impala::DataSink::Prepare(), and impala::HdfsTableSink::PrepareExprs().
|
private |
Definition at line 121 of file data-stream-sender.h.
Referenced by Prepare().
|
private |
Throughput per time spent in TransmitData.
Definition at line 124 of file data-stream-sender.h.
Referenced by Prepare().
|
private |
Throughput per total time spent in sender.
Definition at line 127 of file data-stream-sender.h.
Referenced by Prepare().
|
private |
Definition at line 113 of file data-stream-sender.h.
Referenced by Close(), DataStreamSender(), Open(), Prepare(), and Send().
|
private |
Definition at line 98 of file data-stream-sender.h.
Referenced by Prepare().
|
private |
Definition at line 116 of file data-stream-sender.h.
|
private |
Definition at line 101 of file data-stream-sender.h.
Referenced by DataStreamSender(), and Send().
|
private |
Definition at line 99 of file data-stream-sender.h.
Referenced by Prepare().
|
private |
Sender instance id, unique within a fragment.
Definition at line 93 of file data-stream-sender.h.
|
private |
Definition at line 117 of file data-stream-sender.h.
Referenced by Prepare(), and SerializeBatch().
|
private |
Definition at line 97 of file data-stream-sender.h.
Referenced by Prepare().
|
private |
serialized batches for broadcasting; we need two so we can write one while the other one is still being sent
Definition at line 109 of file data-stream-sender.h.
Referenced by Send().
|
private |
Definition at line 110 of file data-stream-sender.h.
Referenced by Send().
|
private |
Definition at line 118 of file data-stream-sender.h.
Referenced by Prepare().
|
private |
Definition at line 120 of file data-stream-sender.h.
Referenced by Prepare(), and SerializeBatch().