Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::DataStreamSender Class Reference

#include <data-stream-sender.h>

Inheritance diagram for impala::DataStreamSender:
Collaboration diagram for impala::DataStreamSender:

Classes

class  Channel
 

Public Member Functions

 DataStreamSender (ObjectPool *pool, int sender_id, const RowDescriptor &row_desc, const TDataStreamSink &sink, const std::vector< TPlanFragmentDestination > &destinations, int per_channel_buffer_size)
 
virtual ~DataStreamSender ()
 
virtual Status Prepare (RuntimeState *state)
 
virtual Status Open (RuntimeState *state)
 
virtual Status Send (RuntimeState *state, RowBatch *batch, bool eos)
 
virtual void Close (RuntimeState *state)
 
void SerializeBatch (RowBatch *src, TRowBatch *dest, int num_receivers=1)
 
int64_t GetNumDataBytesSent () const
 
virtual RuntimeProfileprofile ()
 Returns the runtime profile for the sink. More...
 

Static Public Member Functions

static Status CreateDataSink (ObjectPool *pool, const TDataSink &thrift_sink, const std::vector< TExpr > &output_exprs, const TPlanFragmentExecParams &params, const RowDescriptor &row_desc, boost::scoped_ptr< DataSink > *sink)
 
static void MergeInsertStats (const TInsertStats &src_stats, TInsertStats *dst_stats)
 
static std::string OutputInsertStats (const PartitionStatusMap &stats, const std::string &prefix="")
 Outputs the insert stats contained in the map of insert partition updates to a string. More...
 

Protected Attributes

boost::scoped_ptr< MemTrackerexpr_mem_tracker_
 

Private Attributes

int sender_id_
 Sender instance id, unique within a fragment. More...
 
RuntimeStatestate_
 
ObjectPoolpool_
 
const RowDescriptorrow_desc_
 
bool broadcast_
 
bool random_
 
int current_channel_idx_
 
bool closed_
 If true, this sender has been closed. Not valid to call Send() anymore. More...
 
TRowBatch thrift_batch1_
 
TRowBatch thrift_batch2_
 
TRowBatch * current_thrift_batch_
 
std::vector< ExprContext * > partition_expr_ctxs_
 
std::vector< Channel * > channels_
 
RuntimeProfileprofile_
 
RuntimeProfile::Counterserialize_batch_timer_
 
RuntimeProfile::Counterthrift_transmit_timer_
 
RuntimeProfile::Counterbytes_sent_counter_
 
RuntimeProfile::Counteruncompressed_bytes_counter_
 
boost::scoped_ptr< MemTrackermem_tracker_
 
RuntimeProfile::Counternetwork_throughput_
 Throughput per time spent in TransmitData. More...
 
RuntimeProfile::Counteroverall_throughput_
 Throughput per total time spent in sender. More...
 
PlanNodeId dest_node_id_
 Identifier of the destination plan node. More...
 

Detailed Description

Single sender of an m:n data stream. Row batch data is routed to destinations based on the provided partitioning specification. Not thread-safe. TODO: capture stats that describe distribution of rows/data volume across channels.

Definition at line 46 of file data-stream-sender.h.

Constructor & Destructor Documentation

impala::DataStreamSender::DataStreamSender ( ObjectPool pool,
int  sender_id,
const RowDescriptor row_desc,
const TDataStreamSink &  sink,
const std::vector< TPlanFragmentDestination > &  destinations,
int  per_channel_buffer_size 
)

Construct a sender according to the output specification (sink), sending to the given destinations. sender_id identifies this sender instance, and is unique within a fragment. Per_channel_buffer_size is the buffer size allocated to each channel and is specified in bytes. The RowDescriptor must live until Close() is called. NOTE: supported partition types are UNPARTITIONED (broadcast), HASH_PARTITIONED, and RANDOM.

Definition at line 310 of file data-stream-sender.cc.

References broadcast_, channels_, impala::Expr::CreateExprTrees(), impala::Status::ok(), partition_expr_ctxs_, and random_.

impala::DataStreamSender::~DataStreamSender ( )
virtual

Definition at line 354 of file data-stream-sender.cc.

References channels_.

Member Function Documentation

void impala::DataStreamSender::Close ( RuntimeState state)
virtual

Flush all buffered data and close all existing channels to destination hosts. Further Send() calls are illegal after calling Close().

Implements impala::DataSink.

Definition at line 450 of file data-stream-sender.cc.

References channels_, impala::Expr::Close(), closed_, and partition_expr_ctxs_.

Referenced by impala::DataStreamTest::Sender().

Status impala::DataSink::CreateDataSink ( ObjectPool pool,
const TDataSink &  thrift_sink,
const std::vector< TExpr > &  output_exprs,
const TPlanFragmentExecParams &  params,
const RowDescriptor row_desc,
boost::scoped_ptr< DataSink > *  sink 
)
staticinherited

Creates a new data sink from thrift_sink. A pointer to the new sink is written to *sink, and is owned by the caller.

Definition at line 34 of file data-sink.cc.

References impala::Status::OK.

Referenced by impala::PlanFragmentExecutor::Prepare().

int64_t impala::DataStreamSender::GetNumDataBytesSent ( ) const

Return total number of bytes sent in TRowBatch.data. If batches are broadcast to multiple receivers, they are counted once per receiver.

Definition at line 469 of file data-stream-sender.cc.

References channels_.

Referenced by impala::DataStreamTest::Sender().

void impala::DataSink::MergeInsertStats ( const TInsertStats &  src_stats,
TInsertStats *  dst_stats 
)
staticinherited

Merges one update to the insert stats for a partition. dst_stats will have the combined stats of src_stats and dst_stats after this method returns.

Definition at line 90 of file data-sink.cc.

Referenced by impala::HdfsTableSink::FinalizePartitionFile().

Status impala::DataStreamSender::Open ( RuntimeState state)
virtual

Must be called before Send() or Close(), and after the codegen'd IR module is compiled (i.e. in an ExecNode's Open() function).

Implements impala::DataSink.

Definition at line 397 of file data-stream-sender.cc.

References impala::Expr::Open(), and partition_expr_ctxs_.

Referenced by impala::DataStreamTest::Sender().

string impala::DataSink::OutputInsertStats ( const PartitionStatusMap stats,
const std::string &  prefix = "" 
)
staticinherited

Outputs the insert stats contained in the map of insert partition updates to a string.

Definition at line 103 of file data-sink.cc.

References impala::PrettyPrinter::Print().

virtual RuntimeProfile* impala::DataStreamSender::profile ( )
inlinevirtual

Returns the runtime profile for the sink.

Implements impala::DataSink.

Definition at line 90 of file data-stream-sender.h.

References profile_.

Referenced by Prepare().

void impala::DataStreamSender::SerializeBatch ( RowBatch src,
TRowBatch *  dest,
int  num_receivers = 1 
)

Serializes the src batch into the dest thrift batch. Maintains metrics. num_receivers is the number of receivers this batch will be sent to. Only used to maintain metrics.

Definition at line 459 of file data-stream-sender.cc.

References bytes_sent_counter_, COUNTER_ADD, impala::RowBatch::GetBatchSize(), impala::RowBatch::num_rows(), SCOPED_TIMER, impala::RowBatch::Serialize(), serialize_batch_timer_, uncompressed_bytes_counter_, and VLOG_ROW.

Referenced by Send().

Member Data Documentation

bool impala::DataStreamSender::broadcast_
private

Definition at line 100 of file data-stream-sender.h.

Referenced by DataStreamSender(), and Send().

RuntimeProfile::Counter* impala::DataStreamSender::bytes_sent_counter_
private

Definition at line 119 of file data-stream-sender.h.

Referenced by Prepare(), and SerializeBatch().

std::vector<Channel*> impala::DataStreamSender::channels_
private
bool impala::DataStreamSender::closed_
private

If true, this sender has been closed. Not valid to call Send() anymore.

Definition at line 105 of file data-stream-sender.h.

Referenced by Close(), and Send().

int impala::DataStreamSender::current_channel_idx_
private

Definition at line 102 of file data-stream-sender.h.

Referenced by Send().

TRowBatch* impala::DataStreamSender::current_thrift_batch_
private

Definition at line 111 of file data-stream-sender.h.

Referenced by Send().

PlanNodeId impala::DataStreamSender::dest_node_id_
private

Identifier of the destination plan node.

Definition at line 130 of file data-stream-sender.h.

Referenced by Prepare().

boost::scoped_ptr<MemTracker> impala::DataSink::expr_mem_tracker_
protectedinherited

Definition at line 85 of file data-sink.h.

Referenced by impala::DataSink::Prepare(), and impala::HdfsTableSink::PrepareExprs().

boost::scoped_ptr<MemTracker> impala::DataStreamSender::mem_tracker_
private

Definition at line 121 of file data-stream-sender.h.

Referenced by Prepare().

RuntimeProfile::Counter* impala::DataStreamSender::network_throughput_
private

Throughput per time spent in TransmitData.

Definition at line 124 of file data-stream-sender.h.

Referenced by Prepare().

RuntimeProfile::Counter* impala::DataStreamSender::overall_throughput_
private

Throughput per total time spent in sender.

Definition at line 127 of file data-stream-sender.h.

Referenced by Prepare().

std::vector<ExprContext*> impala::DataStreamSender::partition_expr_ctxs_
private

Definition at line 113 of file data-stream-sender.h.

Referenced by Close(), DataStreamSender(), Open(), Prepare(), and Send().

ObjectPool* impala::DataStreamSender::pool_
private

Definition at line 98 of file data-stream-sender.h.

Referenced by Prepare().

RuntimeProfile* impala::DataStreamSender::profile_
private

Definition at line 116 of file data-stream-sender.h.

Referenced by Prepare(), profile(), and Send().

bool impala::DataStreamSender::random_
private

Definition at line 101 of file data-stream-sender.h.

Referenced by DataStreamSender(), and Send().

const RowDescriptor& impala::DataStreamSender::row_desc_
private

Definition at line 99 of file data-stream-sender.h.

Referenced by Prepare().

int impala::DataStreamSender::sender_id_
private

Sender instance id, unique within a fragment.

Definition at line 93 of file data-stream-sender.h.

RuntimeProfile::Counter* impala::DataStreamSender::serialize_batch_timer_
private

Definition at line 117 of file data-stream-sender.h.

Referenced by Prepare(), and SerializeBatch().

RuntimeState* impala::DataStreamSender::state_
private

Definition at line 97 of file data-stream-sender.h.

Referenced by Prepare().

TRowBatch impala::DataStreamSender::thrift_batch1_
private

serialized batches for broadcasting; we need two so we can write one while the other one is still being sent

Definition at line 109 of file data-stream-sender.h.

Referenced by Send().

TRowBatch impala::DataStreamSender::thrift_batch2_
private

Definition at line 110 of file data-stream-sender.h.

Referenced by Send().

RuntimeProfile::Counter* impala::DataStreamSender::thrift_transmit_timer_
private

Definition at line 118 of file data-stream-sender.h.

Referenced by Prepare().

RuntimeProfile::Counter* impala::DataStreamSender::uncompressed_bytes_counter_
private

Definition at line 120 of file data-stream-sender.h.

Referenced by Prepare(), and SerializeBatch().


The documentation for this class was generated from the following files: