Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
DataProvider Class Reference

See data-provider-test.cc on how to use this. More...

#include <data-provider.h>

Collaboration diagram for DataProvider:

Classes

class  ColDesc
 
struct  Value
 

Public Types

enum  DataGen { UNIFORM_RANDOM, SEQUENTIAL }
 How the data should be generated. More...
 

Public Member Functions

 DataProvider (impala::MemPool *pool, impala::RuntimeProfile *profile)
 
void Reset (int num_rows, int batch_size, const std::vector< ColDesc > &columns)
 
void SetSeed (int seed)
 
int row_size () const
 The size of a row (tuple size) More...
 
int total_rows () const
 The total number of rows that will be generated. More...
 
void * NextBatch (int *rows_returned)
 
void Print (std::ostream *, char *data, int num_rows) const
 Print the row data in csv format. More...
 

Private Attributes

impala::MemPoolpool_
 
impala::RuntimeProfileprofile_
 
int num_rows_
 
int batch_size_
 
int rows_returned_
 
boost::scoped_ptr< char > data_
 
int row_size_
 
boost::minstd_rand rand_generator_
 
std::vector< ColDesccols_
 
impala::RuntimeProfile::Counterbytes_generated_
 

Detailed Description

See data-provider-test.cc on how to use this.

This is a test utility class that can generate data that is similar to the tuple data we use. It can accept columns descriptions and generates rows (in batches) with an iterator interface.TODO: provide a way to have better control over the pool strings are allocated to TODO: provide a way to control data skew. This is pretty easy with the boost rand classes.

Definition at line 33 of file data-provider.h.

Member Enumeration Documentation

How the data should be generated.

Enumerator
UNIFORM_RANDOM 
SEQUENTIAL 

Definition at line 49 of file data-provider.h.

Constructor & Destructor Documentation

DataProvider::DataProvider ( impala::MemPool pool,
impala::RuntimeProfile profile 
)

Create a data provider object with a pool for allocating memory and a profile to collect metrics.

Definition at line 13 of file data-provider.cc.

References ADD_COUNTER, bytes_generated_, and SetSeed().

Member Function Documentation

void * DataProvider::NextBatch ( int *  rows_returned)
void DataProvider::Print ( std::ostream *  ,
char *  data,
int  num_rows 
) const
void DataProvider::Reset ( int  num_rows,
int  batch_size,
const std::vector< ColDesc > &  columns 
)

Reset the generator with the column description.

  • num_rows: total rows to generate
  • batch_size: size of generated batches from NextBatch Data returned via previous NextBatch calls is no longer valid

Definition at line 26 of file data-provider.cc.

References batch_size_, bytes_generated_, cols_, COUNTER_SET, data_, num_rows_, row_size_, and rows_returned_.

Referenced by main().

int DataProvider::row_size ( ) const
inline

The size of a row (tuple size)

Definition at line 107 of file data-provider.h.

References row_size_.

Referenced by main().

void DataProvider::SetSeed ( int  seed)

Sets the seed to use for randomly generated data. The default generator will use seed(0)

Definition at line 39 of file data-provider.cc.

References rand_generator_.

Referenced by DataProvider().

int DataProvider::total_rows ( ) const
inline

The total number of rows that will be generated.

Definition at line 110 of file data-provider.h.

References num_rows_.

Referenced by main().

Member Data Documentation

int DataProvider::batch_size_
private

Definition at line 124 of file data-provider.h.

Referenced by NextBatch(), and Reset().

impala::RuntimeProfile::Counter* DataProvider::bytes_generated_
private

Definition at line 131 of file data-provider.h.

Referenced by DataProvider(), NextBatch(), and Reset().

std::vector<ColDesc> DataProvider::cols_
private

Definition at line 129 of file data-provider.h.

Referenced by NextBatch(), Print(), and Reset().

boost::scoped_ptr<char> DataProvider::data_
private

Definition at line 126 of file data-provider.h.

Referenced by NextBatch(), and Reset().

int DataProvider::num_rows_
private

Definition at line 123 of file data-provider.h.

Referenced by NextBatch(), Reset(), and total_rows().

impala::MemPool* DataProvider::pool_
private

Definition at line 121 of file data-provider.h.

Referenced by NextBatch().

impala::RuntimeProfile* DataProvider::profile_
private

Definition at line 122 of file data-provider.h.

boost::minstd_rand DataProvider::rand_generator_
private

Definition at line 128 of file data-provider.h.

Referenced by NextBatch(), and SetSeed().

int DataProvider::row_size_
private

Definition at line 127 of file data-provider.h.

Referenced by NextBatch(), Reset(), and row_size().

int DataProvider::rows_returned_
private

Definition at line 125 of file data-provider.h.

Referenced by NextBatch(), and Reset().


The documentation for this class was generated from the following files: