Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
See data-provider-test.cc on how to use this. More...
#include <data-provider.h>
Classes | |
class | ColDesc |
struct | Value |
Public Types | |
enum | DataGen { UNIFORM_RANDOM, SEQUENTIAL } |
How the data should be generated. More... | |
Public Member Functions | |
DataProvider (impala::MemPool *pool, impala::RuntimeProfile *profile) | |
void | Reset (int num_rows, int batch_size, const std::vector< ColDesc > &columns) |
void | SetSeed (int seed) |
int | row_size () const |
The size of a row (tuple size) More... | |
int | total_rows () const |
The total number of rows that will be generated. More... | |
void * | NextBatch (int *rows_returned) |
void | Print (std::ostream *, char *data, int num_rows) const |
Print the row data in csv format. More... | |
Private Attributes | |
impala::MemPool * | pool_ |
impala::RuntimeProfile * | profile_ |
int | num_rows_ |
int | batch_size_ |
int | rows_returned_ |
boost::scoped_ptr< char > | data_ |
int | row_size_ |
boost::minstd_rand | rand_generator_ |
std::vector< ColDesc > | cols_ |
impala::RuntimeProfile::Counter * | bytes_generated_ |
See data-provider-test.cc on how to use this.
This is a test utility class that can generate data that is similar to the tuple data we use. It can accept columns descriptions and generates rows (in batches) with an iterator interface.TODO: provide a way to have better control over the pool strings are allocated to TODO: provide a way to control data skew. This is pretty easy with the boost rand classes.
Definition at line 33 of file data-provider.h.
How the data should be generated.
Enumerator | |
---|---|
UNIFORM_RANDOM | |
SEQUENTIAL |
Definition at line 49 of file data-provider.h.
DataProvider::DataProvider | ( | impala::MemPool * | pool, |
impala::RuntimeProfile * | profile | ||
) |
Create a data provider object with a pool for allocating memory and a profile to collect metrics.
Definition at line 13 of file data-provider.cc.
References ADD_COUNTER, bytes_generated_, and SetSeed().
void * DataProvider::NextBatch | ( | int * | rows_returned | ) |
Generated the next batch, returning a pointer to the start of the batch and the number of rows generated. Returns NULL/0 when the generator is done.
Definition at line 60 of file data-provider.cc.
References batch_size_, DataProvider::ColDesc::bytes, bytes_generated_, cols_, COUNTER_ADD, data_, DataProvider::ColDesc::Generate(), DataProvider::ColDesc::max, DataProvider::ColDesc::min, num_rows_, pool_, rand_generator_, RandString(), row_size_, rows_returned_, DataProvider::Value::s, DataProvider::ColDesc::type, impala::TYPE_BIGINT, impala::TYPE_BOOLEAN, impala::TYPE_DOUBLE, impala::TYPE_FLOAT, impala::TYPE_INT, impala::TYPE_SMALLINT, impala::TYPE_STRING, impala::TYPE_TINYINT, and impala::TYPE_VARCHAR.
Referenced by main().
void DataProvider::Print | ( | std::ostream * | , |
char * | data, | ||
int | num_rows | ||
) | const |
Print the row data in csv format.
Definition at line 113 of file data-provider.cc.
References cols_, impala::TYPE_BIGINT, impala::TYPE_BOOLEAN, impala::TYPE_DOUBLE, impala::TYPE_FLOAT, impala::TYPE_INT, impala::TYPE_SMALLINT, impala::TYPE_STRING, impala::TYPE_TINYINT, and impala::TYPE_VARCHAR.
Referenced by main().
void DataProvider::Reset | ( | int | num_rows, |
int | batch_size, | ||
const std::vector< ColDesc > & | columns | ||
) |
Reset the generator with the column description.
Definition at line 26 of file data-provider.cc.
References batch_size_, bytes_generated_, cols_, COUNTER_SET, data_, num_rows_, row_size_, and rows_returned_.
Referenced by main().
|
inline |
The size of a row (tuple size)
Definition at line 107 of file data-provider.h.
References row_size_.
Referenced by main().
void DataProvider::SetSeed | ( | int | seed | ) |
Sets the seed to use for randomly generated data. The default generator will use seed(0)
Definition at line 39 of file data-provider.cc.
References rand_generator_.
Referenced by DataProvider().
|
inline |
The total number of rows that will be generated.
Definition at line 110 of file data-provider.h.
References num_rows_.
Referenced by main().
|
private |
Definition at line 124 of file data-provider.h.
Referenced by NextBatch(), and Reset().
|
private |
Definition at line 131 of file data-provider.h.
Referenced by DataProvider(), NextBatch(), and Reset().
|
private |
Definition at line 129 of file data-provider.h.
Referenced by NextBatch(), Print(), and Reset().
|
private |
Definition at line 126 of file data-provider.h.
Referenced by NextBatch(), and Reset().
|
private |
Definition at line 123 of file data-provider.h.
Referenced by NextBatch(), Reset(), and total_rows().
|
private |
Definition at line 121 of file data-provider.h.
Referenced by NextBatch().
|
private |
Definition at line 122 of file data-provider.h.
|
private |
Definition at line 128 of file data-provider.h.
Referenced by NextBatch(), and SetSeed().
|
private |
Definition at line 127 of file data-provider.h.
Referenced by NextBatch(), Reset(), and row_size().
|
private |
Definition at line 125 of file data-provider.h.
Referenced by NextBatch(), and Reset().