Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::SnappyBlockCompressor Class Reference

#include <compress.h>

Inheritance diagram for impala::SnappyBlockCompressor:
Collaboration diagram for impala::SnappyBlockCompressor:

Public Types

typedef std::map< const
std::string, const
THdfsCompression::type > 
CodecMap
 Map from codec string to compression format. More...
 

Public Member Functions

virtual ~SnappyBlockCompressor ()
 
virtual int64_t MaxOutputLen (int64_t input_len, const uint8_t *input=NULL)
 
virtual Status ProcessBlock (bool output_preallocated, int64_t input_length, const uint8_t *input, int64_t *output_length, uint8_t **output)
 Process a block of data, either compressing or decompressing it. More...
 
virtual std::string file_extension () const
 File extension to use for this compression codec. More...
 
Status ProcessBlock32 (bool output_preallocated, int input_length, const uint8_t *input, int *output_length, uint8_t **output)
 
virtual Status ProcessBlockStreaming (int64_t input_length, const uint8_t *input, int64_t *input_bytes_read, int64_t *output_length, uint8_t **output, bool *eos)
 
virtual void Close ()
 Must be called on codec before destructor for final cleanup. More...
 
bool reuse_output_buffer () const
 

Static Public Member Functions

static Status CreateDecompressor (MemPool *mem_pool, bool reuse, THdfsCompression::type format, boost::scoped_ptr< Codec > *decompressor)
 
static Status CreateDecompressor (MemPool *mem_pool, bool reuse, const std::string &codec, boost::scoped_ptr< Codec > *decompressor)
 Alternate factory method: takes a codec string and populates a scoped pointer. More...
 
static Status CreateCompressor (MemPool *mem_pool, bool reuse, THdfsCompression::type format, boost::scoped_ptr< Codec > *compressor)
 
static Status CreateCompressor (MemPool *mem_pool, bool reuse, const std::string &codec, boost::scoped_ptr< Codec > *compressor)
 Alternate factory method: takes a codec string and populates a scoped pointer. More...
 
static std::string GetCodecName (THdfsCompression::type)
 Return the name of a compression algorithm. More...
 
static Status GetHadoopCodecClassName (THdfsCompression::type, std::string *out_name)
 Returns the java class name for the given compression type. More...
 

Static Public Attributes

static const char *const DEFAULT_COMPRESSION
 These are the codec string representations used in Hadoop. More...
 
static const char *const GZIP_COMPRESSION = "org.apache.hadoop.io.compress.GzipCodec"
 
static const char *const BZIP2_COMPRESSION = "org.apache.hadoop.io.compress.BZip2Codec"
 
static const char *const SNAPPY_COMPRESSION = "org.apache.hadoop.io.compress.SnappyCodec"
 
static const char *const UNKNOWN_CODEC_ERROR
 
static const CodecMap CODEC_MAP
 
static const int MAX_BLOCK_SIZE = (2L * 1024 * 1024 * 1024) - 1
 

Protected Attributes

MemPoolmemory_pool_
 Pool to allocate the buffer to hold transformed data. More...
 
boost::scoped_ptr< MemPooltemp_memory_pool_
 
bool reuse_buffer_
 Can we reuse the output buffer or do we need to allocate on each call? More...
 
uint8_t * out_buffer_
 
int64_t buffer_length_
 Length of the output buffer. More...
 

Private Member Functions

 SnappyBlockCompressor (MemPool *mem_pool, bool reuse_buffer)
 
virtual Status Init ()
 Initialize the codec. This should only be called once. More...
 

Friends

class Codec
 

Detailed Description

Definition at line 84 of file compress.h.

Member Typedef Documentation

typedef std::map<const std::string, const THdfsCompression::type> impala::Codec::CodecMap
inherited

Map from codec string to compression format.

Definition at line 51 of file codec.h.

Constructor & Destructor Documentation

virtual impala::SnappyBlockCompressor::~SnappyBlockCompressor ( )
inlinevirtual

Definition at line 86 of file compress.h.

SnappyBlockCompressor::SnappyBlockCompressor ( MemPool mem_pool,
bool  reuse_buffer 
)
private

Definition at line 188 of file compress.cc.

Member Function Documentation

void Codec::Close ( )
virtualinherited

Must be called on codec before destructor for final cleanup.

Definition at line 174 of file codec.cc.

References impala::MemPool::AcquireData(), impala::Codec::memory_pool_, and impala::Codec::temp_memory_pool_.

static Status impala::Codec::CreateCompressor ( MemPool mem_pool,
bool  reuse,
THdfsCompression::type  format,
boost::scoped_ptr< Codec > *  compressor 
)
staticinherited

Create a compressor. Input: mem_pool: the memory pool used to store the compressed data. reuse: if true the allocated buffer can be reused. format: The type of compressor to create. Output: compressor: scoped pointer to the compressor class to use.

Referenced by impala::HdfsParquetTableWriter::BaseColumnWriter::BaseColumnWriter(), impala::HdfsSequenceTableWriter::Init(), impala::HdfsTextTableWriter::Init(), impala::HdfsAvroTableWriter::Init(), impala::DecompressorTest::RunTest(), impala::DecompressorTest::RunTestStreaming(), impala::RowBatch::Serialize(), impala::TEST_F(), and impala::TestCompression().

static Status impala::Codec::CreateCompressor ( MemPool mem_pool,
bool  reuse,
const std::string &  codec,
boost::scoped_ptr< Codec > *  compressor 
)
staticinherited

Alternate factory method: takes a codec string and populates a scoped pointer.

static Status impala::Codec::CreateDecompressor ( MemPool mem_pool,
bool  reuse,
THdfsCompression::type  format,
boost::scoped_ptr< Codec > *  decompressor 
)
staticinherited

Create a decompressor. Input: mem_pool: the memory pool used to store the decompressed data. reuse: if true the allocated buffer can be reused. format: the type of decompressor to create. Output: decompressor: scoped pointer to the decompressor class to use. If mem_pool is NULL, then the resulting codec will never allocate memory and the caller must be responsible for it.

Referenced by impala::HdfsRCFileScanner::InitNewRange(), impala::HdfsParquetScanner::BaseColumnReader::Reset(), impala::RowBatch::RowBatch(), impala::DecompressorTest::RunTest(), impala::DecompressorTest::RunTestStreaming(), and impala::HdfsScanner::UpdateDecompressor().

static Status impala::Codec::CreateDecompressor ( MemPool mem_pool,
bool  reuse,
const std::string &  codec,
boost::scoped_ptr< Codec > *  decompressor 
)
staticinherited

Alternate factory method: takes a codec string and populates a scoped pointer.

virtual std::string impala::SnappyBlockCompressor::file_extension ( ) const
inlinevirtual

File extension to use for this compression codec.

Implements impala::Codec.

Definition at line 90 of file compress.h.

string Codec::GetCodecName ( THdfsCompression::type  type)
staticinherited

Return the name of a compression algorithm.

Definition at line 50 of file codec.cc.

Referenced by impala::HdfsParquetTableWriter::Init().

Status Codec::GetHadoopCodecClassName ( THdfsCompression::type  ,
std::string *  out_name 
)
staticinherited

Returns the java class name for the given compression type.

Definition at line 59 of file codec.cc.

References impala::Status::OK.

Referenced by impala::HdfsSequenceTableWriter::Init().

virtual Status impala::SnappyBlockCompressor::Init ( )
inlineprivatevirtual

Initialize the codec. This should only be called once.

Implements impala::Codec.

Definition at line 95 of file compress.h.

References impala::Status::OK.

int64_t SnappyBlockCompressor::MaxOutputLen ( int64_t  input_len,
const uint8_t *  input = NULL 
)
virtual

Returns the maximum result length from applying the codec to input. Note this is not the exact result length, simply a bound to allow preallocating a buffer. This must be an O(1) operation (i.e. cannot read all of input). Codecs that don't support this should return -1.

Implements impala::Codec.

Definition at line 192 of file compress.cc.

Status SnappyBlockCompressor::ProcessBlock ( bool  output_preallocated,
int64_t  input_length,
const uint8_t *  input,
int64_t *  output_length,
uint8_t **  output 
)
virtual

Process a block of data, either compressing or decompressing it.

If output_preallocated is true, *output_length must be the length of *output and data will be written directly to *output (*output must be big enough to contain the transformed output). If output_preallocated is false, *output will be allocated from the codec's mempool. In this case, a mempool must have been passed into the c'tor. In either case, *output_length will be set to the actual length of the transformed output. Inputs: input_length: length of the data to process input: data to process

Implements impala::Codec.

Definition at line 197 of file compress.cc.

References impala::MemPool::Allocate(), impala::Codec::buffer_length_, impala::Codec::memory_pool_, impala::Status::OK, impala::Codec::out_buffer_, impala::ReadWriteUtil::PutInt(), and impala::Codec::reuse_buffer_.

Status Codec::ProcessBlock32 ( bool  output_preallocated,
int  input_length,
const uint8_t *  input,
int *  output_length,
uint8_t **  output 
)
inherited

Wrapper to the actual ProcessBlock() function. This wrapper uses lengths as ints and not int64_ts. We need to keep this interface because the Parquet thrift uses ints. See IMPALA-1116.

Definition at line 181 of file codec.cc.

References impala::Status::OK, impala::Codec::ProcessBlock(), RETURN_IF_ERROR, and UNLIKELY.

virtual Status impala::Codec::ProcessBlockStreaming ( int64_t  input_length,
const uint8_t *  input,
int64_t *  input_bytes_read,
int64_t *  output_length,
uint8_t **  output,
bool eos 
)
inlinevirtualinherited

Process data like ProcessBlock(), but can consume partial input and may only produce partial output. *input_bytes_read returns the number of bytes of input that have been consumed. Even if all input has been consumed, the caller must continue calling to fetch output until *eos returns true.

Reimplemented in impala::GzipDecompressor.

Definition at line 117 of file codec.h.

Referenced by impala::DecompressorTest::CompressAndStreamingDecompress().

bool impala::Codec::reuse_output_buffer ( ) const
inlineinherited

Definition at line 135 of file codec.h.

References impala::Codec::reuse_buffer_.

Friends And Related Function Documentation

friend class Codec
friend

Definition at line 93 of file compress.h.

Member Data Documentation

const char *const Codec::BZIP2_COMPRESSION = "org.apache.hadoop.io.compress.BZip2Codec"
staticinherited

Definition at line 46 of file codec.h.

const Codec::CodecMap Codec::CODEC_MAP
staticinherited
Initial value:
= map_list_of
("", THdfsCompression::NONE)
(DEFAULT_COMPRESSION, THdfsCompression::DEFAULT)
(GZIP_COMPRESSION, THdfsCompression::GZIP)
(BZIP2_COMPRESSION, THdfsCompression::BZIP2)
(SNAPPY_COMPRESSION, THdfsCompression::SNAPPY_BLOCKED)

Definition at line 52 of file codec.h.

Referenced by impala::HdfsSequenceScanner::ReadFileHeader(), and impala::HdfsRCFileScanner::ReadFileHeader().

const char *const Codec::DEFAULT_COMPRESSION
staticinherited
Initial value:
=
"org.apache.hadoop.io.compress.DefaultCodec"

These are the codec string representations used in Hadoop.

Definition at line 44 of file codec.h.

const char *const Codec::GZIP_COMPRESSION = "org.apache.hadoop.io.compress.GzipCodec"
staticinherited

Definition at line 45 of file codec.h.

const int impala::Codec::MAX_BLOCK_SIZE = (2L * 1024 * 1024 * 1024) - 1
staticinherited

Largest block we will compress/decompress: 2GB. We are dealing with compressed blocks that are never this big but we want to guard against a corrupt file that has the block length as some large number.

Definition at line 140 of file codec.h.

Referenced by impala::GzipDecompressor::ProcessBlock(), impala::BzipDecompressor::ProcessBlock(), impala::SnappyDecompressor::ProcessBlock(), impala::SnappyBlockDecompressor::ProcessBlock(), and SnappyBlockDecompress().

const char *const Codec::SNAPPY_COMPRESSION = "org.apache.hadoop.io.compress.SnappyCodec"
staticinherited

Definition at line 47 of file codec.h.

boost::scoped_ptr<MemPool> impala::Codec::temp_memory_pool_
protectedinherited

Temporary memory pool: in case we get the output size too small we can use this to free unused buffers.

Definition at line 158 of file codec.h.

Referenced by impala::Codec::Close(), impala::Codec::Codec(), impala::GzipDecompressor::ProcessBlock(), impala::BzipDecompressor::ProcessBlock(), and impala::BzipCompressor::ProcessBlock().

const char *const Codec::UNKNOWN_CODEC_ERROR
staticinherited
Initial value:
=
"This compression codec is currently unsupported: "

Definition at line 48 of file codec.h.


The documentation for this class was generated from the following files: