Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <compress.h>
Public Types | |
typedef std::map< const std::string, const THdfsCompression::type > | CodecMap |
Map from codec string to compression format. More... | |
Public Member Functions | |
virtual | ~Lz4Compressor () |
virtual int64_t | MaxOutputLen (int64_t input_len, const uint8_t *input=NULL) |
virtual Status | ProcessBlock (bool output_preallocated, int64_t input_length, const uint8_t *input, int64_t *output_length, uint8_t **output) |
Process a block of data, either compressing or decompressing it. More... | |
virtual std::string | file_extension () const |
File extension to use for this compression codec. More... | |
Status | ProcessBlock32 (bool output_preallocated, int input_length, const uint8_t *input, int *output_length, uint8_t **output) |
virtual Status | ProcessBlockStreaming (int64_t input_length, const uint8_t *input, int64_t *input_bytes_read, int64_t *output_length, uint8_t **output, bool *eos) |
virtual void | Close () |
Must be called on codec before destructor for final cleanup. More... | |
bool | reuse_output_buffer () const |
Static Public Member Functions | |
static Status | CreateDecompressor (MemPool *mem_pool, bool reuse, THdfsCompression::type format, boost::scoped_ptr< Codec > *decompressor) |
static Status | CreateDecompressor (MemPool *mem_pool, bool reuse, const std::string &codec, boost::scoped_ptr< Codec > *decompressor) |
Alternate factory method: takes a codec string and populates a scoped pointer. More... | |
static Status | CreateCompressor (MemPool *mem_pool, bool reuse, THdfsCompression::type format, boost::scoped_ptr< Codec > *compressor) |
static Status | CreateCompressor (MemPool *mem_pool, bool reuse, const std::string &codec, boost::scoped_ptr< Codec > *compressor) |
Alternate factory method: takes a codec string and populates a scoped pointer. More... | |
static std::string | GetCodecName (THdfsCompression::type) |
Return the name of a compression algorithm. More... | |
static Status | GetHadoopCodecClassName (THdfsCompression::type, std::string *out_name) |
Returns the java class name for the given compression type. More... | |
Static Public Attributes | |
static const char *const | DEFAULT_COMPRESSION |
These are the codec string representations used in Hadoop. More... | |
static const char *const | GZIP_COMPRESSION = "org.apache.hadoop.io.compress.GzipCodec" |
static const char *const | BZIP2_COMPRESSION = "org.apache.hadoop.io.compress.BZip2Codec" |
static const char *const | SNAPPY_COMPRESSION = "org.apache.hadoop.io.compress.SnappyCodec" |
static const char *const | UNKNOWN_CODEC_ERROR |
static const CodecMap | CODEC_MAP |
static const int | MAX_BLOCK_SIZE = (2L * 1024 * 1024 * 1024) - 1 |
Protected Attributes | |
MemPool * | memory_pool_ |
Pool to allocate the buffer to hold transformed data. More... | |
boost::scoped_ptr< MemPool > | temp_memory_pool_ |
bool | reuse_buffer_ |
Can we reuse the output buffer or do we need to allocate on each call? More... | |
uint8_t * | out_buffer_ |
int64_t | buffer_length_ |
Length of the output buffer. More... | |
Private Member Functions | |
Lz4Compressor (MemPool *mem_pool=NULL, bool reuse_buffer=false) | |
virtual Status | Init () |
Initialize the codec. This should only be called once. More... | |
Friends | |
class | Codec |
Lz4 is a compression codec with similar compression ratios as snappy but much faster decompression. This compressor is not able to compress unless the output buffer is allocated and will cause an error if asked to do so.
Definition at line 121 of file compress.h.
|
inherited |
|
inlinevirtual |
Definition at line 123 of file compress.h.
Definition at line 281 of file compress.cc.
|
virtualinherited |
Must be called on codec before destructor for final cleanup.
Definition at line 174 of file codec.cc.
References impala::MemPool::AcquireData(), impala::Codec::memory_pool_, and impala::Codec::temp_memory_pool_.
|
staticinherited |
Create a compressor. Input: mem_pool: the memory pool used to store the compressed data. reuse: if true the allocated buffer can be reused. format: The type of compressor to create. Output: compressor: scoped pointer to the compressor class to use.
Referenced by impala::HdfsParquetTableWriter::BaseColumnWriter::BaseColumnWriter(), impala::HdfsSequenceTableWriter::Init(), impala::HdfsTextTableWriter::Init(), impala::HdfsAvroTableWriter::Init(), impala::DecompressorTest::RunTest(), impala::DecompressorTest::RunTestStreaming(), impala::RowBatch::Serialize(), impala::TEST_F(), and impala::TestCompression().
|
staticinherited |
Alternate factory method: takes a codec string and populates a scoped pointer.
|
staticinherited |
Create a decompressor. Input: mem_pool: the memory pool used to store the decompressed data. reuse: if true the allocated buffer can be reused. format: the type of decompressor to create. Output: decompressor: scoped pointer to the decompressor class to use. If mem_pool is NULL, then the resulting codec will never allocate memory and the caller must be responsible for it.
Referenced by impala::HdfsRCFileScanner::InitNewRange(), impala::HdfsParquetScanner::BaseColumnReader::Reset(), impala::RowBatch::RowBatch(), impala::DecompressorTest::RunTest(), impala::DecompressorTest::RunTestStreaming(), and impala::HdfsScanner::UpdateDecompressor().
|
staticinherited |
Alternate factory method: takes a codec string and populates a scoped pointer.
|
inlinevirtual |
File extension to use for this compression codec.
Implements impala::Codec.
Definition at line 127 of file compress.h.
|
staticinherited |
Return the name of a compression algorithm.
Definition at line 50 of file codec.cc.
Referenced by impala::HdfsParquetTableWriter::Init().
|
staticinherited |
Returns the java class name for the given compression type.
Definition at line 59 of file codec.cc.
References impala::Status::OK.
Referenced by impala::HdfsSequenceTableWriter::Init().
|
inlineprivatevirtual |
Initialize the codec. This should only be called once.
Implements impala::Codec.
Definition at line 132 of file compress.h.
References impala::Status::OK.
|
virtual |
Returns the maximum result length from applying the codec to input. Note this is not the exact result length, simply a bound to allow preallocating a buffer. This must be an O(1) operation (i.e. cannot read all of input). Codecs that don't support this should return -1.
Implements impala::Codec.
Definition at line 285 of file compress.cc.
|
virtual |
Process a block of data, either compressing or decompressing it.
If output_preallocated is true, *output_length must be the length of *output and data will be written directly to *output (*output must be big enough to contain the transformed output). If output_preallocated is false, *output will be allocated from the codec's mempool. In this case, a mempool must have been passed into the c'tor. In either case, *output_length will be set to the actual length of the transformed output. Inputs: input_length: length of the data to process input: data to process
Implements impala::Codec.
Definition at line 289 of file compress.cc.
References impala::Status::OK.
|
inherited |
Wrapper to the actual ProcessBlock() function. This wrapper uses lengths as ints and not int64_ts. We need to keep this interface because the Parquet thrift uses ints. See IMPALA-1116.
Definition at line 181 of file codec.cc.
References impala::Status::OK, impala::Codec::ProcessBlock(), RETURN_IF_ERROR, and UNLIKELY.
|
inlinevirtualinherited |
Process data like ProcessBlock(), but can consume partial input and may only produce partial output. *input_bytes_read returns the number of bytes of input that have been consumed. Even if all input has been consumed, the caller must continue calling to fetch output until *eos returns true.
Reimplemented in impala::GzipDecompressor.
Definition at line 117 of file codec.h.
Referenced by impala::DecompressorTest::CompressAndStreamingDecompress().
|
inlineinherited |
Definition at line 135 of file codec.h.
References impala::Codec::reuse_buffer_.
|
friend |
Definition at line 130 of file compress.h.
|
protectedinherited |
Length of the output buffer.
Definition at line 168 of file codec.h.
Referenced by impala::GzipDecompressor::ProcessBlock(), impala::GzipCompressor::ProcessBlock(), impala::BzipDecompressor::ProcessBlock(), impala::BzipCompressor::ProcessBlock(), impala::SnappyDecompressor::ProcessBlock(), impala::SnappyBlockCompressor::ProcessBlock(), impala::SnappyCompressor::ProcessBlock(), impala::SnappyBlockDecompressor::ProcessBlock(), and impala::GzipDecompressor::ProcessBlockStreaming().
|
staticinherited |
|
staticinherited |
Definition at line 52 of file codec.h.
Referenced by impala::HdfsSequenceScanner::ReadFileHeader(), and impala::HdfsRCFileScanner::ReadFileHeader().
|
staticinherited |
|
staticinherited |
|
staticinherited |
Largest block we will compress/decompress: 2GB. We are dealing with compressed blocks that are never this big but we want to guard against a corrupt file that has the block length as some large number.
Definition at line 140 of file codec.h.
Referenced by impala::GzipDecompressor::ProcessBlock(), impala::BzipDecompressor::ProcessBlock(), impala::SnappyDecompressor::ProcessBlock(), impala::SnappyBlockDecompressor::ProcessBlock(), and SnappyBlockDecompress().
|
protectedinherited |
Pool to allocate the buffer to hold transformed data.
Definition at line 154 of file codec.h.
Referenced by impala::Codec::Close(), impala::Codec::Codec(), impala::GzipDecompressor::ProcessBlock(), impala::GzipCompressor::ProcessBlock(), impala::BzipDecompressor::ProcessBlock(), impala::BzipCompressor::ProcessBlock(), impala::SnappyDecompressor::ProcessBlock(), impala::SnappyBlockCompressor::ProcessBlock(), impala::SnappyCompressor::ProcessBlock(), impala::SnappyBlockDecompressor::ProcessBlock(), and impala::GzipDecompressor::ProcessBlockStreaming().
|
protectedinherited |
Buffer to hold transformed data. Either passed from the caller or allocated from memory_pool_.
Definition at line 165 of file codec.h.
Referenced by impala::GzipDecompressor::ProcessBlock(), impala::GzipCompressor::ProcessBlock(), impala::BzipDecompressor::ProcessBlock(), impala::BzipCompressor::ProcessBlock(), impala::SnappyDecompressor::ProcessBlock(), impala::SnappyBlockCompressor::ProcessBlock(), impala::SnappyCompressor::ProcessBlock(), impala::SnappyBlockDecompressor::ProcessBlock(), and impala::GzipDecompressor::ProcessBlockStreaming().
|
protectedinherited |
Can we reuse the output buffer or do we need to allocate on each call?
Definition at line 161 of file codec.h.
Referenced by impala::GzipDecompressor::ProcessBlock(), impala::GzipCompressor::ProcessBlock(), impala::BzipDecompressor::ProcessBlock(), impala::BzipCompressor::ProcessBlock(), impala::SnappyDecompressor::ProcessBlock(), impala::SnappyBlockCompressor::ProcessBlock(), impala::SnappyCompressor::ProcessBlock(), impala::SnappyBlockDecompressor::ProcessBlock(), impala::GzipDecompressor::ProcessBlockStreaming(), and impala::Codec::reuse_output_buffer().
|
staticinherited |
|
protectedinherited |
Temporary memory pool: in case we get the output size too small we can use this to free unused buffers.
Definition at line 158 of file codec.h.
Referenced by impala::Codec::Close(), impala::Codec::Codec(), impala::GzipDecompressor::ProcessBlock(), impala::BzipDecompressor::ProcessBlock(), and impala::BzipCompressor::ProcessBlock().
|
staticinherited |