Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::HashUtil Class Reference

Utility class to compute hash values. More...

#include <hash-util.h>

Collaboration diagram for impala::HashUtil:

Static Public Member Functions

static uint32_t CrcHash (const void *data, int32_t bytes, uint32_t hash)
 
static uint64_t MurmurHash2_64 (const void *input, int len, uint64_t seed)
 Murmur2 hash implementation returning 64-bit hashes. More...
 
static uint64_t FnvHash64 (const void *data, int32_t bytes, uint64_t hash)
 
static uint32_t FnvHash64to32 (const void *data, int32_t bytes, uint32_t hash)
 
static uint32_t Hash (const void *data, int32_t bytes, uint32_t seed)
 

Static Public Attributes

static const uint64_t MURMUR_PRIME = 0xc6a4a7935bd1e995
 
static const int MURMUR_R = 47
 
static const uint32_t FNV_PRIME = 0x01000193
 default values recommended by http://isthe.com/chongo/tech/comp/fnv/ More...
 
static const uint32_t FNV_SEED = 0x811C9DC5
 
static const uint64_t FNV64_PRIME = 1099511628211UL
 
static const uint64_t FNV64_SEED = 14695981039346656037UL
 

Detailed Description

Utility class to compute hash values.

Definition at line 28 of file hash-util.h.

Member Function Documentation

static uint32_t impala::HashUtil::CrcHash ( const void *  data,
int32_t  bytes,
uint32_t  hash 
)
inlinestatic

Compute the Crc32 hash for data using SSE4 instructions. The input hash parameter is the current hash/seed value. This should only be called if SSE is supported. This is ~4x faster than Fnv/Boost Hash. NOTE: Any changes made to this function need to be reflected in Codegen::GetHashFn. TODO: crc32 hashes with different seeds do not result in different hash functions. The resulting hashes are correlated.

Definition at line 37 of file hash-util.h.

References impala::hash, impala::CpuInfo::IsSupported(), impala::CpuInfo::SSE4_2, impala::SSE4_crc32_u32(), and impala::SSE4_crc32_u8().

Referenced by Hash(), TestCompactStringsRandom(), TestCompactStringsSequential(), TestCrcIntHash(), TestCrcMixedHash(), TestNormalStringsRandom(), and TestNormalStringsSequential().

static uint64_t impala::HashUtil::FnvHash64 ( const void *  data,
int32_t  bytes,
uint64_t  hash 
)
inlinestatic

Implementation of the Fowler–Noll–Vo hash function. This is not as performant as boost's hash on int types (2x slower) but has bit entropy. For ints, boost just returns the value of the int which can be pathological. For example, if the data is <1000, 2000, 3000, 4000, ..> and then the mod of 1000 is taken on the hash, all values will collide to the same bucket. For string values, Fnv is slightly faster than boost. IMPORTANT: FNV hash suffers from poor diffusion of the least significant bit, which can lead to poor results when input bytes are duplicated. See FnvHash64to32() for how this can be mitigated.

Definition at line 112 of file hash-util.h.

References FNV64_PRIME, and impala::hash.

Referenced by FnvHash64to32().

static uint32_t impala::HashUtil::FnvHash64to32 ( const void *  data,
int32_t  bytes,
uint32_t  hash 
)
inlinestatic

Return a 32-bit hash computed by invoking FNV-64 and folding the result to 32-bits. This technique is recommended instead of FNV-32 since the LSB of an FNV hash is the XOR of the LSBs of its input bytes, leading to poor results for duplicate inputs. The input seed 'hash' is duplicated so the top half of the seed is not all zero.

Definition at line 125 of file hash-util.h.

References FnvHash64().

Referenced by impala::RawValue::GetHashValueFnv(), TestFnvIntHash(), and TestFnvMixedHash().

static uint32_t impala::HashUtil::Hash ( const void *  data,
int32_t  bytes,
uint32_t  seed 
)
inlinestatic

Computes the hash value for data. Will call either CrcHash or MurmurHash depending on hardware capabilities. Seed values for different steps of the query execution should use different seeds to prevent accidental key collisions. (See IMPALA-219 for more details).

Definition at line 135 of file hash-util.h.

References CrcHash(), impala::CpuInfo::IsSupported(), LIKELY, MurmurHash2_64(), and impala::CpuInfo::SSE4_2.

Referenced by impala::RawValue::GetHashValue(), impala::DictEncoder< T >::Hash(), impala::HashTableCtx::Hash(), impala::TimestampValue::Hash(), impala::DecimalValue< int128_t >::Hash(), impala::hash_value(), impala::OldHashTable::HashCurrentRow(), impala::OldHashTable::HashVariableLenRow(), main(), and impala::TEST().

static uint64_t impala::HashUtil::MurmurHash2_64 ( const void *  input,
int  len,
uint64_t  seed 
)
inlinestatic

Murmur2 hash implementation returning 64-bit hashes.

Definition at line 64 of file hash-util.h.

References MURMUR_PRIME, and MURMUR_R.

Referenced by Hash(), and impala::HashTableCtx::Hash().

Member Data Documentation

const uint64_t impala::HashUtil::FNV64_PRIME = 1099511628211UL
static

Definition at line 100 of file hash-util.h.

Referenced by FnvHash64().

const uint64_t impala::HashUtil::FNV64_SEED = 14695981039346656037UL
static

Definition at line 101 of file hash-util.h.

Referenced by impala::AggregateFunctions::HllUpdate().

const uint32_t impala::HashUtil::FNV_PRIME = 0x01000193
static

default values recommended by http://isthe.com/chongo/tech/comp/fnv/

Definition at line 98 of file hash-util.h.

const uint64_t impala::HashUtil::MURMUR_PRIME = 0xc6a4a7935bd1e995
static

Definition at line 60 of file hash-util.h.

Referenced by MurmurHash2_64().

const int impala::HashUtil::MURMUR_R = 47
static

Definition at line 61 of file hash-util.h.

Referenced by MurmurHash2_64().


The documentation for this class was generated from the following file: