Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
Utility class to compute hash values. More...
#include <hash-util.h>
Static Public Member Functions | |
static uint32_t | CrcHash (const void *data, int32_t bytes, uint32_t hash) |
static uint64_t | MurmurHash2_64 (const void *input, int len, uint64_t seed) |
Murmur2 hash implementation returning 64-bit hashes. More... | |
static uint64_t | FnvHash64 (const void *data, int32_t bytes, uint64_t hash) |
static uint32_t | FnvHash64to32 (const void *data, int32_t bytes, uint32_t hash) |
static uint32_t | Hash (const void *data, int32_t bytes, uint32_t seed) |
Static Public Attributes | |
static const uint64_t | MURMUR_PRIME = 0xc6a4a7935bd1e995 |
static const int | MURMUR_R = 47 |
static const uint32_t | FNV_PRIME = 0x01000193 |
default values recommended by http://isthe.com/chongo/tech/comp/fnv/ More... | |
static const uint32_t | FNV_SEED = 0x811C9DC5 |
static const uint64_t | FNV64_PRIME = 1099511628211UL |
static const uint64_t | FNV64_SEED = 14695981039346656037UL |
Utility class to compute hash values.
Definition at line 28 of file hash-util.h.
|
inlinestatic |
Compute the Crc32 hash for data using SSE4 instructions. The input hash parameter is the current hash/seed value. This should only be called if SSE is supported. This is ~4x faster than Fnv/Boost Hash. NOTE: Any changes made to this function need to be reflected in Codegen::GetHashFn. TODO: crc32 hashes with different seeds do not result in different hash functions. The resulting hashes are correlated.
Definition at line 37 of file hash-util.h.
References impala::hash, impala::CpuInfo::IsSupported(), impala::CpuInfo::SSE4_2, impala::SSE4_crc32_u32(), and impala::SSE4_crc32_u8().
Referenced by Hash(), TestCompactStringsRandom(), TestCompactStringsSequential(), TestCrcIntHash(), TestCrcMixedHash(), TestNormalStringsRandom(), and TestNormalStringsSequential().
|
inlinestatic |
Implementation of the Fowler–Noll–Vo hash function. This is not as performant as boost's hash on int types (2x slower) but has bit entropy. For ints, boost just returns the value of the int which can be pathological. For example, if the data is <1000, 2000, 3000, 4000, ..> and then the mod of 1000 is taken on the hash, all values will collide to the same bucket. For string values, Fnv is slightly faster than boost. IMPORTANT: FNV hash suffers from poor diffusion of the least significant bit, which can lead to poor results when input bytes are duplicated. See FnvHash64to32() for how this can be mitigated.
Definition at line 112 of file hash-util.h.
References FNV64_PRIME, and impala::hash.
Referenced by FnvHash64to32().
|
inlinestatic |
Return a 32-bit hash computed by invoking FNV-64 and folding the result to 32-bits. This technique is recommended instead of FNV-32 since the LSB of an FNV hash is the XOR of the LSBs of its input bytes, leading to poor results for duplicate inputs. The input seed 'hash' is duplicated so the top half of the seed is not all zero.
Definition at line 125 of file hash-util.h.
References FnvHash64().
Referenced by impala::RawValue::GetHashValueFnv(), TestFnvIntHash(), and TestFnvMixedHash().
|
inlinestatic |
Computes the hash value for data. Will call either CrcHash or MurmurHash depending on hardware capabilities. Seed values for different steps of the query execution should use different seeds to prevent accidental key collisions. (See IMPALA-219 for more details).
Definition at line 135 of file hash-util.h.
References CrcHash(), impala::CpuInfo::IsSupported(), LIKELY, MurmurHash2_64(), and impala::CpuInfo::SSE4_2.
Referenced by impala::RawValue::GetHashValue(), impala::DictEncoder< T >::Hash(), impala::HashTableCtx::Hash(), impala::TimestampValue::Hash(), impala::DecimalValue< int128_t >::Hash(), impala::hash_value(), impala::OldHashTable::HashCurrentRow(), impala::OldHashTable::HashVariableLenRow(), main(), and impala::TEST().
|
inlinestatic |
Murmur2 hash implementation returning 64-bit hashes.
Definition at line 64 of file hash-util.h.
References MURMUR_PRIME, and MURMUR_R.
Referenced by Hash(), and impala::HashTableCtx::Hash().
|
static |
Definition at line 100 of file hash-util.h.
Referenced by FnvHash64().
|
static |
Definition at line 101 of file hash-util.h.
Referenced by impala::AggregateFunctions::HllUpdate().
|
static |
default values recommended by http://isthe.com/chongo/tech/comp/fnv/
Definition at line 98 of file hash-util.h.
|
static |
Definition at line 99 of file hash-util.h.
Referenced by impala::DataStreamTest::CheckReceivers(), CodegenAssignNullValue(), CodegenCrcHash(), impala::DataStreamSender::Send(), TestBoostIntHash(), TestBoostMixedHash(), TestCrcIntHash(), TestCrcMixedHash(), TestFnvIntHash(), and TestFnvMixedHash().
|
static |
Definition at line 60 of file hash-util.h.
Referenced by MurmurHash2_64().
|
static |
Definition at line 61 of file hash-util.h.
Referenced by MurmurHash2_64().