- Maven-Central
- /
- com.ning.billing /.....bundles-analytics
- /
- 0.3.4
- /
- com.google.common.hash.HashFunction

Stack Overflow Questions

` * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except`

` * Unless required by applicable law or agreed to in writing, software distributed under the License`

` * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express`

` * or implied. See the License for the specific language governing permissions and limitations under`

A hash function is a collision-averse pure function that maps an arbitrary block of
data to a number called a *hash code*.
### Definition

### Desirable properties

### Providing input to a hash function

### Relationship to

Unpacking this definition:

**block of data:**the input for a hash function is always, in concept, an ordered byte array. This hashing API accepts an arbitrary sequence of byte and multibyte values (via

), but this is merely a convenience; these are always translated into raw byte sequences under the covers.`Hasher`

**hash code:**each hash function always yields hash codes of the same fixed bit length (given by

). For example,`bits()`

produces a 160-bit number, while`Hashing.sha1()`

yields only 32 bits. Because a`Hashing.murmur3_32()`

`long`

value is clearly insufficient to hold all hash code values, this API represents a hash code as an instance of

.`HashCode`

**pure function:**the value produced must depend only on the input bytes, in the order they appear. Input data is never modified.

instances should always be stateless, and therefore thread-safe.`HashFunction`

**collision-averse:**while it can't be helped that a hash function will sometimes produce the same hash code for distinct inputs (a "collision"), every hash function strives to*some*degree to make this unlikely. (Without this condition, a function that always returns zero could be called a hash function. It is not.)

Summarizing the last two points: "equal yield equal *always*; unequal yield
unequal *often*." This is the most important characteristic of all hash functions.

A high-quality hash function strives for some subset of the following virtues:

**collision-resistant:**while the definition above requires making at least*some*token attempt, one measure of the quality of a hash function is*how well*it succeeds at this goal. Important note: it may be easy to achieve the theoretical minimum collision rate when using completely*random*sample input. The true test of a hash function is how it performs on representative real-world data, which tends to contain many hidden patterns and clumps. The goal of a good hash function is to stamp these patterns out as thoroughly as possible.**bit-dispersing:**masking out any*single bit*from a hash code should yield only the expected*twofold*increase to all collision rates. Informally, the "information" in the hash code should be as evenly "spread out" through the hash code's bits as possible. The result is that, for example, when choosing a bucket in a hash table of size 2^8,*any*eight bits could be consistently used.**cryptographic:**certain hash functions such as

are designed to make it as infeasible as possible to reverse-engineer the input that produced a given hash code, or even to discover`Hashing.sha512()`

*any*two distinct inputs that yield the same result. These are called*cryptographic hash functions*. But, whenever it is learned that either of these feats has become computationally feasible, the function is deemed "broken" and should no longer be used for secure purposes. (This is the likely eventual fate of*all*cryptographic hashes.)**fast:**perhaps self-explanatory, but often the most important consideration. We have published microbenchmark results for many common hash functions.

The primary way to provide the data that your hash function should act on is via a

. Obtain a new hasher from the hash function using `Hasher`

,
"push" the relevant data into it using methods like `newHasher()`

,
and finally ask for the `Hasher.putBytes(byte[])`

`HashCode`

when finished using

. (See
an example of this.)
`Hasher.hash()`

If all you want to hash is a single byte array, string or `long`

value, there
are convenient shortcut methods defined directly on

to make this
easier.
`HashFunction`

Hasher accepts primitive data types, but can also accept any Object of type `T`

provided that you implement a `Funnel<T>`

to specify how to "feed" data
from that object into the function. (See an example of
this.)

**Compatibility note:** Throughout this API, multibyte values are always
interpreted in *little-endian* order. That is, hashing the byte array `{0x01, 0x02, 0x03, 0x04`

} is equivalent to hashing the `int`

value `0x04030201`

. If this isn't what you need, methods such as

and `java.lang.Integer.reverseBytes(int)`

will help.
`com.google.common.primitives.Ints.toByteArray(int)`

`java.lang.Object.hashCode()`

Java's baked-in concept of hash codes is constrained to 32 bits, and provides no
separation between hash algorithms and the data they act on, so alternate hash
algorithms can't be easily substituted. Also, implementations of `hashCode`

tend
to be poor-quality, in part because they end up depending on *other* existing
poor-quality `hashCode`

implementations, including those in many JDK classes.

`Object.hashCode`

implementations tend to be very fast, but have weak
collision prevention and *no* expectation of bit dispersion. This leaves them
perfectly suitable for use in hash tables, because extra collisions cause only a slight
performance hit, while poor bit dispersion is easily corrected using a secondary hash
function (which all reasonable hash table implementations in Java use). For the many
uses of hash functions beyond data structures, however, `Object.hashCode`

almost
always falls short -- hence this library.

- Author(s):
- Kevin Bourrillion
- Since:
- 11.0

Begins a new hash code computation as

`newHasher()`

, but provides a hint of the
expected size of the input (in bytes). This is only important for non-streaming hash
functions (hash functions that need to buffer their whole input before processing any
of it).
Shortcut for *might* perform better than its longhand equivalent, but should not perform
worse.

`newHasher().putBytes(input, off, len).hash()`

. The implementation
- Throws:
`java.lang.IndexOutOfBoundsException`

if`off < 0`

or`off + len > bytes.length`

or`len < 0`

Shortcut for *might*
perform better than its longhand equivalent, but should not perform worse. Note that no
character encoding is performed; the low byte and high byte of each character are hashed
directly (in that order). This is equivalent to using

`newHasher().putString(input).hash()`

. The implementation `hashString(input, Charsets.UTF_16LE)`

.
Shortcut for *might* perform better than its
longhand equivalent, but should not perform worse.

`newHasher().putString(input, charset).hash()`

. Characters are encoded
using the given `java.nio.charset.Charset`

. The implementation