008 - djb2 hash
Written by Daniel J. Bernstein (also known as djb), this simple hash function dates back to 1991.
Hash functions have wide applications in computer science and in cryptography. They are used to map a potentially large amount of data to a number that represents it.
For example, the hash function from this code snippet maps Hello
to
210676686969
, but Hello!
to 6952330670010
. Despite the fact there’s only
one character different (the exclamation mark), the number returned is
completely different.
Snippet explanation
The simple C function starts with the hash
variable set to the number 5381.
It then iterates the given array of characters str
and performs the following
operations for each character:
-
Multiply the
hash
variable by 33. -
Add the ASCII value of the current character to it.
After iterating through the whole array, it returns the value held by hash
.
Multiplying hash
by 33 could be performed by doing hash * 33
. However,
the function instead uses (hash << 5) + hash
bit shifts
which is on many CPUs a faster way to perform this operation. hash << 5
“shifts” the bits to the left by 5 spaces, multiplying the number by 32 (2^5)
and + hash
adds another value of hash
, turning this into multiplying by 33.
More information
The starting number 5381 was picked by djb simply because testing showed that it results in fewer collisions and better avalanching.
Interestingly, the choice of 33 has never been adequately explained.
- Why are 5381 and 33 so important in the djb2 algorithm?
- Reason for 5381 number in DJB hash function?
- hash function for mac needed (original source of the function)