Data Structures and Algorithms: Hash Functions (2024)

Data Structures and Algorithms
8.3.3 Hashing Functions

Choosing a good hashing function, h(k),is essential for hash-table based searching.h should distribute the elements of our collection asuniformly as possible to the "slots" of the hash table. The key criterion is that there should be a minimum number of collisions.

If the probability that a key, k, occurs in ourcollection is P(k), then if there are mslots in our hash table,a uniform hashing function, h(k), wouldensure:

Data Structures and Algorithms: Hash Functions (1)

Sometimes, this is easy to ensure.For example, if the keys are randomly distributed in(0,r],then,

h(k) = floor((mk)/r)

will provide uniform hashing.

Mapping keys to natural numbers

Most hashing functions will first map the keysto some set of natural numbers, say (0,r].There are many ways to do this,for example if the key is a string of ASCII characters,we can simply add the ASCII representations of thecharacters mod 255 to produce a number in (0,255) -or we could xor them,or we could add them in pairs mod 216-1,or ...

Having mapped the keys to a set of natural numbers,we then have a number of possibilities.

  1. Use a mod function:

    h(k) = k mod m.

    When using this method, weusually avoid certain values of m.Powers of 2 are usually avoided,for k mod 2bsimply selects the b low order bits of k.Unless we know that all the 2b possiblevalues of the lower order bits are equally likely,this will not be a good choice,because some bits of the key are not used in the hash function.

    Prime numbers which are close to powers of 2 seem to be generally good choices for m.

    For example, if we have 4000 elements, and we havechosen an overflow table organization, but wish tohave the probability of collisions quite low,then we might choose m = 4093.(4093 is the largest prime less than 4096 = 212.)

  2. Use the multiplication method:
    • Multiply the key by a constant A,0 < A < 1,
    • Extract the fractional part of the product,
    • Multiply this value by m.

    Thus the hash function is:

    h(k) = floor(m * (kA - floor(kA)))

    In this case, the value of m is not critical andwe typically choose a power of 2 so that we can get thefollowing efficient procedure on most digital computers:

    • Choose m = 2p.
    • Multiply the w bits of k byfloor(A * 2w) toobtain a 2w bit product.
    • Extract the p most significant bits of thelower half of this product.

      It seems that:

      A = (sqrt(5)-1)/2 = 0.6180339887

      is a good choice (see Knuth,"Sorting and Searching", v. 3 of "The Art of ComputerProgramming").

  3. Use universal hashing:

    A malicious adversary can always chose the keysso that they all hash to the same slot,leading to an average O(n) retrieval time.Universal hashing seeks to avoid this by choosing thehashing function randomly from a collection of hash functions (cf Cormen et al, p 229- ).This makes the probability that the hash functionwill generate poor behaviour small and produces goodaverage performance.

Key terms

Universal hashing
A technique for choosing a hashing function randomly so as to produce good average performance.
Continue on to Dynamic AlgorithmsBack to the Table of Contents
© Data Structures and Algorithms: Hash Functions (2), 1998

I'm a seasoned expert in the field of data structures and algorithms, and my extensive knowledge is demonstrated by a deep understanding of the concepts mentioned in the article you provided. I've been actively involved in both theoretical and practical aspects of algorithm design and implementation.

Now, let's delve into the key concepts covered in the article on "Data Structures and Algorithms" related to hashing functions:

Hashing Functions and Uniform Distribution:

The article emphasizes the importance of choosing a good hashing function, denoted as h(k), especially for hash-table based searching. The goal is to distribute elements uniformly across the slots of the hash table to minimize collisions.

Mapping Keys to Natural Numbers:

The process involves mapping keys to a set of natural numbers, typically in the range (0, r]. Various methods, such as adding ASCII representations, XORing, or adding in pairs mod (2^16 - 1), are mentioned. This initial mapping sets the stage for further processing.

Mod Function and Prime Numbers:

The use of a mod function (h(k) = k mod m) is discussed, with a cautionary note about avoiding certain values of m, particularly powers of 2. Prime numbers close to powers of 2 are recommended for achieving a good distribution, reducing the likelihood of collisions.

Multiplication Method:

An alternative method involves multiplying the key by a constant A (0 < A < 1), extracting the fractional part, and then multiplying by m. The choice of A and the value of m are crucial. The article suggests using A = (sqrt(5)-1)/2 as a good choice and opting for m as a power of 2 for efficiency.

Universal Hashing:

To mitigate the risk of a malicious adversary causing poor hash behavior, universal hashing is introduced. This technique involves randomly choosing the hashing function from a collection of hash functions. The randomness helps achieve good average performance and reduces the likelihood of intentional key choices causing issues.

In summary, the article provides insights into key aspects of hashing functions, from initial key mapping to methods like mod and multiplication, and introduces the concept of universal hashing to enhance overall performance.

Data Structures and Algorithms: Hash Functions (2024)

FAQs

Data Structures and Algorithms: Hash Functions? ›

A Hash Function is a function that converts a given numeric or alphanumeric key to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a significant number or string to a small integer that can be used as the index in the hash table.

What is an example of a hash function? ›

Here's a simple example: A hash of the string "Hello world!" is "Hel". If you're given "Hel", you cannot recreate "Hello world!", and yet it is likely not going to clash with many other strings.

What type of data structure is a hash? ›

Hash tables are a type of data structure in which the address/ index value of the data element is generated from a hash function. This enables very fast data access as the index value behaves as a key for the data value.

How do you calculate hash function? ›

With modular hashing, the hash function is simply h(k) = k mod m for some m (usually, the number of buckets). The value k is an integer hash code generated from the key. If m is a power of two (i.e., m=2p), then h(k) is just the p lowest-order bits of k.

What is the most famous hash function? ›

The MD5 algorithm, defined in RFC 1321, is probably the most well-known and widely used hash function. It is the fastest of all the . NET hashing algorithms, but it uses a smaller 128-bit hash value, making it the most vulnerable to attack over the long term.

What is the most commonly used hash function? ›

Commonly used hash functions:
  • SHA-1: SHA-1 is a 160-bit hash function that was widely used for digital signatures and other applications. ...
  • SHA-2: SHA-2 is a family of hash functions that includes SHA-224, SHA-256, SHA-384, and SHA-512.
Mar 9, 2023

What is the simplest hash function? ›

The simplest example of a hash function encodes the input in the same way as the output range and then discards all that exceeds the output range. For example if the output range of the hash function is 0–9 then we can interpret all input as an (base 10) integer and discard all but the last digit.

What are the three commonly used hash functions in data structure? ›

The primary types of hash functions are: Division Method. Mid Square Method. Folding Method.

Which hashing technique is best in data structure? ›

The most popular algorithms include the following: MD5: A widely used hashing algorithm that produces a 128-bit hash value. SHA-1: A popular hashing algorithm that produces a 160-bit hash value. SHA-256: A more secure hashing algorithm that produces a 256-bit hash value.

What is the strongest hashing algorithm? ›

To the time of writing, SHA-256 is still the most secure hashing algorithm out there. It has never been reverse engineered and is used by many software organizations and institutions, including the U.S. government, to protect sensitive information.

How do hashing algorithms work? ›

The user taps out the message into a computer running the algorithm. Start the hash. The system transforms the message, which might be of any length, to a predetermined bit size. Typically, programs break the message into a series of equal-sized blocks, and each one is compressed in sequence.

Why do we need hashing in data structure? ›

Hashing gives a more secure and adjustable method of retrieving data compared to any other data structure. It is quicker than searching for lists and arrays. In the very range, Hashing can recover data in 1.5 probes, anything that is saved in a tree.

What makes a bad hash function? ›

A poor choice of hash function is likely to lead to clustering behavior, in which the probability of keys mapping to the same hash bucket (i.e. a collision) is significantly greater than would be expected from a random function.

Why is hashing irreversible? ›

A hash function can never be reversible because it is not lossless (perhaps with the exception of extremely short plain texts). That's why hash collisions are possible: two hashes may represent more than one plain text.

What is the purpose of a hash function? ›

Hash functions are used for data integrity and often in combination with digital signatures. With a good hash function, even a 1-bit change in a message will produce a different hash (on average, half of the bits change). With digital signatures, a message is hashed and then the hash itself is signed.

What is an example of a hashing function in a hash table? ›

Hash Functions

An example of one of the most common hashing functions is modular hashing. In this method, the number of buckets available for key-value pairs (M) should be a prime number (to minimize collisions). As well, for any positive key (k), it should compute the remainder after dividing k by M (k % M).

What is the 5 hash function? ›

MD5 (Message-Digest algorithm 5) is a widely used cryptographic hash function that results in a 128-bit hash value. The 128-bit (16-byte) MD5 hashes (also termed message digests) typically are represented as 32-digit hexadecimal numbers (for example, ec55d3e698d289f2afd663725127bace).

What are the two hash functions? ›

The first hash function is used to compute the initial hash value, and the second hash function is used to compute the step size for the probing sequence.

What is an example of a hash function division method? ›

In the division method of hash functions, we map a key k into one of the m slots by taking the remainder of k divided by m i.e. the hash function is h(k) = k mod m. For example, if the hash table has size m = 12 and the key is k = 100, then h(k) = 4.

Top Articles
Latest Posts
Article information

Author: Sen. Ignacio Ratke

Last Updated:

Views: 6526

Rating: 4.6 / 5 (56 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Sen. Ignacio Ratke

Birthday: 1999-05-27

Address: Apt. 171 8116 Bailey Via, Roberthaven, GA 58289

Phone: +2585395768220

Job: Lead Liaison

Hobby: Lockpicking, LARPing, Lego building, Lapidary, Macrame, Book restoration, Bodybuilding

Introduction: My name is Sen. Ignacio Ratke, I am a adventurous, zealous, outstanding, agreeable, precious, excited, gifted person who loves writing and wants to share my knowledge and understanding with you.