How To Implement a Sample Hash Table in C/C++ | DigitalOcean (2024)

Introduction

A hash table in C/C++ is a data structure that maps keys to values. A hash table uses a hash function to compute indexes for a key. You can store the value at the appropriate location based on the hash table index.

The benefit of using a hash table is its very fast access time. Typically, the time complexity (amortized time complexity) is a constant O(1) access time.

If two different keys get the same index, you will need to use other data structures (buckets) to account for these collisions. If you choose a very good hash function, the likelihood of a collision can be negligible.

The C++ STL (Standard Template Library) has the std::unordered_map() data structure.

In this article, you will construct a hash table from scratch comprised of:

  • A hash function to map keys to values.
  • A hash table data structure that supports insert, search, and delete operations.
  • A data structure to account for a collision of keys.

Choosing a Hash Function

The first step is to choose a reasonably good hash function that has a low chance of collision. However, for the purposes of this tutorial, a poor hash function will be applied to better illustrate hash collisions. This limited example will also only utilize strings (or character arrays in C).

HashTable.cpp

#define CAPACITY 50000 // Size of the HashTable.unsigned long hash_function(char* str){ unsigned long i = 0; for (int j = 0; str[j]; j++) i += str[j]; return i % CAPACITY;}

Run this code and test different strings for potential collisions. For example, the strings Hel and Cau will collide since they have the same ASCII value.

This code must return a number within the bounds of the capacity of the hash table. Otherwise, it may access an unbound memory location, leading to an error.

Defining the Hash Table Data Structures

A hash table is an array of items, which are { key: value } pairs.

First, define the item structure:

HashTable.cpp

// Defines the HashTable item.typedef struct Ht_item{ char* key; char* value;} Ht_item;

Now, the hash table has an array of pointers that point to Ht_item, so it is a double-pointer.

HashTable.cpp

// Defines the HashTable.typedef struct HashTable{ // Contains an array of pointers to items. Ht_item** items; int size; int count;} HashTable;

Your hash table will need to return the number of elements in the hash table using count and size of the hash table using size.

Creating the Hash Table and Hash Table Items

Next, create functions for allocating memory and creating items.

Create items by allocating memory for a key and value, and return a pointer to the item:

HashTable.cpp

Ht_item* create_item(char* key, char* value){ // Creates a pointer to a new HashTable item. Ht_item* item = (Ht_item*) malloc(sizeof(Ht_item)); item->key = (char*) malloc(strlen(key) + 1); item->value = (char*) malloc(strlen(value) + 1); strcpy(item->key, key); strcpy(item->value, value); return item;}

Create the table by allocating memory and setting size, count, and items:

HashTable.cpp

HashTable* create_table(int size){ // Creates a new HashTable. HashTable* table = (HashTable*) malloc(sizeof(HashTable)); table->size = size; table->count = 0; table->items = (Ht_item**) calloc(table->size, sizeof(Ht_item*)); for (int i = 0; i < table->size; i++) table->items[i] = NULL; return table;}

The preceding example allocates memory for the wrapper structure HashTable and sets all the items to NULL.

Freeing up memory is a C/C++ best practice. Free up memory that you’ve allocated on the heap with malloc() and calloc().

Write functions that free up a table item and the whole table.

HashTable.cpp

void free_item(Ht_item* item){ // Frees an item. free(item->key); free(item->value); free(item);}void free_table(HashTable* table){ // Frees the table. for (int i = 0; i < table->size; i++) { Ht_item* item = table->items[i]; if (item != NULL) free_item(item); } free(table->items); free(table);}

Add a print_table() to display the index, key, and value for each item:

HashTable.cpp

void print_table(HashTable* table){ printf("\nHash Table\n-------------------\n"); for (int i = 0; i < table->size; i++) { if (table->items[i]) { printf("Index:%d, Key:%s, Value:%s\n", i, table->items[i] -> key, table->items[i]->value); } } printf("-------------------\n\n");}

This concludes the basic functionality of your custom hash table. You will now write insert, search, and delete functions.

Inserting into the Hash Table

Create a function, ht_insert(), that performs insertions.

The function takes a HashTable pointer, a key, and a value as parameters:

void ht_insert(HashTable* table, char* key, char* value){ ...}

Now, there are certain steps involved in the ht_insert() function.

  • Create the item based on the { key: value } pair.
  • Compute the index based on the hash function.
  • Check if the index is already occupied or not, by comparing the key.
    • If it is not occupied, you can directly insert it into index.
    • Otherwise, it is a collision, and you will need to handle it.

This tutorial will address handling collisions after the initial model has been created.

First, create the item:

create_item(key, value)

Then, compute the index:

int index = hash_function(key);

When inserting the key for the first time, the item must be a NULL:

HashTable.cpp

// Creates the item.Ht_item* item = create_item(key, value);// Computes the index.int index = hash_function(key);Ht_item* current_item = table->items[index];if (current_item == NULL){ // Key does not exist. if (table->count == table->size) { // HashTable is full. printf("Insert Error: Hash Table is full\n"); free_item(item); return; } // Insert directly. table->items[index] = item; table->count++;}

Consider the scenario where the { key: value } pair already exists because the same item has been inserted into the hash table. To address this, the code must update the item value to the new one:

HashTable.cpp

if (current_item == NULL){ ...}else { // Scenario 1: Update the value. if (strcmp(current_item->key, key) == 0) { strcpy(table->items[index] -> value, value); return; }}

Consider the scenario where a collision has to be handled. To address this, a placeholder has been added:

HashTable.cpp

void handle_collision(HashTable* table, Ht_item* item){}void ht_insert(HashTable* table, char* key, char* value){ ... if (current_item == NULL) { ... } else { // Scenario 1: Update the value. if (strcmp(current_item->key, key) == 0) { ... } else { // Scenario 2: Handle the collision. handle_collision(table, item); return; } }}

Now, your ht_insert() function is complete.

Searching for Items in the Hash Table

Create a function, ht_search(), that checks if the key exists, and returns the corresponding value if it does.

The function takes a HashTable pointer and a key as parameters:

char* ht_search(HashTable* table, char* key){ ...}

Search for an item with the key in the HashTable. If the item cannot be found in the HashTable, NULL is returned.

HashTable.cpp

char* ht_search(HashTable* table, char* key){ // Searches for the key in the HashTable. // Returns NULL if it doesn't exist. int index = hash_function(key); Ht_item* item = table->items[index]; // Provide only non-NULL values. if (item != NULL) { if (strcmp(item->key, key) == 0) return item->value; } return NULL;}

Add a print_search() to display the item that matches the key:

HashTable.cpp

void print_search(HashTable* table, char* key){ char* val; if ((val = ht_search(table, key)) == NULL) { printf("Key:%s does not exist\n", key); return; } else { printf("Key:%s, Value:%s\n", key, val); }}

Now, your ht_search() function is complete.

Handling Collisions

There are different ways to resolve a collision. This tutorial will rely upon a method called Separate Chaining, which aims to create independent chains for all items that have the same hash index. The implementation in this tutorial will create these chains using linked lists.

Whenever there is a collision, additional items that collide on the same index are added to an overflow bucket list. Thus, you will not have to delete any existing records on the hash table.

Due to linked lists having O(n) time complexity for insertion, searching, and deletion, in case of a collision, you will have a worst-case access time of O(n) as well. The advantage of this method is that it is a good choice if your hash table has a low capacity.

Implement the overflow bucket list:

HashTable.cpp

// Defines the LinkedList.typedef struct LinkedList { Ht_item* item; struct LinkedList* next;} LinkedList;;LinkedList* allocate_list(){ // Allocates memory for a LinkedList pointer. LinkedList* list = (LinkedList*) malloc(sizeof(LinkedList)); return list;}LinkedList* linkedlist_insert(LinkedList* list, Ht_item* item){ // Inserts the item onto the LinkedList. if (!list) { LinkedList* head = allocate_list(); head->item = item; head->next = NULL; list = head; return list; } else if (list->next == NULL) { LinkedList* node = allocate_list(); node->item = item; node->next = NULL; list->next = node; return list; } LinkedList* temp = list; while (temp->next->next) { temp = temp->next; } LinkedList* node = allocate_list(); node->item = item; node->next = NULL; temp->next = node; return list;}Ht_item* linkedlist_remove(LinkedList* list){ // Removes the head from the LinkedList. // Returns the item of the popped element. if (!list) return NULL; if (!list->next) return NULL; LinkedList* node = list->next; LinkedList* temp = list; temp->next = NULL; list = node; Ht_item* it = NULL; memcpy(temp->item, it, sizeof(Ht_item)); free(temp->item->key); free(temp->item->value); free(temp->item); free(temp); return it;}void free_linkedlist(LinkedList* list){ LinkedList* temp = list; while (list) { temp = list; list = list->next; free(temp->item->key); free(temp->item->value); free(temp->item); free(temp); }}

Now, add these overflow bucket lists to your HashTable. Every item will have a chain, so the whole table is an array of LinkedList pointers.

HashTable.cpp

typedef struct HashTable HashTable;// Defines the HashTable.struct HashTable{ // Contains an array of pointers to items. Ht_item** items; LinkedList** overflow_buckets; int size; int count;};

Now that overflow_buckets have been defined, add functions to create and delete them. You will also need to account for them in the create_table() and free_table() functions.

HashTable.cpp

LinkedList** create_overflow_buckets(HashTable* table){ // Create the overflow buckets; an array of LinkedLists. LinkedList** buckets = (LinkedList**) calloc(table->size, sizeof(LinkedList*)); for (int i = 0; i < table->size; i++) buckets[i] = NULL; return buckets;}void free_overflow_buckets(HashTable* table){ // Free all the overflow bucket lists. LinkedList** buckets = table->overflow_buckets; for (int i = 0; i < table->size; i++) free_linkedlist(buckets[i]); free(buckets);}HashTable* create_table(int size){ // Creates a new HashTable. HashTable* table = (HashTable*) malloc(sizeof(HashTable)); table->size = size; table->count = 0; table->items = (Ht_item**) calloc(table->size, sizeof(Ht_item*)); for (int i = 0; i < table->size; i++) table->items[i] = NULL; table->overflow_buckets = create_overflow_buckets(table); return table;}void free_table(HashTable* table){ // Frees the table. for (int i = 0; i < table->size; i++) { Ht_item* item = table->items[i]; if (item != NULL) free_item(item); } // Free the overflow bucket lists and its items. free_overflow_buckets(table); free(table->items); free(table);}

If the overflow bucket list for the item does not exist, create a list and add the item to it.

Update handle_collision() for insertions:

HashTable.cpp

void handle_collision(HashTable* table, unsigned long index, Ht_item* item){ LinkedList* head = table->overflow_buckets[index]; if (head == NULL) { // Creates the list. head = allocate_list(); head->item = item; table->overflow_buckets[index] = head; return; } else { // Insert to the list. table->overflow_buckets[index] = linkedlist_insert(head, item); return; }}

And the call:

HashTable.cpp

void ht_insert(HashTable* table, char* key, char* value){ ... if (current_item == NULL) { ... } else { // Scenario 1: Update the value. if (strcmp(current_item->key, key) == 0) { ... } else { // Scenario 2: Handle the collision. handle_collision(table, index, item); return; } }}

Now, update the search method to use overflow buckets:

HashTable.cpp

char* ht_search(HashTable* table, char* key){ // Searches for the key in the HashTable. // Returns NULL if it doesn't exist. int index = hash_function(key); Ht_item* item = table->items[index]; LinkedList* head = table->overflow_buckets[index]; // Provide only non-NULL values. if (item != NULL) { if (strcmp(item->key, key) == 0) return item->value; if (head == NULL) return NULL; item = head->item; head = head->next; } return NULL;}

Finally, collisions are now handled in insert() and search()!

Deleting from the Hash Table

Let’s now finally look at the delete function:

void ht_delete(HashTable* table, char* key){ ...}

Again, the method is similar to insertion.

  1. Compute the hash index and get the item.
  2. If it is NULL, don’t do anything.
  3. Otherwise, after comparing keys, if there is no collision chain for that index, remove the item from the table.
  4. If a collision chain exists, remove that element and shift the links accordingly.

HashTable.cpp

void ht_delete(HashTable* table, char* key){ // Deletes an item from the table. int index = hash_function(key); Ht_item* item = table->items[index]; LinkedList* head = table->overflow_buckets[index]; if (item == NULL) { // Does not exist. return; } else { if (head == NULL && strcmp(item->key, key) == 0) { // Collision chain does not exist. // Remove the item. // Set table index to NULL. table->items[index] = NULL; free_item(item); table->count--; return; } else if (head != NULL) { // Collision chain exists. if (strcmp(item->key, key) == 0) { // Remove this item. // Set the head of the list as the new item. free_item(item); LinkedList* node = head; head = head->next; node->next = NULL; table->items[index] = create_item(node->item->key, node->item->value); free_linkedlist(node); table->overflow_buckets[index] = head; return; } LinkedList* curr = head; LinkedList* prev = NULL; while (curr) { if (strcmp(curr->item->key, key) == 0) { if (prev == NULL) { // First element of the chain. // Remove the chain. free_linkedlist(head); table->overflow_buckets[index] = NULL; return; } else { // This is somewhere in the chain. prev->next = curr->next; curr->next = NULL; free_linkedlist(curr); table->overflow_buckets[index] = head; return; } } curr = curr->next; prev = curr; } } }}
#include <stdio.h>#include <stdlib.h>#include <string.h>#define CAPACITY 50000 // Size of the HashTable.unsigned long hash_function(char *str){ unsigned long i = 0; for (int j = 0; str[j]; j++) i += str[j]; return i % CAPACITY;}// Defines the HashTable item.typedef struct Ht_item{ char *key; char *value;} Ht_item;// Defines the LinkedList.typedef struct LinkedList{ Ht_item *item; LinkedList *next;} LinkedList;// Defines the HashTable.typedef struct HashTable{ // Contains an array of pointers to items. Ht_item **items; LinkedList **overflow_buckets; int size; int count;} HashTable;LinkedList *allocate_list(){ // Allocates memory for a LinkedList pointer. LinkedList *list = (LinkedList *)malloc(sizeof(LinkedList)); return list;}LinkedList *linkedlist_insert(LinkedList *list, Ht_item *item){ // Inserts the item onto the LinkedList. if (!list) { LinkedList *head = allocate_list(); head->item = item; head->next = NULL; list = head; return list; } else if (list->next == NULL) { LinkedList *node = allocate_list(); node->item = item; node->next = NULL; list->next = node; return list; } LinkedList *temp = list; while (temp->next->next) { temp = temp->next; } LinkedList *node = allocate_list(); node->item = item; node->next = NULL; temp->next = node; return list;}Ht_item *linkedlist_remove(LinkedList *list){ // Removes the head from the LinkedList. // Returns the item of the popped element. if (!list) return NULL; if (!list->next) return NULL; LinkedList *node = list->next; LinkedList *temp = list; temp->next = NULL; list = node; Ht_item *it = NULL; memcpy(temp->item, it, sizeof(Ht_item)); free(temp->item->key); free(temp->item->value); free(temp->item); free(temp); return it;}void free_linkedlist(LinkedList *list){ LinkedList *temp = list; while (list) { temp = list; list = list->next; free(temp->item->key); free(temp->item->value); free(temp->item); free(temp); }}LinkedList **create_overflow_buckets(HashTable *table){ // Create the overflow buckets; an array of LinkedLists. LinkedList **buckets = (LinkedList **)calloc(table->size, sizeof(LinkedList *)); for (int i = 0; i < table->size; i++) buckets[i] = NULL; return buckets;}void free_overflow_buckets(HashTable *table){ // Free all the overflow bucket lists. LinkedList **buckets = table->overflow_buckets; for (int i = 0; i < table->size; i++) free_linkedlist(buckets[i]); free(buckets);}Ht_item *create_item(char *key, char *value){ // Creates a pointer to a new HashTable item. Ht_item *item = (Ht_item *)malloc(sizeof(Ht_item)); item->key = (char *)malloc(strlen(key) + 1); item->value = (char *)malloc(strlen(value) + 1); strcpy(item->key, key); strcpy(item->value, value); return item;}HashTable *create_table(int size){ // Creates a new HashTable. HashTable *table = (HashTable *)malloc(sizeof(HashTable)); table->size = size; table->count = 0; table->items = (Ht_item **)calloc(table->size, sizeof(Ht_item *)); for (int i = 0; i < table->size; i++) table->items[i] = NULL; table->overflow_buckets = create_overflow_buckets(table); return table;}void free_item(Ht_item *item){ // Frees an item. free(item->key); free(item->value); free(item);}void free_table(HashTable *table){ // Frees the table. for (int i = 0; i < table->size; i++) { Ht_item *item = table->items[i]; if (item != NULL) free_item(item); } // Free the overflow bucket lists and its items. free_overflow_buckets(table); free(table->items); free(table);}void handle_collision(HashTable *table, unsigned long index, Ht_item *item){ LinkedList *head = table->overflow_buckets[index]; if (head == NULL) { // Creates the list. head = allocate_list(); head->item = item; table->overflow_buckets[index] = head; return; } else { // Insert to the list. table->overflow_buckets[index] = linkedlist_insert(head, item); return; }}void ht_insert(HashTable *table, char *key, char *value){ // Creates the item. Ht_item *item = create_item(key, value); // Computes the index. int index = hash_function(key); Ht_item *current_item = table->items[index]; if (current_item == NULL) { // Key does not exist. if (table->count == table->size) { // HashTable is full. printf("Insert Error: Hash Table is full\n"); free_item(item); return; } // Insert directly. table->items[index] = item; table->count++; } else { // Scenario 1: Update the value. if (strcmp(current_item->key, key) == 0) { strcpy(table->items[index]->value, value); return; } else { // Scenario 2: Handle the collision. handle_collision(table, index, item); return; } }}char *ht_search(HashTable *table, char *key){ // Searches for the key in the HashTable. // Returns NULL if it doesn't exist. int index = hash_function(key); Ht_item *item = table->items[index]; LinkedList *head = table->overflow_buckets[index]; // Provide only non-NULL values. if (item != NULL) { if (strcmp(item->key, key) == 0) return item->value; if (head == NULL) return NULL; item = head->item; head = head->next; } return NULL;}void ht_delete(HashTable *table, char *key){ // Deletes an item from the table. int index = hash_function(key); Ht_item *item = table->items[index]; LinkedList *head = table->overflow_buckets[index]; if (item == NULL) { // Does not exist. return; } else { if (head == NULL && strcmp(item->key, key) == 0) { // Collision chain does not exist. // Remove the item. // Set table index to NULL. table->items[index] = NULL; free_item(item); table->count--; return; } else if (head != NULL) { // Collision chain exists. if (strcmp(item->key, key) == 0) { // Remove this item. // Set the head of the list as the new item. free_item(item); LinkedList *node = head; head = head->next; node->next = NULL; table->items[index] = create_item(node->item->key, node->item->value); free_linkedlist(node); table->overflow_buckets[index] = head; return; } LinkedList *curr = head; LinkedList *prev = NULL; while (curr) { if (strcmp(curr->item->key, key) == 0) { if (prev == NULL) { // First element of the chain. // Remove the chain. free_linkedlist(head); table->overflow_buckets[index] = NULL; return; } else { // This is somewhere in the chain. prev->next = curr->next; curr->next = NULL; free_linkedlist(curr); table->overflow_buckets[index] = head; return; } } curr = curr->next; prev = curr; } } }}void print_search(HashTable *table, char *key){ char *val; if ((val = ht_search(table, key)) == NULL) { printf("Key:%s does not exist\n", key); return; } else { printf("Key:%s, Value:%s\n", key, val); }}void print_table(HashTable *table){ printf("\nHash Table\n-------------------\n"); for (int i = 0; i < table -> size; i++) { if (table -> items[i]) { printf("Index:%d, Key:%s, Value:%s\n", i, table -> items[i] -> key, table -> items[i] -> value); } } printf("-------------------\n\n");}int main(){ HashTable *ht = create_table(CAPACITY); ht_insert(ht, (char *)"1", (char *)"First address"); ht_insert(ht, (char *)"2", (char *)"Second address"); ht_insert(ht, (char *)"Hel", (char *)"Third address"); ht_insert(ht, (char *)"Cau", (char *)"Fourth address"); print_search(ht, (char *)"1"); print_search(ht, (char *)"2"); print_search(ht, (char *)"3"); print_search(ht, (char *)"Hel"); print_search(ht, (char *)"Cau"); // Collision! print_table(ht); ht_delete(ht, (char *)"1"); ht_delete(ht, (char *)"Cau"); print_table(ht); free_table(ht); return 0;}

This code will produce the following output:

Output

Key:1, Value:First addressKey:2, Value:Second addressKey:3 does not existKey:Hel, Value:Third addressKey:Cau does not existHash Table-------------------Index:49, Key:1, Value:First addressIndex:50, Key:2, Value:Second addressIndex:281, Key:Hel, Value:Third address-------------------Hash Table-------------------Index:50, Key:2, Value:Second addressIndex:281, Key:Hel, Value:Third address-------------------

ht_insert(), ht_search(), and ht_delete behave as expected.

Conclusion

In this article, you implemented a hash table from scratch in C/C++.

Experiment with other collision handling algorithms and different hash functions. Continue your learning with more C++ tutorials

How To Implement a Sample Hash Table in C/C++  | DigitalOcean (2024)
Top Articles
Latest Posts
Article information

Author: Fr. Dewey Fisher

Last Updated:

Views: 5966

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Fr. Dewey Fisher

Birthday: 1993-03-26

Address: 917 Hyun Views, Rogahnmouth, KY 91013-8827

Phone: +5938540192553

Job: Administration Developer

Hobby: Embroidery, Horseback riding, Juggling, Urban exploration, Skiing, Cycling, Handball

Introduction: My name is Fr. Dewey Fisher, I am a powerful, open, faithful, combative, spotless, faithful, fair person who loves writing and wants to share my knowledge and understanding with you.