Learn Hash Table | Advanced Data Structures

A hash table, also known as a hash map or associative array, is a data structure that stores key-value pairs. It offers efficient insertion, deletion, and lookup operations by mapping keys to corresponding values through a process called hashing.

The hash table can be implemented with various data structures but is more commonly implemented with arrays and defines three operations: insertion, deletion, and searching for a given element.

Note

Python's dictionary data structure implements a hash table, offering efficient storage and retrieval of key-value pairs.

Basic operations time complexity

Collisions

But it may happen that the hash values will be the same for two or more different keys. This is called collision. So we need ways to deal with this problem, and there are two major approaches to it.

The first is to use a chaining mechanism. If we have a collision (if there is already an element in the array slot with an index generated by the hash function), we insert the new element as the next element in the list associated with that array index or dictionary key.


              12345678910111213141516171819202122232425262728293031323334353637383940
            
import numpy as np

class HashTable:
    def __init__(self, size):
        # Initialize the hash table with a specified size
        self.size = size
        # Create a dictionary where each key corresponds to an index in the hash table
        # and each value is an empty numpy array
        self.table = {i: np.array([]) for i in range(size)}
    
    def hash_function(self, key):
        # Simple modular hashing function to determine the index for a given key
        return key % self.size
    
    def insert(self, key, value):
        # Compute the index using the hash function
        index = self.hash_function(key)
        # Check if the array at the computed index is empty
        if len(self.table[index]) == 0:
            # If empty, initialize the array with a tuple containing the key-value pair
            self.table[index] = np.array([(key, value)])
        else:
            # If not empty, append the key-value pair to the existing array
            self.table[index] = np.append(self.table[index], np.array([(key, value)]))
        print(f"The element with value {value} inserted into a hash table at index {index}")

# Define key-value pairs
pairs = [
    {'Phone': 12345, 'Name': 'Will'},
    {'Phone': 13492, 'Name': 'Sam'},
    {'Phone': 23770, 'Name': 'Mike'},
    {'Phone': 12345, 'Name': 'Jack'}
]

# Initialize the hash table
hash_table = HashTable(size=7)

# Insert key-value pairs into the hash table
for pair in pairs:
    hash_table.insert(pair['Phone'], pair['Name'])

In the chaining approach, the worst-case scenario occurs when a poorly designed hash function results in all keys hashing to the same index. This causes the hash table to essentially function as a regular array, leading to O(N) time complexity for insert, search, and delete operations.

Chaining is typically employed when the number of keys is expected to greatly exceed the memory allocated for the underlying array, and the exact number of keys is uncertain. A well-designed hash function is crucial to ensure that operations perform at an average-case time complexity.

Open addressing

The second approach is called open addressing. In this case, if the collision occurs and the value is already present in the slot with the index generated by the hash function, we generate a new index and try the corresponding slot until we find the one available.
We can implement various open addressing methods:

linear probing (when we increment an index by 1);
quadratic probing (when we increment an index by a degree of 2);
double hashing (when we generate a new hash value based on the previous one).


              1234567891011121314151617181920212223242526272829303132333435363738394041
            
import numpy as np

class HashTable:
    def __init__(self, size):
        # Define an empty dictionary to represent the hash table
        self.table = {}
        # Initialize the hash table with 'None' values for each index
        for i in range(size):
            self.table[i] = None
    
    def hash_function(self, key):
        # Simple modular hash function
        return key % len(self.table)
    
    def insert(self, key, value):
        index = self.hash_function(key)
        # Check if the current index is available
        if self.table[index] is None:
            self.table[index] = value
        else:
            # Perform linear probing until an available index is found
            while self.table[index] is not None:
                print(f'Index {index} is already occupied. Trying index {index + 1}')
                index += 1
            self.table[index] = value
        print(f'The element with value {value} inserted into a hash table at index {index}')

# Define key-value pairs
pairs = [
    {'Phone': 12345, 'Name': 'Will'},
    {'Phone': 13492, 'Name': 'Sam'},
    {'Phone': 23770, 'Name': 'Mike'},
    {'Phone': 12345, 'Name': 'Jack'}
]

# Initialize the hash table
hash_table = HashTable(size=7)

# Insert key-value pairs into the hash table
for pair in pairs:
    hash_table.insert(pair['Phone'], pair['Name'])

This is a more efficient approach compared to chaining, but it may happen that the values in our underlying array are distributed unequally, and there are large clusters present. In this case, the effectiveness of this approach decreases.

So, we can come to the conclusion that both approaches, in their effectiveness, rely heavily on good implementation of a hash function. Building a good hash function is a major problem in computer science. We want to have a hash function that provides us with an equal distribution of items in a data structure.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 4

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Swipe to show menu