Hashmaps and hashing functions are fundamental concepts in computer science, widely used in data storage and retrieval. This article aims to demystify these concepts and provide a practical understanding through detailed examples.
What is a hashmap?
A hashmap, also known as a hash table, is a data structure that stores data in key-value pairs. It offers efficient data retrieval by using a unique key. The efficiency of a hashmap lies in its ability to provide near-constant time complexity for basic operations like add, delete, and find, making it highly efficient for lookups.
Understanding hashing functions
The core of a hashmap’s efficiency is the hashing function. A hashing function is an algorithm that converts a key into a numerical value, known as a hash code. This hash code determines where the key-value pair will be stored in the hashmap. The ideal hashing function distributes data uniformly across the storage, minimizing the chances of two keys producing the same hash code—a situation known as a collision.
Working of a hashmap
- Key Input: When a key-value pair is added to a hashmap, the key is passed through the hashing function.
- Hash Code Generation: The hashing function processes the key and produces a hash code.
- Index Calculation: The hash code is then used to calculate the index at which the value should be stored in the hashmap.
- Data Storage: The key-value pair is stored at the computed index.
Handling collisions
Collisions occur when two keys produce the same hash code. Common strategies to handle collisions are:
- Chaining: Each index of the hashmap stores a list of key-value pairs. If a collision occurs, the new key-value pair is simply added to the list at that index.
- Open Addressing: If a collision occurs, the hashmap searches for the next available slot and stores the key-value pair there.
Example: Implementing a simple hashmap
Let’s consider a simple example of a hashmap that maps employee IDs to their names.
Code Example in Python:
class SimpleHashMap:
def __init__(self):
self.size = 10
self.map = [None] * self.size
def get_hash(self, key):
hash = 0
for char in str(key):
hash += ord(char)
return hash % self.size
def add(self, key, value):
key_hash = self.get_hash(key)
key_value = [key, value]
if self.map[key_hash] is None:
self.map[key_hash] = list([key_value])
return True
else:
for pair in self.map[key_hash]:
if pair[0] == key:
pair[1] = value
return True
self.map[key_hash].append(key_value)
return True
def get(self, key):
key_hash = self.get_hash(key)
if self.map[key_hash] is not None:
for pair in self.map[key_hash]:
if pair[0] == key:
return pair[1]
return None
Explanation:
- Initialization: A hashmap of a fixed size is created.
- Hash Function:
get_hash
converts the key to a hash code. - Add Method: Adds a key-value pair to the hashmap. If a collision occurs, the key-value pair is added to the list at that index.
- Get Method: Retrieves a value for a given key from the hashmap.