Welcome to Subscribe On Youtube
705. Design HashSet
Description
Design a HashSet without using any built-in hash table libraries.
Implement MyHashSet
class:
void add(key)
Inserts the valuekey
into the HashSet.bool contains(key)
Returns whether the valuekey
exists in the HashSet or not.void remove(key)
Removes the valuekey
in the HashSet. Ifkey
does not exist in the HashSet, do nothing.
Example 1:
Input ["MyHashSet", "add", "add", "contains", "contains", "add", "contains", "remove", "contains"] [[], [1], [2], [1], [3], [2], [2], [2], [2]] Output [null, null, null, true, false, null, true, null, false] Explanation MyHashSet myHashSet = new MyHashSet(); myHashSet.add(1); // set = [1] myHashSet.add(2); // set = [1, 2] myHashSet.contains(1); // return True myHashSet.contains(3); // return False, (not found) myHashSet.add(2); // set = [1, 2] myHashSet.contains(2); // return True myHashSet.remove(2); // set = [1] myHashSet.contains(2); // return False, (already removed)
Constraints:
0 <= key <= 106
- At most
104
calls will be made toadd
,remove
, andcontains
.
Solutions
There are multiple ways to implement a hash set, but one straightforward method is to use a list of lists (also known as bucketing or chaining) to handle collisions (when two different elements have the same hash).
List of lists, ie. bucketing or chaining
Since all values are in the range of [0, 1000000]
, create an array with length 1000001, which has boolean
type.
If a value is added, set the element at the corresponding index in the array to true
. If a value is removed, set the element at the corresponding index in the array to false
.
For the contains
function, simply return the element at the corresponding index in the array, where true
means the element exists in the HashSet and false
means the element does not exist in the HashSet.
Open Addressing with Linear Probing
In open addressing, all elements occupy the hash table array itself. When a collision occurs (two keys hash to the same index), the hash table searches for the next empty slot in the array by probing subsequent indices until an empty slot is found.
For simplicity, below code assumes keys are positive integers and uses deletion markers for removed elements.
Explanation
- Initialization: Sets initial capacity and creates an array (
self.data
) to store the elements. It also defines a load factor to determine when to resize the hash table to maintain efficiency. _hash
Method: Computes the hash index of a given key._resize
Method: Doubles the capacity of the hash table and rehashes all existing elements when the load factor is exceeded.add
Method: Inserts a new key if it’s not already present, resizing the table if necessary. It linearly probes for an empty slot if there’s a collision.remove
Method: Searches for the key and marks its slot as “DEL” if found. It uses linear probing to handle collisions.contains
Method: Checks if a key exists in the hash table, returningTrue
if found.
Differences from Chaining
- Space Efficiency: Open addressing might be more space-efficient since it doesn’t use additional data structures for chaining. However, it requires careful management of load factors and resizing to maintain performance.
- Handling of Deletions: Open addressing uses special markers to indicate deleted elements (“DEL” in this case), which is necessary to ensure subsequent elements found by probing can still be located.
- Resizing Logic: To maintain efficient access times, the hash table size is increased based on the load factor, requiring all elements to be rehashed to new positions.
This open addressing implementation is just one of many ways to design a hash set. The choice between chaining, open addressing, or other methods depends on the specific requirements, including the expected number of elements, access patterns, and performance criteria.
-
class MyHashSet { private boolean[] data = new boolean[1000001]; public MyHashSet() { } public void add(int key) { data[key] = true; } public void remove(int key) { data[key] = false; } public boolean contains(int key) { return data[key]; } } /** * Your MyHashSet object will be instantiated and called as such: * MyHashSet obj = new MyHashSet(); * obj.add(key); * obj.remove(key); * boolean param_3 = obj.contains(key); */
-
class MyHashSet { public: bool data[1000001]; MyHashSet() { memset(data, false, sizeof data); } void add(int key) { data[key] = true; } void remove(int key) { data[key] = false; } bool contains(int key) { return data[key]; } }; /** * Your MyHashSet object will be instantiated and called as such: * MyHashSet* obj = new MyHashSet(); * obj->add(key); * obj->remove(key); * bool param_3 = obj->contains(key); */
-
class MyHashSet: def __init__(self): self.data = [False] * 1000001 def add(self, key: int) -> None: self.data[key] = True def remove(self, key: int) -> None: self.data[key] = False def contains(self, key: int) -> bool: return self.data[key] # Your MyHashSet object will be instantiated and called as such: # obj = MyHashSet() # obj.add(key) # obj.remove(key) # param_3 = obj.contains(key) ############### class MyHashSet: # hash to bucket def __init__(self): """ Initialize your data structure here. """ self.size = 1000 # Choosing a size for the outer list self.buckets = [[] for _ in range(self.size)] # List of lists def _hash(self, key: int) -> int: """ Generate a hash for a given key. """ return key % self.size def add(self, key: int) -> None: """ Insert a value into the HashSet. """ hash_key = self._hash(key) if key not in self.buckets[hash_key]: self.buckets[hash_key].append(key) def remove(self, key: int) -> None: """ Remove a value in the HashSet. If the value does not exist, do nothing. """ hash_key = self._hash(key) if key in self.buckets[hash_key]: self.buckets[hash_key].remove(key) def contains(self, key: int) -> bool: """ Returns true if this set contains the specified element. """ hash_key = self._hash(key) return key in self.buckets[hash_key] # Your MyHashSet object will be instantiated and called as such: # obj = MyHashSet() # obj.add(key) # obj.remove(key) # param_3 = obj.contains(key) ############ # Collision handling, optimized class MyHashSet: def __init__(self): self.capacity = 1000 self.data = [None] * self.capacity self.load_factor = 0.7 self.size = 0 def _hash(self, key): return key % self.capacity def _resize(self): if self.size / self.capacity >= self.load_factor: self.capacity *= 2 old_data = self.data self.data = [None] * self.capacity self.size = 0 for item in old_data: if item is not None: self.add(item) def add(self, key: int) -> None: if self.contains(key): return self._resize() idx = self._hash(key) while self.data[idx] is not None: idx = (idx + 1) % self.capacity # Collision handling self.data[idx] = key self.size += 1 def remove(self, key: int) -> None: idx = self._hash(key) while self.data[idx] is not None: if self.data[idx] == key: self.data[idx] = "DEL" # Mark as deleted self.size -= 1 return idx = (idx + 1) % self.capacity def contains(self, key: int) -> bool: idx = self._hash(key) while self.data[idx] is not None: if self.data[idx] == key: return True idx = (idx + 1) % self.capacity return False
-
type MyHashSet struct { data []bool } func Constructor() MyHashSet { data := make([]bool, 1000010) return MyHashSet{data} } func (this *MyHashSet) Add(key int) { this.data[key] = true } func (this *MyHashSet) Remove(key int) { this.data[key] = false } func (this *MyHashSet) Contains(key int) bool { return this.data[key] } /** * Your MyHashSet object will be instantiated and called as such: * obj := Constructor(); * obj.Add(key); * obj.Remove(key); * param_3 := obj.Contains(key); */
-
class MyHashSet { data: Array<boolean>; constructor() { this.data = new Array(10 ** 6 + 1).fill(false); } add(key: number): void { this.data[key] = true; } remove(key: number): void { this.data[key] = false; } contains(key: number): boolean { return this.data[key]; } } /** * Your MyHashSet object will be instantiated and called as such: * var obj = new MyHashSet() * obj.add(key) * obj.remove(key) * var param_3 = obj.contains(key) */