Welcome to Subscribe On Youtube

705. Design HashSet

Description

Design a HashSet without using any built-in hash table libraries.

Implement MyHashSet class:

  • void add(key) Inserts the value key into the HashSet.
  • bool contains(key) Returns whether the value key exists in the HashSet or not.
  • void remove(key) Removes the value key in the HashSet. If key does not exist in the HashSet, do nothing.

 

Example 1:

Input
["MyHashSet", "add", "add", "contains", "contains", "add", "contains", "remove", "contains"]
[[], [1], [2], [1], [3], [2], [2], [2], [2]]
Output
[null, null, null, true, false, null, true, null, false]

Explanation
MyHashSet myHashSet = new MyHashSet();
myHashSet.add(1);      // set = [1]
myHashSet.add(2);      // set = [1, 2]
myHashSet.contains(1); // return True
myHashSet.contains(3); // return False, (not found)
myHashSet.add(2);      // set = [1, 2]
myHashSet.contains(2); // return True
myHashSet.remove(2);   // set = [1]
myHashSet.contains(2); // return False, (already removed)

 

Constraints:

  • 0 <= key <= 106
  • At most 104 calls will be made to add, remove, and contains.

Solutions

There are multiple ways to implement a hash set, but one straightforward method is to use a list of lists (also known as bucketing or chaining) to handle collisions (when two different elements have the same hash).

List of lists, ie. bucketing or chaining

Since all values are in the range of [0, 1000000], create an array with length 1000001, which has boolean type.

If a value is added, set the element at the corresponding index in the array to true. If a value is removed, set the element at the corresponding index in the array to false.

For the contains function, simply return the element at the corresponding index in the array, where true means the element exists in the HashSet and false means the element does not exist in the HashSet.

Open Addressing with Linear Probing

In open addressing, all elements occupy the hash table array itself. When a collision occurs (two keys hash to the same index), the hash table searches for the next empty slot in the array by probing subsequent indices until an empty slot is found.

For simplicity, below code assumes keys are positive integers and uses deletion markers for removed elements.

Explanation
  • Initialization: Sets initial capacity and creates an array (self.data) to store the elements. It also defines a load factor to determine when to resize the hash table to maintain efficiency.
  • _hash Method: Computes the hash index of a given key.
  • _resize Method: Doubles the capacity of the hash table and rehashes all existing elements when the load factor is exceeded.
  • add Method: Inserts a new key if it’s not already present, resizing the table if necessary. It linearly probes for an empty slot if there’s a collision.
  • remove Method: Searches for the key and marks its slot as “DEL” if found. It uses linear probing to handle collisions.
  • contains Method: Checks if a key exists in the hash table, returning True if found.
Differences from Chaining
  • Space Efficiency: Open addressing might be more space-efficient since it doesn’t use additional data structures for chaining. However, it requires careful management of load factors and resizing to maintain performance.
  • Handling of Deletions: Open addressing uses special markers to indicate deleted elements (“DEL” in this case), which is necessary to ensure subsequent elements found by probing can still be located.
  • Resizing Logic: To maintain efficient access times, the hash table size is increased based on the load factor, requiring all elements to be rehashed to new positions.

This open addressing implementation is just one of many ways to design a hash set. The choice between chaining, open addressing, or other methods depends on the specific requirements, including the expected number of elements, access patterns, and performance criteria.

  • class MyHashSet {
        private boolean[] data = new boolean[1000001];
    
        public MyHashSet() {
        }
    
        public void add(int key) {
            data[key] = true;
        }
    
        public void remove(int key) {
            data[key] = false;
        }
    
        public boolean contains(int key) {
            return data[key];
        }
    }
    
    /**
     * Your MyHashSet object will be instantiated and called as such:
     * MyHashSet obj = new MyHashSet();
     * obj.add(key);
     * obj.remove(key);
     * boolean param_3 = obj.contains(key);
     */
    
  • class MyHashSet {
    public:
        bool data[1000001];
    
        MyHashSet() {
            memset(data, false, sizeof data);
        }
    
        void add(int key) {
            data[key] = true;
        }
    
        void remove(int key) {
            data[key] = false;
        }
    
        bool contains(int key) {
            return data[key];
        }
    };
    
    /**
     * Your MyHashSet object will be instantiated and called as such:
     * MyHashSet* obj = new MyHashSet();
     * obj->add(key);
     * obj->remove(key);
     * bool param_3 = obj->contains(key);
     */
    
  • class MyHashSet:
        def __init__(self):
            self.data = [False] * 1000001
    
        def add(self, key: int) -> None:
            self.data[key] = True
    
        def remove(self, key: int) -> None:
            self.data[key] = False
    
        def contains(self, key: int) -> bool:
            return self.data[key]
    
    
    # Your MyHashSet object will be instantiated and called as such:
    # obj = MyHashSet()
    # obj.add(key)
    # obj.remove(key)
    # param_3 = obj.contains(key)
    
    ###############
    
    class MyHashSet: # hash to bucket
    
        def __init__(self):
            """
            Initialize your data structure here.
            """
            self.size = 1000  # Choosing a size for the outer list
            self.buckets = [[] for _ in range(self.size)]  # List of lists
    
        def _hash(self, key: int) -> int:
            """
            Generate a hash for a given key.
            """
            return key % self.size
        
        def add(self, key: int) -> None:
            """
            Insert a value into the HashSet.
            """
            hash_key = self._hash(key)
            if key not in self.buckets[hash_key]:
                self.buckets[hash_key].append(key)
    
        def remove(self, key: int) -> None:
            """
            Remove a value in the HashSet. If the value does not exist, do nothing.
            """
            hash_key = self._hash(key)
            if key in self.buckets[hash_key]:
                self.buckets[hash_key].remove(key)
    
        def contains(self, key: int) -> bool:
            """
            Returns true if this set contains the specified element.
            """
            hash_key = self._hash(key)
            return key in self.buckets[hash_key]
    
    # Your MyHashSet object will be instantiated and called as such:
    # obj = MyHashSet()
    # obj.add(key)
    # obj.remove(key)
    # param_3 = obj.contains(key)
    
    ############
    
    # Collision handling, optimized
    class MyHashSet:
        def __init__(self):
            self.capacity = 1000
            self.data = [None] * self.capacity
            self.load_factor = 0.7
            self.size = 0
        
        def _hash(self, key):
            return key % self.capacity
        
        def _resize(self):
            if self.size / self.capacity >= self.load_factor:
                self.capacity *= 2
                old_data = self.data
                self.data = [None] * self.capacity
                self.size = 0
                for item in old_data:
                    if item is not None:
                        self.add(item)
        
        def add(self, key: int) -> None:
            if self.contains(key):
                return
            self._resize()
            idx = self._hash(key)
            while self.data[idx] is not None:
                idx = (idx + 1) % self.capacity # Collision handling
            self.data[idx] = key
            self.size += 1
        
        def remove(self, key: int) -> None:
            idx = self._hash(key)
            while self.data[idx] is not None:
                if self.data[idx] == key:
                    self.data[idx] = "DEL"  # Mark as deleted
                    self.size -= 1
                    return
                idx = (idx + 1) % self.capacity
        
        def contains(self, key: int) -> bool:
            idx = self._hash(key)
            while self.data[idx] is not None:
                if self.data[idx] == key:
                    return True
                idx = (idx + 1) % self.capacity
            return False
    
    
  • type MyHashSet struct {
    	data []bool
    }
    
    func Constructor() MyHashSet {
    	data := make([]bool, 1000010)
    	return MyHashSet{data}
    }
    
    func (this *MyHashSet) Add(key int) {
    	this.data[key] = true
    }
    
    func (this *MyHashSet) Remove(key int) {
    	this.data[key] = false
    }
    
    func (this *MyHashSet) Contains(key int) bool {
    	return this.data[key]
    }
    
    /**
     * Your MyHashSet object will be instantiated and called as such:
     * obj := Constructor();
     * obj.Add(key);
     * obj.Remove(key);
     * param_3 := obj.Contains(key);
     */
    
  • class MyHashSet {
        data: Array<boolean>;
        constructor() {
            this.data = new Array(10 ** 6 + 1).fill(false);
        }
    
        add(key: number): void {
            this.data[key] = true;
        }
    
        remove(key: number): void {
            this.data[key] = false;
        }
    
        contains(key: number): boolean {
            return this.data[key];
        }
    }
    
    /**
     * Your MyHashSet object will be instantiated and called as such:
     * var obj = new MyHashSet()
     * obj.add(key)
     * obj.remove(key)
     * var param_3 = obj.contains(key)
     */
    
    

All Problems

All Solutions