Welcome to Subscribe On Youtube

1698. Number of Distinct Substrings in a String

Description

Given a string s, return the number of distinct substrings of s.

A substring of a string is obtained by deleting any number of characters (possibly zero) from the front of the string and any number (possibly zero) from the back of the string.

 

Example 1:

Input: s = "aabbaba"
Output: 21
Explanation: The set of distinct strings is ["a","b","aa","bb","ab","ba","aab","abb","bab","bba","aba","aabb","abba","bbab","baba","aabba","abbab","bbaba","aabbab","abbaba","aabbaba"]

Example 2:

Input: s = "abcdefg"
Output: 28

 

Constraints:

  • 1 <= s.length <= 500
  • s consists of lowercase English letters.

 

Follow up: Can you solve this problem in O(n) time complexity?

Solutions

Solution 1: Brute Force Enumeration

Enumerate all substrings and use a hash table to record the count of different substrings.

The time complexity is $O(n^3)$, and the space complexity is $O(n^2)$. Here, $n$ is the length of the string.

Solution 2: String Hashing

String hashing is a method to map a string of any length to a non-negative integer, and the probability of collision is almost zero. String hashing is used to calculate the hash value of a string, which can quickly determine whether two strings are equal.

We take a fixed value BASE, treat the string as a number in BASE radix, and assign a value greater than 0 to represent each character. Generally, the values we assign are much smaller than BASE. For example, for a string composed of lowercase letters, we can set a=1, b=2, …, z=26. We take a fixed value MOD, calculate the remainder of the BASE radix number to MOD, and use it as the hash value of the string.

Generally, we take BASE=131 or BASE=13331, at which point the probability of collision of the hash value is extremely low. As long as the hash values of two strings are the same, we consider the two strings to be equal. Usually, MOD is taken as 2^64. In C++, we can directly use the unsigned long long type to store this hash value. When calculating, we do not handle arithmetic overflow. When overflow occurs, it is equivalent to automatically taking the modulus of 2^64, which can avoid inefficient modulus operations.

Except for extremely specially constructed data, the above hash algorithm is unlikely to cause collisions. In general, the above hash algorithm can appear in the standard answer of the problem. We can also take some appropriate BASE and MOD values (such as large prime numbers), perform several groups of hash operations, and only consider the original strings equal when the results are all the same, making it even more difficult to construct data that causes this hash to produce errors.

The time complexity is $O(n^2)$, and the space complexity is $O(n^2)$. Here, $n$ is the length of the string.

Solution 3: Trie

Use a Trie, and every time a new Trie node created, meaning a new substring.

More on complexity analysis of Trie solution:

  • The Trie construction involves iterating over all suffixes of the string and, for each suffix, possibly traversing and inserting characters into the Trie. The number of operations is tied to the total length of all suffixes, which in theory gives a complexity of (O(n^2)) for string length (n).
  • However, due to the efficient nature of Trie operations (where each character insertion/check is (O(1)) assuming a fixed character set), and the fact that many common substrings in the suffixes do not lead to repeated insertions after the first occurrence, the practical performance approaches (O(n)) for inserting all characters of the string into the Trie.
  • Note that the theoretical worst-case time complexity might not strictly be (O(n)), but the Trie approach significantly reduces redundant comparisons between substrings, making it highly efficient for this problem.
  • class Solution {
        public int countDistinct(String s) {
            Set<String> ss = new HashSet<>();
            int n = s.length();
            for (int i = 0; i < n; ++i) {
                for (int j = i + 1; j <= n; ++j) {
                    ss.add(s.substring(i, j));
                }
            }
            return ss.size();
        }
    }
    
    //////
    
    class Solution:
        def countDistinct(self, s: str) -> int:
            base = 131
            n = len(s)
            p = [0] * (n + 10)
            h = [0] * (n + 10)
            p[0] = 1
            for i, c in enumerate(s):
                p[i + 1] = p[i] * base
                h[i + 1] = h[i] * base + ord(c)
            ss = set()
            for i in range(1, n + 1):
                for j in range(i, n + 1):
                    t = h[j] - h[i - 1] * p[j - i + 1]
                    ss.add(t)
            return len(ss)
    
    
    //////
    
    public class Solution {
        class Trie {
            Trie[] children = new Trie[26];
        }
    
        public int countDistinct(String s) {
            Trie root = new Trie();
            Trie current;
            int count = 0;
            for (int i = 0; i < s.length(); i++) {
                current = root;
                for (int j = i; j < s.length(); j++) {
    
                    if (current.children[s.charAt(j) - 'a'] == null) {
                        current.children[s.charAt(j) - 'a'] = new Trie();
                        count++;
                    }
    
                    current = current.children[s.charAt(j) - 'a'];
                }
            }
    
            return count;
        }
    }
    
    
  • class Solution {
    public:
        int countDistinct(string s) {
            unordered_set<string_view> ss;
            int n = s.size();
            string_view t, v = s;
            for (int i = 0; i < n; ++i) {
                for (int j = i + 1; j <= n; ++j) {
                    t = v.substr(i, j - i);
                    ss.insert(t);
                }
            }
            return ss.size();
        }
    };
    
  • class Solution:
        def countDistinct(self, s: str) -> int:
            n = len(s)
            return len({s[i:j] for i in range(n) for j in range(i + 1, n + 1)})
    
    ############
    
    class Trie:
        def __init__(self):
            self.children = [None] * 26
    
    class Solution:
        def countDistinct(self, s: str) -> int:
            root = Trie()
            count = 0
    
            for i in range(len(s)):
                current = root
                for j in range(i, len(s)):
                    index = ord(s[j]) - ord('a')
    
                    if current.children[index] is None:
                        current.children[index] = Trie()
                        count += 1
    
                    current = current.children[index]
    
            return count
    
    ############
    
    class Solution:
        def countDistinct(self, s: str) -> int:
            base = 131
            n = len(s)
            p = [0] * (n + 10)
            h = [0] * (n + 10)
            p[0] = 1
            for i, c in enumerate(s):
                p[i + 1] = p[i] * base
                h[i + 1] = h[i] * base + ord(c)
            ss = set()
            for i in range(1, n + 1):
                for j in range(i, n + 1):
                    t = h[j] - h[i - 1] * p[j - i + 1]
                    ss.add(t)
            return len(ss)
    
  • func countDistinct(s string) int {
    	ss := map[string]struct{}{}
    	for i := range s {
    		for j := i + 1; j <= len(s); j++ {
    			ss[s[i:j]] = struct{}{}
    		}
    	}
    	return len(ss)
    }
    

All Problems

All Solutions