Formatted question description: https://leetcode.ca/all/940.html

940. Distinct Subsequences II (Hard)

Given a string S, count the number of distinct, non-empty subsequences of S .

Since the result may be large, return the answer modulo 10^9 + 7.

Example 1:

Input: “abc”
Output: 7
Explanation: The 7 distinct subsequences are “a”, “b”, “c”, “ab”, “ac”, “bc”, and “abc”.

Example 2:

Input: “aba”
Output: 6
Explanation: The 6 distinct subsequences are “a”, “b”, “ab”, “ba”, “aa” and “aba”.

Example 3:

Input: “aaa” Output: 3
Explanation: The 3 distinct subsequences are “a”, “aa” and “aaa”.

Note:

  1. S contains only lowercase letters.
  2. 1 <= S.length <= 2000

Thoughts

There are 2^N - 1 subsequences because for each letter you can either choose or not choose it which results in 2^N and -1 is to remove empty string.

One naive solution is to generate all the subsequences and deduplicate using unordered_set. But this is too time and space inefficient.

Can we reduce the complexity?

Try divide and conquer?

Split S into S1 and S2.

Example:
aaab => aa and ab
R(aaab) = a, aa, aaa, ab, aab, aaab
R(aa) = a, aa
R(ab) = a, b, ab
R(aaab) is not a simple addition or multiplication of R(aa) and R(ab).

This idea doesn’t seem to work :(

Solution 1. DP

Try consume the letters one by one?

When we visit a new letter S[i], we have two options – pick it or don’t pick it. Denote the results as P(i) and N(i) respectively.

  1. If don’t pick, then N(i) = N(i - 1) + P(i - 1).
  2. If pick, we can try to append S[i] to all the subsequences previously generated, yielding N(i - 1) + P(i - 1) subsequences. But all the previously generated subsequences ending with the same letter as S[i] will be duplicate with the subsequences ending with exactly this S[i]. So P(i) = N(i - 1) + P(i - 1) - Sum{P(k) | where S[k] == S[i]}.

Let’s use 2-d array dp to store the results. dp[1][i] means “pick S[i - 1]” while dp[0][i] means “don’t pick S[i - 1]”, 1 <= i <= N.

// For 1 <= i <= N
dp[0][i] = dp[0][i - 1] + dp[1][i - 1]
dp[1][i] = dp[0][i] - Sum{ dp[1][k] | where S[k - 1] == S[i - 1] }

dp[0][0] = 1
dp[1][0] = 0

It doesn’t matter dp[0][0] or dp[0][1] as long as only one equals 1.

// OJ: https://leetcode.com/problems/distinct-subsequences-ii/

// Time: O(N^2)
// Space: O(N)
class Solution {
public:
    int distinctSubseqII(string S) {
        int N = S.size(), mod = pow(10, 9) + 7;
        vector<vector<long long>> dp(2, vector<long long>(N + 1, 0));
        dp[0][0] = 1;
        for (int i = 1; i <= N; ++i) {
            dp[0][i] = (dp[0][i - 1] + dp[1][i - 1]) % mod;
            dp[1][i] = dp[0][i];
            for (int j = 1; j < i; ++j) {
                if (S[i - 1] == S[j - 1]) dp[1][i] = (dp[1][i] - dp[1][j] + mod) % mod;
            }
        }
        return (dp[0][N] + dp[1][N] - 1) % mod;
    }
};

Solution 2.

In the formula of Solution 1, we notice that dp[?][i] almost just rely on the dp[?][i - 1].

The dp array is mainly for computing the the sum of duplicates. And the iterative summation

If we store the sum of previous dp[1][k] where S[k - 1] == S[i - 1], we can reduce the 2N extra space to constant space.

// OJ: https://leetcode.com/problems/distinct-subsequences-ii/

// Time: O(N)
// Space: O(1)
class Solution {
public:
    int distinctSubseqII(string S) {
        int N = S.size(), mod = pow(10, 9) + 7;
        int picks[26] = {0};
        int pick = 0, noPick = 1;
        for (int i = 0; i < N; ++i) {
            noPick = (noPick + pick) % mod;
            pick = (noPick - picks[S[i] - 'a'] + mod) % mod;
            picks[S[i] - 'a'] = (picks[S[i] - 'a'] + pick) % mod;
        }
        return (pick + noPick - 1) % mod;
    }
};

Solution 3.

Same idea as Solution 2, but shorter and unnoticeably slower (it’s O(26N) while solution 2 is real O(N)).

// OJ: https://leetcode.com/problems/distinct-subsequences-ii/

// Time: O(N)
// Space: O(1)
// Ref: https://leetcode.com/problems/distinct-subsequences-ii/discuss/192017/C%2B%2BJavaPython-4-lines-O(N)-Time-O(1)-Space
class Solution {
public:
    int distinctSubseqII(string S) {
        long endsWith[26] = {}, mod = 1e9 + 7;
        for (char c : S)
            endsWith[c - 'a'] = accumulate(begin(endsWith), end(endsWith), 1L) % mod;
        return accumulate(begin(endsWith), end(endsWith), 0L) % mod;
    }
};

Solution 4

In solution 2 we notice that we always sum pick and noPick, meaning we only care about the sum.

// OJ: https://leetcode.com/problems/distinct-subsequences-ii/

// Time: O(N)
// Space: O(1)
// Ref: https://leetcode.com/problems/distinct-subsequences-ii/discuss/192095/C%2B%2B-O(n)-or-O-(n)-Geeks4Geeks-improved-to-O(n)-or-O(1)
class Solution {
public:
    int distinctSubseqII(string S) {
        int mod = pow(10, 9) + 7, picks[26] = {0}, sum = 1;
        for (char c : S) {
            int oldSum = sum;
            sum = (sum * 2 % mod - picks[c - 'a'] + mod) % mod;
            picks[c - 'a'] = oldSum;
        }
        return sum - 1;
    }
};

Java

class Solution {
    public int distinctSubseqII(String S) {
        final int MODULO = 1000000007;
        int length = S.length();
        int[] dp = new int[length + 1];
        dp[0] = 1;
        int[] last = new int[26];
        Arrays.fill(last, -1);
        for (int i = 0; i < length; i++) {
            int index = S.charAt(i) - 'a';
            dp[i + 1] = dp[i] * 2 % MODULO;
            if (last[index] >= 0)
                dp[i + 1] -= dp[last[index]];
            dp[i + 1] = (dp[i + 1] + MODULO) % MODULO;
            last[index] = i;
        }
        int distinctSubsequences = (dp[length] - 1 + MODULO) % MODULO;
        return distinctSubsequences;
    }
}

All Problems

All Solutions