Formatted question description: https://leetcode.ca/all/392.html
392 Is Subsequence Given a string s and a string t, check if s is subsequence of t. You may assume that there is only lower case English letters in both s and t. t is potentially a very long (length ~= 500,000) string, and s is a short string (<=100). A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ace" is a subsequence of "abcde" while "aec" is not). Example 1: s = "abc", t = "ahbgdc" Return true. Example 2: s = "axc", t = "ahbgdc" Return false. Follow up: If there are lots of incoming S, say S1, S2, ... , Sk where k >= 1B, and you want to check one by one to see if T has its subsequence. In this scenario, how would you change your code? @tag-dp
The two pointers point to the strings s and t respectively, and then if the characters are equal, i and j are incremented by 1, otherwise only j increments by 1.
Finally, see if i is equal to the length of s, which means that s has been traversed, and the characters are all appeared in t.
Follow up in the topic said that if there are a large number of strings s, ask us how to optimize.
So since the string
t is always the same, we can do some pre-processing on t.
Although the subsequence does not need to be a contiguous substring, but the order between characters is required, then we can establish a direct mapping
- from: each character in the string
- to: its position
Since there may be repeated characters in t, all positions where the same character appears are added to an array in order.
Therefore, HashMap is used to establish the mapping between each character and its position array.
Then iterate over each character in the string s:
- For each traversed character c, we search in the corresponding character array in HashMap.
- Since the position array is ordered, we use
binary searchto speed up the search.
It should be noted here that because the subsequence is ordered, a variable
pre is needed to record the position of the current match in the t string. For the character c in the current s string, even if it exists in the t string, if It cannot be matched before the position pre.
So we can use uppper_bound() to find the first position greater than pre in binary, If it does not exist, return false directly, otherwise, update pre to the result of binary search and continue the loop.