Dynamic programming (DP) is a group of very useful algorithms to solve searching problems. In many cases, it is easy to realize that a particular problem can be solved in DP, but you may spend a lot of time on finding the iterative equations. Distinct Subsequences is one such problem.
Here is the description from leetcode.
Given a string S and a string T, count the number of distinct subsequences of T in S.
A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie,
"ACE"
is a subsequence of"ABCDE"
while"AEC"
is not).
Here is an example:
S ="rabbbit"
, T ="rabbit"
Return
3
.
If you are sensitive enough, "two strings" and "subsequence" will guide you to DP immediately. From some previous experience like Edit Distance, you know it is a good idea to construct an m
by n
matrix, where m = T.length()
and n = S.length()
. Each element in the matrix, say matrix[i][j]
, represents the number of distinct subsequences of string T[0:i+1]
and S[0:j+1]
. Now comes the hard part: how to fill in the matrix?
The matrix can be filled row by row, i.e. each time we add a single character in T
and evaluate the number of distinct subsequences for all prefix of S
. If T[i] != S[j]
, since T[0:i+1]
is a subsequence of S[0:j+1]
(if exists), the new character T[i]
should be matched before S[j]
. As a result we have the equation matrix[i][j] = matrix[i][j-1]
.
If T[i] == S[j]
, there are two possible cases: (1) T[i]
is the character that matches to S[j]
, so we should throw away both T[i]
and S[j]
, and the number of subsequences should equal to matrix[i-1][j-1]
; or (2) T[i]
is matched to a character before S[j]
, where we can get the number from matrix[i][j-1]
, as we have illustrated before. The total number of distinct subsequences is the sum of those two cases, so the equation should be matrix[i][j] = matrix[i-1][j-1] + matrix[i][j-1]
.
To summarize, the core logic of this DP problem is shown in the following code block
1 | if (S[j] != T[i]) |
Since we only use two adjacent rows in the matrix at any time, the memory consumption can be cut down to $O(n)$ instead of $O(mn)$.
In conclusion, the difficulty of this common DP problem stands on finding the correct iterative equations. Some patterns are well-known for similar problems, such as the allocation of m
by n
matrix. However you will need to try hard to understand the exact meaning of a cell, and how to connect the meaning among multiple cells.