Deletion Distance
Question
The deletion distance of two strings is the minimum number of characters you need to delete in the two strings in order to get the same string.
For instance, the deletion distance between "heat" and "hit" is 3:
By deleting 'e' and 'a' in "heat", and 'i' in "hit", we get the string "ht" in both cases.
Given the strings str1 and str2, write an efficient function deletionDistance that returns the deletion distance between them.
Example
input: str1 = "dog", str2 = "frog"
output: 3
input: str1 = "some", str2 = "some"
output: 0
input: str1 = "some", str2 = "thing"
output: 9
input: str1 = "", str2 = ""
output: 0
Solution
In order to determine the minimum number of delete operations needed, we can use the length of the longest common sequence among the two given strings s1 and s2, say given by lcs. If m and n are the lengths of two strings, then we will have:
We could use Dynamic Programming to get the lcs.
We make use of a 2-D array dp, in which dp[i][j] represents the length of the longest common subsequence among the strings s1 and s2 considering their lengths upto (i-1)th index and (j-1)th index only respectively.
We fill the dp array in row-by-row order.
In order to fill the entry for dp[i][j], we can have two cases:
- If the characters s1[i-1] and s2[j-1] match with each other. Then dp[i][j] = 1 + dp[i-1][j-1].
- If The characters s1[i-1] and s2[j-1] don't match each other. We need to remove the current character from either s1 or s2. So dp[i][j] = max(dp[i-1][j] , dp[i][j-1]).
Code
def lcs_dp(s1, s2, m, n):
dp = [[0 for _ in range(n + 1)] for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i - 1] == s2[j - 1]:
dp[i][j] = 1 + dp[i - 1][j - 1]
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
return dp[m][n]
def solution(s1, s2):
m, n = len(s1), len(s2)
lcs_result = lcs_dp(s1, s2, m, n)
return m + n - 2 * lcs_result
Time& Space Complexity
Time complexity :
O(mn). We need to fill in the dp array of size mn. Here, m and n refer to the lengths of s1 and s2.
Space complexity :
O(mn). dp array of size mn is used.