Longest Common Subsequence: LeetCode Guide & Examples
Longest Common Subsequence: LeetCode Guide & Examples
Hey guys! Ever stumbled upon the Longest Common Subsequence (LCS) problem on LeetCode and felt a bit lost? Don’t worry, you’re not alone! This problem can seem tricky at first, but with a clear explanation and some examples, you’ll be solving it like a pro in no time. In this article, we’re going to break down the LCS problem, walk through a dynamic programming approach to solve it, and provide a clear example with code snippets. So, let’s dive in and conquer this LeetCode challenge together!
Table of Contents
Understanding the Longest Common Subsequence Problem
Before we jump into the solution, let’s make sure we understand the problem clearly. The Longest Common Subsequence (LCS) problem is a classic computer science challenge that involves finding the longest sequence of characters that is common to two given strings. It’s important to note that a subsequence is different from a substring. A subsequence doesn’t require the characters to be contiguous in the original string, while a substring does. For example, if we have the strings “ABCDGH” and “AEDFHR”, the longest common subsequence is “ADH”. Notice how the characters ‘A’, ’D’, and ‘H’ appear in both strings in the same order, but they are not necessarily next to each other.
The Longest Common Subsequence (LCS) problem frequently appears in coding interviews and is a cornerstone for understanding dynamic programming techniques. To truly grasp this problem, it’s essential to distinguish between a subsequence and a substring. A subsequence is a sequence of characters derived from another string by deleting some or no characters without changing the order of the remaining characters. Think of it like picking letters out of a word while keeping them in the same order. On the other hand, a substring is a contiguous sequence of characters within a string. For instance, in the string “programming,” “program” is a substring, but “proram” is only a subsequence, as the ‘g’ is missing. This distinction is crucial because the LCS problem deals with subsequences, not substrings. Another key aspect to understand is the concept of “common.” When we talk about a common subsequence, we mean a sequence that exists in both of the given strings. The goal is to find the longest such sequence. For example, consider the strings “ABCDGH” and “AEDFHR” again. The common subsequences include “A”, “AD”, “AH”, and “ADH”. Among these, “ADH” is the longest, making it the LCS. Understanding these core concepts sets the stage for tackling the problem effectively. We need to find a method that not only identifies common elements but also ensures we capture the longest possible sequence. This is where dynamic programming shines, offering an efficient way to explore all possible subsequences and determine the longest one.
Dynamic Programming Approach to Solve LCS
The most efficient way to solve the
Longest Common Subsequence (LCS)
problem is by using dynamic programming. Dynamic programming is a powerful technique that breaks down a problem into smaller overlapping subproblems, solves each subproblem only once, and stores the solutions to avoid redundant computations. This approach is perfect for the LCS problem because we can build up the solution incrementally, considering smaller prefixes of the strings. So, how does dynamic programming work for the LCS problem? We create a 2D table (or matrix) to store the lengths of the longest common subsequences for different prefixes of the two strings. Let’s say we have two strings,
text1
and
text2
. We’ll create a table
dp
where
dp[i][j]
represents the length of the LCS of
text1[0...i-1]
and
text2[0...j-1]
. The dimensions of the table will be
(text1.length() + 1) x (text2.length() + 1)
. We add 1 to the lengths to accommodate the base case of empty prefixes. The table is initialized with zeros because the LCS of an empty string and any other string is zero. Now, we fill in the table using the following rules:
-
If
text1[i-1]is equal totext2[j-1], it means we’ve found a common character. So, we extend the LCS found so far by 1. In this case,dp[i][j] = dp[i-1][j-1] + 1. We look at the LCS of the prefixes without these characters and add 1 to include the current common character. -
If
text1[i-1]is not equal totext2[j-1], it means the current characters don’t match. So, the LCS will be the maximum of the LCS found by either excluding the character fromtext1or excluding the character fromtext2. In this case,dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]). We take the maximum because we want the longest possible subsequence.
By filling the table in this way, we ensure that each entry
dp[i][j]
contains the length of the LCS for the prefixes
text1[0...i-1]
and
text2[0...j-1]
. The final answer, the length of the LCS of the entire strings, will be stored in
dp[text1.length()][text2.length()]
. This dynamic programming approach efficiently explores all possible subsequences, ensuring we find the longest one without redundant calculations. The table acts as a memory, storing intermediate results and allowing us to build up the solution step by step. The beauty of this method lies in its ability to transform a seemingly complex problem into a series of simpler, interconnected subproblems, making it a powerful tool for solving sequence-related challenges.
Example with Code Snippets
Let’s walk through an example to make this even clearer. Suppose we have two strings:
text1 = "ABCDGH"
text2 = "AEDFHR"
We’ll create a
dp
table with dimensions 7x7 (6 characters + 1 for the empty prefix). Here’s how the table will be filled:
- Initialize the table with zeros.
- Iterate through the table, filling it according to the rules we discussed.
Here’s a snippet of Java code that implements the dynamic programming solution:
class Solution {
public int longestCommonSubsequence(String text1, String text2) {
int n = text1.length();
int m = text2.length();
int[][] dp = new int[n + 1][m + 1];
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= m; j++) {
if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1] + 1;
} else {
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
return dp[n][m];
}
}
This Java code perfectly embodies the dynamic programming approach we’ve discussed. Let’s break it down step by step to ensure we fully understand how it works. First, we define a class
Solution
with a method
longestCommonSubsequence
that takes two strings,
text1
and
text2
, as input. The goal is to find the length of their longest common subsequence. We start by getting the lengths of the input strings, storing them in the variables
n
and
m
respectively. These lengths will be used to determine the dimensions of our dynamic programming table. Next, we initialize the 2D integer array
dp
with dimensions
(n + 1) x (m + 1)
. As we discussed earlier, the extra row and column are to accommodate the base case of empty prefixes. Each cell
dp[i][j]
will store the length of the LCS of
text1.substring(0, i)
and
text2.substring(0, j)
. The nested
for
loops are where the magic happens. We iterate through the
dp
table, starting from index 1 to
n
for rows and 1 to
m
for columns. Inside the loops, we check if the characters
text1.charAt(i - 1)
and
text2.charAt(j - 1)
are equal. If they are, it means we’ve found a common character, and we can extend the LCS we’ve found so far. We update
dp[i][j]
by adding 1 to the value in the diagonally previous cell
dp[i - 1][j - 1]
. This is because
dp[i - 1][j - 1]
represents the LCS of the prefixes without these characters, and we’re now adding the common character to it. If the characters are not equal, we need to consider two possibilities: either the current character in
text1
is not part of the LCS, or the current character in
text2
is not part of the LCS. We take the maximum of the LCS lengths we get by excluding either character, which is represented by
Math.max(dp[i - 1][j], dp[i][j - 1])
. This ensures we choose the longest possible subsequence. Finally, after filling the entire
dp
table, the length of the LCS of the entire strings
text1
and
text2
is stored in
dp[n][m]
. We return this value as the result. The code efficiently computes the LCS length by systematically building up solutions to subproblems and storing them in the
dp
table. This avoids redundant computations and ensures we find the optimal solution. The use of a 2D array to represent the dynamic programming table is a classic technique for sequence alignment problems like this one, and understanding how this code works is a valuable skill for any programmer.
In this example, the
dp
table would eventually have a value of 3 in
dp[6][6]
, indicating that the
Longest Common Subsequence
is of length 3 (