240. Shortest Common Supersequence
๐ LeetCode Problem: 1092. Shortest Common Supersequence
๐ Difficulty: Hard
๐ท๏ธ Topics: Dynamic Programming, String, DP - Strings
Problem Statement
Given two strings str1 and str2, return the shortest string that has both str1 and str2 as subsequences. If there are multiple valid strings, return any of them.
A string s is a subsequence of string t if deleting some number of characters from t (possibly 0) results in the string s.
Example 1:
Input: str1 = "abac", str2 = "cab"
Output: "cabac"
Explanation:
str1 = "abac" is a subsequence of "cabac" because we can delete the first "c".
str2 = "cab" is a subsequence of "cabac" because we can delete the last "ac".
The answer provided is the shortest such string that satisfies these properties.
Example 2:
Input: str1 = "aaaaaaaa", str2 = "aaaaaaaa"
Output: "aaaaaaaa"
Constraints:
- 1 <= str1.length, str2.length <= 1000
- str1 and str2 consist of lowercase English letters.
๐ง Let's Build Understanding from Absolute Scratch
What Does This Problem REALLY Ask?
Forget "supersequence" for a moment. Let me understand what we're doing:
Given: str1 = "ab", str2 = "bc"
Goal: Find a string that CONTAINS both as subsequences
Let me try:
"abc" โ Can I find "ab"? a[b]c โ Yes! โ
Can I find "bc"? a[bc] โ Yes! โ
So "abc" works! โ
But wait, let me try another:
"abbc" โ Can I find "ab"? [ab]bc โ Yes! โ
Can I find "bc"? ab[bc] โ Yes! โ
So "abbc" also works! โ
Even "abbcc" works! And "aabbcc"! Many solutions!
INSIGHT: There are MANY valid supersequences!
But we want the SHORTEST one! ๐
For "ab" and "bc":
- "abc" has length 3 โ (shortest!)
- "abbc" has length 4
- "abbcc" has length 5
The shortest is "abc"!
KEY INSIGHT: We're finding a string that's a "container" for both!
The Hidden Pattern - What Are We REALLY Finding?
Let me think about this differently:
Naive approach: Just concatenate!
str1 = "ab"
str2 = "bc"
Concatenate: "ab" + "bc" = "abbc" (length 4)
But we found "abc" (length 3) works!
What happened?
str1 = "ab" โ a b
str2 = "bc" โ b c
They both have 'b'!
In "abc", we only included 'b' ONCE!
a b c
โ โ โ
| shared
from both strings
EUREKA MOMENT:
We can SHARE common characters!
Instead of: a b + b c = a b b c (4 chars)
We did: a b c (3 chars, sharing 'b')
But which characters can we share? ๐ค
The characters that appear in BOTH strings IN ORDER!
This is... LONGEST COMMON SUBSEQUENCE! ๐ฏ
๐ค Why Longest Common Subsequence? Let Me Prove It!
Discovering the Formula
Let me analyze with examples:
Example 1: str1 = "abac", str2 = "cab"
Length of str1: 4
Length of str2: 3
What's common in order?
str1 = "abac" โ [ab]ac
str2 = "cab" โ c[ab]
LCS = "ab" (length 2)
What's the shortest supersequence?
Result: "cabac" (length 5)
Notice: 4 + 3 - 2 = 5! ๐ฏ
Example 2: str1 = "abc", str2 = "def"
No common characters!
LCS = "" (length 0)
Supersequence: "abcdef" (length 6)
Check: 3 + 3 - 0 = 6! โ
Example 3: str1 = "abc", str2 = "abc"
Identical strings!
LCS = "abc" (length 3)
Supersequence: "abc" (length 3)
Check: 3 + 3 - 3 = 3! โ
FORMULA DISCOVERED:
Shortest supersequence length = len(str1) + len(str2) - len(LCS)
But WHY does this work mathematically?
Mathematical Proof - Why The Formula Is Correct
THEOREM:
|Shortest Supersequence| = |str1| + |str2| - |LCS|
PROOF (Part 1 - We CAN achieve this length):
Construction Proof:
Let LCS have length k
Let str1 have length m
Let str2 have length n
Strategy to build supersequence:
1. Include ALL characters from str1: m characters
2. Include characters from str2 NOT in LCS: (n - k) characters
3. Total: m + (n - k) = m + n - k โ
Why does this contain str1?
โ We included every character from str1! โ
Why does this contain str2?
โ LCS characters come from str1 (in correct order)
โ Non-LCS characters we added from str2
โ Together they form complete str2! โ
Therefore: We CAN build supersequence with m + n - k characters! โ
PROOF (Part 2 - We CANNOT do better):
Proof by Contradiction:
Suppose there exists supersequence S with |S| < m + n - k
S must contain all m characters from str1 (as subsequence)
S must contain all n characters from str2 (as subsequence)
Some characters in S serve BOTH strings (shared)
Maximum sharing = LCS (by definition of "longest")
Minimum characters needed:
= characters needed for str1
+ characters needed for str2
- characters that can be shared
= m + n - k
So |S| โฅ m + n - k
Contradiction with assumption |S| < m + n - k! โ
Therefore: We CANNOT do better than m + n - k! โ
QED: The formula m + n - LCS is OPTIMAL! โ
The Intuition Behind This
Think of it like set theory:
Supersequence = Union of both strings (in sequence form)
Naive union: Just add everything
|str1| + |str2| = m + n
Smart union: Don't duplicate the common part
|str1| + |str2| - |common part|
= m + n - LCS
This is exactly like the set formula:
|A โช B| = |A| + |B| - |A โฉ B|
But for sequences!
|Supersequence| = |str1| + |str2| - |LCS|
Beautiful mathematical parallel! ๐ฏ
๐ก Building the Solution Step by Step
Step 1: Reframe the Problem
ORIGINAL PROBLEM:
"Find shortest string containing both str1 and str2 as subsequences"
REFRAMED PROBLEM:
"Find longest common subsequence (LCS)"
"Use formula: length = len(str1) + len(str2) - LCS"
"Build the actual string using LCS structure"
Why this helps?
LCS is a CLASSIC DP problem!
We know how to solve it! โ
But wait... the problem asks for the STRING, not just length!
So we need to CONSTRUCT it!
This requires more than just LCS length!
We need to understand the LCS STRUCTURE! ๐
Step 2: Understanding LCS Structure
When we compute LCS, we make DECISIONS:
- If characters match โ include in LCS
- If they don't match โ skip one
Example: str1 = "ab", str2 = "bc"
Decisions made:
Position (0,0): 'a' vs 'b' โ different, skip one
Position (1,0): 'b' vs 'b' โ match! include
Position (1,1): 'b' vs 'c' โ different, skip one
These decisions tell us:
- Which characters are SHARED (in LCS)
- Which characters are UNIQUE to each string
For supersequence:
- SHARED characters: include ONCE
- UNIQUE characters: must include separately
The DP table RECORDS these decisions! โ
Step 3: How Do We Find LCS?
This is where DP comes in!
Define: LCS(i, j) = LCS of str1[i...] and str2[j...]
Base cases:
If i == len(str1): return 0 (no characters left)
If j == len(str2): return 0 (no characters left)
Recursive cases:
Case 1: str1[i] == str2[j]
Characters match!
They're BOTH in LCS!
LCS(i, j) = 1 + LCS(i+1, j+1)
Why? Include this character, solve rest!
Case 2: str1[i] != str2[j]
Characters don't match!
At least ONE must be skipped!
Option A: Skip str1[i]
LCS(i+1, j)
Option B: Skip str2[j]
LCS(i, j+1)
LCS(i, j) = max(Option A, Option B)
Why max? We want the LONGEST!
This is the foundation! โ
๐ด Approach 1: Pure Recursion - Understanding the Pattern
The Recursive Insight for Building
I know LCS tells me which characters to share.
But how do I actually BUILD the supersequence?
Let me think recursively:
Compare str1[i] and str2[j]:
Case 1: str1[i] == str2[j]
Same character!
This is SHARED in both!
Include ONCE in supersequence!
Result: str1[i] + buildSupersequence(i+1, j+1)
Why? Include the shared character, then solve rest!
Case 2: str1[i] != str2[j]
Different characters!
Both must be in supersequence!
But which first?
I don't know! Try BOTH:
Option A: str1[i] first
str1[i] + buildSupersequence(i+1, j)
Option B: str2[j] first
str2[j] + buildSupersequence(i, j+1)
Return whichever is SHORTER!
Base cases:
If i == len(str1): return str2[j...] (add remaining)
If j == len(str2): return str1[i...] (add remaining)
This is a complete recursive definition! โ
Visualizing the Recursion Tree
Example: str1 = "ab", str2 = "bc"
buildSCS(0,0)
str1[0]='a', str2[0]='b'
Different! Try both โ
'a' + buildSCS(1,0) 'b' + buildSCS(0,1)
/ \
str1[1]='b' str1[0]='a'
str2[0]='b' str2[1]='c'
Match! โ Different!
'b' + buildSCS(2,1) 'a' + buildSCS(1,1) 'c' + buildSCS(0,2)
buildSCS(2,1): buildSCS(1,1): buildSCS(0,2):
i=2 (end of str1) str1[1]='b' j=2 (end of str2)
return str2[1..] = "c" str2[1]='c' return str1[0..] = "ab"
Result: "abc" Different! Result: "cab"
Try both...
Results: "abbc" or "abc"
Return: "abc"
Left path: "a" + "b" + "c" = "abc" (length 3) โ
Right path: "b" + "a" + "c" = "bac" or similar...
Both find length 3! Return shorter/first!
OBSERVATION: Same (i,j) computed MULTIPLE times!
Overlapping subproblems! ๐
Implementation
class Solution {
public String shortestCommonSupersequence(String str1, String str2) {
return buildSCS(str1, str2, 0, 0);
}
private String buildSCS(String str1, String str2, int i, int j) {
// Base case: reached end of str1
if (i == str1.length()) {
return str2.substring(j);
}
// Base case: reached end of str2
if (j == str2.length()) {
return str1.substring(i);
}
// Case 1: Characters match - include ONCE
if (str1.charAt(i) == str2.charAt(j)) {
return str1.charAt(i) + buildSCS(str1, str2, i + 1, j + 1);
}
// Case 2: Characters don't match - try BOTH
String option1 = str1.charAt(i) + buildSCS(str1, str2, i + 1, j);
String option2 = str2.charAt(j) + buildSCS(str1, str2, i, j + 1);
// Return shorter one
return option1.length() < option2.length() ? option1 : option2;
}
}
Why This Is Too Slow
TIME COMPLEXITY ANALYSIS:
At each non-matching position:
We make 2 recursive calls
Recursion tree grows exponentially!
(0,0)
/ \
(1,0) (0,1)
/ \ / \
(2,0)(1,1)(1,1)(0,2)
โ โ
SAME STATE called TWICE!
Number of unique states: m ร n
But each state called MANY times!
Worst case: O(2^(m+n)) โ
For m = n = 100:
2^200 operations! โ
SPACE COMPLEXITY:
Call stack depth: O(m + n)
Plus string concatenation overhead!
OBSERVATION:
Same (i,j) computed multiple times!
Overlapping subproblems!
This screams MEMOIZATION! ๐
But wait... can we memoize STRINGS?
๐ก Approach 2: Memoization - The Challenge
The Problem with Memoizing Strings
Normal memoization pattern:
memo[i][j] = result for state (i,j)
For this problem:
memo[i][j] = supersequence string for str1[i..] and str2[j..]
But strings are LARGE objects!
Example: str1 = "abc...xyz" (1000 chars)
str2 = "pqr...tuv" (1000 chars)
Number of states: 1000 ร 1000 = 1,000,000
Each state stores a string of length ~2000
Memory needed: 1,000,000 ร 2000 = 2 billion characters! โ
String operations (concat, copy, compare) are expensive!
This is INEFFICIENT! ๐ค
So how do we optimize?
The Key Insight: Separate Structure from Construction!
REALIZATION:
What if I separate the problem into TWO phases?
PHASE 1: Find the STRUCTURE (LCS)
Compute: LCS length (just a number!)
Store: DP table with INTEGER values
Memory: m ร n integers (efficient!)
PHASE 2: BUILD the string using the structure
Use: The DP table decisions
Method: Traceback!
Time: Single pass O(m + n)
This is MUCH better! โ
Why does this work?
The DP table tells us:
- Which characters match (part of LCS)
- Which characters don't match (unique to each)
By following the decisions in the table,
we can reconstruct the optimal supersequence! ๐ฏ
Let me understand both phases deeply!
๐ข Approach 3: Bottom-Up DP - The Complete Journey
Phase 1: Build LCS DP Table - Understanding Each Cell
Let me build the LCS table for: str1 = "ab", str2 = "bc"
I'll understand not just VALUES but also MEANING!
Initial state:
"" b c
"" 0 0 0
a 0 ? ?
b 0 ? ?
Definition: dp[i][j] = LCS length of first i chars of str1, first j chars of str2
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Cell dp[1][1]: Comparing "a" with "b"
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
str1[0]='a', str2[0]='b' โ Different! โ
Question: What's LCS of "a" and "b"?
No common characters!
LCS = 0
How do I compute from previous cells?
Option A: LCS of "" and "b" = dp[0][1] = 0
(Skip 'a' from str1, keep 'b')
Option B: LCS of "a" and "" = dp[1][0] = 0
(Keep 'a' from str1, skip 'b')
Take max: dp[1][1] = max(0, 0) = 0
INSIGHT:
When chars don't match, LCS comes from:
- Either skipping current char of str1, OR
- Skipping current char of str2
We take the BETTER option! โ
Decision recorded: We could skip either (both give 0)
"" b c
"" 0 0 0
a 0 0 ?
b 0 ? ?
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Cell dp[1][2]: Comparing "a" with "bc"
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
str1[0]='a', str2[1]='c' โ Different! โ
LCS of "a" and "bc"?
Still no common characters!
LCS = 0
Options:
Option A: LCS of "" and "bc" = dp[0][2] = 0
Option B: LCS of "a" and "b" = dp[1][1] = 0
dp[1][2] = max(0, 0) = 0
"" b c
"" 0 0 0
a 0 0 0
b 0 ? ?
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Cell dp[2][1]: Comparing "ab" with "b"
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
str1[1]='b', str2[0]='b' โ MATCH! โโโ
This is IMPORTANT!
LCS of "ab" and "b"?
'b' is common! LCS = "b" (length 1)
How do I compute?
Since they MATCH, both can be in LCS!
LCS = 1 + LCS of strings WITHOUT these matching chars
dp[2][1] = 1 + dp[1][0] = 1 + 0 = 1
INSIGHT:
When chars MATCH:
- They're BOTH in the LCS!
- Add 1 to the LCS of remaining strings
- Look at DIAGONAL cell (i-1, j-1)! โ
Decision recorded: Include 'b' in LCS!
"" b c
"" 0 0 0
a 0 0 0
b 0 1 ?
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Cell dp[2][2]: Comparing "ab" with "bc"
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
str1[1]='b', str2[1]='c' โ Different! โ
LCS of "ab" and "bc"?
'b' is common! LCS = "b" (length 1)
Options:
Option A: LCS of "a" and "bc" = dp[1][2] = 0
(Skip 'b' from str1)
Option B: LCS of "ab" and "b" = dp[2][1] = 1
(Skip 'c' from str2)
dp[2][2] = max(0, 1) = 1
Which option is better? Option B!
INTERPRETATION:
dp[2][1] = 1 is better than dp[1][2] = 0
So we DON'T skip 'b' from str1
We DO skip 'c' from str2
Decision recorded: Skip str2[1]='c'!
"" b c
"" 0 0 0
a 0 0 0
b 0 1 1
LCS length = dp[2][2] = 1 โ
LCS string = "b"
Understanding "Came From" - The Most Important Concept!
This is CRITICAL for building the supersequence!
When we computed dp[2][2] = 1:
We had two options:
dp[1][2] = 0 (from top)
dp[2][1] = 1 (from left)
We chose dp[2][1] because 1 > 0 (better LCS)
This choice means:
"The LCS of 'ab' and 'bc' CAME FROM 'ab' and 'b'"
"We DIDN'T need the 'c' from str2"
"So 'c' is UNIQUE to str2, not shared!"
Geometrically:
dp[2][2] โ dp[2][1]
We moved LEFT (from column 1 to column 2)
This is what "came from left" means! ๐
Let me visualize ALL cells and their sources:
"" b c
"" 0 0 0
a 0 0โ 0โ
โ โ
b 0 1โ 1
โ
Arrows show WHERE each value came from:
dp[1][1] = 0: Came from top or left (both 0)
Arrow: โ (let's say top)
dp[1][2] = 0: Came from left
Arrow: โ (left)
dp[2][1] = 1: Came from diagonal (match!)
Arrow: โ (diagonal)
dp[2][2] = 1: Came from left (1 > 0)
Arrow: โ (left)
These arrows show the PATH of optimal decisions! โ
WHY are these arrows important?
Because they tell us WHICH CHARACTERS are:
- SHARED (diagonal arrows - in LCS)
- UNIQUE to str1 (came from top - skipped from str1)
- UNIQUE to str2 (came from left - skipped from str2)
This is the KEY to building the supersequence! ๐ฏ
The Three Arrow Patterns - Complete Visual Understanding
When filling the DP table, THREE patterns emerge:
PATTERN 1: Diagonal Arrow โ (Characters MATCH)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
if (str1[i-1] == str2[j-1]):
dp[i][j] = 1 + dp[i-1][j-1]
Visual:
j-1 j
i-1 X
โ
i [i,j]
What it means:
- These characters are THE SAME!
- They're part of LCS (shared)
- Value comes from diagonal
Example: str1[1]='b', str2[0]='b'
Both are 'b' โ Match!
dp[2][1] = 1 + dp[1][0]
Arrow: โ from (1,0) to (2,1)
PATTERN 2: Vertical Arrow โ (Came from TOP)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
if (dp[i-1][j] > dp[i][j-1]):
dp[i][j] = dp[i-1][j]
Visual:
j-1 j
i-1 [X]
โ
i [i,j]
What it means:
- Better LCS came from ABOVE
- str1[i-1] was SKIPPED in LCS
- str1[i-1] is UNIQUE to str1!
Example: If dp[1][2]=2 > dp[2][1]=1
Value comes from top
str1[1] was skipped
Arrow: โ from (1,2) to (2,2)
PATTERN 3: Horizontal Arrow โ (Came from LEFT)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
if (dp[i][j-1] >= dp[i-1][j]):
dp[i][j] = dp[i][j-1]
Visual:
j-1 j
i-1
i [X] โ [i,j]
What it means:
- Better LCS came from LEFT
- str2[j-1] was SKIPPED in LCS
- str2[j-1] is UNIQUE to str2!
Example: At dp[2][2], dp[2][1]=1 >= dp[1][2]=0
Value comes from left
str2[1]='c' was skipped
Arrow: โ from (2,1) to (2,2)
SUMMARY:
Diagonal โ: Match (shared in LCS)
Vertical โ: Skip from str1 (unique to str1)
Horizontal โ: Skip from str2 (unique to str2)
These patterns are EVERYTHING for traceback! ๐
Phase 2: Traceback - Following Arrows Backwards!
Now I have the DP table with its "arrows" (decisions).
To build the supersequence, I FOLLOW THE ARROWS BACKWARDS!
Why backwards?
- DP table is built FORWARD (small to large)
- Final answer is at dp[m][n] (bottom-right)
- We trace back to dp[0][0] (top-left)
- This is END โ START (backwards!)
Start at dp[m][n] (full strings)
End at dp[0][0] (empty strings)
Let me trace for str1 = "ab", str2 = "bc":
Table with arrows:
"" b c
"" 0 0 0
a 0 0โ 0โ
โ โ
b 0 1โ 1
โ
Current position: (2,2), result = ""
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Step 1: At dp[2][2] = 1
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Looking at current characters:
str1[1]='b', str2[1]='c' โ Different
Checking where value came from:
dp[1][2]=0 (from top?)
dp[2][1]=1 (from left?)
1 > 0, so came from LEFT! โ
What does "came from left" mean?
We moved from column 1 to column 2
We added column 2 (str2[1]='c')
But 'c' was SKIPPED in LCS (not shared)!
So 'c' is UNIQUE to str2!
Action: Add 'c' to supersequence
result = "c"
reason = "c is unique to str2, must include it"
Move: LEFT to dp[2][1]
i stays 2, j becomes 1
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Step 2: At dp[2][1] = 1
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Looking at current characters:
str1[1]='b', str2[0]='b' โ MATCH! โ
When characters match:
They're in LCS (shared)!
Value came from diagonal
This 'b' is SHARED between both strings!
Action: Add 'b' to supersequence (ONCE, not twice!)
result = "cb"
reason = "b is shared in LCS, include once"
Move: DIAGONAL to dp[1][0]
i becomes 1, j becomes 0
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Step 3: At dp[1][0] = 0
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
j=0 means we've finished with str2!
But i=1, so str1 still has characters left
Remaining: str1[0]='a'
Action: Add all remaining from str1
result = "cba"
reason = "str2 done, add rest of str1"
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Step 4: Reverse the result!
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
We built from END to START!
result = "cba"
Reverse: "abc" โ
Verify:
str1 "ab" in "abc"? [ab]c โ
str2 "bc" in "abc"? a[bc] โ
Length: 2 + 2 - 1 = 3 โ
Perfect! โ
Why Do We Reverse? - Complete Understanding
QUESTION: Why do we build backwards then reverse?
ANSWER: Because of how DP works!
DP Table Construction:
Start: dp[0][0] (empty strings)
End: dp[m][n] (full strings)
Direction: START โ END (forward)
DP Table Values:
dp[0][0]: LCS of "" and "" = 0
dp[m][n]: LCS of full str1 and full str2 = final answer
Traceback:
Start: dp[m][n] (where the answer is!)
End: dp[0][0] (base case)
Direction: END โ START (backward)
Characters Processed:
First processed: Last characters (at m,n)
Last processed: First characters (at 0,0)
Order: END โ START
But Supersequence Needs:
Order: START โ END
Solution: REVERSE!
Example visualization:
Processing order: [last char] ... [middle] ... [first char]
Built string: "...tsal ...elddim ...tsrif"
Reversed: "first... middle... last..."
This is natural for traceback! โ
Alternative: Build forward?
Possible but more complex
Would need recursion or different structure
Traceback + reverse is SIMPLER! โ
Complete Implementation with Detailed Comments
class Solution {
public String shortestCommonSupersequence(String str1, String str2) {
int m = str1.length();
int n = str2.length();
// PHASE 1: Build LCS DP table
// dp[i][j] = LCS length of first i chars of str1, first j chars of str2
int[][] dp = new int[m + 1][n + 1];
// Fill table with forward loops
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
// PATTERN 1: Characters match
// Both in LCS - came from diagonal
dp[i][j] = 1 + dp[i - 1][j - 1];
} else {
// PATTERN 2 or 3: Characters don't match
// Take better option
dp[i][j] = Math.max(dp[i - 1][j], // came from top
dp[i][j - 1]); // came from left
}
}
}
// LCS length = dp[m][n]
// Expected supersequence length = m + n - dp[m][n]
// PHASE 2: Traceback to build supersequence
StringBuilder result = new StringBuilder();
int i = m, j = n;
while (i > 0 && j > 0) {
if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
// PATTERN 1: Characters match (diagonal arrow)
// This character is SHARED (in LCS)
// Include ONCE in supersequence
result.append(str1.charAt(i - 1));
i--; // Move diagonal
j--;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
// PATTERN 2: Came from TOP (vertical arrow)
// str1[i-1] was SKIPPED in LCS
// str1[i-1] is UNIQUE to str1
// Must include it in supersequence
result.append(str1.charAt(i - 1));
i--; // Move up
} else {
// PATTERN 3: Came from LEFT (horizontal arrow)
// str2[j-1] was SKIPPED in LCS
// str2[j-1] is UNIQUE to str2
// Must include it in supersequence
result.append(str2.charAt(j - 1));
j--; // Move left
}
}
// PHASE 3: Add remaining characters
// If one string has more characters, add them all
while (i > 0) {
result.append(str1.charAt(i - 1));
i--;
}
while (j > 0) {
result.append(str2.charAt(j - 1));
j--;
}
// PHASE 4: Reverse (built backwards)
return result.reverse().toString();
}
}
Why This Algorithm Is Correct - The Proof
THEOREM: The traceback algorithm produces the shortest supersequence
PROOF (by loop invariant):
Invariant:
At position (i, j) during traceback,
result contains all characters from str1[i..m-1] and str2[j..n-1],
merged optimally using LCS structure.
Base case: (m, n)
result = "" (empty)
Must contain str1[m..m-1] and str2[n..n-1] (both empty ranges)
Invariant holds! โ
Inductive step:
Assume invariant holds at (i, j)
Show it holds after one step
Case 1: str1[i-1] == str2[j-1]
These chars match (from DP table construction)
They're SHARED in LCS
Including once serves BOTH strings
After step:
result += str1[i-1]
Move to (i-1, j-1)
New result contains:
- This character (serves both)
- All from str1[i..m-1] (already had)
- All from str2[j..n-1] (already had)
= All from str1[i-1..m-1] and str2[j-1..n-1] โ
Case 2: dp[i-1][j] > dp[i][j-1]
str1[i-1] was SKIPPED in LCS (from DP construction)
It's UNIQUE to str1
Must include separately
After step:
result += str1[i-1]
Move to (i-1, j)
New result contains:
- str1[i-1] (added)
- All from str1[i..m-1] (already had)
- All from str2[j..n-1] (already had)
= All from str1[i-1..m-1] and str2[j..n-1] โ
Case 3: dp[i][j-1] >= dp[i-1][j]
str2[j-1] was SKIPPED in LCS
It's UNIQUE to str2
Must include separately
(Similar proof to Case 2) โ
By induction, invariant holds throughout!
At (0, 0):
result contains all from str1[0..m-1] and str2[0..n-1]
Merged optimally (using LCS structure)
Length = m + n - LCS (by construction)
This is the shortest supersequence! QED โ
๐ Detailed Example - Complete Walkthrough
Example: str1 = "abac", str2 = "cab"
PHASE 1: Build LCS DP Table
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
m = 4 (abac), n = 3 (cab)
Final table:
"" c a b
"" 0 0 0 0
a 0 0 1 1
b 0 0 1 2
a 0 0 1 2
c 0 1 1 2
Key cells explained:
dp[1][2]: str1[0]='a', str2[1]='a' โ Match!
dp[1][2] = 1 + dp[0][1] = 1
Arrow: โ (diagonal)
dp[2][3]: str1[1]='b', str2[2]='b' โ Match!
dp[2][3] = 1 + dp[1][2] = 2
Arrow: โ (diagonal)
dp[4][1]: str1[3]='c', str2[0]='c' โ Match!
dp[4][1] = 1 + dp[3][0] = 1
Arrow: โ (diagonal)
dp[4][3]: str1[3]='c', str2[2]='b' โ Different
dp[3][3]=2 vs dp[4][2]=1
2 > 1 โ Came from TOP!
Arrow: โ (up)
LCS length = 2
LCS = "ab"
Expected supersequence length = 4 + 3 - 2 = 5
PHASE 2: Traceback
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Start: (4,3), result = ""
Step 1: At (4,3)
str1[3]='c', str2[2]='b' โ Different
dp[3][3]=2, dp[4][2]=1
2 > 1 โ Came from TOP!
Action: Add str1[3]='c' (unique to str1)
result = "c"
Move: (3,3) UP
Step 2: At (3,3)
str1[2]='a', str2[2]='b' โ Different
dp[2][3]=2, dp[3][2]=1
2 > 1 โ Came from TOP!
Action: Add str1[2]='a' (unique to str1)
result = "ca"
Move: (2,3) UP
Step 3: At (2,3)
str1[1]='b', str2[2]='b' โ MATCH! โ
Action: Add 'b' ONCE (shared in LCS)
result = "cab"
Move: (1,2) DIAGONAL
Step 4: At (1,2)
str1[0]='a', str2[1]='a' โ MATCH! โ
Action: Add 'a' ONCE (shared in LCS)
result = "caba"
Move: (0,1) DIAGONAL
Step 5: At (0,1)
i=0 โ str1 done
Remaining: str2[0]='c'
Action: Add 'c'
result = "cabac"
Done!
Built: "cabac" (already correct, no reverse needed!)
Verify:
str1 "abac" โ "cabac"? c[abac] โ
str2 "cab" โ "cabac"? [cab]ac โ
Length: 5 = 4 + 3 - 2 โ
Perfect! โ
๐ Complexity Analysis
All Approaches Compared
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ Approach โ Time โ Space โ Practical? โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโค
โ Recursion โ O(2^(m+n)) โ O(m+n) โ NO โ โ
โ (Approach 1) โ Exponential โ Stack โ Too slow โ
โ โ โ + strings โ โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโค
โ Bottom-Up DP โ O(mรn) โ O(mรn) โ YES โ โ
โ (Approach 3) โ Polynomial โ Table โ Optimal โ
โ โ โ + O(m+n) โ โ
โ โ โ result โ โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Note: "Memoization" approach would be same as bottom-up here
because we can't efficiently memoize strings!
Detailed Time Complexity
APPROACH 1 - Recursion:
At each step with non-matching chars:
Two recursive calls
Recurrence: T(m,n) = T(m-1,n) + T(m,n-1) + O(string concat)
Tree height: m + n
Branching factor: 2
Total nodes: O(2^(m+n))
Plus string operations at each node!
Total: O(2^(m+n) ร string operations) โ
APPROACH 3 - Bottom-Up DP:
Phase 1 - Build table:
Nested loops: i from 1 to m, j from 1 to n
Each cell: O(1) work
Total: O(m ร n) โ
Phase 2 - Traceback:
Start at (m,n), move to (0,0)
Each step: move i or j down by 1
Maximum steps: m + n
Total: O(m + n) โ
Phase 3 - Reverse:
String length: at most m + n
Total: O(m + n) โ
Overall: O(m ร n) + O(m + n) + O(m + n) = O(m ร n) โ
Detailed Space Complexity
APPROACH 1 - Recursion:
Call stack depth: O(m + n)
String storage at each level: O(m + n)
Total stack space: O((m + n)^2) โ
APPROACH 3 - Bottom-Up DP:
DP table: (m+1) ร (n+1) integers
Size: O(m ร n) โ
Result string: at most m + n characters
Size: O(m + n) โ
StringBuilder: O(m + n) โ
Total: O(m ร n) โ
SPACE OPTIMIZATION:
Could we use less space?
For LCS computation: YES!
Only need current and previous row
Space: O(min(m, n))
But for traceback: NO!
Need full table to reconstruct path
Trade-off: Can't optimize space without losing ability to build string!
For this problem: O(m ร n) is NECESSARY! โ
๐ฏ Pattern Recognition
Supersequence vs Subsequence - The Duality
MATHEMATICAL RELATIONSHIP:
LCS (Longest Common Subsequence):
= What they SHARE
= Intersection (in sequence form)
= Maximum overlap
SCS (Shortest Common Supersequence):
= What covers BOTH
= Union (in sequence form)
= Minimum container
Set Theory Analogy:
|A โช B| = |A| + |B| - |A โฉ B|
Sequence analog:
|SCS| = |str1| + |str2| - |LCS|
This duality is beautiful! โ
Visual:
str1: [unique to str1] [shared] [unique to str1]
str2: [unique to str2] [shared] [unique to str2]
LCS: [shared only]
SCS: [all parts combined, shared counted once]
The Problem Family
STRING TRANSFORMATION PROBLEMS:
All connected through LCS!
1. Problem 239: Delete Operations
Question: Min deletions to make equal
Formula: deletions = m + n - 2รLCS
Why: Keep LCS, delete rest from both
2. Problem 240: Shortest Supersequence
Question: Shortest string containing both
Formula: length = m + n - LCS
Why: Share LCS, add unique parts
3. Problem 241: LCS itself
Question: Longest common subsequence
The FOUNDATION for others!
4. Edit Distance:
Question: Min operations (insert/delete/replace)
Related: Uses similar DP structure
5. Longest Increasing Subsequence:
Question: LIS length
Related: Similar DP pattern
LCS is a BUILDING BLOCK! ๐
Understanding LCS deeply unlocks:
- String comparison (diff tools)
- DNA sequence alignment
- Version control systems
- Text similarity
- And more!
When to Recognize This Pattern
TRIGGER WORDS that suggest this approach:
รขล" "Shortest string containing both"
รขล" "Merge two sequences"
รขล" "Common subsequence"
รขล" "Minimum additions"
รขล" "Optimal combination"
PROBLEM STRUCTURE:
- Two strings/sequences
- Need to combine/transform
- Order preservation matters
- Optimization required (min/max)
SOLUTION PATTERN:
1. Find LCS (structure)
2. Use formula for length
3. Traceback for construction
This is a MASTER PATTERN! โ
๐ป Complete Working Code
class Solution {
public String shortestCommonSupersequence(String str1, String str2) {
int m = str1.length();
int n = str2.length();
// Build LCS table
int[][] dp = new int[m + 1][n + 1];
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
dp[i][j] = 1 + dp[i - 1][j - 1];
} else {
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
// Traceback to build supersequence
StringBuilder result = new StringBuilder();
int i = m, j = n;
while (i > 0 && j > 0) {
if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
// Shared - include once
result.append(str1.charAt(i - 1));
i--;
j--;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
// Unique to str1
result.append(str1.charAt(i - 1));
i--;
} else {
// Unique to str2
result.append(str2.charAt(j - 1));
j--;
}
}
// Add remaining
while (i > 0) {
result.append(str1.charAt(i - 1));
i--;
}
while (j > 0) {
result.append(str2.charAt(j - 1));
j--;
}
// Reverse
return result.reverse().toString();
}
public static void main(String[] args) {
Solution s = new Solution();
System.out.println(s.shortestCommonSupersequence("abac", "cab"));
// Expected: "cabac"
System.out.println(s.shortestCommonSupersequence("aaaaaaaa", "aaaaaaaa"));
// Expected: "aaaaaaaa"
}
}
๐ Key Insights Summary
The Learning Journey
CRAWL (Understanding):
รขล" What is supersequence?
รขล" Why LCS helps?
รขล" Mathematical proof
WALK (Building):
รขล" Recursive structure
รขล" DP table construction
รขล" Understanding "came from"
RUN (Mastery):
รขล" Traceback algorithm
รขล" Three arrow patterns
รขล" Complete implementation
Every step builds naturally! ๐ฏ
The Core Understanding
1. WHY LCS?
Shared chars counted once, not twice
Formula: m + n - LCS
2. WHY DP table?
Records decisions about sharing
Tells us structure
3. WHAT are arrows?
Diagonal: Shared (in LCS)
Up: Unique to str1
Left: Unique to str2
4. HOW traceback?
Follow arrows backwards
Build string from decisions
5. WHY reverse?
Built END โ START
Need START โ END
Mental Model
Think of it as MERGING with SHARING:
str1: a b a c
str2: c a b
LCS: "ab" (length 2)
Build supersequence:
- Include ALL from str1: a b a c
- Include UNIQUE from str2: c
- Share LCS chars
Result: c a b a c (length 5)
Visual:
c (unique to str2)
a (shared from LCS)
b (shared from LCS)
a (unique to str1)
c (unique to str1)
This MENTAL MODEL sticks! ๐ง
๐ Quick Revision
๐ฏ Core Formula
|SCS| = |str1| + |str2| - |LCS|
Why? Share LCS, add unique parts
โก Algorithm
1. Build LCS table O(mรn):
Match โ 1 + dp[i-1][j-1]
No match โ max(dp[i-1][j], dp[i][j-1])
2. Traceback from (m,n) to (0,0):
Match โ include once, diagonal
dp[i-1][j] > dp[i][j-1] โ str1 unique, up
dp[i][j-1] >= dp[i-1][j] โ str2 unique, left
3. Reverse result
๐ Three Patterns
โ Diagonal: Shared (LCS)
โ Up: Unique to str1
โ Left: Unique to str2
๐ช Quick Implementation
public class Solution {
public String shortestCommonSupersequence(String s1, String s2) {
// return recursive(s1, s2, 0, 0);
// // This is NOT RECOMMENDED and INEFFICIENT. Will get
// // memory limit though logically correct.
// String[][] memo = new String[s1.length() + 1][s2.length() + 1];
// return topDown(s1, s2, 0, 0, memo);
return bottomUp(s1, s2);
}
private String bottomUp(String s1, String s2) {
int len1 = s1.length();
int len2 = s2.length();
// Step 1: build LCS first.
// No need to go with recursion. Building LCS table is very intuitive.
// If we put on paper (fill table manually), we can code this very easily.
int[][] dp = new int[len1 + 1][len2 + 1];
for (int i = 1; i <= len1; i++) {
for (int j = 1; j <= len2; j++) {
if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
dp[i][j] = 1 + dp[i - 1][j - 1];
} else {
dp[i][j] = Math.max(dp[i][j - 1], dp[i - 1][j]);
}
}
}
// Step 2: traverse LCS table from back
StringBuilder sb = new StringBuilder();
// Step 3: start from last portion of the table.
int i = len1;
int j = len2;
while (i > 0 && j > 0) {
// step 4: if both chars equal, take that char in the result and
// move 1 step back diagonally.
if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
sb.append(s1.charAt(i - 1));
i--;
j--;
// if not equal, move to either left or top based on where we came from.
// Means, from where we came to the current point, we need to traverse back in
// the same path. While traversing back in that path, include the part that we
// are leaving here in the result.
// If we are going back to TOP, include char that is on LEFT.
// If we are going back to LEFT.include char that is on TOP.
} else if (dp[i][j - 1] < dp[i - 1][j]) {
sb.append(s1.charAt(i - 1));
i--;
} else {
sb.append(s2.charAt(j - 1));
j--;
}
}
// remaining chars
while (i > 0) {
sb.append(s1.charAt(i - 1));
i--;
}
while (j > 0) {
sb.append(s2.charAt(j - 1));
j--;
}
return sb.reverse().toString();
}
private String topDown(String s1, String s2, int i, int j, String[][] memo) {
if (i == s1.length() && j == s2.length()) {
return "";
}
if (i == s1.length()) {
return s2.substring(j, s2.length());
}
if (j == s2.length()) {
return s1.substring(i, s1.length());
}
if (memo[i][j] != null) {
return memo[i][j];
}
if (s1.charAt(i) == s2.charAt(j)) {
return s1.charAt(i) + topDown(s1, s2, i + 1, j + 1, memo);
}
String option1 = s1.charAt(i) + topDown(s1, s2, i + 1, j, memo);
String option2 = s2.charAt(j) + topDown(s1, s2, i, j + 1, memo);
return memo[i][j] = option1.length() < option2.length() ? option1 : option2;
}
private String recursive(String s1, String s2, int i, int j) {
// step 3: base case 1 (both strings reached end)
if (i == s1.length() && j == s2.length()) {
return "";
}
// step 4a: base case 1 (only s1 ended)
if (i == s1.length()) {
return s2.substring(j, s2.length());
}
// step 4b: base case 2 (only s2 ended)
if (j == s2.length()) {
return s1.substring(i, s1.length());
}
// step 1: if both chars at i and j match
// include that char in sequence and check for rest of the portions
if (s1.charAt(i) == s2.charAt(j)) {
return s1.charAt(i) + recursive(s1, s2, i + 1, j + 1);
}
// step 2: if chars do not match
// add char from s1 and check rest of the portions
String option1 = s1.charAt(i) + recursive(s1, s2, i + 1, j);
// add char from s2 and check rest of the portions
String option2 = s2.charAt(j) + recursive(s1, s2, i, j + 1);
return option1.length() < option2.length() ? option1 : option2;
}
public static void main(String[] args) {
Solution s = new Solution();
System.out.println(s.shortestCommonSupersequence("abac",
"cab").equals("cabac"));
System.out.println(s.shortestCommonSupersequence("aaaaaaaa",
"aaaaaaaa").equals("aaaaaaaa"));
}
}
๐ Congratulations!
You've mastered through baby steps: - โ CRAWL: Understanding supersequence - โ WALK: Building DP table - โ RUN: Complete traceback mastery - โ All three approaches explained - โ Mathematical proofs provided - โ Pattern connections made - โ Complete intuition built
True comprehensive mastery! ๐โจ