240. Shortest Common Supersequence

🔗 LeetCode Problem: 1092. Shortest Common Supersequence
📊 Difficulty: Hard
🏷️ Topics: Dynamic Programming, String, DP - Strings

Problem Statement

Given two strings str1 and str2, return the shortest string that has both str1 and str2 as subsequences. If there are multiple valid strings, return any of them.

A string s is a subsequence of string t if deleting some number of characters from t (possibly 0) results in the string s.

Example 1:

Input: str1 = "abac", str2 = "cab"
Output: "cabac"
Explanation: 
str1 = "abac" is a subsequence of "cabac" because we can delete the first "c".
str2 = "cab" is a subsequence of "cabac" because we can delete the last "ac".
The answer provided is the shortest such string that satisfies these properties.

Example 2:

Input: str1 = "aaaaaaaa", str2 = "aaaaaaaa"
Output: "aaaaaaaa"

Constraints: - 1 <= str1.length, str2.length <= 1000 - str1 and str2 consist of lowercase English letters.

🧒 Let's Build Understanding from Absolute Scratch

What Does This Problem REALLY Ask?

Forget "supersequence" for a moment. Let me understand what we're doing:

Given: str1 = "ab", str2 = "bc"

Goal: Find a string that CONTAINS both as subsequences

Let me try:

  "abc" → Can I find "ab"? a[b]c → Yes! ✓
          Can I find "bc"? a[bc] → Yes! ✓

So "abc" works! ✓

But wait, let me try another:

  "abbc" → Can I find "ab"? [ab]bc → Yes! ✓
           Can I find "bc"? ab[bc] → Yes! ✓

So "abbc" also works! ✓

Even "abbcc" works! And "aabbcc"! Many solutions!

INSIGHT: There are MANY valid supersequences!
         But we want the SHORTEST one! 🔑

For "ab" and "bc":
  - "abc" has length 3 ✓ (shortest!)
  - "abbc" has length 4
  - "abbcc" has length 5

The shortest is "abc"!

KEY INSIGHT: We're finding a string that's a "container" for both!

The Hidden Pattern - What Are We REALLY Finding?

Let me think about this differently:

Naive approach: Just concatenate!
  str1 = "ab"
  str2 = "bc"
  Concatenate: "ab" + "bc" = "abbc" (length 4)

But we found "abc" (length 3) works!

What happened?
  str1 = "ab"  → a b
  str2 = "bc"  →   b c

  They both have 'b'!

In "abc", we only included 'b' ONCE!
  a b c
  ↑ ↑ ↑
  | shared
  from both strings

EUREKA MOMENT:
  We can SHARE common characters!
  Instead of: a b + b c = a b b c (4 chars)
  We did: a b c (3 chars, sharing 'b')

But which characters can we share? 🤔

The characters that appear in BOTH strings IN ORDER!
This is... LONGEST COMMON SUBSEQUENCE! 🎯

🤔 Why Longest Common Subsequence? Let Me Prove It!

Discovering the Formula

Let me analyze with examples:

Example 1: str1 = "abac", str2 = "cab"
  Length of str1: 4
  Length of str2: 3

  What's common in order?
    str1 = "abac" → [ab]ac
    str2 = "cab" → c[ab]
    LCS = "ab" (length 2)

  What's the shortest supersequence?
    Result: "cabac" (length 5)

  Notice: 4 + 3 - 2 = 5! 🎯

Example 2: str1 = "abc", str2 = "def"
  No common characters!
  LCS = "" (length 0)

  Supersequence: "abcdef" (length 6)
  Check: 3 + 3 - 0 = 6! ✓

Example 3: str1 = "abc", str2 = "abc"
  Identical strings!
  LCS = "abc" (length 3)

  Supersequence: "abc" (length 3)
  Check: 3 + 3 - 3 = 3! ✓

FORMULA DISCOVERED:
  Shortest supersequence length = len(str1) + len(str2) - len(LCS)

But WHY does this work mathematically?

Mathematical Proof - Why The Formula Is Correct

THEOREM:
  |Shortest Supersequence| = |str1| + |str2| - |LCS|

PROOF (Part 1 - We CAN achieve this length):

Construction Proof:
  Let LCS have length k
  Let str1 have length m
  Let str2 have length n

  Strategy to build supersequence:
    1. Include ALL characters from str1: m characters
    2. Include characters from str2 NOT in LCS: (n - k) characters
    3. Total: m + (n - k) = m + n - k ✓

  Why does this contain str1?
    → We included every character from str1! ✓

  Why does this contain str2?
    → LCS characters come from str1 (in correct order)
    → Non-LCS characters we added from str2
    → Together they form complete str2! ✓

  Therefore: We CAN build supersequence with m + n - k characters! ✓

PROOF (Part 2 - We CANNOT do better):

Proof by Contradiction:
  Suppose there exists supersequence S with |S| < m + n - k

  S must contain all m characters from str1 (as subsequence)
  S must contain all n characters from str2 (as subsequence)

  Some characters in S serve BOTH strings (shared)
  Maximum sharing = LCS (by definition of "longest")

  Minimum characters needed:
    = characters needed for str1
    + characters needed for str2
    - characters that can be shared
    = m + n - k

  So |S| ≥ m + n - k

  Contradiction with assumption |S| < m + n - k! ✗

  Therefore: We CANNOT do better than m + n - k! ✓

QED: The formula m + n - LCS is OPTIMAL! ✓

The Intuition Behind This

Think of it like set theory:

Supersequence = Union of both strings (in sequence form)

Naive union: Just add everything
  |str1| + |str2| = m + n

Smart union: Don't duplicate the common part
  |str1| + |str2| - |common part|
  = m + n - LCS

This is exactly like the set formula:
  |A ∪ B| = |A| + |B| - |A ∩ B|

But for sequences!
  |Supersequence| = |str1| + |str2| - |LCS|

Beautiful mathematical parallel! 🎯

💡 Building the Solution Step by Step

Step 1: Reframe the Problem

ORIGINAL PROBLEM:
  "Find shortest string containing both str1 and str2 as subsequences"

REFRAMED PROBLEM:
  "Find longest common subsequence (LCS)"
  "Use formula: length = len(str1) + len(str2) - LCS"
  "Build the actual string using LCS structure"

Why this helps?
  LCS is a CLASSIC DP problem!
  We know how to solve it! ✓

But wait... the problem asks for the STRING, not just length!
  So we need to CONSTRUCT it!
  This requires more than just LCS length!
  We need to understand the LCS STRUCTURE! 🔑

Step 2: Understanding LCS Structure

When we compute LCS, we make DECISIONS:
  - If characters match → include in LCS
  - If they don't match → skip one

Example: str1 = "ab", str2 = "bc"

Decisions made:
  Position (0,0): 'a' vs 'b' → different, skip one
  Position (1,0): 'b' vs 'b' → match! include
  Position (1,1): 'b' vs 'c' → different, skip one

These decisions tell us:
  - Which characters are SHARED (in LCS)
  - Which characters are UNIQUE to each string

For supersequence:
  - SHARED characters: include ONCE
  - UNIQUE characters: must include separately

The DP table RECORDS these decisions! ✓

Step 3: How Do We Find LCS?

This is where DP comes in!

Define: LCS(i, j) = LCS of str1[i...] and str2[j...]

Base cases:
  If i == len(str1): return 0 (no characters left)
  If j == len(str2): return 0 (no characters left)

Recursive cases:

Case 1: str1[i] == str2[j]
  Characters match!
  They're BOTH in LCS!
  LCS(i, j) = 1 + LCS(i+1, j+1)

  Why? Include this character, solve rest!

Case 2: str1[i] != str2[j]
  Characters don't match!
  At least ONE must be skipped!

  Option A: Skip str1[i]
            LCS(i+1, j)

  Option B: Skip str2[j]
            LCS(i, j+1)

  LCS(i, j) = max(Option A, Option B)

  Why max? We want the LONGEST!

This is the foundation! ✓

🔴 Approach 1: Pure Recursion - Understanding the Pattern

The Recursive Insight for Building

I know LCS tells me which characters to share.

But how do I actually BUILD the supersequence?

Let me think recursively:

Compare str1[i] and str2[j]:

Case 1: str1[i] == str2[j]
  Same character!
  This is SHARED in both!
  Include ONCE in supersequence!

  Result: str1[i] + buildSupersequence(i+1, j+1)

  Why? Include the shared character, then solve rest!

Case 2: str1[i] != str2[j]
  Different characters!
  Both must be in supersequence!
  But which first?

  I don't know! Try BOTH:

  Option A: str1[i] first
            str1[i] + buildSupersequence(i+1, j)

  Option B: str2[j] first
            str2[j] + buildSupersequence(i, j+1)

  Return whichever is SHORTER!

Base cases:
  If i == len(str1): return str2[j...] (add remaining)
  If j == len(str2): return str1[i...] (add remaining)

This is a complete recursive definition! ✓

Visualizing the Recursion Tree

Example: str1 = "ab", str2 = "bc"

                    buildSCS(0,0)
                  str1[0]='a', str2[0]='b'
                   Different! Try both ↓

         'a' + buildSCS(1,0)      'b' + buildSCS(0,1)
         /                         \
    str1[1]='b'                 str1[0]='a'
    str2[0]='b'                 str2[1]='c'
    Match! ✓                    Different!

    'b' + buildSCS(2,1)      'a' + buildSCS(1,1)    'c' + buildSCS(0,2)

buildSCS(2,1):              buildSCS(1,1):          buildSCS(0,2):
i=2 (end of str1)           str1[1]='b'             j=2 (end of str2)
return str2[1..] = "c"      str2[1]='c'             return str1[0..] = "ab"
Result: "abc"               Different!              Result: "cab"

                            Try both...
                            Results: "abbc" or "abc"
                            Return: "abc"

Left path: "a" + "b" + "c" = "abc" (length 3) ✓
Right path: "b" + "a" + "c" = "bac" or similar...

Both find length 3! Return shorter/first!

OBSERVATION: Same (i,j) computed MULTIPLE times!
             Overlapping subproblems! 🔑

Implementation

class Solution {
    public String shortestCommonSupersequence(String str1, String str2) {
        return buildSCS(str1, str2, 0, 0);
    }

    private String buildSCS(String str1, String str2, int i, int j) {
        // Base case: reached end of str1
        if (i == str1.length()) {
            return str2.substring(j);
        }

        // Base case: reached end of str2
        if (j == str2.length()) {
            return str1.substring(i);
        }

        // Case 1: Characters match - include ONCE
        if (str1.charAt(i) == str2.charAt(j)) {
            return str1.charAt(i) + buildSCS(str1, str2, i + 1, j + 1);
        }

        // Case 2: Characters don't match - try BOTH
        String option1 = str1.charAt(i) + buildSCS(str1, str2, i + 1, j);
        String option2 = str2.charAt(j) + buildSCS(str1, str2, i, j + 1);

        // Return shorter one
        return option1.length() < option2.length() ? option1 : option2;
    }
}

Why This Is Too Slow

TIME COMPLEXITY ANALYSIS:

At each non-matching position:
  We make 2 recursive calls

Recursion tree grows exponentially!

                        (0,0)
                       /    \
                   (1,0)    (0,1)
                   /  \      /  \
               (2,0)(1,1)(1,1)(0,2)
                         ↑    ↑
                    SAME STATE called TWICE!

Number of unique states: m × n
But each state called MANY times!

Worst case: O(2^(m+n)) ❌

For m = n = 100:
  2^200 operations! ❌

SPACE COMPLEXITY:
  Call stack depth: O(m + n)
  Plus string concatenation overhead!

OBSERVATION:
  Same (i,j) computed multiple times!
  Overlapping subproblems!

This screams MEMOIZATION! 🔑

But wait... can we memoize STRINGS?

🟡 Approach 2: Memoization - The Challenge

The Problem with Memoizing Strings

Normal memoization pattern:
  memo[i][j] = result for state (i,j)

For this problem:
  memo[i][j] = supersequence string for str1[i..] and str2[j..]

But strings are LARGE objects!

Example: str1 = "abc...xyz" (1000 chars)
         str2 = "pqr...tuv" (1000 chars)

Number of states: 1000 × 1000 = 1,000,000
Each state stores a string of length ~2000

Memory needed: 1,000,000 × 2000 = 2 billion characters! ❌

String operations (concat, copy, compare) are expensive!

This is INEFFICIENT! 🤔

So how do we optimize?

The Key Insight: Separate Structure from Construction!

REALIZATION:

What if I separate the problem into TWO phases?

PHASE 1: Find the STRUCTURE (LCS)
  Compute: LCS length (just a number!)
  Store: DP table with INTEGER values
  Memory: m × n integers (efficient!)

PHASE 2: BUILD the string using the structure
  Use: The DP table decisions
  Method: Traceback!
  Time: Single pass O(m + n)

This is MUCH better! ✓

Why does this work?

The DP table tells us:
  - Which characters match (part of LCS)
  - Which characters don't match (unique to each)

By following the decisions in the table,
we can reconstruct the optimal supersequence! 🎯

Let me understand both phases deeply!

🟢 Approach 3: Bottom-Up DP - The Complete Journey

Phase 1: Build LCS DP Table - Understanding Each Cell

Let me build the LCS table for: str1 = "ab", str2 = "bc"

I'll understand not just VALUES but also MEANING!

Initial state:
      ""  b  c
  ""   0  0  0
  a    0  ?  ?
  b    0  ?  ?

Definition: dp[i][j] = LCS length of first i chars of str1, first j chars of str2

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cell dp[1][1]: Comparing "a" with "b"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

str1[0]='a', str2[0]='b' → Different! ✗

Question: What's LCS of "a" and "b"?
  No common characters!
  LCS = 0

How do I compute from previous cells?
  Option A: LCS of "" and "b" = dp[0][1] = 0
            (Skip 'a' from str1, keep 'b')

  Option B: LCS of "a" and "" = dp[1][0] = 0
            (Keep 'a' from str1, skip 'b')

  Take max: dp[1][1] = max(0, 0) = 0

INSIGHT:
  When chars don't match, LCS comes from:
    - Either skipping current char of str1, OR
    - Skipping current char of str2
  We take the BETTER option! ✓

Decision recorded: We could skip either (both give 0)

      ""  b  c
  ""   0  0  0
  a    0  0  ?
  b    0  ?  ?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cell dp[1][2]: Comparing "a" with "bc"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

str1[0]='a', str2[1]='c' → Different! ✗

LCS of "a" and "bc"?
  Still no common characters!
  LCS = 0

Options:
  Option A: LCS of "" and "bc" = dp[0][2] = 0
  Option B: LCS of "a" and "b" = dp[1][1] = 0

  dp[1][2] = max(0, 0) = 0

      ""  b  c
  ""   0  0  0
  a    0  0  0
  b    0  ?  ?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cell dp[2][1]: Comparing "ab" with "b"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

str1[1]='b', str2[0]='b' → MATCH! ✓✓✓

This is IMPORTANT!

LCS of "ab" and "b"?
  'b' is common! LCS = "b" (length 1)

How do I compute?
  Since they MATCH, both can be in LCS!

  LCS = 1 + LCS of strings WITHOUT these matching chars
  dp[2][1] = 1 + dp[1][0] = 1 + 0 = 1

INSIGHT:
  When chars MATCH:
    - They're BOTH in the LCS!
    - Add 1 to the LCS of remaining strings
    - Look at DIAGONAL cell (i-1, j-1)! ✓

Decision recorded: Include 'b' in LCS!

      ""  b  c
  ""   0  0  0
  a    0  0  0
  b    0  1  ?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cell dp[2][2]: Comparing "ab" with "bc"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

str1[1]='b', str2[1]='c' → Different! ✗

LCS of "ab" and "bc"?
  'b' is common! LCS = "b" (length 1)

Options:
  Option A: LCS of "a" and "bc" = dp[1][2] = 0
            (Skip 'b' from str1)

  Option B: LCS of "ab" and "b" = dp[2][1] = 1
            (Skip 'c' from str2)

  dp[2][2] = max(0, 1) = 1

Which option is better? Option B!

INTERPRETATION:
  dp[2][1] = 1 is better than dp[1][2] = 0
  So we DON'T skip 'b' from str1
  We DO skip 'c' from str2

Decision recorded: Skip str2[1]='c'!

      ""  b  c
  ""   0  0  0
  a    0  0  0
  b    0  1  1

LCS length = dp[2][2] = 1 ✓
LCS string = "b"

Understanding "Came From" - The Most Important Concept!

This is CRITICAL for building the supersequence!

When we computed dp[2][2] = 1:
  We had two options:
    dp[1][2] = 0 (from top)
    dp[2][1] = 1 (from left)

  We chose dp[2][1] because 1 > 0 (better LCS)

This choice means:
  "The LCS of 'ab' and 'bc' CAME FROM 'ab' and 'b'"
  "We DIDN'T need the 'c' from str2"
  "So 'c' is UNIQUE to str2, not shared!"

Geometrically:
  dp[2][2] ← dp[2][1]
  We moved LEFT (from column 1 to column 2)

This is what "came from left" means! 🔑

Let me visualize ALL cells and their sources:

      ""  b  c
  ""   0  0  0
  a    0  0→ 0→
         ↓  ↓
  b    0  1→ 1
         ↘

Arrows show WHERE each value came from:

dp[1][1] = 0: Came from top or left (both 0)
  Arrow: ↓ (let's say top)

dp[1][2] = 0: Came from left
  Arrow: → (left)

dp[2][1] = 1: Came from diagonal (match!)
  Arrow: ↘ (diagonal)

dp[2][2] = 1: Came from left (1 > 0)
  Arrow: → (left)

These arrows show the PATH of optimal decisions! ✓

WHY are these arrows important?

Because they tell us WHICH CHARACTERS are:
  - SHARED (diagonal arrows - in LCS)
  - UNIQUE to str1 (came from top - skipped from str1)
  - UNIQUE to str2 (came from left - skipped from str2)

This is the KEY to building the supersequence! 🎯

The Three Arrow Patterns - Complete Visual Understanding

When filling the DP table, THREE patterns emerge:

PATTERN 1: Diagonal Arrow ↘ (Characters MATCH)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  if (str1[i-1] == str2[j-1]):
    dp[i][j] = 1 + dp[i-1][j-1]

Visual:
        j-1  j
  i-1   X   
         ↘
  i         [i,j]

What it means:
  - These characters are THE SAME!
  - They're part of LCS (shared)
  - Value comes from diagonal

Example: str1[1]='b', str2[0]='b'
  Both are 'b' → Match!
  dp[2][1] = 1 + dp[1][0]
  Arrow: ↘ from (1,0) to (2,1)

PATTERN 2: Vertical Arrow ↓ (Came from TOP)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  if (dp[i-1][j] > dp[i][j-1]):
    dp[i][j] = dp[i-1][j]

Visual:
        j-1  j
  i-1   [X]
         ↓
  i     [i,j]

What it means:
  - Better LCS came from ABOVE
  - str1[i-1] was SKIPPED in LCS
  - str1[i-1] is UNIQUE to str1!

Example: If dp[1][2]=2 > dp[2][1]=1
  Value comes from top
  str1[1] was skipped
  Arrow: ↓ from (1,2) to (2,2)

PATTERN 3: Horizontal Arrow → (Came from LEFT)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  if (dp[i][j-1] >= dp[i-1][j]):
    dp[i][j] = dp[i][j-1]

Visual:
        j-1  j
  i-1   

  i     [X] ← [i,j]

What it means:
  - Better LCS came from LEFT
  - str2[j-1] was SKIPPED in LCS
  - str2[j-1] is UNIQUE to str2!

Example: At dp[2][2], dp[2][1]=1 >= dp[1][2]=0
  Value comes from left
  str2[1]='c' was skipped
  Arrow: → from (2,1) to (2,2)

SUMMARY:
  Diagonal ↘: Match (shared in LCS)
  Vertical ↓: Skip from str1 (unique to str1)
  Horizontal →: Skip from str2 (unique to str2)

These patterns are EVERYTHING for traceback! 🔑

Phase 2: Traceback - Following Arrows Backwards!

Now I have the DP table with its "arrows" (decisions).

To build the supersequence, I FOLLOW THE ARROWS BACKWARDS!

Why backwards?
  - DP table is built FORWARD (small to large)
  - Final answer is at dp[m][n] (bottom-right)
  - We trace back to dp[0][0] (top-left)
  - This is END → START (backwards!)

Start at dp[m][n] (full strings)
End at dp[0][0] (empty strings)

Let me trace for str1 = "ab", str2 = "bc":

Table with arrows:
      ""  b  c
  ""   0  0  0
  a    0  0→ 0→
         ↓  ↓
  b    0  1→ 1
         ↘

Current position: (2,2), result = ""

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: At dp[2][2] = 1
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Looking at current characters:
  str1[1]='b', str2[1]='c' → Different

Checking where value came from:
  dp[1][2]=0 (from top?)
  dp[2][1]=1 (from left?)

  1 > 0, so came from LEFT! ←

What does "came from left" mean?
  We moved from column 1 to column 2
  We added column 2 (str2[1]='c')
  But 'c' was SKIPPED in LCS (not shared)!

So 'c' is UNIQUE to str2!

Action: Add 'c' to supersequence
result = "c"
reason = "c is unique to str2, must include it"

Move: LEFT to dp[2][1]
  i stays 2, j becomes 1

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 2: At dp[2][1] = 1
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Looking at current characters:
  str1[1]='b', str2[0]='b' → MATCH! ✓

When characters match:
  They're in LCS (shared)!
  Value came from diagonal

This 'b' is SHARED between both strings!

Action: Add 'b' to supersequence (ONCE, not twice!)
result = "cb"
reason = "b is shared in LCS, include once"

Move: DIAGONAL to dp[1][0]
  i becomes 1, j becomes 0

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 3: At dp[1][0] = 0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

j=0 means we've finished with str2!
But i=1, so str1 still has characters left

Remaining: str1[0]='a'

Action: Add all remaining from str1
result = "cba"
reason = "str2 done, add rest of str1"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 4: Reverse the result!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

We built from END to START!
result = "cba"

Reverse: "abc" ✓

Verify:
  str1 "ab" in "abc"? [ab]c ✓
  str2 "bc" in "abc"? a[bc] ✓
  Length: 2 + 2 - 1 = 3 ✓

Perfect! ✓

Why Do We Reverse? - Complete Understanding

QUESTION: Why do we build backwards then reverse?

ANSWER: Because of how DP works!

DP Table Construction:
  Start: dp[0][0] (empty strings)
  End: dp[m][n] (full strings)
  Direction: START → END (forward)

DP Table Values:
  dp[0][0]: LCS of "" and "" = 0
  dp[m][n]: LCS of full str1 and full str2 = final answer

Traceback:
  Start: dp[m][n] (where the answer is!)
  End: dp[0][0] (base case)
  Direction: END → START (backward)

Characters Processed:
  First processed: Last characters (at m,n)
  Last processed: First characters (at 0,0)
  Order: END → START

But Supersequence Needs:
  Order: START → END

Solution: REVERSE!

Example visualization:
  Processing order: [last char] ... [middle] ... [first char]
  Built string: "...tsal ...elddim ...tsrif"
  Reversed: "first... middle... last..."

This is natural for traceback! ✓

Alternative: Build forward?
  Possible but more complex
  Would need recursion or different structure
  Traceback + reverse is SIMPLER! ✓

Complete Implementation with Detailed Comments

class Solution {
    public String shortestCommonSupersequence(String str1, String str2) {
        int m = str1.length();
        int n = str2.length();

        // PHASE 1: Build LCS DP table
        // dp[i][j] = LCS length of first i chars of str1, first j chars of str2
        int[][] dp = new int[m + 1][n + 1];

        // Fill table with forward loops
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
                    // PATTERN 1: Characters match
                    // Both in LCS - came from diagonal
                    dp[i][j] = 1 + dp[i - 1][j - 1];
                } else {
                    // PATTERN 2 or 3: Characters don't match
                    // Take better option
                    dp[i][j] = Math.max(dp[i - 1][j],  // came from top
                                       dp[i][j - 1]);   // came from left
                }
            }
        }

        // LCS length = dp[m][n]
        // Expected supersequence length = m + n - dp[m][n]

        // PHASE 2: Traceback to build supersequence
        StringBuilder result = new StringBuilder();
        int i = m, j = n;

        while (i > 0 && j > 0) {
            if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
                // PATTERN 1: Characters match (diagonal arrow)
                // This character is SHARED (in LCS)
                // Include ONCE in supersequence
                result.append(str1.charAt(i - 1));
                i--;  // Move diagonal
                j--;
            } else if (dp[i - 1][j] > dp[i][j - 1]) {
                // PATTERN 2: Came from TOP (vertical arrow)
                // str1[i-1] was SKIPPED in LCS
                // str1[i-1] is UNIQUE to str1
                // Must include it in supersequence
                result.append(str1.charAt(i - 1));
                i--;  // Move up
            } else {
                // PATTERN 3: Came from LEFT (horizontal arrow)
                // str2[j-1] was SKIPPED in LCS
                // str2[j-1] is UNIQUE to str2
                // Must include it in supersequence
                result.append(str2.charAt(j - 1));
                j--;  // Move left
            }
        }

        // PHASE 3: Add remaining characters
        // If one string has more characters, add them all
        while (i > 0) {
            result.append(str1.charAt(i - 1));
            i--;
        }

        while (j > 0) {
            result.append(str2.charAt(j - 1));
            j--;
        }

        // PHASE 4: Reverse (built backwards)
        return result.reverse().toString();
    }
}

Why This Algorithm Is Correct - The Proof

THEOREM: The traceback algorithm produces the shortest supersequence

PROOF (by loop invariant):

Invariant:
  At position (i, j) during traceback,
  result contains all characters from str1[i..m-1] and str2[j..n-1],
  merged optimally using LCS structure.

Base case: (m, n)
  result = "" (empty)
  Must contain str1[m..m-1] and str2[n..n-1] (both empty ranges)
  Invariant holds! ✓

Inductive step:
  Assume invariant holds at (i, j)
  Show it holds after one step

  Case 1: str1[i-1] == str2[j-1]
    These chars match (from DP table construction)
    They're SHARED in LCS
    Including once serves BOTH strings

    After step:
      result += str1[i-1]
      Move to (i-1, j-1)

    New result contains:
      - This character (serves both)
      - All from str1[i..m-1] (already had)
      - All from str2[j..n-1] (already had)
    = All from str1[i-1..m-1] and str2[j-1..n-1] ✓

  Case 2: dp[i-1][j] > dp[i][j-1]
    str1[i-1] was SKIPPED in LCS (from DP construction)
    It's UNIQUE to str1
    Must include separately

    After step:
      result += str1[i-1]
      Move to (i-1, j)

    New result contains:
      - str1[i-1] (added)
      - All from str1[i..m-1] (already had)
      - All from str2[j..n-1] (already had)
    = All from str1[i-1..m-1] and str2[j..n-1] ✓

  Case 3: dp[i][j-1] >= dp[i-1][j]
    str2[j-1] was SKIPPED in LCS
    It's UNIQUE to str2
    Must include separately

    (Similar proof to Case 2) ✓

By induction, invariant holds throughout!

At (0, 0):
  result contains all from str1[0..m-1] and str2[0..n-1]
  Merged optimally (using LCS structure)
  Length = m + n - LCS (by construction)

This is the shortest supersequence! QED ✓

🔍 Detailed Example - Complete Walkthrough

Example: str1 = "abac", str2 = "cab"

PHASE 1: Build LCS DP Table
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

m = 4 (abac), n = 3 (cab)

Final table:
      ""  c  a  b
  ""   0  0  0  0
  a    0  0  1  1
  b    0  0  1  2
  a    0  0  1  2
  c    0  1  1  2

Key cells explained:

dp[1][2]: str1[0]='a', str2[1]='a' → Match!
  dp[1][2] = 1 + dp[0][1] = 1
  Arrow: ↘ (diagonal)

dp[2][3]: str1[1]='b', str2[2]='b' → Match!
  dp[2][3] = 1 + dp[1][2] = 2
  Arrow: ↘ (diagonal)

dp[4][1]: str1[3]='c', str2[0]='c' → Match!
  dp[4][1] = 1 + dp[3][0] = 1
  Arrow: ↘ (diagonal)

dp[4][3]: str1[3]='c', str2[2]='b' → Different
  dp[3][3]=2 vs dp[4][2]=1
  2 > 1 → Came from TOP!
  Arrow: ↑ (up)

LCS length = 2
LCS = "ab"

Expected supersequence length = 4 + 3 - 2 = 5

PHASE 2: Traceback
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Start: (4,3), result = ""

Step 1: At (4,3)
  str1[3]='c', str2[2]='b' → Different
  dp[3][3]=2, dp[4][2]=1
  2 > 1 → Came from TOP!

  Action: Add str1[3]='c' (unique to str1)
  result = "c"
  Move: (3,3) UP

Step 2: At (3,3)
  str1[2]='a', str2[2]='b' → Different
  dp[2][3]=2, dp[3][2]=1
  2 > 1 → Came from TOP!

  Action: Add str1[2]='a' (unique to str1)
  result = "ca"
  Move: (2,3) UP

Step 3: At (2,3)
  str1[1]='b', str2[2]='b' → MATCH! ✓

  Action: Add 'b' ONCE (shared in LCS)
  result = "cab"
  Move: (1,2) DIAGONAL

Step 4: At (1,2)
  str1[0]='a', str2[1]='a' → MATCH! ✓

  Action: Add 'a' ONCE (shared in LCS)
  result = "caba"
  Move: (0,1) DIAGONAL

Step 5: At (0,1)
  i=0 → str1 done
  Remaining: str2[0]='c'

  Action: Add 'c'
  result = "cabac"

Done!

Built: "cabac" (already correct, no reverse needed!)

Verify:
  str1 "abac" ⊆ "cabac"? c[abac] ✓
  str2 "cab" ⊆ "cabac"? [cab]ac ✓
  Length: 5 = 4 + 3 - 2 ✓

Perfect! ✓

📊 Complexity Analysis

All Approaches Compared

┌──────────────┬─────────────┬──────────────┬──────────────┐
│ Approach     │ Time        │ Space        │ Practical?   │
├──────────────┼─────────────┼──────────────┼──────────────┤
│ Recursion    │ O(2^(m+n))  │ O(m+n)       │ NO ❌        │
│ (Approach 1) │ Exponential │ Stack        │ Too slow     │
│              │             │ + strings    │              │
├──────────────┼─────────────┼──────────────┼──────────────┤
│ Bottom-Up DP │ O(m×n)      │ O(m×n)       │ YES ✓        │
│ (Approach 3) │ Polynomial  │ Table        │ Optimal      │
│              │             │ + O(m+n)     │              │
│              │             │ result       │              │
└──────────────┴─────────────┴──────────────┴──────────────┘

Note: "Memoization" approach would be same as bottom-up here
because we can't efficiently memoize strings!

Detailed Time Complexity

APPROACH 1 - Recursion:

At each step with non-matching chars:
  Two recursive calls

Recurrence: T(m,n) = T(m-1,n) + T(m,n-1) + O(string concat)

Tree height: m + n
Branching factor: 2
Total nodes: O(2^(m+n))

Plus string operations at each node!

Total: O(2^(m+n) × string operations) ❌

APPROACH 3 - Bottom-Up DP:

Phase 1 - Build table:
  Nested loops: i from 1 to m, j from 1 to n
  Each cell: O(1) work
  Total: O(m × n) ✓

Phase 2 - Traceback:
  Start at (m,n), move to (0,0)
  Each step: move i or j down by 1
  Maximum steps: m + n
  Total: O(m + n) ✓

Phase 3 - Reverse:
  String length: at most m + n
  Total: O(m + n) ✓

Overall: O(m × n) + O(m + n) + O(m + n) = O(m × n) ✓

Detailed Space Complexity

APPROACH 1 - Recursion:

Call stack depth: O(m + n)
String storage at each level: O(m + n)
Total stack space: O((m + n)^2) ❌

APPROACH 3 - Bottom-Up DP:

DP table: (m+1) × (n+1) integers
  Size: O(m × n) ✓

Result string: at most m + n characters
  Size: O(m + n) ✓

StringBuilder: O(m + n) ✓

Total: O(m × n) ✓

SPACE OPTIMIZATION:

Could we use less space?
  For LCS computation: YES!
    Only need current and previous row
    Space: O(min(m, n))

  But for traceback: NO!
    Need full table to reconstruct path

Trade-off: Can't optimize space without losing ability to build string!

For this problem: O(m × n) is NECESSARY! ✓

🎯 Pattern Recognition

Supersequence vs Subsequence - The Duality

MATHEMATICAL RELATIONSHIP:

LCS (Longest Common Subsequence):
  = What they SHARE
  = Intersection (in sequence form)
  = Maximum overlap

SCS (Shortest Common Supersequence):
  = What covers BOTH
  = Union (in sequence form)
  = Minimum container

Set Theory Analogy:
  |A ∪ B| = |A| + |B| - |A ∩ B|

  Sequence analog:
  |SCS| = |str1| + |str2| - |LCS|

This duality is beautiful! ✓

Visual:
  str1: [unique to str1] [shared] [unique to str1]
  str2: [unique to str2] [shared] [unique to str2]

  LCS: [shared only]
  SCS: [all parts combined, shared counted once]

The Problem Family

STRING TRANSFORMATION PROBLEMS:

All connected through LCS!

1. Problem 239: Delete Operations
   Question: Min deletions to make equal
   Formula: deletions = m + n - 2×LCS
   Why: Keep LCS, delete rest from both

2. Problem 240: Shortest Supersequence
   Question: Shortest string containing both
   Formula: length = m + n - LCS
   Why: Share LCS, add unique parts

3. Problem 241: LCS itself
   Question: Longest common subsequence
   The FOUNDATION for others!

4. Edit Distance:
   Question: Min operations (insert/delete/replace)
   Related: Uses similar DP structure

5. Longest Increasing Subsequence:
   Question: LIS length
   Related: Similar DP pattern

LCS is a BUILDING BLOCK! 🔑

Understanding LCS deeply unlocks:
  - String comparison (diff tools)
  - DNA sequence alignment
  - Version control systems
  - Text similarity
  - And more!

When to Recognize This Pattern

TRIGGER WORDS that suggest this approach:

âœ" "Shortest string containing both"
âœ" "Merge two sequences"
âœ" "Common subsequence"
âœ" "Minimum additions"
âœ" "Optimal combination"

PROBLEM STRUCTURE:
  - Two strings/sequences
  - Need to combine/transform
  - Order preservation matters
  - Optimization required (min/max)

SOLUTION PATTERN:
  1. Find LCS (structure)
  2. Use formula for length
  3. Traceback for construction

This is a MASTER PATTERN! ✓

💻 Complete Working Code

class Solution {
  public String shortestCommonSupersequence(String str1, String str2) {
    int m = str1.length();
    int n = str2.length();

    // Build LCS table
    int[][] dp = new int[m + 1][n + 1];

    for (int i = 1; i <= m; i++) {
      for (int j = 1; j <= n; j++) {
        if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
          dp[i][j] = 1 + dp[i - 1][j - 1];
        } else {
          dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
        }
      }
    }

    // Traceback to build supersequence
    StringBuilder result = new StringBuilder();
    int i = m, j = n;

    while (i > 0 && j > 0) {
      if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
        // Shared - include once
        result.append(str1.charAt(i - 1));
        i--;
        j--;
      } else if (dp[i - 1][j] > dp[i][j - 1]) {
        // Unique to str1
        result.append(str1.charAt(i - 1));
        i--;
      } else {
        // Unique to str2
        result.append(str2.charAt(j - 1));
        j--;
      }
    }

    // Add remaining
    while (i > 0) {
      result.append(str1.charAt(i - 1));
      i--;
    }

    while (j > 0) {
      result.append(str2.charAt(j - 1));
      j--;
    }

    // Reverse
    return result.reverse().toString();
  }

  public static void main(String[] args) {
    Solution s = new Solution();

    System.out.println(s.shortestCommonSupersequence("abac", "cab"));
    // Expected: "cabac"

    System.out.println(s.shortestCommonSupersequence("aaaaaaaa", "aaaaaaaa"));
    // Expected: "aaaaaaaa"
  }
}

🔑 Key Insights Summary

The Learning Journey

CRAWL (Understanding):
  âœ" What is supersequence?
  âœ" Why LCS helps?
  âœ" Mathematical proof

WALK (Building):
  âœ" Recursive structure
  âœ" DP table construction
  âœ" Understanding "came from"

RUN (Mastery):
  âœ" Traceback algorithm
  âœ" Three arrow patterns
  âœ" Complete implementation

Every step builds naturally! 🎯

The Core Understanding

1. WHY LCS?
   Shared chars counted once, not twice
   Formula: m + n - LCS

2. WHY DP table?
   Records decisions about sharing
   Tells us structure

3. WHAT are arrows?
   Diagonal: Shared (in LCS)
   Up: Unique to str1
   Left: Unique to str2

4. HOW traceback?
   Follow arrows backwards
   Build string from decisions

5. WHY reverse?
   Built END → START
   Need START → END

Mental Model

Think of it as MERGING with SHARING:

str1: a b a c
str2:   c a b

LCS: "ab" (length 2)

Build supersequence:
  - Include ALL from str1: a b a c
  - Include UNIQUE from str2: c
  - Share LCS chars

Result: c a b a c (length 5)

Visual:
  c (unique to str2)
  a (shared from LCS)
  b (shared from LCS)
  a (unique to str1)
  c (unique to str1)

This MENTAL MODEL sticks! 🧠

📝 Quick Revision

🎯 Core Formula

|SCS| = |str1| + |str2| - |LCS|

Why? Share LCS, add unique parts

⚡ Algorithm

1. Build LCS table O(m×n):
   Match → 1 + dp[i-1][j-1]
   No match → max(dp[i-1][j], dp[i][j-1])

2. Traceback from (m,n) to (0,0):
   Match → include once, diagonal
   dp[i-1][j] > dp[i][j-1] → str1 unique, up
   dp[i][j-1] >= dp[i-1][j] → str2 unique, left

3. Reverse result

🔑 Three Patterns

↘ Diagonal: Shared (LCS)
↑ Up: Unique to str1
← Left: Unique to str2

🎪 Quick Implementation

public class Solution {
  public String shortestCommonSupersequence(String s1, String s2) {
    // return recursive(s1, s2, 0, 0);

    // // This is NOT RECOMMENDED and INEFFICIENT. Will get
    // // memory limit though logically correct.
    // String[][] memo = new String[s1.length() + 1][s2.length() + 1];
    // return topDown(s1, s2, 0, 0, memo);

    return bottomUp(s1, s2);
  }

  private String bottomUp(String s1, String s2) {
    int len1 = s1.length();
    int len2 = s2.length();

    // Step 1: build LCS first.
    // No need to go with recursion. Building LCS table is very intuitive.
    // If we put on paper (fill table manually), we can code this very easily.
    int[][] dp = new int[len1 + 1][len2 + 1];
    for (int i = 1; i <= len1; i++) {
      for (int j = 1; j <= len2; j++) {
        if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
          dp[i][j] = 1 + dp[i - 1][j - 1];
        } else {
          dp[i][j] = Math.max(dp[i][j - 1], dp[i - 1][j]);
        }
      }
    }

    // Step 2: traverse LCS table from back
    StringBuilder sb = new StringBuilder();

    // Step 3: start from last portion of the table.
    int i = len1;
    int j = len2;

    while (i > 0 && j > 0) {
      // step 4: if both chars equal, take that char in the result and
      // move 1 step back diagonally.
      if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
        sb.append(s1.charAt(i - 1));
        i--;
        j--;
        // if not equal, move to either left or top based on where we came from.
        // Means, from where we came to the current point, we need to traverse back in
        // the same path. While traversing back in that path, include the part that we
        // are leaving here in the result.
        // If we are going back to TOP, include char that is on LEFT.
        // If we are going back to LEFT.include char that is on TOP.
      } else if (dp[i][j - 1] < dp[i - 1][j]) {
        sb.append(s1.charAt(i - 1));
        i--;
      } else {
        sb.append(s2.charAt(j - 1));
        j--;
      }
    }

    // remaining chars
    while (i > 0) {
      sb.append(s1.charAt(i - 1));
      i--;
    }

    while (j > 0) {
      sb.append(s2.charAt(j - 1));
      j--;
    }

    return sb.reverse().toString();
  }

  private String topDown(String s1, String s2, int i, int j, String[][] memo) {
    if (i == s1.length() && j == s2.length()) {
      return "";
    }

    if (i == s1.length()) {
      return s2.substring(j, s2.length());
    }

    if (j == s2.length()) {
      return s1.substring(i, s1.length());
    }

    if (memo[i][j] != null) {
      return memo[i][j];
    }

    if (s1.charAt(i) == s2.charAt(j)) {
      return s1.charAt(i) + topDown(s1, s2, i + 1, j + 1, memo);
    }

    String option1 = s1.charAt(i) + topDown(s1, s2, i + 1, j, memo);
    String option2 = s2.charAt(j) + topDown(s1, s2, i, j + 1, memo);

    return memo[i][j] = option1.length() < option2.length() ? option1 : option2;
  }

  private String recursive(String s1, String s2, int i, int j) {
    // step 3: base case 1 (both strings reached end)
    if (i == s1.length() && j == s2.length()) {
      return "";
    }

    // step 4a: base case 1 (only s1 ended)
    if (i == s1.length()) {
      return s2.substring(j, s2.length());
    }

    // step 4b: base case 2 (only s2 ended)
    if (j == s2.length()) {
      return s1.substring(i, s1.length());
    }

    // step 1: if both chars at i and j match
    // include that char in sequence and check for rest of the portions
    if (s1.charAt(i) == s2.charAt(j)) {
      return s1.charAt(i) + recursive(s1, s2, i + 1, j + 1);
    }

    // step 2: if chars do not match
    // add char from s1 and check rest of the portions
    String option1 = s1.charAt(i) + recursive(s1, s2, i + 1, j);
    // add char from s2 and check rest of the portions
    String option2 = s2.charAt(j) + recursive(s1, s2, i, j + 1);

    return option1.length() < option2.length() ? option1 : option2;
  }

  public static void main(String[] args) {
    Solution s = new Solution();
    System.out.println(s.shortestCommonSupersequence("abac",
        "cab").equals("cabac"));
    System.out.println(s.shortestCommonSupersequence("aaaaaaaa",
        "aaaaaaaa").equals("aaaaaaaa"));
  }
}

🎉 Congratulations!

You've mastered through baby steps: - ✅ CRAWL: Understanding supersequence - ✅ WALK: Building DP table - ✅ RUN: Complete traceback mastery - ✅ All three approaches explained - ✅ Mathematical proofs provided - ✅ Pattern connections made - ✅ Complete intuition built

True comprehensive mastery! 🚀✨