43. Substring with Concatenation of All Words

🔗 LeetCode Problem: 30. Substring with Concatenation of All Words
📊 Difficulty: Hard
🏷️ Topics: Hash Table, String, Sliding Window

Problem Statement

You are given a string s and an array of strings words. All the strings of words are of the same length.

A concatenated substring in s is a substring that contains all the strings of any permutation of words concatenated.

For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated substring because it is not the concatenation of any permutation of words.

Return the starting indices of all the concatenated substrings in s. You can return the answer in any order.

Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: 
The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words.
The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: There is no concatenated substring.

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]

Constraints: - 1 <= s.length <= 10^4 - 1 <= words.length <= 5000 - 1 <= words[i].length <= 30 - s and words[i] consist of lowercase English letters.

🎨 Visual Understanding

The Problem Visualized

s = "barfoothefoobarman", words = ["foo","bar"]

Each word has length 3
Total concatenation length = 2 words × 3 = 6

Check substrings of length 6:

Index 0: "barfoo" 
         "bar" + "foo" = words ✓ (permutation)

Index 1: "arfoot"
         Not valid (can't split into valid words)

Index 2: "rfooth"
         Not valid

Index 9: "foobar"
         "foo" + "bar" = words ✓ (permutation)

Answer: [0, 9]

Key Observation: Fixed-Size Window with Word Matching

This is a HARD variation of sliding window because:

1. Fixed window size = wordLen × numWords
2. Must split window into words of exact length
3. Each word must match exactly (with frequency)
4. All words in `words` must be used exactly once

Example:
s = "barfoothefoobarman"
words = ["foo", "bar"]
wordLen = 3
windowSize = 3 × 2 = 6

Window [0:6]: "barfoo"
Split: ["bar", "foo"]
Match words? {"bar":1, "foo":1} = words ✓

Window [1:7]: "arfoot"
Split: ["arf", "oot"]
Match words? {"arf":1, "oot":1} ≠ words ✗

Visual Process

s = "barfoothefoobarman", words = ["foo","bar"]
wordLen = 3, numWords = 2, windowSize = 6

words frequency: {foo:1, bar:1}

Position 0: "barfoo"
  Split: "bar", "foo"
  Frequency: {bar:1, foo:1}
  Match? YES ✓ → Add 0 to result

Position 1: "arfoot"
  Split: "arf", "oot"
  Frequency: {arf:1, oot:1}
  Match? NO ✗

Position 2: "rfooth"
  Split: "rfo", "oth"
  Match? NO ✗

...continue checking...

Position 9: "foobar"
  Split: "foo", "bar"
  Frequency: {foo:1, bar:1}
  Match? YES ✓ → Add 9 to result

Result: [0, 9]

🎯 Approach 1: Brute Force

The Most Natural Thinking 💭

Core Logic:

Check every possible starting position:
1. For each index i in s
2. Extract substring of length (wordLen × numWords)
3. Split into words of wordLen
4. Check if words match (with frequency)
5. If match, add i to result

Why This Works: - Checks all possibilities - Guaranteed to find all answers - Easy to understand

Why This Fails Requirements: - Time complexity O(n × m × wordLen) where n = s.length, m = words.length - Recounts frequencies for every position - Too slow for large inputs - Not optimal for interviews

Implementation

public List<Integer> findSubstring(String s, String[] words) {
    List<Integer> result = new ArrayList<>();
    if (s == null || s.length() == 0 || words == null || words.length == 0) {
        return result;
    }

    int wordLen = words[0].length();
    int numWords = words.length;
    int windowSize = wordLen * numWords;

    // Build word frequency map
    Map<String, Integer> wordCount = new HashMap<>();
    for (String word : words) {
        wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
    }

    // Check every position
    for (int i = 0; i <= s.length() - windowSize; i++) {
        if (isMatch(s, i, wordLen, numWords, wordCount)) {
            result.add(i);
        }
    }

    return result;
}

private boolean isMatch(String s, int start, int wordLen, int numWords, 
                        Map<String, Integer> wordCount) {
    Map<String, Integer> seen = new HashMap<>();

    for (int j = 0; j < numWords; j++) {
        int wordStart = start + j * wordLen;
        String word = s.substring(wordStart, wordStart + wordLen);

        seen.put(word, seen.getOrDefault(word, 0) + 1);

        // Check if word is valid and not over-counted
        if (!wordCount.containsKey(word) || 
            seen.get(word) > wordCount.get(word)) {
            return false;
        }
    }

    return true;
}

⏰ Time: O(n × m × wordLen) - n positions, each checks m words
💾 Space: O(m) - for frequency maps

✅ Approach 2: Sliding Window (Optimal)

The Breakthrough Insight 💡

Core Logic:

Use sliding window with word-by-word movement:

Key optimization:
Instead of checking every position, we only need to check
wordLen different starting positions (0, 1, 2, ..., wordLen-1)

For each starting offset:
1. Build a sliding window of numWords words
2. Slide word by word (not character by character)
3. Maintain word frequency in current window
4. Check if frequency matches

This reduces redundant work!

Visual Explanation:

s = "barfoothefoobarman", words = ["foo","bar"]
wordLen = 3, numWords = 2

Instead of checking positions 0,1,2,3,4,5,6,...
We check 3 separate sequences:

Offset 0: positions 0, 3, 6, 9, 12, 15
  Window at 0: "bar" + "foo" = "barfoo" ✓
  Window at 3: "foo" + "the" = "foothe" ✗
  Window at 6: "the" + "foo" = "thefoo" ✗
  Window at 9: "foo" + "bar" = "foobar" ✓

Offset 1: positions 1, 4, 7, 10, 13, 16
  All invalid

Offset 2: positions 2, 5, 8, 11, 14, 17
  All invalid

This way we only move by wordLen, not by 1!

Why This Works:

Any valid concatenation must start at some position i.
If we check offsets 0, 1, 2, ..., wordLen-1,
we cover all possible starting positions.

Within each offset, we slide by full words,
which is more efficient than character-by-character.

Time saved:
- Brute force: n positions × m words = O(n × m)
- Optimized: wordLen offsets × (n/wordLen) positions = O(n × wordLen)
- Since wordLen is usually small (≤ 30), this is much better!

Implementation

public List<Integer> findSubstring(String s, String[] words) {
    List<Integer> result = new ArrayList<>();
    if (s == null || s.length() == 0 || words == null || words.length == 0) {
        return result;
    }

    int wordLen = words[0].length();
    int numWords = words.length;
    int windowSize = wordLen * numWords;

    if (s.length() < windowSize) {
        return result;
    }

    // Build word frequency map
    Map<String, Integer> wordCount = new HashMap<>();
    for (String word : words) {
        wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
    }

    // Check wordLen different starting offsets
    for (int offset = 0; offset < wordLen; offset++) {
        slidingWindow(s, offset, wordLen, numWords, wordCount, result);
    }

    return result;
}

private void slidingWindow(String s, int offset, int wordLen, int numWords,
                           Map<String, Integer> wordCount, List<Integer> result) {
    Map<String, Integer> windowCount = new HashMap<>();
    int wordsUsed = 0;

    // Process words starting from offset
    for (int i = offset; i <= s.length() - wordLen; i += wordLen) {
        String word = s.substring(i, i + wordLen);

        // Expand window
        if (wordCount.containsKey(word)) {
            windowCount.put(word, windowCount.getOrDefault(word, 0) + 1);
            wordsUsed++;

            // Shrink window if word is over-counted
            while (windowCount.get(word) > wordCount.get(word)) {
                int leftIdx = i - (wordsUsed - 1) * wordLen;
                String leftWord = s.substring(leftIdx, leftIdx + wordLen);
                windowCount.put(leftWord, windowCount.get(leftWord) - 1);
                wordsUsed--;
            }

            // Check if we have a valid window
            if (wordsUsed == numWords) {
                result.add(i - (numWords - 1) * wordLen);
            }
        } else {
            // Invalid word, reset window
            windowCount.clear();
            wordsUsed = 0;
        }
    }
}

⏰ Time: O(n × wordLen) - wordLen offsets, each processes n chars
💾 Space: O(m) - for frequency maps

🔍 Step-by-Step Execution

Example: s = "barfoothefoobarman", words = ["foo","bar"]

Initial Setup:
═══════════════════════════════════════════════════════════════════
s = "barfoothefoobarman"
words = ["foo", "bar"]
wordLen = 3
numWords = 2
windowSize = 6

wordCount = {foo:1, bar:1}


OFFSET 0: Check positions 0, 3, 6, 9, 12, 15
═══════════════════════════════════════════════════════════════════

Position 0: word = "bar"
  windowCount = {bar:1}
  wordsUsed = 1
  Valid? wordsUsed (1) < numWords (2)

Position 3: word = "foo"
  windowCount = {bar:1, foo:1}
  wordsUsed = 2
  Valid? wordsUsed (2) == numWords (2) ✓

  Starting index = 3 - (2-1)×3 = 0
  Add 0 to result ✓

Position 6: word = "the"
  "the" not in wordCount
  Reset: windowCount = {}, wordsUsed = 0

Position 9: word = "foo"
  windowCount = {foo:1}
  wordsUsed = 1

Position 12: word = "bar"
  windowCount = {foo:1, bar:1}
  wordsUsed = 2
  Valid? wordsUsed (2) == numWords (2) ✓

  Starting index = 12 - (2-1)×3 = 9
  Add 9 to result ✓

Position 15: word = "man"
  "man" not in wordCount
  Reset


OFFSET 1: Check positions 1, 4, 7, 10, 13, 16
═══════════════════════════════════════════════════════════════════

Position 1: word = "arf"
  "arf" not in wordCount
  Reset

Position 4: word = "oot"
  "oot" not in wordCount
  Reset

...all invalid


OFFSET 2: Check positions 2, 5, 8, 11, 14, 17
═══════════════════════════════════════════════════════════════════

Position 2: word = "rfo"
  "rfo" not in wordCount
  Reset

...all invalid


═══════════════════════════════════════════════════════════════════
🎯 FINAL RESULT: [0, 9]
═══════════════════════════════════════════════════════════════════

Example 2: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]

Setup:
wordLen = 4, numWords = 4, windowSize = 16
wordCount = {word:2, good:1, best:1}

Need to find: any permutation of ["word","good","best","word"]

OFFSET 0: Check positions 0, 4, 8, 12, 16, 20
═══════════════════════════════════════════════════════════════════

Position 0: "word" → windowCount = {word:1}, wordsUsed = 1
Position 4: "good" → windowCount = {word:1, good:1}, wordsUsed = 2
Position 8: "good" → windowCount = {word:1, good:2}, wordsUsed = 3
  "good" over-counted! good:2 > wordCount[good]:1
  Shrink: remove "word" from left
  windowCount = {good:2}, wordsUsed = 2
  Still over-counted!
  Shrink: remove "good" from left
  windowCount = {good:1}, wordsUsed = 1

Position 12: "good" → windowCount = {good:2}, wordsUsed = 2
  Over-counted again, shrink...

...no valid concatenation found

Result: []

🔍 Edge Cases

Case 1: Empty string
Input: s = "", words = ["foo"]
Output: []

Case 2: Words longer than string
Input: s = "bar", words = ["barr"]
Output: []

Case 3: Single word
Input: s = "barfoo", words = ["bar"]
Output: [0]

Case 4: Duplicate words
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","good"]
Output: [8]
Explanation: "goodgoodbestword"

Case 5: All same words
Input: s = "aaaaaaaa", words = ["aa","aa","aa"]
Output: [0,1,2]

Case 6: No match
Input: s = "barfoo", words = ["foo","bar","the"]
Output: []

Case 7: Entire string matches
Input: s = "foobar", words = ["foo","bar"]
Output: [0]

Case 8: Multiple overlapping matches
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]

Common Mistakes

Mistake 1: Not handling duplicate words
❌ Wrong: 
    // Using Set instead of frequency map
    Set<String> words = new HashSet<>(Arrays.asList(words));
✓ Right: 
    Map<String, Integer> wordCount = new HashMap<>();
    for (String word : words) {
        wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
    }
Reason: Words can appear multiple times

Mistake 2: Checking every character position
❌ Wrong: 
    for (int i = 0; i < s.length(); i++) {
        // Check every position
    }
✓ Right: 
    for (int offset = 0; offset < wordLen; offset++) {
        // Only check wordLen offsets
    }
Reason: Optimize by sliding word-by-word

Mistake 3: Not resetting on invalid word
❌ Wrong: 
    if (!wordCount.containsKey(word)) {
        continue;  // Just skip
    }
✓ Right: 
    if (!wordCount.containsKey(word)) {
        windowCount.clear();
        wordsUsed = 0;
    }
Reason: Invalid word breaks the concatenation

Mistake 4: Wrong starting index calculation
❌ Wrong: 
    result.add(i);  // Current position
✓ Right: 
    result.add(i - (numWords - 1) * wordLen);
Reason: Need start of window, not end

Mistake 5: Not handling over-counting
❌ Wrong: 
    windowCount.put(word, windowCount.get(word) + 1);
    // Don't check if over-counted
✓ Right: 
    windowCount.put(word, windowCount.get(word) + 1);
    while (windowCount.get(word) > wordCount.get(word)) {
        // Shrink from left
    }
Reason: Must maintain exact frequencies

Mistake 6: Wrong window size check
❌ Wrong: 
    if (s.length() < wordLen) return result;
✓ Right: 
    if (s.length() < wordLen * numWords) return result;
Reason: Need space for all words

🎯 Key Takeaways

⚡ Algorithm Comparison

Approach 1: Brute Force
  Time: O(n × m × wordLen)
  Space: O(m)
  Use: Only for understanding

Approach 2: Sliding Window (RECOMMENDED)
  Time: O(n × wordLen)
  Space: O(m)
  Use: Optimal solution, best for interviews

🔑 The Core Insight

Problem Breakdown:
═══════════════════════════════════════════════════════════════

This is a HARD sliding window problem because:

1. Fixed window size (wordLen × numWords)
2. Must split into exact-length words
3. Word frequency matching required
4. All words must be used exactly once

Key Optimization:
═══════════════════════════════════════════════════════════════

Instead of checking every position (0,1,2,3,...):
Check only wordLen different offsets (0 to wordLen-1)

For each offset, slide by full words:
- Offset 0: 0, wordLen, 2×wordLen, 3×wordLen, ...
- Offset 1: 1, 1+wordLen, 1+2×wordLen, ...
- Offset 2: 2, 2+wordLen, 2+2×wordLen, ...

This covers all positions but slides more efficiently!

Algorithm Template:
═══════════════════════════════════════════════════════════════

for offset in 0 to wordLen-1:
    windowCount = {}
    wordsUsed = 0

    for i from offset by wordLen:
        word = s[i:i+wordLen]

        if word in wordCount:
            add word to windowCount
            wordsUsed++

            // Shrink if over-counted
            while windowCount[word] > wordCount[word]:
                remove left word
                wordsUsed--

            // Check if valid
            if wordsUsed == numWords:
                add starting index to result
        else:
            // Invalid word, reset
            clear windowCount
            wordsUsed = 0

Key Techniques:
═══════════════════════════════════════════════════════════════

1. Word frequency map (not character)
2. Slide by wordLen, not by 1
3. Multiple offset starting points
4. Shrink window when over-counting
5. Reset window on invalid word
6. Calculate starting index correctly

🎯 Pattern Recognition

Problem Type: Concatenation Matching with Permutation
Core Pattern: Fixed-Size Sliding Window + Word Frequency Matching

When to Apply:
✓ Finding all occurrences of concatenations
✓ Words can be in any order (permutation)
✓ All words same length
✓ Must use each word exact number of times
✓ Want O(n) time (with small constant)

Unique Characteristics:
- Works with WORDS not characters
- Slides by wordLen not 1
- Checks wordLen different offsets
- Frequency matching at word level

Comparison to Similar Problems:
┌────────────────────────────────┬─────────────────────────┐
│ Find All Anagrams (LC 438)     │ This Problem (LC 30)    │
├────────────────────────────────┼─────────────────────────┤
│ Character frequency            │ Word frequency          │
│ Slide by 1                     │ Slide by wordLen        │
│ Single offset                  │ Multiple offsets        │
│ Pattern length fixed           │ Total length fixed      │
└────────────────────────────────┴─────────────────────────┘

Related Problems:
- Find All Anagrams in a String (LC 438)
- Permutation in String (LC 567)
- Minimum Window Substring (LC 76)

🧠 Interview Strategy

Step 1: "Need to find all concatenations of word permutations"
Step 2: "Words are same length, any order allowed"
Step 3: "Use sliding window with word-level matching"
Step 4: "Optimize: slide by wordLen, check wordLen offsets"
Step 5: "Track word frequencies in window"
Step 6: "Shrink when over-counted, reset on invalid"

Key Points to Mention:
- Fixed window size = wordLen × numWords
- Slide by full words, not characters
- Check wordLen different offsets (optimization)
- Use frequency maps for word counting
- Handle duplicates correctly
- Shrink window when word over-counted
- Reset window on invalid word
- Calculate starting index: i - (numWords-1) × wordLen
- Time: O(n × wordLen), much better than O(n × m)
- Space: O(m) for word frequencies

Why Multiple Offsets:
"Any valid concatenation starts at some position.
By checking offsets 0, 1, ..., wordLen-1 and sliding
by wordLen within each offset, we cover all positions
more efficiently."

This is a HARD problem - focus on correctness first! 🏆

📝 Quick Revision Notes

🎯 Core Concept:

Sliding window at WORD level, not character level. Check wordLen offsets, slide by wordLen. Match word frequencies.

⚡ Quick Implementation (Simplified):

public List<Integer> findSubstring(String s, String[] words) {
    List<Integer> result = new ArrayList<>();
    if (s.length() == 0 || words.length == 0) return result;

    int wordLen = words[0].length();
    int numWords = words.length;
    int windowSize = wordLen * numWords;

    // Build word frequency map
    Map<String, Integer> wordCount = new HashMap<>();
    for (String word : words) {
        wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
    }

    // Check wordLen different offsets
    for (int offset = 0; offset < wordLen; offset++) {
        Map<String, Integer> windowCount = new HashMap<>();
        int wordsUsed = 0;

        for (int i = offset; i <= s.length() - wordLen; i += wordLen) {
            String word = s.substring(i, i + wordLen);

            if (wordCount.containsKey(word)) {
                windowCount.put(word, windowCount.getOrDefault(word, 0) + 1);
                wordsUsed++;

                // Shrink if over-counted
                while (windowCount.get(word) > wordCount.get(word)) {
                    int leftIdx = i - (wordsUsed - 1) * wordLen;
                    String leftWord = s.substring(leftIdx, leftIdx + wordLen);
                    windowCount.put(leftWord, windowCount.get(leftWord) - 1);
                    wordsUsed--;
                }

                // Check if valid
                if (wordsUsed == numWords) {
                    result.add(i - (numWords - 1) * wordLen);
                }
            } else {
                // Invalid word, reset
                windowCount.clear();
                wordsUsed = 0;
            }
        }
    }

    return result;
}

🔑 Key Insights:

Window size: wordLen × numWords (fixed)
Slide by: wordLen (not 1!)
Check offsets: 0 to wordLen-1 (covers all positions)
Word frequency: Use HashMap, handle duplicates
Expand: Add word to windowCount, increment wordsUsed
Shrink: When word over-counted, remove from left
Reset: When invalid word found, clear window
Starting index: i - (numWords - 1) × wordLen
Time: O(n × wordLen) - much better than brute force
Space: O(m) - for word frequency maps

🎪 Memory Aid:

"Slide by WORDS not chars! Check wordLen OFFSETS! Match FREQUENCIES!"
Think: "WORDS have fixed length, slide efficiently!"

💡 The Optimization:

Brute Force: Check positions 0,1,2,3,4,5,6,7,8,...
             O(n) positions × O(m) words = O(n×m)

Optimized:   Check 3 offsets: [0,3,6,9,...], [1,4,7,10,...], [2,5,8,11,...]
             O(wordLen) offsets × O(n/wordLen) positions = O(n)

Huge improvement when wordLen is small!

Find All Anagrams in a String (LC 438)
Permutation in String (LC 567)
Minimum Window Substring (LC 76)
Longest Substring Without Repeating Characters (LC 3)

Happy practicing! 🎯

43. Substring with Concatenation of All Words

Problem Statement

🎨 Visual Understanding

The Problem Visualized

Key Observation: Fixed-Size Window with Word Matching

Visual Process

🎯 Approach 1: Brute Force

The Most Natural Thinking 💭

Implementation

✅ Approach 2: Sliding Window (Optimal)

The Breakthrough Insight 💡

Implementation

🔍 Step-by-Step Execution

Example: s = "barfoothefoobarman", words = ["foo","bar"]

Example 2: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]

🔍 Edge Cases

Common Mistakes

🎯 Key Takeaways

⚡ Algorithm Comparison

🔑 The Core Insight

🎯 Pattern Recognition

🧠 Interview Strategy

📝 Quick Revision Notes

Related Patterns