43. Substring with Concatenation of All Words
π LeetCode Problem: 30. Substring with Concatenation of All Words
π Difficulty: Hard
π·οΈ Topics: Hash Table, String, Sliding Window
Problem Statement
You are given a string s and an array of strings words. All the strings of words are of the same length.
A concatenated substring in s is a substring that contains all the strings of any permutation of words concatenated.
- For example, if
words = ["ab","cd","ef"], then"abcdef","abefcd","cdabef","cdefab","efabcd", and"efcdab"are all concatenated strings."acdbef"is not a concatenated substring because it is not the concatenation of any permutation ofwords.
Return the starting indices of all the concatenated substrings in s. You can return the answer in any order.
Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation:
The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words.
The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: There is no concatenated substring.
Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Constraints:
- 1 <= s.length <= 10^4
- 1 <= words.length <= 5000
- 1 <= words[i].length <= 30
- s and words[i] consist of lowercase English letters.
π¨ Visual Understanding
The Problem Visualized
s = "barfoothefoobarman", words = ["foo","bar"]
Each word has length 3
Total concatenation length = 2 words Γ 3 = 6
Check substrings of length 6:
Index 0: "barfoo"
"bar" + "foo" = words β (permutation)
Index 1: "arfoot"
Not valid (can't split into valid words)
Index 2: "rfooth"
Not valid
Index 9: "foobar"
"foo" + "bar" = words β (permutation)
Answer: [0, 9]
Key Observation: Fixed-Size Window with Word Matching
This is a HARD variation of sliding window because:
1. Fixed window size = wordLen Γ numWords
2. Must split window into words of exact length
3. Each word must match exactly (with frequency)
4. All words in `words` must be used exactly once
Example:
s = "barfoothefoobarman"
words = ["foo", "bar"]
wordLen = 3
windowSize = 3 Γ 2 = 6
Window [0:6]: "barfoo"
Split: ["bar", "foo"]
Match words? {"bar":1, "foo":1} = words β
Window [1:7]: "arfoot"
Split: ["arf", "oot"]
Match words? {"arf":1, "oot":1} β words β
Visual Process
s = "barfoothefoobarman", words = ["foo","bar"]
wordLen = 3, numWords = 2, windowSize = 6
words frequency: {foo:1, bar:1}
Position 0: "barfoo"
Split: "bar", "foo"
Frequency: {bar:1, foo:1}
Match? YES β β Add 0 to result
Position 1: "arfoot"
Split: "arf", "oot"
Frequency: {arf:1, oot:1}
Match? NO β
Position 2: "rfooth"
Split: "rfo", "oth"
Match? NO β
...continue checking...
Position 9: "foobar"
Split: "foo", "bar"
Frequency: {foo:1, bar:1}
Match? YES β β Add 9 to result
Result: [0, 9]
π― Approach 1: Brute Force
The Most Natural Thinking π
Core Logic:
Check every possible starting position:
1. For each index i in s
2. Extract substring of length (wordLen Γ numWords)
3. Split into words of wordLen
4. Check if words match (with frequency)
5. If match, add i to result
Why This Works: - Checks all possibilities - Guaranteed to find all answers - Easy to understand
Why This Fails Requirements: - Time complexity O(n Γ m Γ wordLen) where n = s.length, m = words.length - Recounts frequencies for every position - Too slow for large inputs - Not optimal for interviews
Implementation
public List<Integer> findSubstring(String s, String[] words) {
List<Integer> result = new ArrayList<>();
if (s == null || s.length() == 0 || words == null || words.length == 0) {
return result;
}
int wordLen = words[0].length();
int numWords = words.length;
int windowSize = wordLen * numWords;
// Build word frequency map
Map<String, Integer> wordCount = new HashMap<>();
for (String word : words) {
wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
}
// Check every position
for (int i = 0; i <= s.length() - windowSize; i++) {
if (isMatch(s, i, wordLen, numWords, wordCount)) {
result.add(i);
}
}
return result;
}
private boolean isMatch(String s, int start, int wordLen, int numWords,
Map<String, Integer> wordCount) {
Map<String, Integer> seen = new HashMap<>();
for (int j = 0; j < numWords; j++) {
int wordStart = start + j * wordLen;
String word = s.substring(wordStart, wordStart + wordLen);
seen.put(word, seen.getOrDefault(word, 0) + 1);
// Check if word is valid and not over-counted
if (!wordCount.containsKey(word) ||
seen.get(word) > wordCount.get(word)) {
return false;
}
}
return true;
}
β° Time: O(n Γ m Γ wordLen) - n positions, each checks m words
πΎ Space: O(m) - for frequency maps
β Approach 2: Sliding Window (Optimal)
The Breakthrough Insight π‘
Core Logic:
Use sliding window with word-by-word movement:
Key optimization:
Instead of checking every position, we only need to check
wordLen different starting positions (0, 1, 2, ..., wordLen-1)
For each starting offset:
1. Build a sliding window of numWords words
2. Slide word by word (not character by character)
3. Maintain word frequency in current window
4. Check if frequency matches
This reduces redundant work!
Visual Explanation:
s = "barfoothefoobarman", words = ["foo","bar"]
wordLen = 3, numWords = 2
Instead of checking positions 0,1,2,3,4,5,6,...
We check 3 separate sequences:
Offset 0: positions 0, 3, 6, 9, 12, 15
Window at 0: "bar" + "foo" = "barfoo" β
Window at 3: "foo" + "the" = "foothe" β
Window at 6: "the" + "foo" = "thefoo" β
Window at 9: "foo" + "bar" = "foobar" β
Offset 1: positions 1, 4, 7, 10, 13, 16
All invalid
Offset 2: positions 2, 5, 8, 11, 14, 17
All invalid
This way we only move by wordLen, not by 1!
Why This Works:
Any valid concatenation must start at some position i.
If we check offsets 0, 1, 2, ..., wordLen-1,
we cover all possible starting positions.
Within each offset, we slide by full words,
which is more efficient than character-by-character.
Time saved:
- Brute force: n positions Γ m words = O(n Γ m)
- Optimized: wordLen offsets Γ (n/wordLen) positions = O(n Γ wordLen)
- Since wordLen is usually small (β€ 30), this is much better!
Implementation
public List<Integer> findSubstring(String s, String[] words) {
List<Integer> result = new ArrayList<>();
if (s == null || s.length() == 0 || words == null || words.length == 0) {
return result;
}
int wordLen = words[0].length();
int numWords = words.length;
int windowSize = wordLen * numWords;
if (s.length() < windowSize) {
return result;
}
// Build word frequency map
Map<String, Integer> wordCount = new HashMap<>();
for (String word : words) {
wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
}
// Check wordLen different starting offsets
for (int offset = 0; offset < wordLen; offset++) {
slidingWindow(s, offset, wordLen, numWords, wordCount, result);
}
return result;
}
private void slidingWindow(String s, int offset, int wordLen, int numWords,
Map<String, Integer> wordCount, List<Integer> result) {
Map<String, Integer> windowCount = new HashMap<>();
int wordsUsed = 0;
// Process words starting from offset
for (int i = offset; i <= s.length() - wordLen; i += wordLen) {
String word = s.substring(i, i + wordLen);
// Expand window
if (wordCount.containsKey(word)) {
windowCount.put(word, windowCount.getOrDefault(word, 0) + 1);
wordsUsed++;
// Shrink window if word is over-counted
while (windowCount.get(word) > wordCount.get(word)) {
int leftIdx = i - (wordsUsed - 1) * wordLen;
String leftWord = s.substring(leftIdx, leftIdx + wordLen);
windowCount.put(leftWord, windowCount.get(leftWord) - 1);
wordsUsed--;
}
// Check if we have a valid window
if (wordsUsed == numWords) {
result.add(i - (numWords - 1) * wordLen);
}
} else {
// Invalid word, reset window
windowCount.clear();
wordsUsed = 0;
}
}
}
β° Time: O(n Γ wordLen) - wordLen offsets, each processes n chars
πΎ Space: O(m) - for frequency maps
π Step-by-Step Execution
Example: s = "barfoothefoobarman", words = ["foo","bar"]
Initial Setup:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
s = "barfoothefoobarman"
words = ["foo", "bar"]
wordLen = 3
numWords = 2
windowSize = 6
wordCount = {foo:1, bar:1}
OFFSET 0: Check positions 0, 3, 6, 9, 12, 15
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Position 0: word = "bar"
windowCount = {bar:1}
wordsUsed = 1
Valid? wordsUsed (1) < numWords (2)
Position 3: word = "foo"
windowCount = {bar:1, foo:1}
wordsUsed = 2
Valid? wordsUsed (2) == numWords (2) β
Starting index = 3 - (2-1)Γ3 = 0
Add 0 to result β
Position 6: word = "the"
"the" not in wordCount
Reset: windowCount = {}, wordsUsed = 0
Position 9: word = "foo"
windowCount = {foo:1}
wordsUsed = 1
Position 12: word = "bar"
windowCount = {foo:1, bar:1}
wordsUsed = 2
Valid? wordsUsed (2) == numWords (2) β
Starting index = 12 - (2-1)Γ3 = 9
Add 9 to result β
Position 15: word = "man"
"man" not in wordCount
Reset
OFFSET 1: Check positions 1, 4, 7, 10, 13, 16
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Position 1: word = "arf"
"arf" not in wordCount
Reset
Position 4: word = "oot"
"oot" not in wordCount
Reset
...all invalid
OFFSET 2: Check positions 2, 5, 8, 11, 14, 17
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Position 2: word = "rfo"
"rfo" not in wordCount
Reset
...all invalid
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π― FINAL RESULT: [0, 9]
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Example 2: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Setup:
wordLen = 4, numWords = 4, windowSize = 16
wordCount = {word:2, good:1, best:1}
Need to find: any permutation of ["word","good","best","word"]
OFFSET 0: Check positions 0, 4, 8, 12, 16, 20
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Position 0: "word" β windowCount = {word:1}, wordsUsed = 1
Position 4: "good" β windowCount = {word:1, good:1}, wordsUsed = 2
Position 8: "good" β windowCount = {word:1, good:2}, wordsUsed = 3
"good" over-counted! good:2 > wordCount[good]:1
Shrink: remove "word" from left
windowCount = {good:2}, wordsUsed = 2
Still over-counted!
Shrink: remove "good" from left
windowCount = {good:1}, wordsUsed = 1
Position 12: "good" β windowCount = {good:2}, wordsUsed = 2
Over-counted again, shrink...
...no valid concatenation found
Result: []
π Edge Cases
Case 1: Empty string
Input: s = "", words = ["foo"]
Output: []
Case 2: Words longer than string
Input: s = "bar", words = ["barr"]
Output: []
Case 3: Single word
Input: s = "barfoo", words = ["bar"]
Output: [0]
Case 4: Duplicate words
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","good"]
Output: [8]
Explanation: "goodgoodbestword"
Case 5: All same words
Input: s = "aaaaaaaa", words = ["aa","aa","aa"]
Output: [0,1,2]
Case 6: No match
Input: s = "barfoo", words = ["foo","bar","the"]
Output: []
Case 7: Entire string matches
Input: s = "foobar", words = ["foo","bar"]
Output: [0]
Case 8: Multiple overlapping matches
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Common Mistakes
Mistake 1: Not handling duplicate words
β Wrong:
// Using Set instead of frequency map
Set<String> words = new HashSet<>(Arrays.asList(words));
β Right:
Map<String, Integer> wordCount = new HashMap<>();
for (String word : words) {
wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
}
Reason: Words can appear multiple times
Mistake 2: Checking every character position
β Wrong:
for (int i = 0; i < s.length(); i++) {
// Check every position
}
β Right:
for (int offset = 0; offset < wordLen; offset++) {
// Only check wordLen offsets
}
Reason: Optimize by sliding word-by-word
Mistake 3: Not resetting on invalid word
β Wrong:
if (!wordCount.containsKey(word)) {
continue; // Just skip
}
β Right:
if (!wordCount.containsKey(word)) {
windowCount.clear();
wordsUsed = 0;
}
Reason: Invalid word breaks the concatenation
Mistake 4: Wrong starting index calculation
β Wrong:
result.add(i); // Current position
β Right:
result.add(i - (numWords - 1) * wordLen);
Reason: Need start of window, not end
Mistake 5: Not handling over-counting
β Wrong:
windowCount.put(word, windowCount.get(word) + 1);
// Don't check if over-counted
β Right:
windowCount.put(word, windowCount.get(word) + 1);
while (windowCount.get(word) > wordCount.get(word)) {
// Shrink from left
}
Reason: Must maintain exact frequencies
Mistake 6: Wrong window size check
β Wrong:
if (s.length() < wordLen) return result;
β Right:
if (s.length() < wordLen * numWords) return result;
Reason: Need space for all words
π― Key Takeaways
β‘ Algorithm Comparison
Approach 1: Brute Force
Time: O(n Γ m Γ wordLen)
Space: O(m)
Use: Only for understanding
Approach 2: Sliding Window (RECOMMENDED)
Time: O(n Γ wordLen)
Space: O(m)
Use: Optimal solution, best for interviews
π The Core Insight
Problem Breakdown:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This is a HARD sliding window problem because:
1. Fixed window size (wordLen Γ numWords)
2. Must split into exact-length words
3. Word frequency matching required
4. All words must be used exactly once
Key Optimization:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Instead of checking every position (0,1,2,3,...):
Check only wordLen different offsets (0 to wordLen-1)
For each offset, slide by full words:
- Offset 0: 0, wordLen, 2ΓwordLen, 3ΓwordLen, ...
- Offset 1: 1, 1+wordLen, 1+2ΓwordLen, ...
- Offset 2: 2, 2+wordLen, 2+2ΓwordLen, ...
This covers all positions but slides more efficiently!
Algorithm Template:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
for offset in 0 to wordLen-1:
windowCount = {}
wordsUsed = 0
for i from offset by wordLen:
word = s[i:i+wordLen]
if word in wordCount:
add word to windowCount
wordsUsed++
// Shrink if over-counted
while windowCount[word] > wordCount[word]:
remove left word
wordsUsed--
// Check if valid
if wordsUsed == numWords:
add starting index to result
else:
// Invalid word, reset
clear windowCount
wordsUsed = 0
Key Techniques:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. Word frequency map (not character)
2. Slide by wordLen, not by 1
3. Multiple offset starting points
4. Shrink window when over-counting
5. Reset window on invalid word
6. Calculate starting index correctly
π― Pattern Recognition
Problem Type: Concatenation Matching with Permutation
Core Pattern: Fixed-Size Sliding Window + Word Frequency Matching
When to Apply:
β Finding all occurrences of concatenations
β Words can be in any order (permutation)
β All words same length
β Must use each word exact number of times
β Want O(n) time (with small constant)
Unique Characteristics:
- Works with WORDS not characters
- Slides by wordLen not 1
- Checks wordLen different offsets
- Frequency matching at word level
Comparison to Similar Problems:
ββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ
β Find All Anagrams (LC 438) β This Problem (LC 30) β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββ€
β Character frequency β Word frequency β
β Slide by 1 β Slide by wordLen β
β Single offset β Multiple offsets β
β Pattern length fixed β Total length fixed β
ββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
Related Problems:
- Find All Anagrams in a String (LC 438)
- Permutation in String (LC 567)
- Minimum Window Substring (LC 76)
π§ Interview Strategy
Step 1: "Need to find all concatenations of word permutations"
Step 2: "Words are same length, any order allowed"
Step 3: "Use sliding window with word-level matching"
Step 4: "Optimize: slide by wordLen, check wordLen offsets"
Step 5: "Track word frequencies in window"
Step 6: "Shrink when over-counted, reset on invalid"
Key Points to Mention:
- Fixed window size = wordLen Γ numWords
- Slide by full words, not characters
- Check wordLen different offsets (optimization)
- Use frequency maps for word counting
- Handle duplicates correctly
- Shrink window when word over-counted
- Reset window on invalid word
- Calculate starting index: i - (numWords-1) Γ wordLen
- Time: O(n Γ wordLen), much better than O(n Γ m)
- Space: O(m) for word frequencies
Why Multiple Offsets:
"Any valid concatenation starts at some position.
By checking offsets 0, 1, ..., wordLen-1 and sliding
by wordLen within each offset, we cover all positions
more efficiently."
This is a HARD problem - focus on correctness first! π
π Quick Revision Notes
π― Core Concept:
Sliding window at WORD level, not character level. Check wordLen offsets, slide by wordLen. Match word frequencies.
β‘ Quick Implementation (Simplified):
public List<Integer> findSubstring(String s, String[] words) {
List<Integer> result = new ArrayList<>();
if (s.length() == 0 || words.length == 0) return result;
int wordLen = words[0].length();
int numWords = words.length;
int windowSize = wordLen * numWords;
// Build word frequency map
Map<String, Integer> wordCount = new HashMap<>();
for (String word : words) {
wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
}
// Check wordLen different offsets
for (int offset = 0; offset < wordLen; offset++) {
Map<String, Integer> windowCount = new HashMap<>();
int wordsUsed = 0;
for (int i = offset; i <= s.length() - wordLen; i += wordLen) {
String word = s.substring(i, i + wordLen);
if (wordCount.containsKey(word)) {
windowCount.put(word, windowCount.getOrDefault(word, 0) + 1);
wordsUsed++;
// Shrink if over-counted
while (windowCount.get(word) > wordCount.get(word)) {
int leftIdx = i - (wordsUsed - 1) * wordLen;
String leftWord = s.substring(leftIdx, leftIdx + wordLen);
windowCount.put(leftWord, windowCount.get(leftWord) - 1);
wordsUsed--;
}
// Check if valid
if (wordsUsed == numWords) {
result.add(i - (numWords - 1) * wordLen);
}
} else {
// Invalid word, reset
windowCount.clear();
wordsUsed = 0;
}
}
}
return result;
}
π Key Insights:
- Window size:
wordLen Γ numWords(fixed) - Slide by:
wordLen(not 1!) - Check offsets: 0 to wordLen-1 (covers all positions)
- Word frequency: Use HashMap, handle duplicates
- Expand: Add word to windowCount, increment wordsUsed
- Shrink: When word over-counted, remove from left
- Reset: When invalid word found, clear window
- Starting index:
i - (numWords - 1) Γ wordLen - Time: O(n Γ wordLen) - much better than brute force
- Space: O(m) - for word frequency maps
πͺ Memory Aid:
"Slide by WORDS not chars! Check wordLen OFFSETS! Match FREQUENCIES!"
Think: "WORDS have fixed length, slide efficiently!"
π‘ The Optimization:
Brute Force: Check positions 0,1,2,3,4,5,6,7,8,...
O(n) positions Γ O(m) words = O(nΓm)
Optimized: Check 3 offsets: [0,3,6,9,...], [1,4,7,10,...], [2,5,8,11,...]
O(wordLen) offsets Γ O(n/wordLen) positions = O(n)
Huge improvement when wordLen is small!
Related Patterns
- Find All Anagrams in a String (LC 438)
- Permutation in String (LC 567)
- Minimum Window Substring (LC 76)
- Longest Substring Without Repeating Characters (LC 3)
Happy practicing! π―