285. Find Median from Data Stream
🔗 LeetCode Problem: 295. Find Median from Data Stream
📊 Difficulty: Hard
🏷️ Topics: Two Pointers, Design, Sorting, Heap (Priority Queue), Data Stream
Problem Statement
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
- For example, for
arr = [2,3,4], the median is3. - For example, for
arr = [2,3], the median is(2 + 3) / 2 = 2.5.
Implement the MedianFinder class:
MedianFinder()initializes theMedianFinderobject.void addNum(int num)adds the integernumfrom the data stream to the data structure.double findMedian()returns the median of all elements so far. Answers within10^-5of the actual answer will be accepted.
Example 1:
Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]
Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3); // arr = [1, 2, 3]
medianFinder.findMedian(); // return 2.0
Constraints:
- -10^5 <= num <= 10^5
- There will be at least one element in the data structure before calling findMedian.
- At most 3 * 10^4 calls will be made to addNum and findMedian.
Follow up:
- If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?
- If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?
🌟 Understanding the Problem
What is a median?
Median = middle value in SORTED order
Odd count:
arr = [1, 2, 3, 4, 5]
median = middle = 3
Even count:
arr = [1, 2, 3, 4]
median = average of two middles = (2 + 3) / 2 = 2.5
Key insight: Median is the "balance point"!
The Challenge:
Data comes as a STREAM:
Can't sort the entire array each time (too slow!)
Need to:
1. Add numbers efficiently
2. Find median efficiently
Can't just sort repeatedly - O(n log n) per add is too slow!
Key Observations:
1. Median divides data into two halves:
- Left half: smaller values
- Right half: larger values
2. For median, we need:
- Maximum of left half
- Minimum of right half
3. Two heaps can maintain this:
- Max-heap for left half (smaller values)
- Min-heap for right half (larger values)
4. Balance: Keep heaps equal size (or differ by 1)
🌟 The Natural Thinking Process
When you first see this problem:
Question: Find median from streaming data
First Thought: "Keep array sorted, find middle"
- Add new number
- Sort array
- Return middle element(s)
Is this optimal? NO (O(n log n) per add!), but it WORKS!
The Evolution:
BRUTE FORCE THINKING:
"Maintain sorted array"
"Add: insert in sorted position"
"Median: return middle"
Time: O(n) per add, O(1) for median
Problem: Linear search for insertion!
⬇
REALIZATION 1:
"Binary search for insertion position!"
Time: O(log n) search + O(n) insertion
Problem: Still O(n) to insert in array!
⬇
REALIZATION 2:
"For median, I only need TWO values:"
"- Max of left half"
"- Min of right half"
"I don't need FULL sorted order!"
⬇
OPTIMIZATION: Two Heaps
"Max-heap for left half (smaller values)"
"Min-heap for right half (larger values)"
"Keep them balanced!"
Time: O(log n) per add, O(1) for median ✨
🎯 Approach 1: Brute Force - Sorted List ⭐
The First Natural Solution
The Thought Process:
Step 1: Maintain a sorted list
Step 2: When adding, insert in correct position
Step 3: Median = middle element(s)
Simple but inefficient!
Visual Tracking - Complete Example
Operations: add(1), add(2), findMedian(), add(3), findMedian()
═══════════════════════════════════════════════════════════════
add(1)
═══════════════════════════════════════════════════════════════
List: []
Add 1: [1]
Sorted list: [1]
═══════════════════════════════════════════════════════════════
add(2)
═══════════════════════════════════════════════════════════════
List: [1]
Add 2: Find position (binary search)
2 > 1, insert at index 1
Sorted list: [1, 2]
═══════════════════════════════════════════════════════════════
findMedian()
═══════════════════════════════════════════════════════════════
List: [1, 2]
Size: 2 (even)
Median = (list[0] + list[1]) / 2
= (1 + 2) / 2
= 1.5 ✓
═══════════════════════════════════════════════════════════════
add(3)
═══════════════════════════════════════════════════════════════
List: [1, 2]
Add 3: Find position
3 > 2, insert at index 2
Sorted list: [1, 2, 3]
═══════════════════════════════════════════════════════════════
findMedian()
═══════════════════════════════════════════════════════════════
List: [1, 2, 3]
Size: 3 (odd)
Median = list[1]
= 2.0 ✓
Implementation
import java.util.*;
/**
* Brute Force: Sorted List
* Time: O(n) per add (insertion), O(1) per findMedian
* Space: O(n)
*/
class MedianFinder {
private List<Integer> nums;
public MedianFinder() {
nums = new ArrayList<>();
}
public void addNum(int num) {
// Binary search for insertion position
int left = 0, right = nums.size();
while (left < right) {
int mid = left + (right - left) / 2;
if (nums.get(mid) < num) {
left = mid + 1;
} else {
right = mid;
}
}
// Insert at position 'left'
nums.add(left, num); // O(n) to shift elements
}
public double findMedian() {
int n = nums.size();
if (n % 2 == 1) {
// Odd: return middle
return nums.get(n / 2);
} else {
// Even: return average of two middles
return (nums.get(n / 2 - 1) + nums.get(n / 2)) / 2.0;
}
}
}
⏰ Time: - addNum: O(log n) search + O(n) insertion = O(n) - findMedian: O(1)
💾 Space: O(n)
Why This is Inefficient:
Problem: Inserting into ArrayList is O(n)!
Example: 10,000 numbers
Each insert: O(10,000) to shift
Total: O(100,000,000) operations!
Can we avoid shifting?
💡 The AHA Moment
CRITICAL REALIZATION:
"For median, I don't need ENTIRE sorted array!"
"I only need TWO key values:"
1. Maximum of left half (smaller values)
2. Minimum of right half (larger values)
Example: [1, 2, 3, 4, 5]
Left half: [1, 2, 3] → need max = 3
Right half: [4, 5] → need min = 4
Median (odd) = 3 (max of left)
Example: [1, 2, 3, 4]
Left half: [1, 2] → need max = 2
Right half: [3, 4] → need min = 3
Median (even) = (2 + 3) / 2 = 2.5
How to maintain these two values efficiently?
MAX-HEAP for left half (gives max)
MIN-HEAP for right half (gives min)
Perfect! ✨
🎯 Approach 2: Two Heaps (Optimal) ⭐⭐⭐
The Optimal Solution
The Evolution of Thought:
Sorted list: O(n) per add
⬇
Observation: "Only need max(left) and min(right)!"
⬇
Solution: "Two heaps!"
⬇
Optimal: O(log n) per add, O(1) for median!
Understanding Two Heaps - Complete Reasoning
The Core Idea:
PRINCIPLE: Divide and Conquer the Data
═══════════════════════════════════════════════════════════════
Concept: Split all numbers into two halves
LEFT HALF RIGHT HALF
(smaller values) (larger values)
─────────── ───────────
stored in stored in
MAX-HEAP MIN-HEAP
(largest on top) (smallest on top)
Properties maintained:
1. All values in left ≤ all values in right
2. Sizes differ by at most 1
3. If sizes equal: median = (max(left) + min(right)) / 2
4. If sizes differ: median = top of larger heap
═══════════════════════════════════════════════════════════════
WHY Max-Heap for Left and Min-Heap for Right?
═══════════════════════════════════════════════════════════════
Left half (smaller values):
Need: Maximum value (closest to median)
Use: MAX-HEAP (max on top) ✓
Right half (larger values):
Need: Minimum value (closest to median)
Use: MIN-HEAP (min on top) ✓
Example:
Numbers: [1, 2, 3, 4, 5]
Left (max-heap): [1, 2, 3] → top = 3
Right (min-heap): [4, 5] → top = 4
Median (odd count): 3 (from left)
The two heap tops give us the median! ✓
═══════════════════════════════════════════════════════════════
INVARIANT: Balanced Heaps
═══════════════════════════════════════════════════════════════
We maintain:
|size(left) - size(right)| ≤ 1
Why?
So median is always at the "boundary"
If equal sizes: median = average of tops
If left bigger: median = left top
If right bigger: median = right top (we'll use left bigger)
═══════════════════════════════════════════════════════════════
ADD ALGORITHM
═══════════════════════════════════════════════════════════════
Strategy:
1. Add to appropriate heap
2. Rebalance if needed
Detailed steps:
Step 1: Choose heap
If left empty OR num ≤ left.top:
Add to left (smaller values)
Else:
Add to right (larger values)
Step 2: Rebalance
If left.size > right.size + 1:
Move left.top to right
Else if right.size > left.size:
Move right.top to left
Result: Always balanced! ✓
═══════════════════════════════════════════════════════════════
FIND MEDIAN ALGORITHM
═══════════════════════════════════════════════════════════════
Check sizes:
If left.size > right.size:
Median = left.top (odd count, extra in left)
Else:
Median = (left.top + right.top) / 2.0 (even count)
O(1) operation! ✓
Visual Tracking - Complete Example with Detailed Execution
Operations: add(1), add(2), findMedian(), add(3), findMedian()
═══════════════════════════════════════════════════════════════
INITIAL STATE
═══════════════════════════════════════════════════════════════
Left (max-heap): []
Right (min-heap): []
═══════════════════════════════════════════════════════════════
add(1)
═══════════════════════════════════════════════════════════════
Step 1: Choose heap
Left empty? YES
Add to left
Left: [1]
Right: []
Step 2: Check balance
left.size = 1
right.size = 0
1 > 0 + 1? NO
Balanced! ✓
State:
Left (max-heap): [1] ← top = 1
Right (min-heap): []
═══════════════════════════════════════════════════════════════
add(2)
═══════════════════════════════════════════════════════════════
Step 1: Choose heap
Left empty? NO
2 ≤ left.top (1)? NO
Add to right
Left: [1]
Right: [2]
Step 2: Check balance
left.size = 1
right.size = 1
Balanced! ✓
State:
Left (max-heap): [1] ← top = 1
Right (min-heap): [2] ← top = 2
Visual:
[1] | [2]
↑ ↑
max min
(smaller) | (larger)
═══════════════════════════════════════════════════════════════
findMedian()
═══════════════════════════════════════════════════════════════
Sizes: left = 1, right = 1 (equal)
Median = (left.top + right.top) / 2
= (1 + 2) / 2
= 1.5 ✓
═══════════════════════════════════════════════════════════════
add(3)
═══════════════════════════════════════════════════════════════
Step 1: Choose heap
Left empty? NO
3 ≤ left.top (1)? NO
Add to right
Left: [1]
Right: [2, 3] → min-heap, so 2 on top
Step 2: Check balance
left.size = 1
right.size = 2
right.size > left.size? YES (2 > 1)
REBALANCE: Move right.top to left
Remove from right: 2
Add to left: 2
Left: [1, 2] → max-heap, so 2 on top
Right: [3]
State after rebalance:
Left (max-heap): [2, 1] ← top = 2
2
/
1
Right (min-heap): [3] ← top = 3
Visual:
[1, 2] | [3]
↑ ↑
max min
═══════════════════════════════════════════════════════════════
findMedian()
═══════════════════════════════════════════════════════════════
Sizes: left = 2, right = 1
left.size > right.size? YES
Median = left.top
= 2.0 ✓
═══════════════════════════════════════════════════════════════
SUMMARY OF STATE TRANSITIONS
═══════════════════════════════════════════════════════════════
add(1):
Left: [1] | Right: []
add(2):
Left: [1] | Right: [2]
add(3):
Before rebalance: Left: [1] | Right: [2,3]
After rebalance: Left: [2,1] | Right: [3]
Key insight:
✓ Always balanced (sizes differ by ≤ 1)
✓ Left top = max of smaller half
✓ Right top = min of larger half
✓ Median always available at tops! ✨
Implementation with Detailed Code Walkthrough
import java.util.*;
/**
* Two Heaps - Optimal Solution
* Time: O(log n) per addNum, O(1) per findMedian
* Space: O(n)
*/
class MedianFinder {
// Max-heap for left half (smaller values)
// Top = largest of small values
private PriorityQueue<Integer> left;
// Min-heap for right half (larger values)
// Top = smallest of large values
private PriorityQueue<Integer> right;
public MedianFinder() {
// Max-heap: reverse order (larger values have higher priority)
left = new PriorityQueue<>((a, b) -> Integer.compare(b, a));
// Min-heap: natural order (smaller values have higher priority)
right = new PriorityQueue<>();
}
/**
* Add number to data structure
*
* Strategy:
* 1. Add to appropriate heap
* 2. Rebalance to maintain invariant
*
* Invariant: |left.size - right.size| ≤ 1
*/
public void addNum(int num) {
// Step 1: Add to appropriate heap
if (left.isEmpty() || num <= left.peek()) {
// Number belongs in left half (smaller values)
left.offer(num);
} else {
// Number belongs in right half (larger values)
right.offer(num);
}
// Step 2: Rebalance if needed
// We maintain: left.size >= right.size
// And: left.size - right.size <= 1
if (left.size() > right.size() + 1) {
// Left has too many, move one to right
right.offer(left.poll());
} else if (right.size() > left.size()) {
// Right has more, move one to left
left.offer(right.poll());
}
// After rebalance:
// - If total odd: left has 1 more
// - If total even: both equal
}
/**
* Find median in O(1)
*
* Cases:
* - Odd count: left has extra, return left.top
* - Even count: return average of tops
*/
public double findMedian() {
if (left.size() > right.size()) {
// Odd count: extra element in left
return left.peek();
} else {
// Even count: average of two middles
return (left.peek() + right.peek()) / 2.0;
}
}
}
Detailed Code Walkthrough:
Example: add(5), add(15), add(1), add(3)
═══════════════════════════════════════════════════════════════
add(5)
═══════════════════════════════════════════════════════════════
Code execution:
left.isEmpty()? YES
→ left.offer(5)
left.size() = 1
right.size() = 0
1 > 0 + 1? NO → no rebalance
State:
left: [5]
right: []
═══════════════════════════════════════════════════════════════
add(15)
═══════════════════════════════════════════════════════════════
Code execution:
left.isEmpty()? NO
15 <= left.peek() (5)? NO
→ right.offer(15)
left.size() = 1
right.size() = 1
1 > 1 + 1? NO
1 > 1? NO → no rebalance
State:
left: [5] ← max-heap top = 5
right: [15] ← min-heap top = 15
═══════════════════════════════════════════════════════════════
add(1)
═══════════════════════════════════════════════════════════════
Code execution:
left.isEmpty()? NO
1 <= left.peek() (5)? YES
→ left.offer(1)
left.size() = 2
right.size() = 1
2 > 1 + 1? NO → no rebalance
State:
left: [5, 1] ← max-heap, top = 5
5
/
1
right: [15]
═══════════════════════════════════════════════════════════════
add(3)
═══════════════════════════════════════════════════════════════
Code execution:
left.isEmpty()? NO
3 <= left.peek() (5)? YES
→ left.offer(3)
left.size() = 3
right.size() = 1
3 > 1 + 1? YES → REBALANCE!
Rebalance:
right.offer(left.poll())
left.poll() removes 5 (top of max-heap)
right.offer(5)
After rebalance:
left.size() = 2
right.size() = 2
State:
left: [3, 1] ← max-heap, top = 3
3
/
1
right: [5, 15] ← min-heap, top = 5
5
\
15
findMedian():
left.size() > right.size()? NO (2 == 2)
→ return (left.peek() + right.peek()) / 2.0
→ return (3 + 5) / 2.0 = 4.0 ✓
All numbers sorted: [1, 3, 5, 15]
Median of 4 numbers: (3 + 5) / 2 = 4.0 ✓ CORRECT!
⏰ Time Complexity:
addNum: O(log n)
- Heap insertion: O(log n)
- Rebalance (at most 1 move): O(log n)
- Total: O(log n) ✓
findMedian: O(1)
- Just peek at tops
- Total: O(1) ✓
For n operations:
Total: O(n log n) ✓
💾 Space: O(n) - Store all numbers
Why This is Optimal:
✓ O(log n) per add (vs O(n) sorted list)
✓ O(1) per median (same as sorted list)
✓ No shifting elements (heap operations)
✓ Elegant balance maintenance
Example: 10,000 numbers
Sorted list: 10,000 × O(10,000) = O(100M) ops
Two heaps: 10,000 × O(log 10,000) ≈ O(130K) ops
770x faster! 🚀
📊 Approach Comparison
┌────────────────┬────────────┬──────────┬─────────────────────┐
│ Approach │ addNum │ Median │ Key Insight │
├────────────────┼────────────┼──────────┼─────────────────────┤
│ Sorted List │ O(n) │ O(1) │ Maintain sorted │
│ Two Heaps │ O(log n) │ O(1) │ Only need boundary │
└────────────────┴────────────┴──────────┴─────────────────────┘
For streaming data with frequent adds:
Two Heaps is MUCH better! ✨
💡 Key Learnings
1. MEDIAN INSIGHT:
✓ Don't need full sorted order
✓ Only need max(left) and min(right)
✓ Two heaps maintain these efficiently
2. HEAP CHOICES:
✓ Max-heap for left (smaller values)
✓ Min-heap for right (larger values)
✓ Tops give median boundary
3. BALANCE INVARIANT:
✓ Keep sizes equal or differ by 1
✓ If odd count: left has 1 extra
✓ If even count: both equal
4. PATTERN:
✓ Appears in: sliding window median
✓ Two heaps for maintaining "middle"
✓ Classic streaming data problem
5. COMPLEXITY:
✓ Add: O(log n) - heap operations
✓ Median: O(1) - just peek
✓ Optimal for streaming!
⚠️ Common Mistakes
// Mistake 1: Wrong heap types
PriorityQueue<Integer> left = new PriorityQueue<>(); // ❌ Min-heap!
PriorityQueue<Integer> right = new PriorityQueue<>((a,b) -> b - a); // ❌ Max-heap!
// ✓ CORRECT
PriorityQueue<Integer> left = new PriorityQueue<>((a,b) -> b - a); // Max-heap
PriorityQueue<Integer> right = new PriorityQueue<>(); // Min-heap
// Mistake 2: Not rebalancing
public void addNum(int num) {
left.offer(num); // ❌ Forgot to rebalance!
}
// ✓ CORRECT - Always rebalance after adding
// Mistake 3: Wrong median calculation
public double findMedian() {
return (left.peek() + right.peek()) / 2.0; // ❌ What if odd count?
}
// ✓ CORRECT - Check size difference
// Mistake 4: Integer division
return (left.peek() + right.peek()) / 2; // ❌ Returns integer!
// ✓ CORRECT
return (left.peek() + right.peek()) / 2.0; // Returns double
🎯 Pattern Recognition
Problem Type: Streaming Median
Core Pattern: Two Heaps for Maintaining Middle
When to Apply:
✓ Find median in streaming data
✓ Need quick adds and median queries
✓ Can't afford full sort each time
✓ Maintaining "middle" of data
Recognition Keywords:
- "median"
- "data stream"
- "running median"
- "online algorithm"
Similar Problems:
- Sliding Window Median (LC 480)
- IPO (LC 502) - uses similar two-heap pattern
- Find K-th Smallest Pair Distance
Key Components:
┌────────────────────────────────────────────┐
│ Max-heap: Left half (smaller values) │
│ Min-heap: Right half (larger values) │
│ Balance: Sizes differ by ≤ 1 │
│ Median: From heap tops in O(1) │
└────────────────────────────────────────────┘
📝 Quick Revision
🎯 Core: Find median from streaming data. Naive: Sort each time O(n log n). Optimal: Two heaps! Max-heap for left (smaller), min-heap for right (larger). Balance: sizes differ by ≤ 1. Median from tops in O(1)!
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.PriorityQueue;
// // Approach 1: Using sorting or binary search
// class MedianFinder {
// ArrayList<Integer> list;
// public MedianFinder() {
// // step 1: initialize freshly
// list = new ArrayList<>();
// }
// public void addNum(int num) {
// // step 2: find correct place for num to insert
// int left = 0;
// int right = list.size() - 1;
// int index = -1;
// while (left <= right) {
// int mid = right + (left - right) / 2;
// if (list.get(mid) == num) {
// index = mid;
// } else if (list.get(mid) < num) {
// left = mid + 1;
// } else {
// right = mid - 1;
// }
// }
// if (index == -1) {
// index = left;
// }
// list.add(index, num);
// }
// public double findMedian() {
// // step 3: if odd, mid element. Else, mean of 2 elements
// int size = list.size();
// if (size % 2 == 1) {
// return list.get(size / 2);
// }
// // [1,2,3,4] => (4/2-1, 4/2) => (1,2)
// return (list.get(size / 2 - 1) + list.get(size / 2)) / 2.0;
// }
// }
// Approach 2: Heap
class MedianFinder {
PriorityQueue<Integer> left;
PriorityQueue<Integer> right;
public MedianFinder() {
// step 1: create heaps
// left heap (max heap) maintains smaller elements and
// right heap (min heap) maintains larger elements
left = new PriorityQueue<>(Collections.reverseOrder());
right = new PriorityQueue<>();
}
public void addNum(int num) {
// Step 2: Choose heap
// if left empty OR num ≤ left.top => Add to left (smaller values)
// else Add to right (larger values)
if (left.size() == 0 || left.peek() >= num) {
left.offer(num);
} else {
right.offer(num);
}
// step 3: rebalance immediately ((|size(left) - size(right)| ≤ 1))
// Condition for constant time median retrieval:
// The left heap is maintained with the higher size (at most one extra element).
// if still left is greater, push to right
if (left.size() - 1 > right.size()) {
right.offer(left.poll());
} else if (left.size() < right.size()) {
// else take from right and put to left
left.offer(right.poll());
}
}
public double findMedian() {
// step 4:
// If sizes equal: median = (max(left) + min(right)) / 2
// If sizes differ: median = top of larger heap
if (left.size() == right.size()) {
return (left.peek() + right.peek()) / 2.0;
} else if (left.size() > right.size()) {
return left.peek();
} else {
return right.peek();
}
}
}
public class Solution {
public static void main(String[] args) {
MedianFinder mf = new MedianFinder();
mf.addNum(1);
mf.addNum(2);
System.out.println(mf.findMedian());
mf.addNum(3);
System.out.println(mf.findMedian());
}
}
🔑 Key: Max-heap left, min-heap right. Balance sizes. Tops = median boundary. Add O(log n), median O(1). ✨
Happy practicing! 🎯💪✨
Note: This problem demonstrates the classic two-heap pattern for maintaining median in streaming data! You learn: (1) WHY two heaps (only need boundary values), (2) WHY max-heap for left and min-heap for right (efficient access to median), (3) Balance invariant maintenance, (4) Pattern appears in many streaming/online algorithm problems. The detailed code walkthroughs show exact heap states at each step - true mastery through visualization! 💪✨🏆