Skip to content

285. Find Median from Data Stream

🔗 LeetCode Problem: 295. Find Median from Data Stream
📊 Difficulty: Hard
🏷️ Topics: Two Pointers, Design, Sorting, Heap (Priority Queue), Data Stream

Problem Statement

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

  • For example, for arr = [2,3,4], the median is 3.
  • For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10^-5 of the actual answer will be accepted.

Example 1:

Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]

Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr = [1, 2, 3]
medianFinder.findMedian(); // return 2.0

Constraints: - -10^5 <= num <= 10^5 - There will be at least one element in the data structure before calling findMedian. - At most 3 * 10^4 calls will be made to addNum and findMedian.

Follow up: - If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution? - If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?


🌟 Understanding the Problem

What is a median?

Median = middle value in SORTED order

Odd count:
  arr = [1, 2, 3, 4, 5]
  median = middle = 3

Even count:
  arr = [1, 2, 3, 4]
  median = average of two middles = (2 + 3) / 2 = 2.5

Key insight: Median is the "balance point"!

The Challenge:

Data comes as a STREAM:
  Can't sort the entire array each time (too slow!)

Need to:
  1. Add numbers efficiently
  2. Find median efficiently

Can't just sort repeatedly - O(n log n) per add is too slow!

Key Observations:

1. Median divides data into two halves:
   - Left half: smaller values
   - Right half: larger values

2. For median, we need:
   - Maximum of left half
   - Minimum of right half

3. Two heaps can maintain this:
   - Max-heap for left half (smaller values)
   - Min-heap for right half (larger values)

4. Balance: Keep heaps equal size (or differ by 1)

🌟 The Natural Thinking Process

When you first see this problem:

Question: Find median from streaming data

First Thought: "Keep array sorted, find middle"
  - Add new number
  - Sort array
  - Return middle element(s)

Is this optimal? NO (O(n log n) per add!), but it WORKS!

The Evolution:

BRUTE FORCE THINKING:
  "Maintain sorted array"
  "Add: insert in sorted position"
  "Median: return middle"
  Time: O(n) per add, O(1) for median
  Problem: Linear search for insertion!
  ⬇

REALIZATION 1:
  "Binary search for insertion position!"
  Time: O(log n) search + O(n) insertion
  Problem: Still O(n) to insert in array!
  ⬇

REALIZATION 2:
  "For median, I only need TWO values:"
  "- Max of left half"
  "- Min of right half"
  "I don't need FULL sorted order!"
  ⬇

OPTIMIZATION: Two Heaps
  "Max-heap for left half (smaller values)"
  "Min-heap for right half (larger values)"
  "Keep them balanced!"
  Time: O(log n) per add, O(1) for median ✨

🎯 Approach 1: Brute Force - Sorted List ⭐

The First Natural Solution

The Thought Process:

Step 1: Maintain a sorted list

Step 2: When adding, insert in correct position

Step 3: Median = middle element(s)

Simple but inefficient!

Visual Tracking - Complete Example

Operations: add(1), add(2), findMedian(), add(3), findMedian()

═══════════════════════════════════════════════════════════════
add(1)
═══════════════════════════════════════════════════════════════

List: []
Add 1: [1]

Sorted list: [1]

═══════════════════════════════════════════════════════════════
add(2)
═══════════════════════════════════════════════════════════════

List: [1]
Add 2: Find position (binary search)
  2 > 1, insert at index 1

Sorted list: [1, 2]

═══════════════════════════════════════════════════════════════
findMedian()
═══════════════════════════════════════════════════════════════

List: [1, 2]
Size: 2 (even)

Median = (list[0] + list[1]) / 2
       = (1 + 2) / 2
       = 1.5 ✓

═══════════════════════════════════════════════════════════════
add(3)
═══════════════════════════════════════════════════════════════

List: [1, 2]
Add 3: Find position
  3 > 2, insert at index 2

Sorted list: [1, 2, 3]

═══════════════════════════════════════════════════════════════
findMedian()
═══════════════════════════════════════════════════════════════

List: [1, 2, 3]
Size: 3 (odd)

Median = list[1]
       = 2.0 ✓

Implementation

import java.util.*;

/**
 * Brute Force: Sorted List
 * Time: O(n) per add (insertion), O(1) per findMedian
 * Space: O(n)
 */
class MedianFinder {
    private List<Integer> nums;

    public MedianFinder() {
        nums = new ArrayList<>();
    }

    public void addNum(int num) {
        // Binary search for insertion position
        int left = 0, right = nums.size();
        while (left < right) {
            int mid = left + (right - left) / 2;
            if (nums.get(mid) < num) {
                left = mid + 1;
            } else {
                right = mid;
            }
        }
        // Insert at position 'left'
        nums.add(left, num);  // O(n) to shift elements
    }

    public double findMedian() {
        int n = nums.size();
        if (n % 2 == 1) {
            // Odd: return middle
            return nums.get(n / 2);
        } else {
            // Even: return average of two middles
            return (nums.get(n / 2 - 1) + nums.get(n / 2)) / 2.0;
        }
    }
}

⏰ Time: - addNum: O(log n) search + O(n) insertion = O(n) - findMedian: O(1)

💾 Space: O(n)

Why This is Inefficient:

Problem: Inserting into ArrayList is O(n)!

Example: 10,000 numbers
  Each insert: O(10,000) to shift
  Total: O(100,000,000) operations!

Can we avoid shifting?


💡 The AHA Moment

CRITICAL REALIZATION:

"For median, I don't need ENTIRE sorted array!"
"I only need TWO key values:"
  1. Maximum of left half (smaller values)
  2. Minimum of right half (larger values)

Example: [1, 2, 3, 4, 5]
  Left half: [1, 2, 3]  → need max = 3
  Right half: [4, 5]    → need min = 4
  Median (odd) = 3 (max of left)

Example: [1, 2, 3, 4]
  Left half: [1, 2]  → need max = 2
  Right half: [3, 4] → need min = 3
  Median (even) = (2 + 3) / 2 = 2.5

How to maintain these two values efficiently?
  MAX-HEAP for left half (gives max)
  MIN-HEAP for right half (gives min)

Perfect! ✨

🎯 Approach 2: Two Heaps (Optimal) ⭐⭐⭐

The Optimal Solution

The Evolution of Thought:

Sorted list: O(n) per add
  ⬇
Observation: "Only need max(left) and min(right)!"
  ⬇
Solution: "Two heaps!"
  ⬇
Optimal: O(log n) per add, O(1) for median!

Understanding Two Heaps - Complete Reasoning

The Core Idea:

PRINCIPLE: Divide and Conquer the Data
═══════════════════════════════════════════════════════════════

Concept: Split all numbers into two halves

  LEFT HALF                RIGHT HALF
  (smaller values)         (larger values)
  ───────────              ───────────
  stored in                stored in
  MAX-HEAP                 MIN-HEAP
  (largest on top)         (smallest on top)

Properties maintained:
  1. All values in left ≤ all values in right
  2. Sizes differ by at most 1
  3. If sizes equal: median = (max(left) + min(right)) / 2
  4. If sizes differ: median = top of larger heap

═══════════════════════════════════════════════════════════════
WHY Max-Heap for Left and Min-Heap for Right?
═══════════════════════════════════════════════════════════════

Left half (smaller values):
  Need: Maximum value (closest to median)
  Use: MAX-HEAP (max on top) ✓

Right half (larger values):
  Need: Minimum value (closest to median)
  Use: MIN-HEAP (min on top) ✓

Example:
  Numbers: [1, 2, 3, 4, 5]

  Left (max-heap): [1, 2, 3]  → top = 3
  Right (min-heap): [4, 5]    → top = 4

  Median (odd count): 3 (from left)

The two heap tops give us the median! ✓

═══════════════════════════════════════════════════════════════
INVARIANT: Balanced Heaps
═══════════════════════════════════════════════════════════════

We maintain:
  |size(left) - size(right)| ≤ 1

Why?
  So median is always at the "boundary"

  If equal sizes: median = average of tops
  If left bigger: median = left top
  If right bigger: median = right top (we'll use left bigger)

═══════════════════════════════════════════════════════════════
ADD ALGORITHM
═══════════════════════════════════════════════════════════════

Strategy:
  1. Add to appropriate heap
  2. Rebalance if needed

Detailed steps:

  Step 1: Choose heap
    If left empty OR num ≤ left.top:
      Add to left (smaller values)
    Else:
      Add to right (larger values)

  Step 2: Rebalance
    If left.size > right.size + 1:
      Move left.top to right
    Else if right.size > left.size:
      Move right.top to left

Result: Always balanced! ✓

═══════════════════════════════════════════════════════════════
FIND MEDIAN ALGORITHM
═══════════════════════════════════════════════════════════════

Check sizes:

  If left.size > right.size:
    Median = left.top (odd count, extra in left)

  Else:
    Median = (left.top + right.top) / 2.0 (even count)

O(1) operation! ✓

Visual Tracking - Complete Example with Detailed Execution

Operations: add(1), add(2), findMedian(), add(3), findMedian()

═══════════════════════════════════════════════════════════════
INITIAL STATE
═══════════════════════════════════════════════════════════════

Left (max-heap): []
Right (min-heap): []

═══════════════════════════════════════════════════════════════
add(1)
═══════════════════════════════════════════════════════════════

Step 1: Choose heap
  Left empty? YES
  Add to left

  Left: [1]
  Right: []

Step 2: Check balance
  left.size = 1
  right.size = 0
  1 > 0 + 1? NO
  Balanced! ✓

State:
  Left (max-heap): [1]  ← top = 1
  Right (min-heap): []

═══════════════════════════════════════════════════════════════
add(2)
═══════════════════════════════════════════════════════════════

Step 1: Choose heap
  Left empty? NO
  2 ≤ left.top (1)? NO
  Add to right

  Left: [1]
  Right: [2]

Step 2: Check balance
  left.size = 1
  right.size = 1
  Balanced! ✓

State:
  Left (max-heap): [1]  ← top = 1
  Right (min-heap): [2] ← top = 2

Visual:
  [1] | [2]
   ↑     ↑
  max   min
  (smaller) | (larger)

═══════════════════════════════════════════════════════════════
findMedian()
═══════════════════════════════════════════════════════════════

Sizes: left = 1, right = 1 (equal)

Median = (left.top + right.top) / 2
       = (1 + 2) / 2
       = 1.5 ✓

═══════════════════════════════════════════════════════════════
add(3)
═══════════════════════════════════════════════════════════════

Step 1: Choose heap
  Left empty? NO
  3 ≤ left.top (1)? NO
  Add to right

  Left: [1]
  Right: [2, 3]  → min-heap, so 2 on top

Step 2: Check balance
  left.size = 1
  right.size = 2
  right.size > left.size? YES (2 > 1)

  REBALANCE: Move right.top to left
    Remove from right: 2
    Add to left: 2

  Left: [1, 2]  → max-heap, so 2 on top
  Right: [3]

State after rebalance:
  Left (max-heap): [2, 1]  ← top = 2
                    2
                   /
                  1

  Right (min-heap): [3]    ← top = 3

Visual:
  [1, 2] | [3]
     ↑      ↑
    max    min

═══════════════════════════════════════════════════════════════
findMedian()
═══════════════════════════════════════════════════════════════

Sizes: left = 2, right = 1

left.size > right.size? YES

Median = left.top
       = 2.0 ✓

═══════════════════════════════════════════════════════════════
SUMMARY OF STATE TRANSITIONS
═══════════════════════════════════════════════════════════════

add(1):
  Left: [1] | Right: []

add(2):
  Left: [1] | Right: [2]

add(3):
  Before rebalance: Left: [1] | Right: [2,3]
  After rebalance:  Left: [2,1] | Right: [3]

Key insight:
  ✓ Always balanced (sizes differ by ≤ 1)
  ✓ Left top = max of smaller half
  ✓ Right top = min of larger half
  ✓ Median always available at tops! ✨

Implementation with Detailed Code Walkthrough

import java.util.*;

/**
 * Two Heaps - Optimal Solution
 * Time: O(log n) per addNum, O(1) per findMedian
 * Space: O(n)
 */
class MedianFinder {
    // Max-heap for left half (smaller values)
    // Top = largest of small values
    private PriorityQueue<Integer> left;

    // Min-heap for right half (larger values)
    // Top = smallest of large values
    private PriorityQueue<Integer> right;

    public MedianFinder() {
        // Max-heap: reverse order (larger values have higher priority)
        left = new PriorityQueue<>((a, b) -> Integer.compare(b, a));

        // Min-heap: natural order (smaller values have higher priority)
        right = new PriorityQueue<>();
    }

    /**
     * Add number to data structure
     * 
     * Strategy:
     *   1. Add to appropriate heap
     *   2. Rebalance to maintain invariant
     * 
     * Invariant: |left.size - right.size| ≤ 1
     */
    public void addNum(int num) {
        // Step 1: Add to appropriate heap
        if (left.isEmpty() || num <= left.peek()) {
            // Number belongs in left half (smaller values)
            left.offer(num);
        } else {
            // Number belongs in right half (larger values)
            right.offer(num);
        }

        // Step 2: Rebalance if needed
        // We maintain: left.size >= right.size
        // And: left.size - right.size <= 1

        if (left.size() > right.size() + 1) {
            // Left has too many, move one to right
            right.offer(left.poll());
        } else if (right.size() > left.size()) {
            // Right has more, move one to left
            left.offer(right.poll());
        }

        // After rebalance:
        // - If total odd: left has 1 more
        // - If total even: both equal
    }

    /**
     * Find median in O(1)
     * 
     * Cases:
     *   - Odd count: left has extra, return left.top
     *   - Even count: return average of tops
     */
    public double findMedian() {
        if (left.size() > right.size()) {
            // Odd count: extra element in left
            return left.peek();
        } else {
            // Even count: average of two middles
            return (left.peek() + right.peek()) / 2.0;
        }
    }
}

Detailed Code Walkthrough:

Example: add(5), add(15), add(1), add(3)

═══════════════════════════════════════════════════════════════
add(5)
═══════════════════════════════════════════════════════════════

Code execution:
  left.isEmpty()? YES
  → left.offer(5)

  left.size() = 1
  right.size() = 0
  1 > 0 + 1? NO → no rebalance

State:
  left: [5]
  right: []

═══════════════════════════════════════════════════════════════
add(15)
═══════════════════════════════════════════════════════════════

Code execution:
  left.isEmpty()? NO
  15 <= left.peek() (5)? NO
  → right.offer(15)

  left.size() = 1
  right.size() = 1
  1 > 1 + 1? NO
  1 > 1? NO → no rebalance

State:
  left: [5]   ← max-heap top = 5
  right: [15] ← min-heap top = 15

═══════════════════════════════════════════════════════════════
add(1)
═══════════════════════════════════════════════════════════════

Code execution:
  left.isEmpty()? NO
  1 <= left.peek() (5)? YES
  → left.offer(1)

  left.size() = 2
  right.size() = 1
  2 > 1 + 1? NO → no rebalance

State:
  left: [5, 1]  ← max-heap, top = 5
         5
        /
       1
  right: [15]

═══════════════════════════════════════════════════════════════
add(3)
═══════════════════════════════════════════════════════════════

Code execution:
  left.isEmpty()? NO
  3 <= left.peek() (5)? YES
  → left.offer(3)

  left.size() = 3
  right.size() = 1
  3 > 1 + 1? YES → REBALANCE!

  Rebalance:
    right.offer(left.poll())
    left.poll() removes 5 (top of max-heap)
    right.offer(5)

  After rebalance:
    left.size() = 2
    right.size() = 2

State:
  left: [3, 1]  ← max-heap, top = 3
         3
        /
       1

  right: [5, 15] ← min-heap, top = 5
          5
           \
            15

findMedian():
  left.size() > right.size()? NO (2 == 2)
  → return (left.peek() + right.peek()) / 2.0
  → return (3 + 5) / 2.0 = 4.0 ✓

All numbers sorted: [1, 3, 5, 15]
Median of 4 numbers: (3 + 5) / 2 = 4.0 ✓ CORRECT!

⏰ Time Complexity:

addNum: O(log n)
  - Heap insertion: O(log n)
  - Rebalance (at most 1 move): O(log n)
  - Total: O(log n) ✓

findMedian: O(1)
  - Just peek at tops
  - Total: O(1) ✓

For n operations:
  Total: O(n log n) ✓

💾 Space: O(n) - Store all numbers

Why This is Optimal:

✓ O(log n) per add (vs O(n) sorted list)
✓ O(1) per median (same as sorted list)
✓ No shifting elements (heap operations)
✓ Elegant balance maintenance

Example: 10,000 numbers
  Sorted list: 10,000 × O(10,000) = O(100M) ops
  Two heaps: 10,000 × O(log 10,000) ≈ O(130K) ops

  770x faster! 🚀


📊 Approach Comparison

┌────────────────┬────────────┬──────────┬─────────────────────┐
│ Approach       │ addNum     │ Median   │ Key Insight         │
├────────────────┼────────────┼──────────┼─────────────────────┤
│ Sorted List    │ O(n)       │ O(1)     │ Maintain sorted     │
│ Two Heaps      │ O(log n)   │ O(1)     │ Only need boundary  │
└────────────────┴────────────┴──────────┴─────────────────────┘

For streaming data with frequent adds:
  Two Heaps is MUCH better! ✨

💡 Key Learnings

1. MEDIAN INSIGHT:
   ✓ Don't need full sorted order
   ✓ Only need max(left) and min(right)
   ✓ Two heaps maintain these efficiently

2. HEAP CHOICES:
   ✓ Max-heap for left (smaller values)
   ✓ Min-heap for right (larger values)
   ✓ Tops give median boundary

3. BALANCE INVARIANT:
   ✓ Keep sizes equal or differ by 1
   ✓ If odd count: left has 1 extra
   ✓ If even count: both equal

4. PATTERN:
   ✓ Appears in: sliding window median
   ✓ Two heaps for maintaining "middle"
   ✓ Classic streaming data problem

5. COMPLEXITY:
   ✓ Add: O(log n) - heap operations
   ✓ Median: O(1) - just peek
   ✓ Optimal for streaming!

⚠️ Common Mistakes

// Mistake 1: Wrong heap types
PriorityQueue<Integer> left = new PriorityQueue<>();  // ❌ Min-heap!
PriorityQueue<Integer> right = new PriorityQueue<>((a,b) -> b - a);  // ❌ Max-heap!

// ✓ CORRECT
PriorityQueue<Integer> left = new PriorityQueue<>((a,b) -> b - a);  // Max-heap
PriorityQueue<Integer> right = new PriorityQueue<>();  // Min-heap

// Mistake 2: Not rebalancing
public void addNum(int num) {
    left.offer(num);  // ❌ Forgot to rebalance!
}

// ✓ CORRECT - Always rebalance after adding

// Mistake 3: Wrong median calculation
public double findMedian() {
    return (left.peek() + right.peek()) / 2.0;  // ❌ What if odd count?
}

// ✓ CORRECT - Check size difference

// Mistake 4: Integer division
return (left.peek() + right.peek()) / 2;  // ❌ Returns integer!

// ✓ CORRECT
return (left.peek() + right.peek()) / 2.0;  // Returns double

🎯 Pattern Recognition

Problem Type: Streaming Median
Core Pattern: Two Heaps for Maintaining Middle

When to Apply:
✓ Find median in streaming data
✓ Need quick adds and median queries
✓ Can't afford full sort each time
✓ Maintaining "middle" of data

Recognition Keywords:
- "median"
- "data stream"
- "running median"
- "online algorithm"

Similar Problems:
- Sliding Window Median (LC 480)
- IPO (LC 502) - uses similar two-heap pattern
- Find K-th Smallest Pair Distance

Key Components:
┌────────────────────────────────────────────┐
│ Max-heap: Left half (smaller values)      │
│ Min-heap: Right half (larger values)      │
│ Balance: Sizes differ by ≤ 1              │
│ Median: From heap tops in O(1)            │
└────────────────────────────────────────────┘

📝 Quick Revision

🎯 Core: Find median from streaming data. Naive: Sort each time O(n log n). Optimal: Two heaps! Max-heap for left (smaller), min-heap for right (larger). Balance: sizes differ by ≤ 1. Median from tops in O(1)!

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.PriorityQueue;

// // Approach 1: Using sorting or binary search
// class MedianFinder {
//   ArrayList<Integer> list;

//   public MedianFinder() {
//     // step 1: initialize freshly
//     list = new ArrayList<>();
//   }

//   public void addNum(int num) {
//     // step 2: find correct place for num to insert
//     int left = 0;
//     int right = list.size() - 1;
//     int index = -1;

//     while (left <= right) {
//       int mid = right + (left - right) / 2;

//       if (list.get(mid) == num) {
//         index = mid;
//       } else if (list.get(mid) < num) {
//         left = mid + 1;
//       } else {
//         right = mid - 1;
//       }
//     }

//     if (index == -1) {
//       index = left;
//     }

//     list.add(index, num);
//   }

//   public double findMedian() {
//     // step 3: if odd, mid element. Else, mean of 2 elements
//     int size = list.size();

//     if (size % 2 == 1) {
//       return list.get(size / 2);
//     }

//     // [1,2,3,4] => (4/2-1, 4/2) => (1,2)
//     return (list.get(size / 2 - 1) + list.get(size / 2)) / 2.0;
//   }
// }

// Approach 2: Heap
class MedianFinder {
  PriorityQueue<Integer> left;
  PriorityQueue<Integer> right;

  public MedianFinder() {
    // step 1: create heaps
    // left heap (max heap) maintains smaller elements and
    // right heap (min heap) maintains larger elements
    left = new PriorityQueue<>(Collections.reverseOrder());
    right = new PriorityQueue<>();
  }

  public void addNum(int num) {
    // Step 2: Choose heap
    // if left empty OR num ≤ left.top => Add to left (smaller values)
    // else Add to right (larger values)
    if (left.size() == 0 || left.peek() >= num) {
      left.offer(num);
    } else {
      right.offer(num);
    }

    // step 3: rebalance immediately ((|size(left) - size(right)| ≤ 1))
    // Condition for constant time median retrieval:
    // The left heap is maintained with the higher size (at most one extra element).
    // if still left is greater, push to right
    if (left.size() - 1 > right.size()) {
      right.offer(left.poll());
    } else if (left.size() < right.size()) {
      // else take from right and put to left
      left.offer(right.poll());
    }
  }

  public double findMedian() {
    // step 4:
    // If sizes equal: median = (max(left) + min(right)) / 2
    // If sizes differ: median = top of larger heap
    if (left.size() == right.size()) {
      return (left.peek() + right.peek()) / 2.0;
    } else if (left.size() > right.size()) {
      return left.peek();
    } else {
      return right.peek();
    }
  }
}

public class Solution {
  public static void main(String[] args) {
    MedianFinder mf = new MedianFinder();
    mf.addNum(1);
    mf.addNum(2);
    System.out.println(mf.findMedian());
    mf.addNum(3);
    System.out.println(mf.findMedian());
  }
}

🔑 Key: Max-heap left, min-heap right. Balance sizes. Tops = median boundary. Add O(log n), median O(1). ✨


Happy practicing! 🎯💪✨

Note: This problem demonstrates the classic two-heap pattern for maintaining median in streaming data! You learn: (1) WHY two heaps (only need boundary values), (2) WHY max-heap for left and min-heap for right (efficient access to median), (3) Balance invariant maintenance, (4) Pattern appears in many streaming/online algorithm problems. The detailed code walkthroughs show exact heap states at each step - true mastery through visualization! 💪✨🏆