283. K Closest Points to Origin
π LeetCode Problem: 973. K Closest Points to Origin
π Difficulty: Medium
π·οΈ Topics: Array, Math, Divide and Conquer, Geometry, Sorting, Heap (Priority Queue), Quickselect
Problem Statement
Given an array of points where points[i] = [xi, yi] represents a point on the X-Y plane and an integer k, return the k closest points to the origin (0, 0).
The distance between two points on the X-Y plane is the Euclidean distance (i.e., β((x1 - x2)Β² + (y1 - y2)Β²)).
You may return the answer in any order. The answer is guaranteed to be unique (except for the order that it is in).
Example 1:
Input: points = [[1,3],[-2,2]], k = 1
Output: [[-2,2]]
Explanation:
The distance between (1, 3) and the origin is sqrt(10).
The distance between (-2, 2) and the origin is sqrt(8).
Since sqrt(8) < sqrt(10), (-2, 2) is closer to the origin.
We only want the closest k = 1 points from the origin, so the answer is just [[-2,2]].
Example 2:
Input: points = [[3,3],[5,-1],[-2,4]], k = 2
Output: [[3,3],[-2,4]]
Explanation: The answer [[-2,4],[3,3]] would also be accepted.
Constraints:
- 1 <= k <= points.length <= 10^4
- -10^4 <= xi, yi <= 10^4
π Understanding the Problem
CRITICAL: Distance Formula Clarification
Question: Why is the formula xΒ² + yΒ² and not (xβ-xβ)Β² + (yβ-yβ)Β²?
GENERAL distance formula between TWO arbitrary points:
Point 1: (xβ, yβ)
Point 2: (xβ, yβ)
Distance = β((xβ-xβ)Β² + (yβ-yβ)Β²)
But in THIS problem, one point is ALWAYS the origin (0, 0)!
Distance from point (x, y) to ORIGIN (0, 0):
Applying the general formula:
Distance = β((x-0)Β² + (y-0)Β²)
= β(xΒ² + yΒ²)
Why the simplification?
(x-0) = x, so (x-0)Β² = xΒ²
(y-0) = y, so (y-0)Β² = yΒ²
The origin coordinates (0, 0) subtract to nothing!
Examples demonstrating the simplification:
Point (3, 4) to origin (0, 0):
Full formula: β((3-0)Β² + (4-0)Β²) = β(9 + 16) = β25 = 5
Simplified: β(3Β² + 4Β²) = β25 = 5 β
Point (-2, 2) to origin (0, 0):
Full formula: β((-2-0)Β² + (2-0)Β²) = β(4 + 4) = β8 β 2.83
Simplified: β((-2)Β² + 2Β²) = β8 β 2.83 β
Point (5, 0) to origin (0, 0):
Simplified: β(5Β² + 0Β²) = β25 = 5 β
In code:
int dist = x * x + y * y;
This IS the formula (x-0)Β² + (y-0)Β²
The -0 disappears, leaving just xΒ² + yΒ²!
Why we can skip sqrt:
For COMPARISON only, we don't need the actual distance value!
Mathematical property (sqrt is monotonic increasing):
βa < βb βΊ a < b
So comparing distances:
β8 < β10 is equivalent to 8 < 10
We can compare SQUARED distances directly:
xβΒ² + yβΒ² vs xβΒ² + yβΒ²
Example:
Point A (1, 3): distΒ² = 1Β² + 3Β² = 1 + 9 = 10
Point B (-2, 2): distΒ² = (-2)Β² + 2Β² = 4 + 4 = 8
Compare squared: 8 < 10
Therefore: B is closer! β
No need to compute β8 β 2.83 and β10 β 3.16!
Benefits:
β Faster computation (no expensive sqrt operation)
β Exact comparison (no floating point errors)
β Simpler code
Key Observations:
1. Distance to origin SIMPLIFIES
General: β((x-xβ)Β² + (y-yβ)Β²)
To origin: β(xΒ² + yΒ²) [because origin is (0,0)]
2. Can skip sqrt for comparison
Compare xΒ² + yΒ² directly
Saves computation time!
3. k closest = k smallest distances
Similar to "Kth Largest Element" pattern!
4. Order doesn't matter in result
[[-2,2], [1,0]] and [[1,0], [-2,2]] both valid
π The Natural Thinking Process
When you first see this problem:
Question: Find k closest points to origin
First Thought: "Calculate distances, sort, take first k"
- Compute distance for each point
- Sort by distance
- Return first k points
Is this optimal? NO (sorts all!), but it WORKS!
Let's code this first to understand.
The Evolution:
BRUTE FORCE THINKING:
"Calculate distances β Sort all β Take k"
Time: O(n log n) for sorting
Problem: Sorting ALL when only need k!
β¬
REALIZATION 1:
"I don't need ALL sorted, just k smallest!"
"This is exactly like Problem 278 (Top K Frequent)!"
"And Problem 281 (Kth Largest in Stream)!"
"Use heap of size k!"
β¬
OPTIMIZATION 1: Heap of Size K
"Keep heap of k closest"
"But... min-heap or max-heap?"
Time: O(n log k) - better when k << n!
β¬
CRITICAL QUESTION:
"Closest = smallest distance"
"So min-heap for smallest, right?"
"NO! Need max-heap! Let me understand why..."
β¬
OPTIMIZATION 2: Max-Heap of Size K
"Max-heap tracks k smallest distances!"
"Top = largest of k smallest = boundary!"
"Easy to evict farthest when find closer!"
Time: O(n log k) β¨
β¬
REALIZATION 2:
"Can I do even better?"
"Quickselect for O(n) average!"
β¬
OPTIMIZATION 3: Quickselect
"Partition-based selection"
Time: O(n) average - OPTIMAL! β¨
π― Approach 1: Brute Force - Sort All Points β
The First Natural Solution
The Thought Process:
Step 1: Calculate distance for each point
distanceΒ² = xΒ² + yΒ² (no need for sqrt)
Step 2: Sort points by distance (ascending)
Step 3: Return first k points
Simple and straightforward!
Visual Tracking - Complete Example
Input: points = [[1,3], [-2,2], [2,2]], k = 2
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
STEP 1: Calculate Distances (squared, no sqrt needed)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Point [1, 3]:
distanceΒ² = 1Β² + 3Β² = 1 + 9 = 10
(This is (1-0)Β² + (3-0)Β² since origin is (0,0))
Point [-2, 2]:
distanceΒ² = (-2)Β² + 2Β² = 4 + 4 = 8
Point [2, 2]:
distanceΒ² = 2Β² + 2Β² = 4 + 4 = 8
Points with distances:
[[1,3], 10]
[[-2,2], 8]
[[2,2], 8]
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
STEP 2: Sort by Distance (Ascending)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Before sort:
[[1,3]=10, [-2,2]=8, [2,2]=8]
After sort (by distance ascending):
[[-2,2]=8, [2,2]=8, [1,3]=10]
βββββclosestββββββ βfartherβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
STEP 3: Take First K
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
k = 2
Take first 2 points:
[[-2,2], [2,2]]
Result: [[-2,2], [2,2]] β
Verification:
Both have distanceΒ² = 8 (closest)
Point [1,3] has distanceΒ² = 10 (farther)
Correct! β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation
import java.util.*;
/**
* Brute Force: Sort all points
* Time: O(n log n)
* Space: O(1) auxiliary
*/
class Solution {
public int[][] kClosest(int[][] points, int k) {
// Sort by distance to origin
Arrays.sort(points, (a, b) -> {
int distA = a[0] * a[0] + a[1] * a[1];
int distB = b[0] * b[0] + b[1] * b[1];
return Integer.compare(distA, distB); // Safe from overflow
});
// Return first k points
return Arrays.copyOfRange(points, 0, k);
}
}
β° Time: O(n log n) - Sorting dominates
πΎ Space: O(1) auxiliary
Why This is Inefficient:
Problem: Sorts ALL n points just to get k!
Example: n = 10,000, k = 10
Must sort all 10,000 points
Only need 10 closest
Sorted 9,990 unnecessarily! Wasteful!
Can we avoid full sort?
π‘ The First AHA Moment
OBSERVATION:
"I'm sorting ALL points just to get k closest!"
"This is EXACTLY like Problem 278 (Top K Frequent)!"
"And Problem 281 (Kth Largest in Stream)!"
Pattern recognition:
Top K Frequent β Use min-heap of size k
Kth Largest β Use min-heap of size k
K Closest β Use... heap of size k!
But wait - which heap?
Closest = smallest distance
So k closest = k smallest distances
Should I use min-heap for smallest?
Let me think carefully about this...
π― Approach 2: Max-Heap of Size K (Optimal for Most Cases) βββ
The Better Solution
The Evolution of Thought:
Brute Force: Sort all
β¬
Observation: "Only need k closest, not all sorted!"
β¬
Pattern: "Like Top K problems!"
β¬
Critical Question: "Min-heap or max-heap?"
β¬
Answer: "MAX-HEAP for k smallest distances!"
WHY Max-Heap for K Closest - Complete Textbook Reasoning
The Core Question:
For k CLOSEST points (k smallest distances),
should we use min-heap or max-heap?
Intuition says: "Smallest distances β min-heap!"
But that's WRONG!
Let me prove why MAX-HEAP is correct...
Complete Answer with Proof:
PRINCIPLE 1: Understanding the Goal
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Goal: Track k points with SMALLEST distances (k closest)
Strategy: Keep heap of exactly k elements
- These k elements = k points with smallest distances
- Need to decide: which element is "on top"?
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PRINCIPLE 2: Comparison with K Largest
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Let's compare with a known pattern:
Problem 281: Kth LARGEST element
Goal: Track k largest values
Heap contains: k largest values
Top of heap: Smallest of k largest = kth largest
Heap type: MIN-heap β
Why min-heap?
- Top = smallest of k largest
- When new larger value arrives, evict this smallest
- Easy to find what to evict!
Problem 283: K CLOSEST points
Goal: Track k smallest distances
Heap contains: k smallest distances
Top of heap: Largest of k smallest = kth smallest
Heap type: MAX-heap β
Why max-heap?
- Top = largest of k smallest
- When new smaller value arrives, evict this largest
- Easy to find what to evict!
Pattern: Use "OPPOSITE" heap!
- k largest β MIN-heap
- k smallest β MAX-heap
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PRINCIPLE 3: Max-Heap Strategy Explained
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
With MAX-HEAP of size k:
Invariant maintained:
- Heap contains exactly k points with SMALLEST distances
- Top element = point with LARGEST distance among these k
- Top = "boundary" = kth smallest distance
When new point arrives with distance d:
CASE 1: Heap size < k
Decision: Add unconditionally
Why? Still building to k elements
CASE 2: Heap size = k, and d < top (max)
Observation: New distance smaller than current boundary
Decision: New point is closer than current kth closest
Action: Remove top (farthest of k), add new point
Result: Heap still has k closest β
CASE 3: Heap size = k, and d β₯ top (max)
Observation: New distance β₯ boundary
Decision: New point NOT in k closest
Action: Skip (don't add)
Result: Heap unchanged β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CONCRETE EXAMPLE: Max-Heap in Action
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
k = 2
Current heap distances: [8, 5]
Max-heap structure: top = 8 (larger)
Visualization:
Heap represents: 2 closest points so far
Point with dist 5: closest
Point with dist 8: 2nd closest (farthest of 2 closest)
Top = 8 = "boundary"
Scenario 1: Add point with distance 6
Compare: 6 vs 8 (top)
6 < 8? YES
Analysis:
New point closer than current boundary (8)
New point should be in top 2!
Current point with dist 8 should be evicted
Action:
Remove top (dist 8)
Add new (dist 6)
New heap: [6, 5]
New top: 6
New boundary: 6 β
Scenario 2: Add point with distance 10
Compare: 10 vs 6 (top)
10 < 6? NO
Analysis:
New point farther than boundary
Not in top 2 closest
Action: Skip
Heap: [6, 5] (unchanged) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
WHY NOT Min-Heap? (Proof by Counter-Example)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Attempt: Use MIN-HEAP of size k for k smallest
Min-heap structure:
Top = SMALLEST element in heap
Problem Scenario:
Current heap: [5, 8]
Min-heap: top = 5
Add new point with distance 6:
Compare: 6 vs 5 (top)
6 > 5... now what?
Analysis:
Should we add 6? YES (it's 3rd smallest so far)
Should we remove 5? NO (5 is smaller, keep it!)
Should we remove 8? YES (8 is the largest we have)
Problem:
8 is NOT at the top!
8 is buried somewhere in the heap!
Min-heap gives us minimum, but we need MAXIMUM!
Can't easily access what to evict! β
Fundamental flaw:
Min-heap optimized for: finding minimum
What we need: finding maximum (to evict)
Min-heap is WRONG tool for this job! β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INTUITION: Boundary Visualization
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Think of k closest points forming a circle around origin:
Origin
β’
/|\
/ | \
/ β’ \ β k closest points
β’ββββ’ββββ’
β
boundary point
(kth closest = farthest of k closest)
The boundary point:
- Farthest among k closest
- kth smallest distance
- Decision criterion!
Max-heap keeps boundary point on top!
Any point:
Inside boundary (dist < top) β add to k
Outside boundary (dist β₯ top) β skip
Max-heap top = boundary = decision point! β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
COMPARISON TABLE
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββββββββββ
β Aspect β Min-Heap β Max-Heap β
ββββββββββββββββββββΌβββββββββββββββββββΌβββββββββββββββββββ€
β Top Element β Smallest in heap β Largest in heap β
β For k closest β Closest point β Farthest of k β
β Add decision β Compare with min β Compare with max β
β When to add? β Unclear! β If dist < top β
β What to evict? β Need max (hard!) β Top (easy!) β
β Correctness β β WRONG β β CORRECT β
ββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SUMMARY: Complete Logic
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Goal: Find k smallest distances (k closest points)
Solution: Max-heap of size k
Why max-heap?
1. Contains k smallest distances
2. Top = largest of these k
3. Top = kth smallest = boundary
4. Easy check: if new < top β closer than boundary
5. Easy evict: remove top (farthest of k)
Algorithm:
for each point:
if heap.size < k:
add point
else if distance < heap.top:
remove heap.top
add point
else:
skip
This is SAME pattern as:
- Problem 281: Min-heap for k largest
- Problem 278: Min-heap for k frequent
Universal pattern: "Opposite heap for boundary"! β¨
Visual Tracking - Complete Example
Input: points = [[1,3], [-2,2], [2,2], [5,1]], k = 2
Distances:
[1,3]: 10
[-2,2]: 8
[2,2]: 8
[5,1]: 26
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
BUILD MAX-HEAP (SIZE k=2)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Iteration 1: Process [1,3], dist=10
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Current heap: []
Heap size: 0 < k=2
Decision: Add unconditionally (building to k)
Action: Add [1,3]
Heap state:
[[1,3]]
dist=10
Size: 1
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Iteration 2: Process [-2,2], dist=8
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Current heap: [[1,3]=10]
Size: 1 < k=2
Decision: Add unconditionally
Action: Add [-2,2]
Heap state (max-heap):
[[1,3]=10, [-2,2]=8]
β top (max)
Size: 2 = k (full now!)
Explanation:
Max-heap property: parent β₯ children
10 > 8, so [1,3] on top
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Iteration 3: Process [2,2], dist=8
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Current heap: [[1,3]=10, [-2,2]=8]
Top: 10
Size: 2 = k
Compare new distance with top:
8 vs 10
8 < 10? YES
Analysis:
New point [2,2] has distance 8
Top has distance 10
8 < 10 β new point CLOSER than farthest!
New point should be in k closest
Old farthest [1,3] should be evicted
Decision: Replace top
Action:
Step 1: poll() removes top [1,3]
Heap: [[-2,2]=8]
Step 2: offer([2,2])
Heap: [[-2,2]=8, [2,2]=8]
Size: 2
Heap state:
Both distances equal (8)
Either can be on top
Max-heap: [[2,2]=8, [-2,2]=8] or [[-2,2]=8, [2,2]=8]
Top = 8
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Iteration 4: Process [5,1], dist=26
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Current heap: top = 8
Size: 2 = k
Compare new distance with top:
26 vs 8
26 < 8? NO
Analysis:
New point [5,1] has distance 26
Top has distance 8
26 > 8 β new point FARTHER than boundary!
New point NOT in k closest
Decision: Skip (don't add)
Action: None
Heap: [[-2,2]=8, [2,2]=8] (unchanged)
Size: 2
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
RESULT
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Final heap: [[-2,2], [2,2]]
Verification:
Both have distance 8
These are the 2 closest to origin! β
Operations performed:
- 4 points processed
- 3 additions to heap
- 1 removal from heap
- Time: O(4 log 2) = O(4)
Compare with sorting:
- Sort 4 points: O(4 log 4) β O(8)
- Heap is better even for small n!
Key observations:
β Heap never exceeded size k
β Top always = farthest of k closest
β Easy decision: compare with top
β Efficient eviction: just poll()
Implementation
import java.util.*;
/**
* Max-Heap of size k
* Time: O(n log k)
* Space: O(k)
*/
class Solution {
public int[][] kClosest(int[][] points, int k) {
// Max-heap by distance (farthest on top)
PriorityQueue<int[]> maxHeap = new PriorityQueue<>((a, b) -> {
int distA = a[0] * a[0] + a[1] * a[1];
int distB = b[0] * b[0] + b[1] * b[1];
return Integer.compare(distB, distA); // Max-heap, safe from overflow
});
for (int[] point : points) {
maxHeap.offer(point);
// Keep heap size β€ k
if (maxHeap.size() > k) {
maxHeap.poll(); // Remove farthest
}
}
// Extract k closest points
int[][] result = new int[k][2];
for (int i = 0; i < k; i++) {
result[i] = maxHeap.poll();
}
return result;
}
public static void main(String[] args) {
Solution sol = new Solution();
int[][] result = sol.kClosest(new int[][]{{1,3},{-2,2}}, 1);
System.out.println(Arrays.deepToString(result)); // [[-2,2]]
}
}
β° Time: O(n log k)
πΎ Space: O(k)
Why This is Better:
β O(n log k) vs O(n log n)
β Only k points in heap vs n sorted
β Much better when k << n
Example: n = 10,000, k = 10
Sort: 10,000 Γ log(10,000) β 130,000 ops
Heap: 10,000 Γ log(10) β 33,000 ops
4x faster! β¨
Space: 10 vs 10,000 points
100x less memory!
π‘ The Second AHA Moment
OBSERVATION:
Max-heap is O(n log k), better than O(n log n)!
But can we do even better?
From Problem 275 (Kth Largest Element):
"Quickselect achieves O(n) average time!"
Can I use quickselect here too?
YES! Quickselect works for kth smallest element!
When is it better?
Max-heap: O(n log k)
Quickselect: O(n) average
When k is large (k β n/2):
log k becomes significant
O(n log k) approaches O(n log n)
Quickselect's O(n) is better!
Let's understand how quickselect works for this problem...
π― Approach 3: Quickselect (Optimal for Large K) βββ
The Optimal Solution
The Evolution of Thought:
Sort all: O(n log n)
β¬
Max-heap: O(n log k)
β¬
Observation: "Quickselect for O(n) average!"
β¬
Question: "How does it work?"
β¬
Optimal: "Partition-based selection!"
Understanding Quickselect - Complete Explanation
How does quickselect work here?
QUICKSELECT CONCEPT:
Similar to quicksort, but ONLY recurse on one side!
Goal: Partition array so that:
- First k positions contain k closest points
- Remaining positions contain farther points
- Don't need full sort! Just need correct k in first k positions
Strategy (like quicksort partitioning):
1. Pick pivot point
2. Partition around pivot distance:
- Points closer than pivot β left
- Points farther than pivot β right
3. Check pivot's final position:
- If position = k β DONE! First k are k closest
- If position < k β need more, recurse RIGHT
- If position > k β have too many, recurse LEFT
Average case: O(n)
T(n) = n + T(n/2)
= n + n/2 + n/4 + ...
β 2n = O(n) β
Much better than O(n log k) when k is large! β¨
WHY Quickselect Works - Detailed Reasoning
The Core Question:
How does partitioning help find k closest?
Why don't we need full sort?
Answer Through Reasoning:
PRINCIPLE: Partial Ordering is Sufficient
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
For k closest points, we need:
- k points with smallest distances in first k positions
- Order within first k doesn't matter!
- Order of remaining points doesn't matter!
Full sort gives:
[1st closest, 2nd, 3rd, ..., kth, k+1th, ..., nth]
Time: O(n log n)
Quickselect gives:
[k closest in any order] [remaining in any order]
Time: O(n) average
We only need the first part! β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
HOW PARTITIONING WORKS
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Partition operation:
1. Choose pivot point with distance d_pivot
2. Rearrange array:
[points with dist < d_pivot] [pivot] [points with dist > d_pivot]
3. Pivot ends at position p
After partition:
- All points at positions [0, p-1] are closer than pivot
- Point at position p is the pivot
- All points at positions [p+1, n-1] are farther than pivot
Check position p:
If p = k:
First k positions have k closest! β
DONE!
If p < k:
We have p closest so far
Need k - p more
They must be in RIGHT partition
Recurse on [p+1, right]
If p > k:
We have too many (p points)
The k closest are among first p
Recurse on [left, p-1]
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
WHY O(n) Average Time?
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Best case: Pivot always splits array in half
Recursion:
T(n) = partition time + recurse on half
= O(n) + T(n/2)
= n + n/2 + n/4 + n/8 + ...
= n Γ (1 + 1/2 + 1/4 + ...)
= n Γ 2
= O(n) β
Geometric series sums to constant!
With random pivot, average case = O(n)
Worst case: O(nΒ²) (sorted array, always bad pivot)
But with random pivot, probability is low
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
WHEN IS QUICKSELECT BETTER THAN HEAP?
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Comparison:
Max-heap: O(n log k)
Quickselect: O(n) average
When k is small (k << n):
log k β log(10) β 3
O(n log k) β O(3n) β O(n)
Both similar, heap is simpler
When k is large (k β n/2):
log k β log(n/2) β log n - 1
O(n log k) β O(n log n)
Quickselect O(n) much better! β
Rule of thumb:
k < 100: Use max-heap (simpler)
k > n/2: Use quickselect (faster)
Space:
Max-heap: O(k)
Quickselect: O(1)
Quickselect better for space! β
Visual Tracking - Complete Example
Input: points = [[1,3], [-2,2], [2,2], [5,1]], k = 2
Distances:
[1,3]: 10
[-2,2]: 8
[2,2]: 8
[5,1]: 26
Goal: Rearrange so first k=2 positions have 2 closest points
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INITIAL STATE
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Array: [[1,3], [-2,2], [2,2], [5,1]]
Dist: 10 8 8 26
Index: 0 1 2 3
Target: First 2 positions should have 2 closest
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PARTITION 1: Range [0, 3], pivot = last element
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Pivot: [5,1] at index 3, distance = 26
Partition process:
i = 0 (tracks position for smaller elements)
j = 0: [1,3], dist = 10
10 < 26? YES β swap with position i=0
Array: [[1,3], [-2,2], [2,2], [5,1]]
i becomes 1
j = 1: [-2,2], dist = 8
8 < 26? YES β swap with position i=1
Array: [[1,3], [-2,2], [2,2], [5,1]]
i becomes 2
j = 2: [2,2], dist = 8
8 < 26? YES β swap with position i=2
Array: [[1,3], [-2,2], [2,2], [5,1]]
i becomes 3
Place pivot at position i=3:
Swap [5,1] with [5,1] (no change)
Array after partition:
[[1,3], [-2,2], [2,2], [5,1]]
βββββββcloser than 26βββ β26β
positions 0,1,2 pos 3
Pivot position: p = 3
Check: p vs k
3 > 2 β Too many! k closest are in LEFT
Recurse on [0, 2]
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PARTITION 2: Range [0, 2], pivot = last element
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Array segment: [[1,3], [-2,2], [2,2]]
10 8 8
Pivot: [2,2] at index 2, distance = 8
Partition process:
i = 0
j = 0: [1,3], dist = 10
10 < 8? NO β don't swap
i stays 0
j = 1: [-2,2], dist = 8
8 < 8? NO β don't swap
i stays 0
Place pivot at position i=0:
Swap [1,3] and [2,2]
Array: [[2,2], [-2,2], [1,3], [5,1]]
8 8 10 26
Pivot position: p = 0
Check: p vs k
0 < 2 β Need more!
Recurse on [1, 2]
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PARTITION 3: Range [1, 2], pivot = last element
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Array segment: [[-2,2], [1,3]]
8 10
Pivot: [1,3] at index 2, distance = 10
Partition process:
i = 1
j = 1: [-2,2], dist = 8
8 < 10? YES β swap with position i=1
Array: [[2,2], [-2,2], [1,3], [5,1]]
i becomes 2
Place pivot at position i=2:
Swap [1,3] with [1,3] (no change)
Pivot position: p = 2
Check: p vs k
2 = 2 β Perfect! DONE! β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
FINAL STATE
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Array: [[2,2], [-2,2], [1,3], [5,1]]
ββk=2ββ ββrestββ
First k=2 positions: [[2,2], [-2,2]]
Both have distance 8 β
Verification:
Distances: [8, 8, 10, 26]
First 2 are smallest! β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
COMPLEXITY ANALYSIS
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Partition costs:
Partition 1: Process 4 elements β O(4)
Partition 2: Process 3 elements β O(3)
Partition 3: Process 2 elements β O(2)
Total: O(4 + 3 + 2) = O(9) β O(n)
General formula:
n + n/2 + n/4 + ... β 2n = O(n) β
Compare with other approaches:
Sort: O(4 log 4) β O(8)
Heap: O(4 log 2) β O(4 Γ 1) = O(4)
Quickselect: O(9)
For small n, all similar.
For large n with large k:
n = 10,000, k = 5,000
Heap: 10,000 Γ log(5,000) β 120,000 ops
Quickselect: β 20,000 ops
6x faster! π
Implementation
import java.util.*;
/**
* Quickselect (Optimal for large k)
* Time: O(n) average, O(nΒ²) worst
* Space: O(1)
*/
class Solution {
public int[][] kClosest(int[][] points, int k) {
// Partition so first k positions have k closest
quickselect(points, 0, points.length - 1, k);
// Return first k points
return Arrays.copyOfRange(points, 0, k);
}
/**
* Partition array so first k positions have k smallest distances
*
* @param points - array of points
* @param left - left boundary of range to partition
* @param right - right boundary of range to partition
* @param k - number of closest points needed
*/
private void quickselect(int[][] points, int left, int right, int k) {
// Base case: range has 0 or 1 element
if (left >= right) return;
// Partition around pivot
int pivotIndex = partition(points, left, right);
// Check pivot position
if (pivotIndex == k) {
// Perfect! First k positions have k closest
return;
} else if (pivotIndex < k) {
// Need more points, recurse right
quickselect(points, pivotIndex + 1, right, k);
} else {
// Have too many, recurse left
quickselect(points, left, pivotIndex - 1, k);
}
}
/**
* Partition points around pivot
* Points closer than pivot go left, farther go right
*
* @return final position of pivot
*/
private int partition(int[][] points, int left, int right) {
// Choose last element as pivot
int[] pivot = points[right];
int pivotDist = distance(pivot);
// i tracks next position for smaller elements
int i = left;
// Scan through range
for (int j = left; j < right; j++) {
// If current point closer than pivot
if (distance(points[j]) < pivotDist) {
// Swap to left section
swap(points, i, j);
i++;
}
}
// Place pivot in final position
swap(points, i, right);
// Return pivot's final position
return i;
}
/**
* Calculate squared distance to origin
* No sqrt needed for comparison
*/
private int distance(int[] point) {
return point[0] * point[0] + point[1] * point[1];
}
/**
* Swap two points in array
*/
private void swap(int[][] points, int i, int j) {
int[] temp = points[i];
points[i] = points[j];
points[j] = temp;
}
public static void main(String[] args) {
Solution sol = new Solution();
int[][] result = sol.kClosest(new int[][]{{3,3},{5,-1},{-2,4}}, 2);
System.out.println(Arrays.deepToString(result));
}
}
β° Time: O(n) average, O(nΒ²) worst
πΎ Space: O(1) in-place
Why This is Optimal (for large k):
β O(n) average vs O(n log k) heap
β O(1) space vs O(k) space
β In-place partitioning
β No extra data structures
When to use:
Small k (< 100): Max-heap simpler, similar speed
Large k (> n/2): Quickselect faster
Space critical: Quickselect (O(1))
Simplicity preferred: Max-heap easier
Example advantage (n=10,000, k=5,000):
Heap: 10,000 Γ log(5,000) β 120,000 ops
Quickselect: β 20,000 ops
6x faster! π
π Approach Comparison - The Complete Growth Journey
ββββββββββββββββ¬ββββββββββββββ¬βββββββββββ¬βββββββββββββββββββββββ
β Approach β Time β Space β Key Insight β
ββββββββββββββββΌββββββββββββββΌβββββββββββΌβββββββββββββββββββββββ€
β Sort all β O(n log n) β O(1) β Simple but wasteful β
β Max-heap k β O(n log k) β O(k) β Only track k β
β Quickselect β O(n) avg β O(1) β Partition-based β
ββββββββββββββββ΄ββββββββββββββ΄βββββββββββ΄βββββββββββββββββββββββ
THE COMPLETE LEARNING PROGRESSION:
Level 1: Brute Force
Thought: "Sort all by distance, take k"
Works? YES β
Optimal? NO β
Time: O(n log n)
Problem: Sorts all when only need k
Learning: Understand problem, get working solution
Level 2: Pattern Recognition
Observation: "Only need k closest!"
Pattern: "Like Top K problems!"
Connection: Problems 275, 278, 281
Key: "Closest = smallest distance"
Insight: Use heap-based approach!
Level 3: Max-Heap Solution
Implementation: Max-heap of size k
WHY max?: Top = farthest of k closest = boundary
Result: O(n log k) β
Better: Much better when k << n!
Trade-off: O(k) space vs O(1)
Level 4: Further Optimization
Observation: "Can use quickselect!"
Pattern: "Like Problem 275!"
Idea: "Partition for O(n) average!"
When better?: k large (k β n/2)
Level 5: Optimal Solution
Implementation: Quickselect
Result: O(n) average β
Perfect: Can't do better asymptotically! β¨
Growth: Learned when each approach is best!
CONCRETE EXAMPLE (n=10,000, k=100):
Sort all:
10,000 Γ log(10,000) β 130,000 ops
Baseline
Max-heap:
10,000 Γ log(100) β 67,000 ops
2x faster than sort! β¨
Quickselect:
10,000 + 5,000 + 2,500 + ... β 20,000 ops
6.5x faster than sort!
3x faster than heap! π
When k=5,000 (large k):
Heap: 10,000 Γ log(5,000) β 120,000 ops (close to sort!)
Quickselect: Still β 20,000 ops
Quickselect 6x better! β¨
π‘ Key Learnings - Your Complete Growth
WHAT YOU LEARNED:
1. DISTANCE FORMULA CLARIFICATION:
β General: β((xβ-xβ)Β² + (yβ-yβ)Β²)
β To origin: β(xΒ² + yΒ²) [since origin is (0,0)]
β In code: x*x + y*y [since (x-0)=x, (y-0)=y]
2. DISTANCE OPTIMIZATION:
β Don't need sqrt for comparison
β βa < βb βΊ a < b (monotonic property)
β Compare xΒ² + yΒ² directly
β Faster and more accurate!
3. WHY MAX-HEAP FOR K SMALLEST:
β Closest = smallest distance
β Track k smallest distances
β Top = largest of k smallest = boundary
β Easy eviction: remove top when find closer
β O(log k) per operation
4. WHY NOT MIN-HEAP:
β Top = smallest, but need to evict largest!
β Can't easily find what to remove
β Wrong data structure for this problem
5. PATTERN: "Opposite Heap for Boundary"
β k largest β min-heap (boundary on small side)
β k smallest β max-heap (boundary on large side)
β Top always = boundary for easy comparison
β Same pattern as Problems 275, 278, 281!
6. QUICKSELECT UNDERSTANDING:
β Partial ordering sufficient
β Partition-based selection
β O(n) average time
β Better for large k
7. WHEN TO USE EACH APPROACH:
β Small k (<100): Max-heap simpler, similar speed
β Large k (>n/2): Quickselect better
β Need streaming: Max-heap (can't quickselect stream)
β One-time query: Either (based on k size)
β Simple code: Sort acceptable for interviews
8. INTEGER OVERFLOW SAFETY:
β Use Integer.compare(distB, distA)
β NOT distB - distA (can overflow!)
β Constraint allows distΒ² up to 200M
β But subtraction can exceed int range
β οΈ Common Mistakes
Mistake 1: Using min-heap for k closest
// β WRONG - Min-heap for k smallest distances
PriorityQueue<int[]> minHeap = new PriorityQueue<>((a,b) ->
Integer.compare(distance(a), distance(b)));
// Why wrong?
// Min-heap gives smallest, but we need to evict LARGEST
// Can't easily find what to remove!
// β CORRECT - Max-heap for k smallest distances
PriorityQueue<int[]> maxHeap = new PriorityQueue<>((a,b) -> {
int distA = a[0]*a[0] + a[1]*a[1];
int distB = b[0]*b[0] + b[1]*b[1];
return Integer.compare(distB, distA); // Max-heap
});
Mistake 2: Computing sqrt unnecessarily
// β WRONG - Expensive and unnecessary
double dist = Math.sqrt(x*x + y*y);
// β CORRECT - Compare squared distances
int distSq = x*x + y*y;
Mistake 3: Integer overflow in comparator
// β DANGEROUS - Can overflow!
return distB - distA;
// Why dangerous?
// distA, distB each up to 200,000,000 (fits in int)
// But distB - distA can overflow!
// Example: distB = 200M, distA = -200M β overflow
// β SAFE - Uses Integer.compare
return Integer.compare(distB, distA);
Mistake 4: Wrong understanding of distance formula
// β WRONG - Treating as general two-point distance
int dist = (x2-x1)*(x2-x1) + (y2-y1)*(y2-y1);
// β CORRECT - One point is origin (0, 0)!
int dist = x*x + y*y; // This IS (x-0)Β² + (y-0)Β²
Mistake 5: Not maintaining heap size
// β WRONG - Heap grows unbounded
maxHeap.offer(point);
// β CORRECT - Maintain size β€ k
maxHeap.offer(point);
if (maxHeap.size() > k) {
maxHeap.poll();
}
Mistake 6: Wrong quickselect termination
// β WRONG - Doesn't handle pivot = k
if (pivotIndex < k) {
quickselect(left, pivotIndex - 1, k);
}
// β CORRECT - Check all three cases
if (pivotIndex == k) {
return; // Done!
} else if (pivotIndex < k) {
quickselect(pivotIndex + 1, right, k);
} else {
quickselect(left, pivotIndex - 1, k);
}
π― Pattern Recognition
Problem Type: K Smallest/Closest Selection
Core Pattern: Max-Heap for K Smallest
When to Apply:
β Find k smallest/closest elements
β Distance-based selection
β Don't need sorted order
β Can optimize distance calculation
Recognition Keywords:
- "k closest"
- "k smallest"
- "distance to origin"
- "nearest points"
- "k points with smallest"
Similar Problems:
- Kth Largest Element (LC 215) - Min-heap for k largest
- Top K Frequent (LC 347) - Min-heap for k frequent
- Kth Largest in Stream (LC 703) - Min-heap for k largest
- Find K Pairs with Smallest Sums (LC 373)
- Kth Smallest in Sorted Matrix (LC 378)
Key Components:
ββββββββββββββββββββββββββββββββββββββββββββββ
β Distance: xΒ² + yΒ² (no sqrt needed) β
β Max-Heap: For k smallest distances β
β Top: Largest of k smallest = boundary β
β Quickselect: O(n) average alternative β
β Pattern: "Opposite heap for boundary" β
ββββββββββββββββββββββββββββββββββββββββββββββ
Universal Pattern Across "Top K" Problems:
Track k LARGEST β use MIN-heap
Track k SMALLEST β use MAX-heap
Why? Keep boundary at top for easy comparison!
π Quick Revision Notes
π― Core Concept:
Find k closest points to origin. Distance to origin: β(xΒ² + yΒ²) [not β((xβ-xβ)Β²+(yβ-yβ)Β²) because origin is (0,0)]. Skip sqrt: Compare xΒ²+yΒ² directly. Sort: O(n log n) - wasteful. Max-heap of k: Track k smallest distances, top = farthest of k closest = boundary. NOT min-heap: Can't decide eviction! Quickselect: O(n) average, partition-based, best for large k.
β‘ Quick Implementation:
import java.util.Arrays;
import java.util.PriorityQueue;
public class Solution {
public int[][] kClosest(int[][] a, int k) {
// return naive(a, k);
// return maxHeap(a, k);
quickselect(a, 0, a.length - 1, k);
// Return first k points
return Arrays.copyOfRange(a, 0, k);
}
private void quickselect(int[][] points, int left, int right, int k) {
// Base case: range has 0 or 1 element
if (left >= right)
return;
// Partition around pivot
int pivotIndex = partition(points, left, right);
// Check pivot position
if (pivotIndex == k) {
// Perfect! First k positions have k closest
return;
} else if (pivotIndex < k) {
// Need more points, recurse right
quickselect(points, pivotIndex + 1, right, k);
} else {
// Have too many, recurse left
quickselect(points, left, pivotIndex - 1, k);
}
}
private int partition(int[][] points, int left, int right) {
// Choose last element as pivot
int[] pivot = points[right];
int pivotDist = distance(pivot);
// i tracks next position for smaller elements
int i = left;
// Scan through range
for (int j = left; j < right; j++) {
// If current point closer than pivot
if (distance(points[j]) < pivotDist) {
// Swap to left section
swap(points, i, j);
i++;
}
}
// Place pivot in final position
swap(points, i, right);
// Return pivot's final position
return i;
}
private int distance(int[] point) {
return point[0] * point[0] + point[1] * point[1];
}
private void swap(int[][] points, int i, int j) {
int[] temp = points[i];
points[i] = points[j];
points[j] = temp;
}
private int[][] maxHeap(int[][] a, int k) {
// step 1: create max heap with capacity k which
// provides always greatest element when polled.
PriorityQueue<int[]> pq = new PriorityQueue<>((p1, p2) -> {
int distP1FromOrigin = p1[1] * p1[1] + p1[0] * p1[0];
int distP2FromOrigin = p2[1] * p2[1] + p2[0] * p2[0];
return Integer.compare(distP2FromOrigin, distP1FromOrigin);
});
// step 2: process data
for (int[] curr : a) {
pq.offer(curr);
if (pq.size() > k) {
pq.poll();
}
}
// step 3: take first k results
int[][] res = new int[k][2];
for (int i = 0; i < k; i++) {
int[] curr = pq.poll();
res[i][0] = curr[0];
res[i][1] = curr[1];
}
return res;
}
private int[][] naive(int[][] a, int k) {
// step 1: sort the values in array in ascending order
// per distance from origin
// dist between (x,y) from (0,0) => sqrt(y^2+x^2)
// Actual formula is sqrt((y-0)^2+(x-0)^2)
// comparator: we do (a, b) -> a - b => ascending order
Arrays.sort(a, (p1, p2) -> {
int distP1FromOrigin = p1[1] * p1[1] + p1[0] * p1[0];
int distP2FromOrigin = p2[1] * p2[1] + p2[0] * p2[0];
return distP1FromOrigin - distP2FromOrigin;
});
// step 2: take first k results
int[][] res = new int[k][2];
for (int i = 0; i < k; i++) {
res[i][0] = a[i][0];
res[i][1] = a[i][1];
}
return res;
}
public static void main(String[] args) {
Solution s = new Solution();
System.out.println(Arrays.deepToString(s.kClosest(new int[][] { { 1, 3 }, { -2, 2 } }, 1)));
System.out.println(Arrays.deepToString(s.kClosest(new int[][] { { 3, 3 }, { 5, -1 }, { -2, 4 } }, 2)));
}
}
π Key Insights:
- Distance Formula: To origin (0,0) β β(xΒ²+yΒ²) not β((xβ-xβ)Β²+(yβ-yβ)Β²)
- Why Simplifies: (x-0)=x, (y-0)=y, so formula becomes β(xΒ²+yΒ²)
- Why Max-Heap?: Closest=smallest, track k smallest, top=largest of k smallest=boundary β
- Why Not Min?: Top=smallest, but need evict largest! Can't find it easily β
- Integer.compare: Safe from overflow, NOT distB-distA
- No Sqrt: Compare squared distances (monotonic property)
- Quickselect: Partial ordering, O(n) average, best for large k
- Pattern: k largestβmin-heap, k smallestβmax-heap (opposite for boundary)
- Time: Sort O(n log n), Heap O(n log k), Quick O(n) avg
- Space: Sort O(1), Heap O(k), Quick O(1) β
πͺ Memory Aid:
"To origin? Just xΒ²+yΒ²!"
"Closest = smallest! Max-heap for k smallest!"
"Top = boundary = farthest of k closest!" β¨
π§ͺ Edge Cases
Case 1: k = n (all points)
Input: points = [[1,0],[0,1]], k = 2
Output: [[1,0],[0,1]] (any order)
All points returned
Case 2: k = 1 (single closest)
Input: points = [[1,3],[-2,2]], k = 1
Output: [[-2,2]]
Single closest point
Case 3: Equal distances
Input: points = [[1,0],[0,1],[-1,0],[0,-1]], k = 2
All at distance 1 from origin
Any 2 points valid
Case 4: Origin included
Input: points = [[0,0],[1,1]], k = 1
Output: [[0,0]]
Origin has distance 0 (closest possible)
Case 5: Large coordinates
Input: points = [[10000,10000],[-10000,-10000]], k = 1
Both have distanceΒ² = 200,000,000
Either valid (equal distance)
Must use Integer.compare to avoid overflow!
All handled correctly! β
π Complexity Analysis
Approach 1: Sort All
Time: O(n log n)
Calculate distances: O(n)
Sort: O(n log n)
Extract k: O(k)
Total: O(n log n)
Space: O(1) auxiliary
Sorting in-place (if allowed)
Approach 2: Max-Heap of K
Time: O(n log k)
Process n points
Each: offer O(log k) + maybe poll O(log k)
Total: O(n log k)
Space: O(k)
Heap contains at most k elements
When k << n: Much better!
n=10,000, k=10
Sort: O(130,000)
Heap: O(33,000)
4x faster! β¨
Approach 3: Quickselect
Time: O(n) average, O(nΒ²) worst
Average: n + n/2 + n/4 + ... β 2n
Worst: Sorted array, always pick bad pivot
Space: O(1)
In-place partitioning
No extra data structures
When k large: Best choice!
n=10,000, k=5,000
Heap: O(120,000)
Quick: O(20,000)
6x faster! π
Optimal average case! Can't do better than O(n)!
Happy practicing! π―
Note: This problem PERFECTLY demonstrates the "max-heap for k smallest" pattern! You learn: (1) Distance to origin simplifies from β((xβ-xβ)Β²+(yβ-yβ)Β²) to β(xΒ²+yΒ²) because origin is (0,0), (2) WHY max-heap not min-heap (top=boundary=largest of k smallest for easy eviction), (3) WHY Integer.compare not subtraction (overflow safety), (4) Quickselect for O(n) average when k is large, (5) Pattern recognition across all "Top K" problems (opposite heap for boundary tracking). The complete reasoning - distance formula clarification, min vs max heap comparison with proof, quickselect partitioning logic, overflow discussion - builds production-ready problem-solving skills! True mastery! πͺβ¨π