The “Kth Largest Element” problem is a classic. It’s frequently asked in coding interviews, but more importantly, understanding its solutions provides practical insights into algorithms like sorting, heaps, and selection algorithms. It’s not just an academic exercise; think about finding the top N performers, the median value, or filtering critical data points efficiently.
In this post, we’ll break down what the problem entails and explore several common, pragmatic approaches, complete with executable code examples in Python and relevant CLI commands.
What is the Kth Largest Element?
Given an unsorted array of numbers nums
and an integer k
, the goal is to find the k
-th largest element in that array.
Important Note: This means the k
-th largest element in the sorted order, not the k
-th distinct largest element. Duplicates are counted.
Example:
If nums = [3, 2, 1, 5, 6, 4]
and k = 2
:
- Sort
nums
:[1, 2, 3, 4, 5, 6]
- The 1st largest is
6
. - The 2nd largest is
5
. So, the answer is5
.
Let’s dive into the solutions.
Approach 1: Sorting (The Simple Way)
The most straightforward approach is to simply sort the array and then pick the element at the appropriate index.
How it Works
- Sort the array in descending order. The
k
-th largest element will be at indexk-1
. - Alternatively, sort in ascending order. The
k
-th largest element will be at indexn - k
, wheren
is the total number of elements.
Complexity
- Time Complexity: O(N log N), primarily due to the sorting step. This is generally the bottleneck for larger
N
. - Space Complexity: O(1) if an in-place sort is used (e.g., QuickSort, HeapSort) or O(N) if a new array is created (e.g., MergeSort).
Python Example
# sort_and_find.py
def find_kth_largest_sorted(nums: list[int], k: int) -> int:
"""
Finds the k-th largest element by sorting the array.
"""
if not nums or k <= 0 or k > len(nums):
raise ValueError("Invalid input: nums must not be empty, k must be positive and within bounds.")
nums.sort() # Sorts in ascending order
# The k-th largest element is at index len(nums) - k
return nums[len(nums) - k]
# Test cases
test_cases = [
([3, 2, 1, 5, 6, 4], 2, 5),
([3, 2, 3, 1, 2, 4, 5, 5, 6], 4, 4),
([1], 1, 1),
([7, 6, 5, 4, 3, 2, 1], 5, 3), # Expected: 3 (5th largest)
]
for nums, k, expected in test_cases:
result = find_kth_largest_sorted(nums, k)
print(f"Nums: {nums}, k: {k}, Result: {result}, Expected: {expected} {'(PASS)' if result == expected else '(FAIL)'}")
# Example of an edge case handling
try:
find_kth_largest_sorted([], 1)
except ValueError as e:
print(f"\nError for empty array: {e}")
try:
find_kth_largest_sorted([1, 2], 3)
except ValueError as e:
print(f"Error for k out of bounds: {e}")
Output
Nums: [3, 2, 1, 5, 6, 4], k: 2, Result: 5, Expected: 5 (PASS)
Nums: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4, Result: 4, Expected: 4 (PASS)
Nums: [1], k: 1, Result: 1, Expected: 1 (PASS)
Nums: [7, 6, 5, 4, 3, 2, 1], k: 5, Result: 3, Expected: 3 (PASS)
Error for empty array: Invalid input: nums must not be empty, k must be positive and within bounds.
Error for k out of bounds: Invalid input: nums must not be empty, k must be positive and within bounds.
CLI Example (Conceptual)
You can achieve similar results using standard Unix tools, though it’s more for data processing pipelines than algorithm implementation.
# sort_cli.sh
# Define the array elements
NUMS="3 2 1 5 6 4"
K=2
# Find the Kth largest element using sort and awk/head/tail
echo "Array: $NUMS, k: $K"
# Method 1: Sort numerically in reverse, then get the Kth line
echo "$NUMS" | tr ' ' '\n' | sort -nr | head -n "$K" | tail -n 1
Output
Array: 3 2 1 5 6 4, k: 2
5
Note: While simple, O(N log N) might be too slow for very large datasets where only the Kth element is needed, not the full sorted list.
Approach 2: Min-Heap (Priority Queue)
This approach is significantly more efficient when k
is much smaller than N
. It leverages a min-heap (or priority queue) to keep track of the k
largest elements encountered so far.
How it Works
- Initialize a min-heap.
- Iterate through each number in the input array
nums
. - For each number:
- Add the number to the heap.
- If the heap’s size exceeds
k
, remove the smallest element (which is at the root of a min-heap) from the heap.
- After iterating through all numbers, the heap will contain the
k
largest elements from the array, and its root (the smallest element in the heap) will be thek
-th largest element overall.
Why a Min-Heap for Kth Largest?
This is a common point of confusion. We use a min-heap because we want to maintain the k
largest elements. If a new number comes in that is larger than the smallest of the k
elements currently in our heap, it means this new number is among the k
largest we’ve seen. We pop the current smallest (which is no longer among the k
largest) and add the new number. This way, the heap’s minimum element (its root) is always the k
-th largest element.
Complexity
- Time Complexity: O(N log K). We iterate through
N
elements, and each heap operation (push/pop) takes O(log K) time since the heap’s maximum size isK
. This is a significant improvement over O(N log N) whenK << N
. - Space Complexity: O(K), as the heap stores at most
K
elements.
Python Example
Python’s heapq
module provides a min-heap implementation.
import heapq
# min_heap_find.py
def find_kth_largest_heap(nums: list[int], k: int) -> int:
"""
Finds the k-th largest element using a min-heap.
"""
if not nums or k <= 0 or k > len(nums):
raise ValueError("Invalid input: nums must not be empty, k must be positive and within bounds.")
min_heap = []
for num in nums:
heapq.heappush(min_heap, num)
if len(min_heap) > k:
heapq.heappop(min_heap) # Remove the smallest element if heap size exceeds k
return min_heap[0] # The root of the min-heap is the k-th largest element
# Test cases
test_cases = [
([3, 2, 1, 5, 6, 4], 2, 5),
([3, 2, 3, 1, 2, 4, 5, 5, 6], 4, 4),
([1], 1, 1),
([7, 6, 5, 4, 3, 2, 1], 5, 3),
]
for nums, k, expected in test_cases:
result = find_kth_largest_heap(nums, k)
print(f"Nums: {nums}, k: {k}, Result: {result}, Expected: {expected} {'(PASS)' if result == expected else '(FAIL)'}")
# Example of an edge case handling
try:
find_kth_largest_heap([], 1)
except ValueError as e:
print(f"\nError for empty array: {e}")
Output
Nums: [3, 2, 1, 5, 6, 4], k: 2, Result: 5, Expected: 5 (PASS)
Nums: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4, Result: 4, Expected: 4 (PASS)
Nums: [1], k: 1, Result: 1, Expected: 1 (PASS)
Nums: [7, 6, 5, 4, 3, 2, 1], k: 5, Result: 3, Expected: 3 (PASS)
Error for empty array: Invalid input: nums must not be empty, k must be positive and within bounds.
Note: Python’s heapq
also has a convenient nlargest
function which directly implements this logic. It’s often the most pragmatic choice in real-world Python code.
import heapq
# heapq_nlargest.py
def find_kth_largest_nlargest(nums: list[int], k: int) -> int:
"""
Finds the k-th largest element using heapq.nlargest.
"""
if not nums or k <= 0 or k > len(nums):
raise ValueError("Invalid input: nums must not be empty, k must be positive and within bounds.")
return heapq.nlargest(k, nums)[-1] # nlargest returns a list, the last element is the k-th largest
# Test cases (same as above)
test_cases = [
([3, 2, 1, 5, 6, 4], 2, 5),
([3, 2, 3, 1, 2, 4, 5, 5, 6], 4, 4),
([1], 1, 1),
([7, 6, 5, 4, 3, 2, 1], 5, 3),
]
for nums, k, expected in test_cases:
result = find_kth_largest_nlargest(nums, k)
print(f"Nums: {nums}, k: {k}, Result: {result}, Expected: {expected} {'(PASS)' if result == expected else '(FAIL)'}")
Output
Nums: [3, 2, 1, 5, 6, 4], k: 2, Result: 5, Expected: 5 (PASS)
Nums: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4, Result: 4, Expected: 4 (PASS)
Nums: [1], k: 1, Result: 1, Expected: 1 (PASS)
Nums: [7, 6, 5, 4, 3, 2, 1], k: 5, Result: 3, Expected: 3 (PASS)
Approach 3: QuickSelect (Hoare’s Selection Algorithm)
QuickSelect is a selection algorithm that finds the k
-th smallest (or largest) element in an unsorted array. It’s an average-case linear time algorithm, making it one of the most efficient solutions for this problem. It’s a modification of QuickSort.
How it Works
- Partitioning: Choose a pivot element from the array. Rearrange the array such that all elements smaller than the pivot are to its left, and all elements greater than or equal to the pivot are to its right. The pivot is now in its final sorted position.
- Comparison: Let the pivot’s final index be
p_idx
.- If
p_idx
is the target index (target_idx = n - k
fork
-th largest in an ascending array), then the pivot is our answer. - If
p_idx
is less thantarget_idx
, thek
-th largest element must be in the right subarray. Recurse on the right subarray. - If
p_idx
is greater thantarget_idx
, thek
-th largest element must be in the left subarray. Recurse on the left subarray.
- If
This process eliminates a significant portion of the array in each step, leading to the efficient average-case performance.
Complexity
- Time Complexity:
- Average Case: O(N). On average, each partition reduces the problem size by half, leading to
N + N/2 + N/4 + ... = 2N
operations. - Worst Case: O(N^2). If the pivot selection consistently leads to highly unbalanced partitions (e.g., always picking the smallest or largest element), the algorithm degrades to
N + (N-1) + (N-2) + ... = N^2
. Random pivot selection helps mitigate this.
- Average Case: O(N). On average, each partition reduces the problem size by half, leading to
- Space Complexity: O(log N) on average due to recursion stack, O(N) in worst case. QuickSelect can be implemented iteratively to achieve O(1) space.
Python Example
Here’s an implementation of QuickSelect with random pivot selection for robustness. We’ll target the (n - k)
-th smallest element, which is equivalent to the k
-th largest.
import random
# quick_select.py
def find_kth_largest_quickselect(nums: list[int], k: int) -> int:
"""
Finds the k-th largest element using the QuickSelect algorithm.
"""
if not nums or k <= 0 or k > len(nums):
raise ValueError("Invalid input: nums must not be empty, k must be positive and within bounds.")
# Convert k-th largest to (n-k)-th smallest for easier indexing
target_idx = len(nums) - k
def _quick_select(arr: list[int], left: int, right: int) -> int:
if left == right:
return arr[left]
# Choose a random pivot index to prevent worst-case O(N^2) on sorted/reverse-sorted data
pivot_idx = random.randint(left, right)
pivot_value = arr[pivot_idx]
# Move pivot to the end
arr[pivot_idx], arr[right] = arr[right], arr[pivot_idx]
store_idx = left
for i in range(left, right):
if arr[i] < pivot_value:
arr[store_idx], arr[i] = arr[i], arr[store_idx]
store_idx += 1
# Move pivot to its final sorted position
arr[right], arr[store_idx] = arr[store_idx], arr[right]
if store_idx == target_idx:
return arr[store_idx]
elif store_idx < target_idx:
return _quick_select(arr, store_idx + 1, right)
else: # store_idx > target_idx
return _quick_select(arr, left, store_idx - 1)
# We pass a copy because QuickSelect modifies the list in place
return _quick_select(list(nums), 0, len(nums) - 1)
# Test cases
test_cases = [
([3, 2, 1, 5, 6, 4], 2, 5),
([3, 2, 3, 1, 2, 4, 5, 5, 6], 4, 4),
([1], 1, 1),
([7, 6, 5, 4, 3, 2, 1], 5, 3),
([2, 1], 1, 2),
([2, 1], 2, 1),
([0,0,0,1,2,3], 3, 1)
]
for nums, k, expected in test_cases:
result = find_kth_largest_quickselect(nums, k)
print(f"Nums: {nums}, k: {k}, Result: {result}, Expected: {expected} {'(PASS)' if result == expected else '(FAIL)'}")
# Example of an edge case handling
try:
find_kth_largest_quickselect([], 1)
except ValueError as e:
print(f"\nError for empty array: {e}")
Output
Nums: [3, 2, 1, 5, 6, 4], k: 2, Result: 5, Expected: 5 (PASS)
Nums: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4, Result: 4, Expected: 4 (PASS)
Nums: [1], k: 1, Result: 1, Expected: 1 (PASS)
Nums: [7, 6, 5, 4, 3, 2, 1], k: 5, Result: 3, Expected: 3 (PASS)
Nums: [2, 1], k: 1, Result: 2, Expected: 2 (PASS)
Nums: [2, 1], k: 2, Result: 1, Expected: 1 (PASS)
Nums: [0, 0, 0, 1, 2, 3], k: 3, Result: 1, Expected: 1 (PASS)
Error for empty array: Invalid input: nums must not be empty, k must be positive and within bounds.
Note: Implementing QuickSelect correctly can be tricky, especially the partitioning step. The random.randint
pivot selection helps ensure average-case performance even with adversarial inputs. It modifies the list in place, so if you need the original list untouched, pass a copy.
Approach 4: Using collections.Counter
(For Specific Integer Ranges/Frequencies)
If the array contains integers within a relatively small, known range, or if the problem implicitly involves frequencies (e.g., finding the Kth most frequent element, though that’s a different problem), you can use a frequency map or an array to count occurrences.
How it Works
- Count the frequency of each number in the input array.
- Iterate from the largest possible number down to the smallest.
- As you iterate, accumulate the counts. The first number where the accumulated count equals or exceeds
k
is your answer.
Complexity
- Time Complexity: O(N + M), where
N
is the number of elements andM
is the range of possible values (max - min value). IfM
is small, this can be O(N). - Space Complexity: O(M) to store the counts.
Python Example
from collections import Counter
# counter_find.py
def find_kth_largest_counter(nums: list[int], k: int) -> int:
"""
Finds the k-th largest element using collections.Counter.
Best for integer arrays with a constrained range.
"""
if not nums or k <= 0 or k > len(nums):
raise ValueError("Invalid input: nums must not be empty, k must be positive and within bounds.")
counts = Counter(nums)
# Determine the range of numbers for iteration
min_val = min(nums)
max_val = max(nums)
current_count = 0
# Iterate from largest possible value down to smallest
for num_val in range(max_val, min_val - 1, -1):
if num_val in counts:
current_count += counts[num_val]
if current_count >= k:
return num_val
# This part should ideally not be reached if k is within bounds and nums is not empty
# unless there's an issue with logic or k is too high for frequency-based search
raise RuntimeError("Should not happen: Kth element not found. Check k or input.")
# Test cases
test_cases = [
([3, 2, 1, 5, 6, 4], 2, 5),
([3, 2, 3, 1, 2, 4, 5, 5, 6], 4, 4), # Expected 4th largest
([1], 1, 1),
([7, 6, 5, 4, 3, 2, 1], 5, 3),
([0, 0, 0, 1, 2, 3], 3, 1), # Duplicates handle correctly
]
for nums, k, expected in test_cases:
try:
result = find_kth_largest_counter(nums, k)
print(f"Nums: {nums}, k: {k}, Result: {result}, Expected: {expected} {'(PASS)' if result == expected else '(FAIL)'}")
except ValueError as e:
print(f"Nums: {nums}, k: {k}, Error: {e}")
Output
Nums: [3, 2, 1, 5, 6, 4], k: 2, Result: 5, Expected: 5 (PASS)
Nums: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4, Result: 4, Expected: 4 (PASS)
Nums: [1], k: 1, Result: 1, Expected: 1 (PASS)
Nums: [7, 6, 5, 4, 3, 2, 1], k: 5, Result: 3, Expected: 3 (PASS)
Nums: [0, 0, 0, 1, 2, 3], k: 3, Result: 1, Expected: 1 (PASS)
Note: This approach shines when the range of numbers is small (e.g., ages 0-100, scores 0-1000) or when dealing with character counts, making it a “counting sort” like approach. If the numbers are arbitrary large integers, M
could be enormous, making this method impractical due to memory usage for the counts
dictionary and the iteration range.
Choosing the Right Approach
Here’s a quick summary and guide for when to use which method:
| Approach | Time Complexity (Avg/Worst) | Space Complexity (Avg/Worst) | When to Use QuickSort selects a pivot
, then places elements smaller than the pivot
before it, and elements greater than the pivot
after it.
The Problem:
Given an unsorted array nums
, return the k
th largest element in the array. Note that it is the k
th largest element in the sorted order, not the k
th distinct element.
Example 1:
Input: nums = [3,2,1,5,6,4], k = 2
Output: 5
Example 2:
Input: nums = [3,2,3,1,2,4,5,5,6], k = 4
Output: 4
Constraints:
1 <= k <= nums.length <= 10^5
-10^4 <= nums[i] <= 10^4
1. Simple Solution: Sorting
The most straightforward approach is to sort the array and then pick the element at the k
-th position from the end (or n-k
from the beginning, where n
is the array length).
Algorithm:
- Sort the given array
nums
in ascending order. - The
k
th largest element will be at indexlen(nums) - k
.
Complexity Analysis:
- Time Complexity: O(N log N) due to the sorting operation. This is generally efficient enough for
N
up to10^5
but might be overkill ifk
is very small. - Space Complexity: O(log N) or O(N) depending on the sorting algorithm used (e.g., Python’s
sort()
uses Timsort which is O(N) in worst case). For in-place sorts like Heap Sort or Quick Sort, it can be O(log N) average for recursion stack.
Python Code Example:
# sort_solution.py
def find_kth_largest_sorting(nums: list[int], k: int) -> int:
"""
Finds the k-th largest element by sorting the array.
"""
if not nums or not (1 <= k <= len(nums)):
raise ValueError("Invalid input: nums cannot be empty, and k must be within bounds.")
nums.sort() # Sorts in ascending order
return nums[len(nums) - k]
# Test cases
test_cases = [
([3, 2, 1, 5, 6, 4], 2, 5),
([3, 2, 3, 1, 2, 4, 5, 5, 6], 4, 4),
([1], 1, 1),
([7, 6, 5, 4, 3, 2, 1], 5, 3), # Expected: 3 (5th largest)
]
for nums, k, expected in test_cases:
result = find_kth_largest_sorting(list(nums), k) # Pass a copy to preserve original list
print(f"Input: {nums}, k: {k}, Output: {result}, Expected: {expected} -> {'PASS' if result == expected else 'FAIL'}")
# Edge case: invalid k
try:
find_kth_largest_sorting([1, 2, 3], 0)
except ValueError as e:
print(f"Error for k=0: {e}")
Sample Output:
Input: [3, 2, 1, 5, 6, 4], k: 2, Output: 5, Expected: 5 -> PASS
Input: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4, Output: 4, Expected: 4 -> PASS
Input: [1], k: 1, Output: 1, Expected: 1 -> PASS
Input: [7, 6, 5, 4, 3, 2, 1], k: 5, Output: 3, Expected: 3 -> PASS
Error for k=0: Invalid input: nums cannot be empty, and k must be within bounds.
Note: This solution is simple to implement and understand, making it a good first approach if performance isn’t the absolute top priority or N
is relatively small. However, we can do better.
2. Efficient Solution: Min-Heap (Priority Queue)
A min-heap (or priority queue) is an excellent choice when you need to maintain a collection of the k
largest (or smallest) elements seen so far.
Algorithm:
- Initialize a min-heap (priority queue).
- Iterate through each number
num
in the input arraynums
:- Add
num
to the heap. - If the size of the heap exceeds
k
, remove the smallest element from the heap (this isheapq.heappop()
for a min-heap).
- Add
- After processing all numbers, the heap will contain exactly the
k
largest elements. The smallest element in this heap (its root) will be thek
-th largest element overall.
Why a Min-Heap for Kth Largest?
This often confuses people. We use a min-heap because we want to keep track of the k
largest numbers. When a new number comes in, if it’s larger than the smallest element currently in our heap (which is the current k
-th largest), it means this new number should replace that smallest element. We pop the smallest to maintain a heap of size k
containing only the largest elements.
Complexity Analysis:
- Time Complexity: O(N log K). We iterate through
N
elements. For each element, a heap push or pop operation takes O(log K) time (since the heap’s maximum size isK
). IfK
is close toN
, this approaches O(N log N). IfK
is very small, it’s closer to O(N). - Space Complexity: O(K), as the heap stores at most
K
elements.
Python Code Example:
Python’s heapq
module provides min-heap functionality.
import heapq
# min_heap_solution.py
def find_kth_largest_heap(nums: list[int], k: int) -> int:
"""
Finds the k-th largest element using a min-heap.
"""
if not nums or not (1 <= k <= len(nums)):
raise ValueError("Invalid input: nums cannot be empty, and k must be within bounds.")
min_heap = []
for num in nums:
heapq.heappush(min_heap, num)
if len(min_heap) > k:
heapq.heappop(min_heap) # Remove the smallest element if heap size exceeds k
return min_heap[0] # The root of the min-heap is the k-th largest element
# Test cases (same as above)
test_cases = [
([3, 2, 1, 5, 6, 4], 2, 5),
([3, 2, 3, 1, 2, 4, 5, 5, 6], 4, 4),
([1], 1, 1),
([7, 6, 5, 4, 3, 2, 1], 5, 3),
]
for nums, k, expected in test_cases:
result = find_kth_largest_heap(list(nums), k)
print(f"Input: {nums}, k: {k}, Output: {result}, Expected: {expected} -> {'PASS' if result == expected else 'FAIL'}")
# Edge case: invalid k
try:
find_kth_largest_heap([1, 2, 3], 0)
except ValueError as e:
print(f"Error for k=0: {e}")
Sample Output:
Input: [3, 2, 1, 5, 6, 4], k: 2, Output: 5, Expected: 5 -> PASS
Input: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4, Output: 4, Expected: 4 -> PASS
Input: [1], k: 1, Output: 1, Expected: 1 -> PASS
Input: [7, 6, 5, 4, 3, 2, 1], k: 5, Output: 3, Expected: 3 -> PASS
Error for k=0: Invalid input: nums cannot be empty, and k must be within bounds.
Note: Python’s heapq
module also provides a highly optimized nlargest(k, iterable)
function which directly returns a list of the k
largest elements. For practical purposes, this is often the most concise and robust solution in Python.
import heapq
# heapq_nlargest_solution.py
def find_kth_largest_nlargest(nums: list[int], k: int) -> int:
"""
Finds the k-th largest element using heapq.nlargest.
"""
if not nums or not (1 <= k <= len(nums)):
raise ValueError("Invalid input: nums cannot be empty, and k must be within bounds.")
return heapq.nlargest(k, nums)[-1] # nlargest returns a list, the last element is the k-th largest
# Test cases (same as above)
test_cases = [
([3, 2, 1, 5, 6, 4], 2, 5),
([3, 2, 3, 1, 2, 4, 5, 5, 6], 4, 4),
([1], 1, 1),
([7, 6, 5, 4, 3, 2, 1], 5, 3),
]
for nums, k, expected in test_cases:
result = find_kth_largest_nlargest(list(nums), k)
print(f"Input: {nums}, k: {k}, Output: {result}, Expected: {expected} -> {'PASS' if result == expected else 'FAIL'}")
Sample Output:
Input: [3, 2, 1, 5, 6, 4], k: 2, Output: 5, Expected: 5 -> PASS
Input: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4, Output: 4, Expected: 4 -> PASS
Input: [1], k: 1, Output: 1, Expected: 1 -> PASS
Input: [7, 6, 5, 4, 3, 2, 1], k: 5, Output: 3, Expected: 3 -> PASS
3. Optimal Solution: QuickSelect (Hoare’s Selection Algorithm)
QuickSelect is an in-place, average O(N) time complexity algorithm, making it theoretically the most efficient for large N
. It’s a variation of QuickSort that only recurses on the partition that potentially contains the k
-th element, rather than both.
Algorithm:
- Transform
k
: For thek
-th largest element in an array of sizen
, we are looking for the element at indexn - k
if the array were sorted in ascending order (0-indexed). Let’s call thistarget_idx
. partition(arr, left, right)
function:- Choose a
pivot_value
fromarr[left...right]
. A common robust strategy is to pick a random element to avoid worst-case scenarios. - Rearrange elements such that all elements
x < pivot_value
are to its left, and allx >= pivot_value
are to its right. - Return the
pivot_index
(the final position of the pivot).
- Choose a
- Recursive Search:
- Call
partition(nums, left, right)
. Getpivot_index
. - If
pivot_index == target_idx
, thennums[pivot_index]
is our answer. - If
pivot_index < target_idx
, the element is in the right partition. Recursively callquick_select(nums, pivot_index + 1, right)
. - If
pivot_index > target_idx
, the element is in the left partition. Recursively callquick_select(nums, left, pivot_index - 1)
.
- Call
Complexity Analysis:
- Time Complexity:
- Average Case: O(N). Each recursive call roughly halves the search space.
- Worst Case: O(N^2). This occurs if pivot selection consistently leads to highly unbalanced partitions (e.g., always picking the smallest or largest element). Random pivot selection significantly reduces the probability of hitting this worst case.
- Space Complexity: O(log N) on average due to the recursion stack. O(N) in the worst case (unbalanced partitions). An iterative implementation can achieve O(1) space.
Python Code Example:
import random
# quick_select_solution.py
def find_kth_largest_quickselect(nums: list[int], k: int) -> int:
"""
Finds the k-th largest element using the QuickSelect algorithm.
"""
if not nums or not (1 <= k <= len(nums)):
raise ValueError("Invalid input: nums cannot be empty, and k must be within bounds.")
# Convert k-th largest to (n-k)-th smallest for easier indexing with ascending partition
target_idx = len(nums) - k
# Inner helper function for QuickSelect
def _quick_select(arr: list[int], left: int, right: int) -> int:
if left == right:
return arr[left]
# Choose a random pivot index to help prevent worst-case O(N^2)
pivot_idx = random.randint(left, right)
pivot_value = arr[pivot_idx]
# Move pivot to the end of the current partition
arr[pivot_idx], arr[right] = arr[right], arr[pivot_idx]
store_idx = left # This will be the position for elements smaller than pivot
# Partition step: elements < pivot go to the left of store_idx
for i in range(left, right):
if arr[i] < pivot_value:
arr[store_idx], arr[i] = arr[i], arr[store_idx]
store_idx += 1
# Place the pivot at its correct sorted position
arr[right], arr[store_idx] = arr[store_idx], arr[right]
# Check if the pivot is at the target index
if store_idx == target_idx:
return arr[store_idx]
elif store_idx < target_idx:
# Recurse on the right partition (elements larger than pivot)
return _quick_select(arr, store_idx + 1, right)
else: # store_idx > target_idx
# Recurse on the left partition (elements smaller than pivot)
return _quick_select(arr, left, store_idx - 1)
# QuickSelect modifies the array in-place, so pass a copy if you need original nums intact
return _quick_select(list(nums), 0, len(nums) - 1)
# Test cases
test_cases = [
([3, 2, 1, 5, 6, 4], 2, 5),
([3, 2, 3, 1, 2, 4, 5, 5, 6], 4, 4),
([1], 1, 1),
([7, 6, 5, 4, 3, 2, 1], 5, 3),
([2, 1], 1, 2), # k=1 (largest) for [2,1] should be 2
([2, 1], 2, 1), # k=2 (2nd largest) for [2,1] should be 1
([0, 0, 0, 1, 2, 3], 3, 1), # 3rd largest: [3,2,1,0,0,0] -> 1
]
for nums, k, expected in test_cases:
result = find_kth_largest_quickselect(list(nums), k) # Pass a copy
print(f"Input: {nums}, k: {k}, Output: {result}, Expected: {expected} -> {'PASS' if result == expected else 'FAIL'}")
# Edge case: invalid k
try:
find_kth_largest_quickselect([1, 2, 3], 0)
except ValueError as e:
print(f"Error for k=0: {e}")
Sample Output:
Input: [3, 2, 1, 5, 6, 4], k: 2, Output: 5, Expected: 5 -> PASS
Input: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4, Output: 4, Expected: 4 -> PASS
Input: [1], k: 1, Output: 1, Expected: 1 -> PASS
Input: [7, 6, 5, 4, 3, 2, 1], k: 5, Output: 3, Expected: 3 -> PASS
Input: [2, 1], k: 1, Output: 2, Expected: 2 -> PASS
Input: [2, 1], k: 2, Output: 1, Expected: 1 -> PASS
Input: [0, 0, 0, 1, 2, 3], k: 3, Output: 1, Expected: 1 -> PASS
Error for k=0: Invalid input: nums cannot be empty, and k must be within bounds.
Note: While QuickSelect offers the best average-case performance, its worst-case scenario is a concern. Randomizing the pivot choice is crucial in practice to almost guarantee average-case behavior. For guaranteed O(N) performance, a more complex algorithm like Median-of-Medians can be used, but it’s rarely implemented in practice due to its complexity.
Real-World Considerations & Gotchas
- Edge Cases: Always validate
k
and the input arraynums
.k
is less than 1.k
is greater thanlen(nums)
.nums
is empty.- All elements are the same.
- Duplicates: The problem statement explicitly says “kth largest element in the sorted order, not the kth distinct element.” All the provided solutions inherently handle duplicates correctly based on this interpretation.
- Data Types: The examples use integers, but these algorithms generally extend to floats or any comparable data types. For complex objects, you’d need to define a custom comparison or key function (e.g.,
key=lambda x: x.attribute
forsort
andheapq
). - Language Built-ins: For languages like Python,
heapq.nlargest
is often the most pragmatic and efficient choice for real-world code, as it’s highly optimized and concise. Many languages have similar standard library functions (e.g.,Collections.sort
in Java,std::sort
in C++). Don’t reinvent the wheel unless you’re explicitly learning or demonstrating the underlying algorithm. - Streaming Data: If you can’t load the entire dataset into memory (e.g., processing a large log file), the min-heap approach is ideal. It only requires O(K) space, making it suitable for streaming scenarios.
Conclusion
The “Kth Largest Element” problem is a fantastic way to understand the trade-offs between different algorithmic approaches:
- Sorting: Easiest to implement, good for small
N
or when the full sorted list is needed anyway. Time: O(N log N). - Min-Heap: Robust and efficient, particularly when
k
is much smaller thanN
. Excellent for streaming data. Time: O(N log K). - QuickSelect: Optimal average-case performance (O(N)), making it the fastest for large
N
if worst-case is mitigated. Time: O(N) average, O(N^2) worst.
Choosing the “best” solution depends on your specific constraints: the size of N
, the value of k
, memory limitations, and the acceptable worst-case performance. For most practical scenarios in Python, heapq.nlargest
is often your best bet due to its balance of performance and readability. For interview settings, being able to implement QuickSelect demonstrates a deeper understanding of algorithms.
Happy coding!