Find the Median of an Array in Python: Sorted Approach and QuickSelect
The median of a dataset is its middle value when sorted. For an odd-length list, that is the value at position n // 2. For an even-length list, it is the average of the two middle values. Simple definition — but the implementations range from a one-liner to a genuinely interesting algorithm.
Approach 1: Sort and Index
Sort the array, then compute the middle position. This is correct, readable, and O(n log n).
def find_median(arr): """Return the median of a list. Handles both odd and even length.""" if not arr: raise ValueError("Cannot find median of an empty list")
sorted_arr = sorted(arr) # returns a new sorted list, arr unchanged n = len(sorted_arr) mid = n // 2
if n % 2 == 1: # Odd length: exact middle element return sorted_arr[mid] else: # Even length: average of two middle elements return (sorted_arr[mid - 1] + sorted_arr[mid]) / 2
# Odd-length exampleprint(find_median([7, 2, 5, 1, 9, 3, 7]))# Output: 5
# Even-length exampleprint(find_median([3, 1, 4, 1, 5, 9]))# Output: 3.5For most practical purposes, this is all you need. Python’s statistics library also has statistics.median() if you want a one-liner.
Using the statistics Module
import statistics
data = [7, 2, 5, 1, 9, 3, 7]print(statistics.median(data))# Output: 5
# For integer median (no averaging for even-length)print(statistics.median_low(data)) # lower of the two middle valuesprint(statistics.median_high(data)) # higher of the two middle valuesApproach 2: QuickSelect — O(n) Average Time
Sorting everything just to find one element is wasteful. QuickSelect finds the kth smallest element (and thus the median) without fully sorting the array. Average time: O(n). Worst case: O(n²), but randomised pivoting makes that unlikely.
import random
def quickselect(arr, k): """Return the kth smallest element (0-indexed) using QuickSelect.""" if len(arr) == 1: return arr[0]
pivot = random.choice(arr) # random pivot reduces worst-case probability
smaller = [x for x in arr if x < pivot] equal = [x for x in arr if x == pivot] larger = [x for x in arr if x > pivot]
if k < len(smaller): return quickselect(smaller, k) elif k < len(smaller) + len(equal): return pivot else: return quickselect(larger, k - len(smaller) - len(equal))
def find_median_quickselect(arr): """Find median using QuickSelect — O(n) average time.""" if not arr: raise ValueError("Cannot find median of an empty list")
n = len(arr) if n % 2 == 1: return quickselect(arr, n // 2) else: lo = quickselect(arr, n // 2 - 1) hi = quickselect(arr, n // 2) return (lo + hi) / 2
data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]print(find_median_quickselect(data))# Output: 4.0The QuickSelect approach is more useful when you only need the median from a large dataset and want to avoid the O(n log n) sorting cost.
Edge Cases Worth Testing
# Single elementprint(find_median([42])) # 42
# Two elementsprint(find_median([3, 7])) # 5.0
# All same valuesprint(find_median([4, 4, 4])) # 4
# Already sortedprint(find_median([1, 2, 3, 4, 5])) # 3
# Reverse sortedprint(find_median([9, 7, 5, 3, 1])) # 5Which Approach to Use
Use sorted() + index arithmetic for everyday code. It is readable and correct. Use QuickSelect when you are specifically optimising for a large array and need better than O(n log n). The statistics.median() function is fine for data analysis scripts where you want the library to handle edge cases.
In interviews, implement the sort-based approach first, state its complexity, then mention QuickSelect as the optimisation if they ask.