Find Duplicates in a Python Array: Hash Set, Sorting, and Floyd’s Algorithm
Finding duplicates in an array is a classic interview question that comes in several flavours. The simplest version just asks you to identify which values appear more than once. A harder variant constrains you to O(1) extra space. Understanding both is worth your time.
Approach 1: Hash Set — O(n) Time, O(n) Space
Walk through the array once. For each element, check whether you have seen it before. If yes, it is a duplicate; if no, add it to your seen set.
def find_duplicates_set(nums): """Return a list of all values that appear more than once.""" seen = set() duplicates = set()
for num in nums: if num in seen: duplicates.add(num) # add to set so we don't report it twice else: seen.add(num)
return sorted(duplicates) # sorted for deterministic output
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 17, 17, 20, 20]print(find_duplicates_set(numbers))# Output: [10, 17, 20]Both seen and duplicates use a set, so membership checks are O(1) on average. Total time: O(n). Total extra space: O(n).
Approach 2: Sorting — O(n log n) Time, O(1) Extra Space
Sort the array first. Duplicates will then be adjacent — a single pass finds them.
def find_duplicates_sorted(nums): """Find duplicates by sorting — modifies the input array.""" nums.sort() # in-place sort, O(n log n) duplicates = []
for i in range(1, len(nums)): if nums[i] == nums[i - 1]: # Avoid adding the same duplicate multiple times if not duplicates or duplicates[-1] != nums[i]: duplicates.append(nums[i])
return duplicates
numbers = [4, 3, 2, 7, 8, 2, 3, 1]print(find_duplicates_sorted(numbers))# Output: [2, 3]The sorting approach mutates the input. If preserving the original order matters, sort a copy: sorted_nums = sorted(nums).
Approach 3: Using Counter for Frequency Analysis
When you need the count of each element — not just whether it is a duplicate:
from collections import Counter
def find_duplicates_counter(nums): """Return a dict of {element: count} for elements appearing more than once.""" freq = Counter(nums) return {num: count for num, count in freq.items() if count > 1}
numbers = [1, 1, 2, 3, 3, 3, 4]print(find_duplicates_counter(numbers))# Output: {1: 2, 3: 3}Approach 4: Floyd’s Cycle Detection (Constrained Problem)
The most elegant version — given an array of n+1 integers where each value is in the range [1, n], find the duplicate using only O(1) extra space and without modifying the array.
The idea: treat each value as a pointer to the next index. A duplicate creates a cycle, just like a linked list with a loop.
def find_duplicate_floyd(nums): """ Find the single duplicate in nums[1..n] with n+1 elements. Uses Floyd's tortoise-and-hare cycle detection. O(n) time, O(1) space. """ # Phase 1: detect the cycle slow = nums[0] fast = nums[0]
while True: slow = nums[slow] # move one step fast = nums[nums[fast]] # move two steps if slow == fast: break
# Phase 2: find the entry point of the cycle (= the duplicate) slow = nums[0] while slow != fast: slow = nums[slow] fast = nums[fast]
return slow
nums = [1, 3, 4, 2, 2]print(find_duplicate_floyd(nums))# Output: 2This only works under specific constraints: exactly one duplicate, values in [1, n]. For the general case, use the hash set approach.
Choosing the Right Approach
| Approach | Time | Extra Space | Notes |
|---|---|---|---|
| Hash set | O(n) | O(n) | General purpose, most readable |
| Sorting | O(n log n) | O(1)* | *Mutates the array |
| Counter | O(n) | O(n) | Best when frequency counts matter |
| Floyd’s cycle | O(n) | O(1) | Constrained problem only |
In a standard interview, the hash set solution is the right default answer. Mention Floyd’s only if space complexity is explicitly constrained.