Technology  /  Python

🐍 Python 78 guides · updated 2026

From first variable to OOP, generators, and real projects — the language that runs everything from data pipelines to AI agents, taught the practical way.

Find Duplicates in a Python Array: Hash Set, Sorting, and Floyd’s Algorithm

Finding duplicates in an array is a classic interview question that comes in several flavours. The simplest version just asks you to identify which values appear more than once. A harder variant constrains you to O(1) extra space. Understanding both is worth your time.

Approach 1: Hash Set — O(n) Time, O(n) Space

Walk through the array once. For each element, check whether you have seen it before. If yes, it is a duplicate; if no, add it to your seen set.

def find_duplicates_set(nums):
"""Return a list of all values that appear more than once."""
seen = set()
duplicates = set()
for num in nums:
if num in seen:
duplicates.add(num) # add to set so we don't report it twice
else:
seen.add(num)
return sorted(duplicates) # sorted for deterministic output
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 17, 17, 20, 20]
print(find_duplicates_set(numbers))
# Output: [10, 17, 20]

Both seen and duplicates use a set, so membership checks are O(1) on average. Total time: O(n). Total extra space: O(n).

Approach 2: Sorting — O(n log n) Time, O(1) Extra Space

Sort the array first. Duplicates will then be adjacent — a single pass finds them.

def find_duplicates_sorted(nums):
"""Find duplicates by sorting — modifies the input array."""
nums.sort() # in-place sort, O(n log n)
duplicates = []
for i in range(1, len(nums)):
if nums[i] == nums[i - 1]:
# Avoid adding the same duplicate multiple times
if not duplicates or duplicates[-1] != nums[i]:
duplicates.append(nums[i])
return duplicates
numbers = [4, 3, 2, 7, 8, 2, 3, 1]
print(find_duplicates_sorted(numbers))
# Output: [2, 3]

The sorting approach mutates the input. If preserving the original order matters, sort a copy: sorted_nums = sorted(nums).

Approach 3: Using Counter for Frequency Analysis

When you need the count of each element — not just whether it is a duplicate:

from collections import Counter
def find_duplicates_counter(nums):
"""Return a dict of {element: count} for elements appearing more than once."""
freq = Counter(nums)
return {num: count for num, count in freq.items() if count > 1}
numbers = [1, 1, 2, 3, 3, 3, 4]
print(find_duplicates_counter(numbers))
# Output: {1: 2, 3: 3}

Approach 4: Floyd’s Cycle Detection (Constrained Problem)

The most elegant version — given an array of n+1 integers where each value is in the range [1, n], find the duplicate using only O(1) extra space and without modifying the array.

The idea: treat each value as a pointer to the next index. A duplicate creates a cycle, just like a linked list with a loop.

def find_duplicate_floyd(nums):
"""
Find the single duplicate in nums[1..n] with n+1 elements.
Uses Floyd's tortoise-and-hare cycle detection.
O(n) time, O(1) space.
"""
# Phase 1: detect the cycle
slow = nums[0]
fast = nums[0]
while True:
slow = nums[slow] # move one step
fast = nums[nums[fast]] # move two steps
if slow == fast:
break
# Phase 2: find the entry point of the cycle (= the duplicate)
slow = nums[0]
while slow != fast:
slow = nums[slow]
fast = nums[fast]
return slow
nums = [1, 3, 4, 2, 2]
print(find_duplicate_floyd(nums))
# Output: 2

This only works under specific constraints: exactly one duplicate, values in [1, n]. For the general case, use the hash set approach.

Choosing the Right Approach

ApproachTimeExtra SpaceNotes
Hash setO(n)O(n)General purpose, most readable
SortingO(n log n)O(1)**Mutates the array
CounterO(n)O(n)Best when frequency counts matter
Floyd’s cycleO(n)O(1)Constrained problem only

In a standard interview, the hash set solution is the right default answer. Mention Floyd’s only if space complexity is explicitly constrained.