Find Duplicates in a Python Array: Hash Set, Sorting, and Floyd’s Algorithm

Finding duplicates in an array is a classic interview question that comes in several flavours. The simplest version just asks you to identify which values appear more than once. A harder variant constrains you to O(1) extra space. Understanding both is worth your time.

Approach 1: Hash Set — O(n) Time, O(n) Space

Walk through the array once. For each element, check whether you have seen it before. If yes, it is a duplicate; if no, add it to your seen set.

def find_duplicates_set(nums):
    """Return a list of all values that appear more than once."""
    seen = set()
    duplicates = set()

    for num in nums:
        if num in seen:
            duplicates.add(num)  # add to set so we don't report it twice
        else:
            seen.add(num)

    return sorted(duplicates)  # sorted for deterministic output

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 17, 17, 20, 20]
print(find_duplicates_set(numbers))
# Output: [10, 17, 20]

Both seen and duplicates use a set, so membership checks are O(1) on average. Total time: O(n). Total extra space: O(n).

Approach 2: Sorting — O(n log n) Time, O(1) Extra Space

Sort the array first. Duplicates will then be adjacent — a single pass finds them.

def find_duplicates_sorted(nums):
    """Find duplicates by sorting — modifies the input array."""
    nums.sort()                 # in-place sort, O(n log n)
    duplicates = []

    for i in range(1, len(nums)):
        if nums[i] == nums[i - 1]:
            # Avoid adding the same duplicate multiple times
            if not duplicates or duplicates[-1] != nums[i]:
                duplicates.append(nums[i])

    return duplicates

numbers = [4, 3, 2, 7, 8, 2, 3, 1]
print(find_duplicates_sorted(numbers))
# Output: [2, 3]

The sorting approach mutates the input. If preserving the original order matters, sort a copy: sorted_nums = sorted(nums).

Approach 3: Using Counter for Frequency Analysis

When you need the count of each element — not just whether it is a duplicate:

from collections import Counter

def find_duplicates_counter(nums):
    """Return a dict of {element: count} for elements appearing more than once."""
    freq = Counter(nums)
    return {num: count for num, count in freq.items() if count > 1}

numbers = [1, 1, 2, 3, 3, 3, 4]
print(find_duplicates_counter(numbers))
# Output: {1: 2, 3: 3}

Approach 4: Floyd’s Cycle Detection (Constrained Problem)

The most elegant version — given an array of n+1 integers where each value is in the range [1, n], find the duplicate using only O(1) extra space and without modifying the array.

The idea: treat each value as a pointer to the next index. A duplicate creates a cycle, just like a linked list with a loop.

def find_duplicate_floyd(nums):
    """
    Find the single duplicate in nums[1..n] with n+1 elements.
    Uses Floyd's tortoise-and-hare cycle detection.
    O(n) time, O(1) space.
    """
    # Phase 1: detect the cycle
    slow = nums[0]
    fast = nums[0]

    while True:
        slow = nums[slow]          # move one step
        fast = nums[nums[fast]]    # move two steps
        if slow == fast:
            break

    # Phase 2: find the entry point of the cycle (= the duplicate)
    slow = nums[0]
    while slow != fast:
        slow = nums[slow]
        fast = nums[fast]

    return slow

nums = [1, 3, 4, 2, 2]
print(find_duplicate_floyd(nums))
# Output: 2

This only works under specific constraints: exactly one duplicate, values in [1, n]. For the general case, use the hash set approach.

Choosing the Right Approach

Approach	Time	Extra Space	Notes
Hash set	O(n)	O(n)	General purpose, most readable
Sorting	O(n log n)	O(1)*	*Mutates the array
Counter	O(n)	O(n)	Best when frequency counts matter
Floyd’s cycle	O(n)	O(1)	Constrained problem only

In a standard interview, the hash set solution is the right default answer. Mention Floyd’s only if space complexity is explicitly constrained.

Written by NPBlue Engineering Team — Software & Data Engineers who ships production Python across data, backend, and ML systems.

Reviewed for technical accuracy. Spot an error? Let us know.