Technology  /  Python

🐍 Python 78 guides · updated 2026

From first variable to OOP, generators, and real projects — the language that runs everything from data pipelines to AI agents, taught the practical way.

Python Sets: Fast Membership Testing, Deduplication, and Set Operations

Sets don’t appear in beginner tutorials as often as lists and dictionaries, but they’re the right tool for a specific set of problems — and when they’re right, they’re much better than the alternatives. The two killer features: O(1) membership testing and automatic deduplication.


What a Set Is

A set is an unordered collection of unique, hashable elements. “Unordered” means elements have no index and no guaranteed position. “Unique” means duplicates are automatically discarded. “Hashable” means elements must be immutable (strings, numbers, tuples — not lists or dicts).

# Create with curly braces
skills = {"Python", "SQL", "Git", "Python"} # duplicate "Python" is ignored
print(skills) # {'SQL', 'Git', 'Python'} — order is not guaranteed
# Create from an iterable
numbers = set([3, 1, 4, 1, 5, 9, 2, 6, 5])
print(numbers) # {1, 2, 3, 4, 5, 6, 9}
# Empty set — must use set(), not {} (that's an empty dict)
empty = set()

Why Sets Are Fast for Membership Testing

When you check value in some_list, Python scans the list from the beginning until it finds the value or reaches the end. For a list of 10,000 items, that’s up to 10,000 comparisons — O(n) time.

Sets use a hash table internally. To check if a value is in a set, Python computes its hash (a constant-time operation), looks up that position in the hash table, and checks just that slot. This is O(1) regardless of set size.

import time
large_list = list(range(1_000_000))
large_set = set(large_list)
target = 999_999
# List lookup — scans all 1 million elements
start = time.time()
for _ in range(1000):
target in large_list
list_time = time.time() - start
# Set lookup — hash table lookup
start = time.time()
for _ in range(1000):
target in large_set
set_time = time.time() - start
print(f"List: {list_time:.3f}s, Set: {set_time:.3f}s")
# Set is orders of magnitude faster

If you’re checking membership repeatedly, convert your list to a set first.


Adding and Removing Elements

languages = {"Python", "JavaScript", "Go"}
# Add a single element
languages.add("Rust")
# Add multiple elements
languages.update(["TypeScript", "Python"]) # "Python" already exists, ignored
# Remove elements
languages.remove("Go") # raises KeyError if not found
languages.discard("Fortran") # silent if not found — usually what you want
popped = languages.pop() # removes and returns an arbitrary element
# Clear the set
languages.clear()

Use discard() over remove() unless you want confirmation that an element existed.


Set Operations

This is where sets shine for data analysis and comparison problems:

backend_devs = {"Alice", "Bob", "Charlie", "Diana"}
frontend_devs = {"Charlie", "Diana", "Eve", "Frank"}
# Union — everyone in either set
all_devs = backend_devs | frontend_devs
# or: backend_devs.union(frontend_devs)
print(all_devs) # {'Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'}
# Intersection — in both sets
fullstack = backend_devs & frontend_devs
# or: backend_devs.intersection(frontend_devs)
print(fullstack) # {'Charlie', 'Diana'}
# Difference — in first set but not second
backend_only = backend_devs - frontend_devs
# or: backend_devs.difference(frontend_devs)
print(backend_only) # {'Alice', 'Bob'}
# Symmetric difference — in one set but not both
unique_to_one = backend_devs ^ frontend_devs
# or: backend_devs.symmetric_difference(frontend_devs)
print(unique_to_one) # {'Alice', 'Bob', 'Eve', 'Frank'}

Subset and superset tests

admin_permissions = {"read", "write", "delete", "admin"}
user_permissions = {"read", "write"}
print(user_permissions.issubset(admin_permissions)) # True
print(admin_permissions.issuperset(user_permissions)) # True
print(user_permissions.isdisjoint({"delete", "admin"})) # True — no overlap

Frozensets

A frozenset is an immutable set. You can’t add, remove, or modify its elements after creation:

fixed = frozenset(["red", "green", "blue"])
# Frozensets are hashable — can be used as dict keys or in other sets
colour_map = {
frozenset(["red", "blue"]): "purple",
frozenset(["red", "yellow"]): "orange",
}
print(colour_map[frozenset(["red", "blue"])]) # "purple"
# A set of frozensets
nested = {frozenset([1, 2]), frozenset([3, 4])}

Use frozensets when you need set semantics but the set itself must be hashable.


Practical Patterns

Deduplication

raw_emails = [
"alice@example.com",
"bob@example.com",
"alice@example.com", # duplicate
"charlie@example.com",
]
unique_emails = list(set(raw_emails))
# Note: order is not preserved. Use dict.fromkeys() to preserve order:
ordered_unique = list(dict.fromkeys(raw_emails))

Finding common elements between two lists

list_a = [1, 2, 3, 4, 5, 6]
list_b = [4, 5, 6, 7, 8, 9]
common = set(list_a) & set(list_b) # {4, 5, 6}
only_in_a = set(list_a) - set(list_b) # {1, 2, 3}

Filtering against a blocklist

blocked_words = {"spam", "scam", "free", "winner"}
def is_suspicious(subject):
words = set(subject.lower().split())
return bool(words & blocked_words) # intersection is non-empty
print(is_suspicious("Congratulations! You are a winner!")) # True
print(is_suspicious("Meeting agenda for Thursday")) # False

Tracking visited items

def crawl(start_url):
visited = set()
queue = [start_url]
while queue:
url = queue.pop()
if url in visited: # O(1) check
continue
visited.add(url)
# process url, add links to queue

When to Use a Set vs a List

NeedUse
Maintain orderlist
Allow duplicateslist
Frequent membership testingset
Deduplicate dataset
Mathematical set operationsset
Index access (items[3])list
Hashable containerfrozenset

Sets don’t replace lists — they solve a different problem. When your primary operations are “does this value exist?” and “what’s unique?”, a set is the right answer.