Python Sets: Fast Membership Testing, Deduplication, and Set Operations
Sets don’t appear in beginner tutorials as often as lists and dictionaries, but they’re the right tool for a specific set of problems — and when they’re right, they’re much better than the alternatives. The two killer features: O(1) membership testing and automatic deduplication.
What a Set Is
A set is an unordered collection of unique, hashable elements. “Unordered” means elements have no index and no guaranteed position. “Unique” means duplicates are automatically discarded. “Hashable” means elements must be immutable (strings, numbers, tuples — not lists or dicts).
# Create with curly bracesskills = {"Python", "SQL", "Git", "Python"} # duplicate "Python" is ignoredprint(skills) # {'SQL', 'Git', 'Python'} — order is not guaranteed
# Create from an iterablenumbers = set([3, 1, 4, 1, 5, 9, 2, 6, 5])print(numbers) # {1, 2, 3, 4, 5, 6, 9}
# Empty set — must use set(), not {} (that's an empty dict)empty = set()Why Sets Are Fast for Membership Testing
When you check value in some_list, Python scans the list from the beginning until it finds the value or reaches the end. For a list of 10,000 items, that’s up to 10,000 comparisons — O(n) time.
Sets use a hash table internally. To check if a value is in a set, Python computes its hash (a constant-time operation), looks up that position in the hash table, and checks just that slot. This is O(1) regardless of set size.
import time
large_list = list(range(1_000_000))large_set = set(large_list)
target = 999_999
# List lookup — scans all 1 million elementsstart = time.time()for _ in range(1000): target in large_listlist_time = time.time() - start
# Set lookup — hash table lookupstart = time.time()for _ in range(1000): target in large_setset_time = time.time() - start
print(f"List: {list_time:.3f}s, Set: {set_time:.3f}s")# Set is orders of magnitude fasterIf you’re checking membership repeatedly, convert your list to a set first.
Adding and Removing Elements
languages = {"Python", "JavaScript", "Go"}
# Add a single elementlanguages.add("Rust")
# Add multiple elementslanguages.update(["TypeScript", "Python"]) # "Python" already exists, ignored
# Remove elementslanguages.remove("Go") # raises KeyError if not foundlanguages.discard("Fortran") # silent if not found — usually what you wantpopped = languages.pop() # removes and returns an arbitrary element
# Clear the setlanguages.clear()Use discard() over remove() unless you want confirmation that an element existed.
Set Operations
This is where sets shine for data analysis and comparison problems:
backend_devs = {"Alice", "Bob", "Charlie", "Diana"}frontend_devs = {"Charlie", "Diana", "Eve", "Frank"}
# Union — everyone in either setall_devs = backend_devs | frontend_devs# or: backend_devs.union(frontend_devs)print(all_devs) # {'Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'}
# Intersection — in both setsfullstack = backend_devs & frontend_devs# or: backend_devs.intersection(frontend_devs)print(fullstack) # {'Charlie', 'Diana'}
# Difference — in first set but not secondbackend_only = backend_devs - frontend_devs# or: backend_devs.difference(frontend_devs)print(backend_only) # {'Alice', 'Bob'}
# Symmetric difference — in one set but not bothunique_to_one = backend_devs ^ frontend_devs# or: backend_devs.symmetric_difference(frontend_devs)print(unique_to_one) # {'Alice', 'Bob', 'Eve', 'Frank'}Subset and superset tests
admin_permissions = {"read", "write", "delete", "admin"}user_permissions = {"read", "write"}
print(user_permissions.issubset(admin_permissions)) # Trueprint(admin_permissions.issuperset(user_permissions)) # Trueprint(user_permissions.isdisjoint({"delete", "admin"})) # True — no overlapFrozensets
A frozenset is an immutable set. You can’t add, remove, or modify its elements after creation:
fixed = frozenset(["red", "green", "blue"])
# Frozensets are hashable — can be used as dict keys or in other setscolour_map = { frozenset(["red", "blue"]): "purple", frozenset(["red", "yellow"]): "orange",}
print(colour_map[frozenset(["red", "blue"])]) # "purple"
# A set of frozensetsnested = {frozenset([1, 2]), frozenset([3, 4])}Use frozensets when you need set semantics but the set itself must be hashable.
Practical Patterns
Deduplication
raw_emails = [ "alice@example.com", "bob@example.com", "alice@example.com", # duplicate "charlie@example.com",]
unique_emails = list(set(raw_emails))# Note: order is not preserved. Use dict.fromkeys() to preserve order:ordered_unique = list(dict.fromkeys(raw_emails))Finding common elements between two lists
list_a = [1, 2, 3, 4, 5, 6]list_b = [4, 5, 6, 7, 8, 9]
common = set(list_a) & set(list_b) # {4, 5, 6}only_in_a = set(list_a) - set(list_b) # {1, 2, 3}Filtering against a blocklist
blocked_words = {"spam", "scam", "free", "winner"}
def is_suspicious(subject): words = set(subject.lower().split()) return bool(words & blocked_words) # intersection is non-empty
print(is_suspicious("Congratulations! You are a winner!")) # Trueprint(is_suspicious("Meeting agenda for Thursday")) # FalseTracking visited items
def crawl(start_url): visited = set() queue = [start_url]
while queue: url = queue.pop() if url in visited: # O(1) check continue visited.add(url) # process url, add links to queueWhen to Use a Set vs a List
| Need | Use |
|---|---|
| Maintain order | list |
| Allow duplicates | list |
| Frequent membership testing | set |
| Deduplicate data | set |
| Mathematical set operations | set |
Index access (items[3]) | list |
| Hashable container | frozenset |
Sets don’t replace lists — they solve a different problem. When your primary operations are “does this value exist?” and “what’s unique?”, a set is the right answer.