Python Set Comprehensions: Creating Unique Collections the Concise Way
Set comprehensions give you the conciseness of list comprehensions combined with the uniqueness guarantee of sets. They’re the right tool when you want to build a collection of unique values from an iterable — and you want to do it in one line.
Syntax
{expression for item in iterable}Note the curly braces. This looks like a dict comprehension until you notice there’s no colon — one expression instead of key: value.
Basic Examples
# Get unique lengths of wordswords = ["apple", "banana", "cherry", "apricot", "blueberry"]unique_lengths = {len(word) for word in words}print(unique_lengths) # {5, 6, 9} — each length appears once
# Unique first charactersfirst_chars = {word[0] for word in words}print(first_chars) # {'a', 'b', 'c'}
# Unique absolute valuesnumbers = [-3, -1, 0, 1, 2, -2, 3]absolute_unique = {abs(n) for n in numbers}print(absolute_unique) # {0, 1, 2, 3}Notice the last example: both -3 and 3 produce 3 via abs(), so the set contains only one 3. The deduplication happens automatically.
With Filter Conditions
Add if to include only elements that pass a test:
# Unique even numbersdata = [1, 2, 3, 2, 4, 1, 5, 4, 6]unique_evens = {n for n in data if n % 2 == 0}print(unique_evens) # {2, 4, 6}
# Unique words longer than 4 characterstext = "the quick brown fox jumps over the lazy dog"long_words = {word for word in text.split() if len(word) > 4}print(long_words) # {'quick', 'brown', 'jumps', 'lazy'}# 'the' and 'over' and 'fox' are excluded; 'the' appears twice but only once in setExpressions in the Comprehension
The expression part can be any valid Python expression, not just the item itself:
# Normalise strings as you collect themtags = ["Python", "python", "PYTHON", "javascript", "JavaScript"]normalised_tags = {tag.lower() for tag in tags}print(normalised_tags) # {'python', 'javascript'}
# Extract domains from email addressesemails = [ "alice@company.com", "bob@company.com", "charlie@university.edu", "diana@university.edu",]domains = {email.split("@")[1] for email in emails}print(domains) # {'company.com', 'university.edu'}The domain extraction example shows why set comprehensions are often better than list comprehensions for this kind of work: a list would give you ["company.com", "company.com", "university.edu", "university.edu"], but you likely want the unique set.
Compared to List Comprehensions
The difference between {expr for x in iterable} and [expr for x in iterable] is:
- List: ordered, allows duplicates, supports indexing
- Set: unordered, unique values only, supports fast membership testing
source = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]
as_list = [x for x in source] # [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]as_set = {x for x in source} # {1, 2, 3, 4, 5, 6, 9}
print(len(as_list)) # 10print(len(as_set)) # 7 (duplicates removed)Choosing between them: if you need to keep duplicates or maintain order, use a list comprehension. If you need unique values and fast lookup, use a set comprehension.
Nested Set Comprehensions
You can loop over nested iterables in a set comprehension:
# Unique characters from all wordswords = ["hello", "world", "python"]all_chars = {char for word in words for char in word}print(all_chars)# {'h', 'e', 'l', 'o', 'w', 'r', 'd', 'p', 'y', 't', 'n'}
# Unique pairs from two rangespairs = {(x, y) for x in range(3) for y in range(3) if x != y}print(pairs)# {(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)}The if x != y filter excludes identical pairs. The set ensures no duplicate tuples even if you generated the same pair twice.
Practical Use Cases
Finding shared items between collections
user_a_favourites = ["Python", "Go", "Rust", "TypeScript"]user_b_favourites = ["JavaScript", "Python", "TypeScript", "Kotlin"]
# Unique favourites of each userset_a = {lang.lower() for lang in user_a_favourites}set_b = {lang.lower() for lang in user_b_favourites}shared = set_a & set_bprint(shared) # {'python', 'typescript'}Extracting unique values from structured data
orders = [ {"id": 1, "status": "shipped", "country": "US"}, {"id": 2, "status": "pending", "country": "UK"}, {"id": 3, "status": "shipped", "country": "US"}, {"id": 4, "status": "delivered", "country": "CA"},]
active_countries = {order["country"] for order in orders if order["status"] != "delivered"}print(active_countries) # {'US', 'UK'}Checking for required items
required_columns = {"name", "email", "age", "role"}csv_headers = {"name", "email", "age"}
missing = required_columns - {col for col in csv_headers}print(missing) # {'role'}if missing: raise ValueError(f"Missing required columns: {missing}")Common Mistakes
Confusing set comprehensions with dict comprehensions. {x: x for x in range(3)} is a dict comprehension. {x for x in range(3)} is a set comprehension. The colon makes the difference.
Expecting a specific order. Sets are unordered. If you need unique values in a predictable order, convert to a sorted list: sorted({...}).
Using mutable elements. Elements must be hashable. You can’t have a set of lists:
bad = {[1, 2], [3, 4]} # TypeError: unhashable type: 'list'good = {(1, 2), (3, 4)} # tuples are hashableWhen a Set Comprehension Is the Right Choice
Reach for a set comprehension when you’re building a collection and:
- Duplicates don’t matter (or you actively want them removed)
- You’ll be testing membership frequently
- You want to combine it with set operations (union, intersection, difference)
When order matters, or when you’re aggregating values (summing, grouping), a list comprehension or loop is more appropriate.