Python Set Comprehensions: Creating Unique Collections the Concise Way

Set comprehensions give you the conciseness of list comprehensions combined with the uniqueness guarantee of sets. They’re the right tool when you want to build a collection of unique values from an iterable — and you want to do it in one line.

Syntax

{expression for item in iterable}

Note the curly braces. This looks like a dict comprehension until you notice there’s no colon — one expression instead of key: value.

Basic Examples

# Get unique lengths of words
words = ["apple", "banana", "cherry", "apricot", "blueberry"]
unique_lengths = {len(word) for word in words}
print(unique_lengths)   # {5, 6, 9} — each length appears once

# Unique first characters
first_chars = {word[0] for word in words}
print(first_chars)   # {'a', 'b', 'c'}

# Unique absolute values
numbers = [-3, -1, 0, 1, 2, -2, 3]
absolute_unique = {abs(n) for n in numbers}
print(absolute_unique)  # {0, 1, 2, 3}

Notice the last example: both -3 and 3 produce 3 via abs(), so the set contains only one 3. The deduplication happens automatically.

With Filter Conditions

Add if to include only elements that pass a test:

# Unique even numbers
data = [1, 2, 3, 2, 4, 1, 5, 4, 6]
unique_evens = {n for n in data if n % 2 == 0}
print(unique_evens)   # {2, 4, 6}

# Unique words longer than 4 characters
text = "the quick brown fox jumps over the lazy dog"
long_words = {word for word in text.split() if len(word) > 4}
print(long_words)   # {'quick', 'brown', 'jumps', 'lazy'}
# 'the' and 'over' and 'fox' are excluded; 'the' appears twice but only once in set

Expressions in the Comprehension

The expression part can be any valid Python expression, not just the item itself:

# Normalise strings as you collect them
tags = ["Python", "python", "PYTHON", "javascript", "JavaScript"]
normalised_tags = {tag.lower() for tag in tags}
print(normalised_tags)  # {'python', 'javascript'}

# Extract domains from email addresses
emails = [
    "alice@company.com",
    "bob@company.com",
    "charlie@university.edu",
    "diana@university.edu",
]
domains = {email.split("@")[1] for email in emails}
print(domains)   # {'company.com', 'university.edu'}

The domain extraction example shows why set comprehensions are often better than list comprehensions for this kind of work: a list would give you ["company.com", "company.com", "university.edu", "university.edu"], but you likely want the unique set.

Compared to List Comprehensions

The difference between {expr for x in iterable} and [expr for x in iterable] is:

List: ordered, allows duplicates, supports indexing
Set: unordered, unique values only, supports fast membership testing

source = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]

as_list = [x for x in source]    # [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]
as_set  = {x for x in source}    # {1, 2, 3, 4, 5, 6, 9}

print(len(as_list))   # 10
print(len(as_set))    # 7 (duplicates removed)

Choosing between them: if you need to keep duplicates or maintain order, use a list comprehension. If you need unique values and fast lookup, use a set comprehension.

Nested Set Comprehensions

You can loop over nested iterables in a set comprehension:

# Unique characters from all words
words = ["hello", "world", "python"]
all_chars = {char for word in words for char in word}
print(all_chars)
# {'h', 'e', 'l', 'o', 'w', 'r', 'd', 'p', 'y', 't', 'n'}

# Unique pairs from two ranges
pairs = {(x, y) for x in range(3) for y in range(3) if x != y}
print(pairs)
# {(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)}

The if x != y filter excludes identical pairs. The set ensures no duplicate tuples even if you generated the same pair twice.

Practical Use Cases

Finding shared items between collections

user_a_favourites = ["Python", "Go", "Rust", "TypeScript"]
user_b_favourites = ["JavaScript", "Python", "TypeScript", "Kotlin"]

# Unique favourites of each user
set_a = {lang.lower() for lang in user_a_favourites}
set_b = {lang.lower() for lang in user_b_favourites}
shared = set_a & set_b
print(shared)   # {'python', 'typescript'}

Extracting unique values from structured data

orders = [
    {"id": 1, "status": "shipped", "country": "US"},
    {"id": 2, "status": "pending", "country": "UK"},
    {"id": 3, "status": "shipped", "country": "US"},
    {"id": 4, "status": "delivered", "country": "CA"},
]

active_countries = {order["country"] for order in orders if order["status"] != "delivered"}
print(active_countries)   # {'US', 'UK'}

Checking for required items

required_columns = {"name", "email", "age", "role"}
csv_headers = {"name", "email", "age"}

missing = required_columns - {col for col in csv_headers}
print(missing)   # {'role'}
if missing:
    raise ValueError(f"Missing required columns: {missing}")

Common Mistakes

Confusing set comprehensions with dict comprehensions. {x: x for x in range(3)} is a dict comprehension. {x for x in range(3)} is a set comprehension. The colon makes the difference.

Expecting a specific order. Sets are unordered. If you need unique values in a predictable order, convert to a sorted list: sorted({...}).

Using mutable elements. Elements must be hashable. You can’t have a set of lists:

bad = {[1, 2], [3, 4]}      # TypeError: unhashable type: 'list'
good = {(1, 2), (3, 4)}     # tuples are hashable

When a Set Comprehension Is the Right Choice

Reach for a set comprehension when you’re building a collection and:

Duplicates don’t matter (or you actively want them removed)
You’ll be testing membership frequently
You want to combine it with set operations (union, intersection, difference)

When order matters, or when you’re aggregating values (summing, grouping), a list comprehension or loop is more appropriate.

Written by NPBlue Engineering Team — Software & Data Engineers who ships production Python across data, backend, and ML systems.

Reviewed for technical accuracy. Spot an error? Let us know.