Sets - Python Programming - Fundamentals

A set is an unordered collection of unique items. Two key properties:

No duplicates. Adding the same item twice has no effect.
No order. Sets don’t keep items in any particular sequence.

Use a set when you care about membership (“is this in the collection?”) or want to drop duplicates.

Creating a set

tags: set[str] = {"python", "ml", "ai"}
numbers: set[int] = {1, 2, 3, 4, 5}

# from another collection — duplicates are dropped
unique: set[int] = set([1, 2, 2, 3, 3, 3])
print(unique)   # {1, 2, 3}

A set:

Uses curly braces { }.
Items are separated by commas.
Cannot have duplicates.
Items must be hashable (numbers, strings, tuples — not lists or dicts).

The empty set is set(), not {}. The {} syntax creates an empty dictionary (next lesson).

Adding and removing

tags: set[str] = {"python", "ml"}

tags.add("ai")
print(tags)            # {'python', 'ml', 'ai'}  (order may vary)

tags.add("python")     # already there — no change
print(tags)            # {'python', 'ml', 'ai'}

tags.remove("ml")      # raises KeyError if not found
tags.discard("nope")   # safe — does nothing if not found

popped = tags.pop()    # removes an arbitrary item
print(popped, tags)

Sets are unordered

You can’t index a set:

tags[0]   # TypeError: 'set' object is not subscriptable

You also can’t slice. If you need order, use a list instead.

Membership — what sets are made for

The whole point of a set is fast “is this in here?” checks:

tags: set[str] = {"python", "ml", "ai"}

print("python" in tags)    # True
print("rust" in tags)      # False

For a list of N items, in has to scan up to all N. For a set, in takes the same time no matter how big the set is. The difference matters at scale — checking membership against a set of one million items is roughly as fast as checking against a set of ten.

Removing duplicates

The classic use case:

words: list[str] = ["python", "ml", "python", "ai", "ml", "ai"]
unique: list[str] = list(set(words))
print(unique)   # ['python', 'ml', 'ai']  (order may vary)

Two notes:

The order is not preserved by the set.
If you need to preserve order, use list(dict.fromkeys(words)) — a Python idiom that uses dictionaries (which do keep insertion order).

Set operations

Sets support proper mathematical operations:

a: set[int] = {1, 2, 3, 4}
b: set[int] = {3, 4, 5, 6}

print(a | b)    # union          {1, 2, 3, 4, 5, 6}
print(a & b)    # intersection   {3, 4}
print(a - b)    # difference     {1, 2}
print(a ^ b)    # symmetric difference {1, 2, 5, 6}

The same operations as methods:

print(a.union(b))                # same as a | b
print(a.intersection(b))         # same as a & b
print(a.difference(b))           # same as a - b
print(a.symmetric_difference(b)) # same as a ^ b

The operator form is shorter and reads almost like maths.

Subset and superset

small: set[int] = {1, 2}
big: set[int] = {1, 2, 3, 4}

print(small <= big)        # True — small is a subset of big
print(big >= small)        # True — big is a superset of small
print(small < big)         # True — strict subset (small != big)
print(small.isdisjoint({99, 100}))   # True — no items in common

Looping over a set

for tag in {"python", "ml", "ai"}:
    print(tag)

The order isn’t guaranteed. If you need consistent order, sort first:

for tag in sorted({"python", "ml", "ai"}):
    print(tag)

Frozen sets

If you need an immutable set (one that can’t be changed), use frozenset:

permissions: frozenset[str] = frozenset({"read", "write"})
permissions.add("delete")    # AttributeError

You’ll rarely need this in beginner code. It’s useful when you want to use a set as a key in a dictionary (regular sets can’t be keys because they’re mutable).

A practical example — common tags

Find which tags appear in two articles:

article_a: set[str] = {"python", "ai", "tutorial", "beginner"}
article_b: set[str] = {"python", "ml", "advanced", "tutorial"}

common: set[str] = article_a & article_b
print(common)   # {'python', 'tutorial'}

unique_to_a: set[str] = article_a - article_b
print(unique_to_a)   # {'ai', 'beginner'}

This is exactly the kind of work sets are made for.

What’s next

Sets handle “is it in here” and “what’s in both?”. Next, the most important data structure in Python (and in data science): dictionaries.