A generator expression looks just like a list comprehension, with one tiny syntax change — round brackets instead of square. The behaviour is different: instead of building a whole list, it produces values one at a time.
The syntax
# list comprehension — builds a list in memory
squares_list: list[int] = [n * n for n in range(1, 6)]
print(squares_list) # [1, 4, 9, 16, 25]
# generator expression — produces values lazily
squares_gen = (n * n for n in range(1, 6))
print(squares_gen) # <generator object ...>
for s in squares_gen:
print(s)
1
4
9
16
25
The list version exists fully in memory the moment you create it. The generator only produces values when something asks for them — for, next(), sum(), list().
Why use generators?
Two reasons:
- Memory. A generator can iterate over millions of items without ever holding them all at once.
- Composability. You can chain generator expressions together to build a pipeline.
# sum of squares — never builds a list
total: int = sum(n * n for n in range(1, 1_000_000))
print(total)
When you pass a generator expression as the only argument to a function, you can drop the outer parentheses:
total = sum(n * n for n in range(1, 1_000_000)) # OK
total = sum((n * n for n in range(1, 1_000_000))) # also OK
Generators are one-shot
Like any iterator, a generator can be consumed once. After that, it’s empty:
gen = (n * n for n in range(1, 4))
print(list(gen)) # [1, 4, 9]
print(list(gen)) # [] — already exhausted
If you need to iterate twice, either build a list or recreate the generator.
Pipelines
Generators shine when you stack them:
import sys
# imagine these are coming from a file or network
lines = (line.strip() for line in sys.stdin)
non_empty = (line for line in lines if line)
numbers = (int(line) for line in non_empty if line.isdigit())
total = sum(numbers)
Each layer is a filter or transform. No intermediate list is ever built. Values flow through one at a time.
This is exactly how data pipelines in ML and scientific computing work — read a chunk, filter it, transform it, feed it to the model. NumPy and PyTorch take this further with vectorisation (Section 16), but the generator idea is the same.
Quick comparison
# comprehension — eager, list built in memory
adults = [u for u in users if u.age >= 18]
# generator expression — lazy, no list built
adult_iter = (u for u in users if u.age >= 18)
# both work in a for loop
for u in adults: ...
for u in adult_iter: ...
# but you can iterate adults again
# you cannot iterate adult_iter again
# adults uses memory; adult_iter doesn't
Use a list when
- You need to iterate multiple times
- You need to look up by index
- You need
len() - The data is small and you want simple, debuggable values
Use a generator when
- The data is huge or infinite
- You only walk it once
- You’re feeding a chain of transformations
- You want to start producing results before all input is ready
A practical example — find the first match
next() with a generator is the cleanest way to find the first matching item:
people = [
{"name": "alice", "active": False},
{"name": "bob", "active": True},
{"name": "carol", "active": True},
]
first_active = next((p for p in people if p["active"]), None)
print(first_active) # {'name': 'bob', 'active': True}
The generator doesn’t search the whole list — it stops the moment next() gets one match. For long collections, this is a real speed win.
What’s next
A generator expression is great for one-line cases. For anything longer, Python has the yield keyword — the way to build a generator function.