A generator expression looks just like a list comprehension, with one tiny syntax change — round brackets instead of square. The behaviour is different: instead of building a whole list, it produces values one at a time.

The syntax

# list comprehension — builds a list in memory
squares_list: list[int] = [n * n for n in range(1, 6)]
print(squares_list)        # [1, 4, 9, 16, 25]

# generator expression — produces values lazily
squares_gen = (n * n for n in range(1, 6))
print(squares_gen)         # <generator object ...>

for s in squares_gen:
    print(s)
1
4
9
16
25

The list version exists fully in memory the moment you create it. The generator only produces values when something asks for them — for, next(), sum(), list().

Why use generators?

Two reasons:

  1. Memory. A generator can iterate over millions of items without ever holding them all at once.
  2. Composability. You can chain generator expressions together to build a pipeline.
# sum of squares — never builds a list
total: int = sum(n * n for n in range(1, 1_000_000))
print(total)

When you pass a generator expression as the only argument to a function, you can drop the outer parentheses:

total = sum(n * n for n in range(1, 1_000_000))    # OK
total = sum((n * n for n in range(1, 1_000_000)))  # also OK

Generators are one-shot

Like any iterator, a generator can be consumed once. After that, it’s empty:

gen = (n * n for n in range(1, 4))

print(list(gen))   # [1, 4, 9]
print(list(gen))   # [] — already exhausted

If you need to iterate twice, either build a list or recreate the generator.

Pipelines

Generators shine when you stack them:

import sys

# imagine these are coming from a file or network
lines = (line.strip() for line in sys.stdin)
non_empty = (line for line in lines if line)
numbers = (int(line) for line in non_empty if line.isdigit())
total = sum(numbers)

Each layer is a filter or transform. No intermediate list is ever built. Values flow through one at a time.

This is exactly how data pipelines in ML and scientific computing work — read a chunk, filter it, transform it, feed it to the model. NumPy and PyTorch take this further with vectorisation (Section 16), but the generator idea is the same.

Quick comparison

# comprehension — eager, list built in memory
adults = [u for u in users if u.age >= 18]

# generator expression — lazy, no list built
adult_iter = (u for u in users if u.age >= 18)

# both work in a for loop
for u in adults: ...
for u in adult_iter: ...

# but you can iterate adults again
# you cannot iterate adult_iter again

# adults uses memory; adult_iter doesn't

Use a list when

  • You need to iterate multiple times
  • You need to look up by index
  • You need len()
  • The data is small and you want simple, debuggable values

Use a generator when

  • The data is huge or infinite
  • You only walk it once
  • You’re feeding a chain of transformations
  • You want to start producing results before all input is ready

A practical example — find the first match

next() with a generator is the cleanest way to find the first matching item:

people = [
    {"name": "alice", "active": False},
    {"name": "bob", "active": True},
    {"name": "carol", "active": True},
]

first_active = next((p for p in people if p["active"]), None)
print(first_active)   # {'name': 'bob', 'active': True}

The generator doesn’t search the whole list — it stops the moment next() gets one match. For long collections, this is a real speed win.

What’s next

A generator expression is great for one-line cases. For anything longer, Python has the yield keyword — the way to build a generator function.

Toggle theme (T)