Vectorisation is the practice of expressing your computation as operations on whole arrays rather than per-element loops. Done well, it makes NumPy code shorter, clearer, and much faster than the equivalent Python.
This lesson is partly philosophy, partly practical patterns.
The performance difference
A motivating example. Compare adding two arrays element by element using a Python loop versus NumPy:
import numpy as np
import time
n = 1_000_000
a = list(range(n))
b = list(range(n))
# Python loop
start = time.perf_counter()
result = [a[i] + b[i] for i in range(n)]
python_time = time.perf_counter() - start
# NumPy
a_np = np.array(a)
b_np = np.array(b)
start = time.perf_counter()
result_np = a_np + b_np
numpy_time = time.perf_counter() - start
print(f"Python: {python_time:.4f}s")
print(f"NumPy: {numpy_time:.4f}s")
print(f"Speedup: {python_time / numpy_time:.1f}x")
Typical output:
Python: 0.0850s
NumPy: 0.0020s
Speedup: 42.5x
That’s roughly 40× faster for a million elements. The gap grows for more complex operations and larger arrays.
Why it’s faster
Two reasons (we mentioned in lesson 2):
- NumPy operations run in compiled C, not interpreted Python.
- Arrays are stored contiguously in memory, so the CPU can read them efficiently.
Vectorising your code means staying inside NumPy as long as possible. Each time you “drop out” to a Python loop, you pay the slow tax.
The vectorisation mindset
When solving a problem, ask yourself: can I phrase this as operations on whole arrays? Common patterns:
Replace for loops with arithmetic
# slow — explicit loop
result = []
for x in xs:
result.append(x * 2 + 1)
# fast — vectorised
result = xs * 2 + 1
Replace conditional loops with np.where
# slow
result = []
for x in xs:
if x > 0:
result.append(x)
else:
result.append(0)
# fast
result = np.where(xs > 0, xs, 0)
np.where(condition, value_if_true, value_if_false) does element-wise selection.
Replace nested loops with broadcasting
# slow — pairwise distance between rows of two matrices
distances = []
for a in A:
row = []
for b in B:
row.append(np.sqrt(((a - b) ** 2).sum()))
distances.append(row)
# fast — using broadcasting
diff = A[:, None, :] - B[None, :, :] # shape (n_A, n_B, n_features)
distances = np.sqrt((diff ** 2).sum(axis=2))
Useful vectorised functions
A small sample:
import numpy as np
a = np.array([1, -2, 3, -4, 5])
np.abs(a) # [1 2 3 4 5]
np.maximum(a, 0) # [1 0 3 0 5] — element-wise max with 0
np.minimum(a, 0) # [0 -2 0 -4 0] — element-wise min with 0
np.clip(a, 0, 3) # [1 0 3 0 3] — clamp into [0, 3]
np.sign(a) # [ 1 -1 1 -1 1]
np.cumsum(a) # [ 1 -1 2 -2 3] — running sum
np.cumprod(a) # [ 1 -2 -6 24 120]
np.unique(np.array([1, 2, 2, 3, 3, 3])) # [1 2 3]
np.sort(np.array([3, 1, 2])) # [1 2 3]
np.argsort(np.array([3, 1, 2])) # [1 2 0] — indices that sort
Each of these would be a small loop in plain Python.
When to drop back to a loop
Sometimes the operation genuinely needs a loop — for example, an iterative algorithm where each step depends on the previous result. NumPy can’t help much there.
# inherently sequential — a Python loop is fine
balance = 1000.0
for tx in transactions:
balance = balance * (1 + tx)
Don’t try to vectorise this for the sake of it. Vectorise where it’s natural; loop where you must.
A complete example — Monte Carlo estimate of π
A classic exercise: estimate π by throwing random darts at a square and counting how many land inside the unit circle.
import numpy as np
n = 1_000_000
np.random.seed(42)
x = np.random.uniform(-1, 1, size=n)
y = np.random.uniform(-1, 1, size=n)
inside = (x ** 2 + y ** 2) <= 1.0
pi_estimate = 4 * inside.sum() / n
print(f"π ≈ {pi_estimate}")
π ≈ 3.141908
A million random points, evaluated in one or two NumPy operations. The equivalent Python loop would take 30 seconds; NumPy runs it in under a second.
Summary of Section 16
You can now:
- Create and reshape NumPy arrays
- Perform element-wise arithmetic without loops
- Use broadcasting to combine arrays of different shapes
- Index and slice in multiple dimensions
- Vectorise loops for big speedups
NumPy is the gateway to scientific Python. Pandas uses NumPy for everything. PyTorch and TensorFlow mirror NumPy’s API on top of GPU code. Even scikit-learn passes NumPy arrays around.
Course summary
You’ve come a long way:
- Getting Started — installed Python with uv, set up your editor
- Variables and Types — numbers, strings, booleans, conversion
- Operators — every way to combine values
- Control Flow —
if, loops,match - Functions — typed, documented, flexible
- Data Structures — lists, tuples, sets, dicts, comprehensions
- Functional Tools —
map,filter,zip,sorted,any,all - Iterators and Generators — lazy data pipelines
- Exceptions — handling failure properly
- File Handling — text, CSV, JSON, paths
- Modules and Packages — organising larger code
- OOP — classes, properties, inheritance, dataclasses
- Type System — pyright, generics, Protocols
- Standard Library — the most useful built-in modules
- Debugging and Code Quality — tracebacks, debugger, logging, Ruff, PEP 8
- NumPy Fundamentals — your first step into scientific computing
What you have now is the foundation. The next step depends on where you want to go:
- Data analysis — learn Pandas and Matplotlib.
- Machine learning — scikit-learn, then PyTorch.
- Web backends — FastAPI or Django.
- Automation —
requests,playwright, scheduling tools.
Whichever direction, the Python you’ve learned here is the bedrock you’ll build on. Thanks for following along.