Vectorisation is the practice of expressing your computation as operations on whole arrays rather than per-element loops. Done well, it makes NumPy code shorter, clearer, and much faster than the equivalent Python.

This lesson is partly philosophy, partly practical patterns.

The performance difference

A motivating example. Compare adding two arrays element by element using a Python loop versus NumPy:

import numpy as np
import time

n = 1_000_000
a = list(range(n))
b = list(range(n))

# Python loop
start = time.perf_counter()
result = [a[i] + b[i] for i in range(n)]
python_time = time.perf_counter() - start

# NumPy
a_np = np.array(a)
b_np = np.array(b)
start = time.perf_counter()
result_np = a_np + b_np
numpy_time = time.perf_counter() - start

print(f"Python: {python_time:.4f}s")
print(f"NumPy:  {numpy_time:.4f}s")
print(f"Speedup: {python_time / numpy_time:.1f}x")

Typical output:

Python: 0.0850s
NumPy:  0.0020s
Speedup: 42.5x

That’s roughly 40× faster for a million elements. The gap grows for more complex operations and larger arrays.

Why it’s faster

Two reasons (we mentioned in lesson 2):

  1. NumPy operations run in compiled C, not interpreted Python.
  2. Arrays are stored contiguously in memory, so the CPU can read them efficiently.

Vectorising your code means staying inside NumPy as long as possible. Each time you “drop out” to a Python loop, you pay the slow tax.

The vectorisation mindset

When solving a problem, ask yourself: can I phrase this as operations on whole arrays? Common patterns:

Replace for loops with arithmetic

# slow — explicit loop
result = []
for x in xs:
    result.append(x * 2 + 1)

# fast — vectorised
result = xs * 2 + 1

Replace conditional loops with np.where

# slow
result = []
for x in xs:
    if x > 0:
        result.append(x)
    else:
        result.append(0)

# fast
result = np.where(xs > 0, xs, 0)

np.where(condition, value_if_true, value_if_false) does element-wise selection.

Replace nested loops with broadcasting

# slow — pairwise distance between rows of two matrices
distances = []
for a in A:
    row = []
    for b in B:
        row.append(np.sqrt(((a - b) ** 2).sum()))
    distances.append(row)

# fast — using broadcasting
diff = A[:, None, :] - B[None, :, :]   # shape (n_A, n_B, n_features)
distances = np.sqrt((diff ** 2).sum(axis=2))

Useful vectorised functions

A small sample:

import numpy as np

a = np.array([1, -2, 3, -4, 5])

np.abs(a)            # [1 2 3 4 5]
np.maximum(a, 0)     # [1 0 3 0 5]      — element-wise max with 0
np.minimum(a, 0)     # [0 -2 0 -4 0]    — element-wise min with 0
np.clip(a, 0, 3)     # [1 0 3 0 3]      — clamp into [0, 3]
np.sign(a)           # [ 1 -1  1 -1  1]

np.cumsum(a)         # [ 1 -1  2 -2  3]   — running sum
np.cumprod(a)        # [   1   -2   -6   24  120]

np.unique(np.array([1, 2, 2, 3, 3, 3]))   # [1 2 3]
np.sort(np.array([3, 1, 2]))               # [1 2 3]
np.argsort(np.array([3, 1, 2]))            # [1 2 0]  — indices that sort

Each of these would be a small loop in plain Python.

When to drop back to a loop

Sometimes the operation genuinely needs a loop — for example, an iterative algorithm where each step depends on the previous result. NumPy can’t help much there.

# inherently sequential — a Python loop is fine
balance = 1000.0
for tx in transactions:
    balance = balance * (1 + tx)

Don’t try to vectorise this for the sake of it. Vectorise where it’s natural; loop where you must.

A complete example — Monte Carlo estimate of π

A classic exercise: estimate π by throwing random darts at a square and counting how many land inside the unit circle.

import numpy as np

n = 1_000_000
np.random.seed(42)

x = np.random.uniform(-1, 1, size=n)
y = np.random.uniform(-1, 1, size=n)

inside = (x ** 2 + y ** 2) <= 1.0
pi_estimate = 4 * inside.sum() / n

print(f"π ≈ {pi_estimate}")
π ≈ 3.141908

A million random points, evaluated in one or two NumPy operations. The equivalent Python loop would take 30 seconds; NumPy runs it in under a second.

Summary of Section 16

You can now:

  • Create and reshape NumPy arrays
  • Perform element-wise arithmetic without loops
  • Use broadcasting to combine arrays of different shapes
  • Index and slice in multiple dimensions
  • Vectorise loops for big speedups

NumPy is the gateway to scientific Python. Pandas uses NumPy for everything. PyTorch and TensorFlow mirror NumPy’s API on top of GPU code. Even scikit-learn passes NumPy arrays around.

Course summary

You’ve come a long way:

  1. Getting Started — installed Python with uv, set up your editor
  2. Variables and Types — numbers, strings, booleans, conversion
  3. Operators — every way to combine values
  4. Control Flowif, loops, match
  5. Functions — typed, documented, flexible
  6. Data Structures — lists, tuples, sets, dicts, comprehensions
  7. Functional Toolsmap, filter, zip, sorted, any, all
  8. Iterators and Generators — lazy data pipelines
  9. Exceptions — handling failure properly
  10. File Handling — text, CSV, JSON, paths
  11. Modules and Packages — organising larger code
  12. OOP — classes, properties, inheritance, dataclasses
  13. Type System — pyright, generics, Protocols
  14. Standard Library — the most useful built-in modules
  15. Debugging and Code Quality — tracebacks, debugger, logging, Ruff, PEP 8
  16. NumPy Fundamentals — your first step into scientific computing

What you have now is the foundation. The next step depends on where you want to go:

  • Data analysis — learn Pandas and Matplotlib.
  • Machine learning — scikit-learn, then PyTorch.
  • Web backends — FastAPI or Django.
  • Automationrequests, playwright, scheduling tools.

Whichever direction, the Python you’ve learned here is the bedrock you’ll build on. Thanks for following along.

Toggle theme (T)