Broadcasting is NumPy’s rule for combining arrays of different shapes. It lets you write array + scalar, matrix + row, or column + row without manually expanding either side.
It’s the single most powerful idea in NumPy — and it’s well worth learning carefully.
The simplest case — scalar with array
import numpy as np
a = np.array([1, 2, 3])
print(a + 10) # [11 12 13]
NumPy treats the scalar 10 as if it were an array of [10, 10, 10] and adds element by element. This is broadcasting in its simplest form.
Row with matrix
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
])
row = np.array([10, 20, 30])
print(matrix + row)
# [[11 22 33]
# [14 25 36]
# [17 28 39]]
NumPy “stretches” the row vector down across every row of the matrix. No loops, no copies.
Column with matrix
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
])
column = np.array([[100], [200], [300]]) # shape (3, 1)
print(matrix + column)
# [[101 102 103]
# [204 205 206]
# [307 308 309]]
The column vector (shape (3, 1)) is stretched across the three columns of the matrix.
The rules of broadcasting
NumPy compares the trailing dimensions of two arrays. Two dimensions are compatible when:
- They’re equal, OR
- One of them is 1
If every dimension pair is compatible, broadcasting works. The shape 1 dimension is “stretched” to match the other side.
Examples:
A.shape: (3, 4)
B.shape: (4,) → B is treated as shape (1, 4), stretched to (3, 4). OK.
A.shape: (3, 4)
B.shape: (3, 1) → B is stretched to (3, 4). OK.
A.shape: (3, 4)
B.shape: (3,) → trailing dims don't match (4 vs 3). FAIL.
If shapes don’t broadcast, NumPy raises:
ValueError: operands could not be broadcast together with shapes (3,4) (3,)
Why it matters
Many ML operations express naturally as broadcast operations:
# normalise each feature column to zero mean and unit standard deviation
data = np.random.rand(100, 5) # 100 samples, 5 features
normalised = (data - data.mean(axis=0)) / data.std(axis=0)
print(normalised.shape) # (100, 5)
data.mean(axis=0) has shape (5,). Broadcasting stretches it across all 100 rows. No loops, no per-feature code.
Or computing distances between every pair of points:
points = np.array([
[0, 0],
[1, 1],
[3, 4],
])
# subtract every point from every other — uses broadcasting
diff = points[:, None, :] - points[None, :, :]
# shape: (3, 3, 2)
dist = np.sqrt((diff ** 2).sum(axis=2))
print(dist)
# [[0. 1.414 5. ]
# [1.414 0. 3.606]
# [5. 3.606 0. ]]
The [:, None, :] trick adds a length-1 dimension, which broadcasting then stretches. It’s dense, but once you see this pattern, you’ll meet it constantly in NumPy and PyTorch code.
When broadcasting goes wrong
The classic surprise: silent broadcasting where you didn’t expect it.
a = np.array([[1], [2], [3]]) # shape (3, 1)
b = np.array([10, 20, 30]) # shape (3,)
print(a + b)
# [[11 21 31]
# [12 22 32]
# [13 23 33]]
You might have thought you were adding two vectors and getting [11, 22, 33]. Instead you got a 3×3 matrix. The fix: make sure shapes are what you think.
Always check array.shape when something looks wrong.
A practical example — feature scaling
A common preprocessing step before training:
import numpy as np
# 5 samples, 3 features
X = np.array([
[10.0, 200.0, 0.1],
[12.0, 180.0, 0.3],
[11.0, 210.0, 0.2],
[15.0, 250.0, 0.4],
[13.0, 190.0, 0.5],
])
# scale each feature to [0, 1]
mins = X.min(axis=0) # shape (3,)
maxs = X.max(axis=0) # shape (3,)
scaled = (X - mins) / (maxs - mins)
print(scaled)
[[0. 0.286 0. ]
[0.4 0. 0.5 ]
[0.2 0.429 0.25 ]
[1. 1. 0.75 ]
[0.6 0.143 1. ]]
Four lines that scale every feature column independently. Broadcasting does the heavy lifting.
What’s next
You understand how shapes combine. Next — indexing and slicing NumPy arrays, which has more tricks than plain Python.