A module is a file. A package is a folder of related modules — a way to group code as your project grows past a handful of files.
What makes a folder a package?
A folder becomes a package when it contains an __init__.py file:
my_project/
├── main.py
└── data_tools/
├── __init__.py
├── load.py
└── clean.py
data_tools/ is now a package. Inside it, load.py and clean.py are modules.
The __init__.py can be empty — its presence alone tells Python “this is a package”.
Importing from a package
from data_tools.load import load_csv
from data_tools.clean import remove_nulls
Or with the dotted form:
import data_tools.load
import data_tools.clean
data_tools.load.load_csv("file.csv")
Or with as:
import data_tools.load as loader
loader.load_csv("file.csv")
The . is how you reach into a package’s modules. data_tools.clean reads as “the clean module inside the data_tools package”.
Nested packages
Packages can contain other packages. Just put an __init__.py at every level:
my_project/
└── data_tools/
├── __init__.py
├── load.py
└── transforms/
├── __init__.py
├── normalise.py
└── encode.py
from data_tools.transforms.normalise import min_max_scale
This is exactly how big libraries (NumPy, PyTorch) are structured — packages inside packages, each with focused submodules.
What goes in init.py?
For small projects: nothing — keep it empty. It’s just a marker.
For larger projects, __init__.py is often used to re-export the package’s public API:
from data_tools.load import load_csv
from data_tools.clean import remove_nulls
__all__ = ["load_csv", "remove_nulls"]
Now callers can write from data_tools import load_csv instead of from data_tools.load import load_csv. The __all__ list documents what’s “public”.
This is convenience, not necessity. Start with empty __init__.py files and only add re-exports when the project is big enough that callers benefit.
Relative imports — inside the same package
Within a package, you can import sibling modules with a leading dot:
from .load import load_csv # same as: from data_tools.load import load_csv
. means “the current package”. .. means “one level up”.
When to use relative vs absolute imports:
- Absolute (
from data_tools.load import ...) — clearer, works from any context. Prefer this in most code. - Relative (
from .load import ...) — shorter, useful when you might rename the package later.
Both work. Pick one style per project.
Running a script inside a package
If your script is part of a package and uses relative imports, you can’t run it directly:
uv run python data_tools/load.py
# ImportError: attempted relative import with no known parent package
Instead, run it as a module from the package’s root:
uv run python -m data_tools.load
The -m flag tells Python “treat this as a module of the package”, not “run this as a script”. Relative imports then work.
A realistic package shape
my_project/
├── pyproject.toml
├── README.md
└── src/
└── my_project/
├── __init__.py
├── main.py
├── data/
│ ├── __init__.py
│ ├── load.py
│ └── clean.py
└── models/
├── __init__.py
└── train.py
This is the src layout — popular in modern Python projects. The package source lives in src/my_project/, separate from project files like README.md and pyproject.toml.
For learning, a flat layout is fine. The src layout pays off when you start publishing your package or running automated tests.
pycache
After importing a package, you’ll see __pycache__/ folders appear. These hold compiled bytecode for faster re-imports. We met them in Section 1 — they’re safe to ignore and should be added to .gitignore.
uv init adds __pycache__/ to your .gitignore for you.
What’s next
A short tour of the standard library — the modules that come with Python and what each is good for.