CSV — comma-separated values — is the simplest tabular file format. Every spreadsheet program reads and writes it; every data tool starts with it.
Python’s csv module reads and writes CSV correctly — handling quoting, commas inside values, and edge cases that a naive split(",") would miss.
A sample file
Save this as users.csv:
name,age,city
Manikandan,30,Chennai
Alice,25,Mumbai
Bob,28,Bangalore
The first line is the header — the column names. Each line after that is a record.
Reading a CSV — csv.reader
import csv
with open("users.csv", encoding="utf-8") as f:
reader = csv.reader(f)
for row in reader:
print(row)
['name', 'age', 'city']
['Manikandan', '30', 'Chennai']
['Alice', '25', 'Mumbai']
['Bob', '28', 'Bangalore']
Each row is a list of strings. Even 30 comes through as "30" — CSV has no concept of types. Convert as needed:
with open("users.csv", encoding="utf-8") as f:
reader = csv.reader(f)
next(reader) # skip the header
for row in reader:
name: str = row[0]
age: int = int(row[1])
city: str = row[2]
print(f"{name}, {age}, {city}")
Indexing by position is fragile — if someone reorders the columns, your code silently breaks. Better:
Reading with column names — csv.DictReader
with open("users.csv", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(row)
{'name': 'Manikandan', 'age': '30', 'city': 'Chennai'}
{'name': 'Alice', 'age': '25', 'city': 'Mumbai'}
{'name': 'Bob', 'age': '28', 'city': 'Bangalore'}
Each row is a dictionary, keyed by the header. Access fields by name — not position:
with open("users.csv", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(f"{row['name']} is {row['age']} years old")
Use DictReader by default. It’s more readable and survives column reordering.
Writing a CSV — csv.writer
import csv
rows: list[list[str | int]] = [
["name", "age", "city"],
["Manikandan", 30, "Chennai"],
["Alice", 25, "Mumbai"],
]
with open("output.csv", "w", encoding="utf-8", newline="") as f:
writer = csv.writer(f)
writer.writerows(rows)
Two things to notice:
- The mode is
"w"and we passnewline="". Always passnewline=""when opening a CSV file for write — otherwise Windows adds extra blank lines. This is a Python quirk you should just memorise. writerowswrites a list of rows in one call.writerow(singular) writes one row.
Writing with column names — csv.DictWriter
records: list[dict[str, str | int]] = [
{"name": "Manikandan", "age": 30, "city": "Chennai"},
{"name": "Alice", "age": 25, "city": "Mumbai"},
]
with open("output.csv", "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "age", "city"])
writer.writeheader()
writer.writerows(records)
DictWriter matches DictReader — easier to read, harder to break by reordering.
Different delimiters — TSV and friends
CSV’s “C” stands for comma, but the format works with any separator. The csv module supports any choice:
with open("data.tsv", encoding="utf-8") as f:
reader = csv.reader(f, delimiter="\t")
for row in reader:
print(row)
Use delimiter="\t" for tab-separated files (.tsv), delimiter=";" for European-style CSVs.
When NOT to use the csv module
For serious data work — millions of rows, complex types, missing values — use Pandas:
import pandas as pd
df = pd.read_csv("users.csv")
print(df.head())
Pandas handles types, missing values, dates, and large files much better. We’ll meet it again in any follow-up course on data analysis.
For small files, scripting, or when you don’t want a heavy dependency, the built-in csv module is perfect.
A realistic example — filter rows
Read users, filter by city, write the result:
import csv
with open("users.csv", encoding="utf-8") as src, \
open("chennai_users.csv", "w", encoding="utf-8", newline="") as dst:
reader = csv.DictReader(src)
writer = csv.DictWriter(dst, fieldnames=reader.fieldnames or [])
writer.writeheader()
for row in reader:
if row["city"] == "Chennai":
writer.writerow(row)
This streams the file — never loads everything into memory. Even for a CSV with millions of rows, the memory use stays tiny.
What’s next
CSV handles tables. Next — JSON, the format used by almost every API and config file on the modern internet.