📂 Lesson 4: File I/O & Context Managers
Read and write files, process CSV data, and manage resources cleanly with the with statement — then build your own context managers.
🎯 Learning Objectives
By the end of this lesson, you will be able to:
- Open, read, and write text files using Python's built-in
open() - Use the
withstatement to guarantee files are properly closed - Read and write CSV files using the
csvmodule - Work with file paths using
pathlib - Build custom context managers with both classes and
@contextmanager
Estimated Time: 60 minutes
Project: Build a log-file analyzer that reads, filters, and summarizes server logs
In This Lesson
🤔 Why File I/O Matters
So far, all of our data has lived in memory — once the program ends, it's gone. File I/O (Input/Output) lets your programs persist data, read configuration, process logs, import datasets, and communicate with other systems.
Almost every real-world Python script reads from or writes to files: web servers write access logs, data pipelines read CSVs, apps load settings from config files, and scripts generate reports.
📖 Key Terms
File handle (file object): The Python object returned by open() — it's your connection to the file on disk.
Context manager: An object that defines setup and teardown actions for a with block — files are the most common example.
Encoding: How text is converted to/from bytes on disk. UTF-8 is the modern standard.
Mode: How the file is opened — read ('r'), write ('w'), append ('a'), etc.
📖 Reading Text Files
The built-in open() function returns a file object. The simplest way to read is with .read(), which loads the entire file into a string:
# Assume we have a file called "greeting.txt" containing:
# Hello, World!
# Welcome to Python file I/O.
f = open("greeting.txt", "r") # "r" = read mode (the default)
content = f.read()
f.close() # Always close when done!
print(content)
print(type(content)) # <class 'str'>
Output:
Hello, World!
Welcome to Python file I/O.
<class 'str'>
Reading Methods
| Method | Returns | Best For |
|---|---|---|
.read() |
Entire file as one string | Small files you need as a single block |
.readline() |
One line at a time | When you need line-by-line control |
.readlines() |
List of all lines (with \n) |
When you need random access to lines |
| Iterate the file object | One line per iteration | Large files — memory efficient |
Iterating Line by Line
For most real-world work, you'll iterate over the file object directly. This is memory efficient because Python only loads one line at a time:
f = open("server.log", "r")
error_count = 0
for line in f:
if "ERROR" in line:
error_count += 1
print(line.strip()) # .strip() removes the trailing \n
f.close()
print(f"\nTotal errors: {error_count}")
⚠️ Always Close Your Files
If you forget f.close(), bad things can happen: data may not be flushed to disk, file locks may persist, and you can run out of file descriptors. The with statement (next section) solves this automatically.
Specifying Encoding
Always specify the encoding explicitly for maximum portability:
f = open("data.txt", "r", encoding="utf-8")
content = f.read()
f.close()
🧠 Why UTF-8?
UTF-8 is the universal standard that handles every character from every language — English, Japanese, Arabic, emoji, everything. On some systems (especially Windows), the default encoding might be something else, which can cause UnicodeDecodeError. Always specify encoding="utf-8" to avoid surprises.
✍️ Writing Text Files
Open a file in write mode ("w") or append mode ("a"):
| Mode | Behavior | If File Exists | If File Doesn't Exist |
|---|---|---|---|
"w" |
Write (overwrite) | Erases content | Creates it |
"a" |
Append | Adds to end | Creates it |
"x" |
Exclusive create | Raises FileExistsError |
Creates it |
"r+" |
Read + write | Keeps content | Raises FileNotFoundError |
# Write a new file (or overwrite existing)
f = open("output.txt", "w", encoding="utf-8")
f.write("Line 1: Hello!\n")
f.write("Line 2: This is Python.\n")
f.close()
# Append to an existing file
f = open("output.txt", "a", encoding="utf-8")
f.write("Line 3: Appended later.\n")
f.close()
# Read it back to verify
f = open("output.txt", "r", encoding="utf-8")
print(f.read())
f.close()
Output:
Line 1: Hello!
Line 2: This is Python.
Line 3: Appended later.
Writing Multiple Lines
lines = ["Alice,95,A\n", "Bob,87,B+\n", "Carlos,92,A-\n"]
f = open("grades.csv", "w", encoding="utf-8")
f.writelines(lines) # Writes all strings — does NOT add \n for you
f.close()
⚠️ "w" Mode Erases Everything
Opening a file with "w" mode immediately deletes all existing content — even before you write anything. If you want to add to a file, use "a" (append). If you want safety, use "x" (exclusive create) which refuses to overwrite.
🛡️ The with Statement
The with statement is Python's solution to the "forgot to close the file" problem. It guarantees the file is closed when the block ends — even if an exception occurs:
# ✅ The Pythonic way — always use this
with open("greeting.txt", "r", encoding="utf-8") as f:
content = f.read()
print(content)
# f is automatically closed here — no f.close() needed!
print(f.closed) # True
Compare with the manual approach:
# ❌ The fragile way — don't do this
f = open("greeting.txt", "r")
try:
content = f.read()
print(content)
finally:
f.close() # Same effect, more boilerplate
The with statement works with any context manager — an object that defines what happens at the start and end of a block. Files are the most common, but you'll also see it with database connections, network sockets, locks, and more.
Reading and Writing in One Script
# Read input, process, write output
with open("raw_data.txt", "r", encoding="utf-8") as infile:
lines = infile.readlines()
# Process: strip whitespace, filter blanks, uppercase
processed = [line.strip().upper() for line in lines if line.strip()]
with open("clean_data.txt", "w", encoding="utf-8") as outfile:
for line in processed:
outfile.write(line + "\n")
print(f"Processed {len(processed)} lines.")
✅ Rule of Thumb
Every open() call should be inside a with block. There's almost never a reason to call .close() manually in modern Python.
📊 Working with CSV Files
CSV (Comma-Separated Values) is one of the most common data formats. Python's built-in csv module handles the tricky parts — quoting, escaping, different delimiters — so you don't have to.
Reading CSV with csv.reader
import csv
# Assume students.csv contains:
# name,grade,score
# Alice,A,95
# Bob,B+,87
# Carlos,A-,92
with open("students.csv", "r", encoding="utf-8") as f:
reader = csv.reader(f)
header = next(reader) # Skip the header row
print(f"Columns: {header}")
for row in reader:
name, grade, score = row
print(f" {name}: {grade} ({score})")
Output:
Columns: ['name', 'grade', 'score']
Alice: A (95)
Bob: B+ (87)
Carlos: A- (92)
Reading CSV with csv.DictReader
DictReader is usually more convenient — it gives you each row as a dictionary keyed by the header names:
import csv
with open("students.csv", "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
# row is a dict: {'name': 'Alice', 'grade': 'A', 'score': '95'}
print(f"{row['name']} scored {row['score']} ({row['grade']})")
Output:
Alice scored 95 (A)
Bob scored 87 (B+)
Carlos scored 92 (A-)
Writing CSV Files
import csv
students = [
{"name": "Alice", "grade": "A", "score": 95},
{"name": "Bob", "grade": "B+", "score": 87},
{"name": "Carlos", "grade": "A-", "score": 92},
]
with open("output.csv", "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "grade", "score"])
writer.writeheader()
writer.writerows(students)
print("CSV written successfully!")
🧠 Why newline=""?
On Windows, the csv module handles line endings itself. If you don't pass newline="", you'll get extra blank lines between rows on Windows. This parameter tells Python to leave line-ending handling to the csv module.
['Alice', 'A', '95']"] B -->|csv.DictReader| D["Returns dicts
{'name': 'Alice', 'grade': 'A'}"] C --> E["Access by index
row[0], row[1]"] D --> F["Access by name
row['name'], row['grade']"] style D fill:#10b981,color:#fff style F fill:#10b981,color:#fff
🗂️ File Paths with pathlib
Hardcoding file paths as strings is fragile — different operating systems use different separators (/ vs \). Python's pathlib module gives you an object-oriented way to work with paths that's cross-platform and readable:
from pathlib import Path
# Create a path object
data_dir = Path("data")
file_path = data_dir / "students.csv" # The / operator joins paths!
print(file_path) # data/students.csv (or data\students.csv on Windows)
print(file_path.name) # students.csv
print(file_path.stem) # students
print(file_path.suffix) # .csv
print(file_path.parent) # data
Common Path Operations
from pathlib import Path
p = Path("data/reports/q3.csv")
# Check existence
print(p.exists()) # True/False
print(p.is_file()) # True if it's a file
print(p.is_dir()) # True if it's a directory
# Create directories
Path("output/reports").mkdir(parents=True, exist_ok=True)
# List files in a directory
for f in Path("data").iterdir():
print(f" {f.name} ({'dir' if f.is_dir() else 'file'})")
# Find all CSV files recursively
for csv_file in Path("data").glob("**/*.csv"):
print(f" Found: {csv_file}")
Using pathlib with open()
Path objects work directly with open(), and they also have their own convenience methods:
from pathlib import Path
p = Path("greeting.txt")
# Option 1: Path object with open()
with open(p, "r", encoding="utf-8") as f:
content = f.read()
# Option 2: Path's built-in methods (even simpler!)
content = p.read_text(encoding="utf-8") # Read entire file
p.write_text("Hello!\n", encoding="utf-8") # Write entire file
✅ pathlib vs String Paths
pathlib is the modern, Pythonic approach. Use it for any code you plan to share or maintain. The older os.path module still works but is more verbose and less readable.
🔧 Custom Context Managers
You've been using context managers with files. Now let's build our own. There are two approaches: the class-based approach (implementing __enter__ and __exit__) and the generator-based approach (using @contextmanager).
Approach 1: Class-Based
A context manager is any object with __enter__ and __exit__ methods:
import time
class Timer:
"""Measures how long a block of code takes."""
def __enter__(self):
self.start = time.time()
print("⏱️ Timer started...")
return self # This becomes the `as` variable
def __exit__(self, exc_type, exc_val, exc_tb):
self.elapsed = time.time() - self.start
print(f"⏱️ Elapsed: {self.elapsed:.4f} seconds")
return False # Don't suppress exceptions
# Usage
with Timer() as t:
# Simulate some work
total = sum(range(1_000_000))
print(f"Sum: {total}")
print(f"Stored time: {t.elapsed:.4f}s")
Output:
⏱️ Timer started...
Sum: 499999500000
⏱️ Elapsed: 0.0312 seconds
Stored time: 0.0312s
📖 __exit__ Parameters
The __exit__ method receives three arguments about any exception that occurred inside the with block:
exc_type: The exception class (e.g., ValueError), or None if no exception.
exc_val: The exception instance, or None.
exc_tb: The traceback, or None.
Returning True from __exit__ suppresses the exception. Returning False (or nothing) lets it propagate normally. In most cases, you want to return False.
Approach 2: Generator-Based with @contextmanager
For simpler cases, the contextlib module provides a decorator that turns a generator into a context manager:
from contextlib import contextmanager
import time
@contextmanager
def timer(label="Block"):
start = time.time()
print(f"⏱️ {label} started...")
try:
yield # Everything before yield = __enter__, after = __exit__
finally:
elapsed = time.time() - start
print(f"⏱️ {label} finished in {elapsed:.4f}s")
with timer("Data processing"):
total = sum(range(1_000_000))
print(f"Sum: {total}")
Output:
⏱️ Data processing started...
Sum: 499999500000
⏱️ Data processing finished in 0.0298s
Practical Example: Temporary Directory
from contextlib import contextmanager
from pathlib import Path
import shutil
@contextmanager
def temp_directory(name="temp_work"):
"""Create a temporary working directory, clean up when done."""
path = Path(name)
path.mkdir(exist_ok=True)
print(f"📁 Created {path}/")
try:
yield path
finally:
shutil.rmtree(path)
print(f"🗑️ Cleaned up {path}/")
with temp_directory("scratch") as tmp:
# Write a temp file
(tmp / "data.txt").write_text("temporary data", encoding="utf-8")
print(f"Files: {list(tmp.iterdir())}")
# Directory is automatically deleted here
print(f"Still exists? {Path('scratch').exists()}")
Output:
📁 Created scratch/
Files: [PosixPath('scratch/data.txt')]
🗑️ Cleaned up scratch/
Still exists? False
context manager?"} --> B{"Complex state
or reusable?"} B -->|Yes| C["Class-based
__enter__ / __exit__"] B -->|No| D["@contextmanager
generator with yield"] style C fill:#6366f1,color:#fff style D fill:#10b981,color:#fff
🏋️ Hands-on Exercises
🏋️ Exercise 1: Log Analyzer
Objective: Practice reading files, processing text, and writing output.
Requirements:
- Create a sample log file with at least 10 lines, mixing INFO, WARNING, and ERROR levels
- Read the log file and count occurrences of each level
- Extract only the ERROR lines and write them to a separate file
- Print a summary with counts and the error filename
- Use
withstatements andpathlibthroughout
Starter Code:
from pathlib import Path
from collections import Counter
def create_sample_log(path):
"""Create a sample log file for testing."""
log_lines = [
"2024-01-15 10:00:01 INFO Server started on port 8080",
"2024-01-15 10:00:05 INFO Connected to database",
"2024-01-15 10:01:12 WARNING High memory usage: 85%",
"2024-01-15 10:02:30 ERROR Failed to process request /api/users",
"2024-01-15 10:02:31 INFO Retrying request...",
"2024-01-15 10:02:35 ERROR Retry failed: connection timeout",
"2024-01-15 10:03:00 INFO Request processed successfully",
"2024-01-15 10:05:00 WARNING Disk usage above 90%",
"2024-01-15 10:06:22 ERROR Unhandled exception in /api/orders",
"2024-01-15 10:07:00 INFO Health check passed",
]
# TODO: Write log_lines to the file
pass
def analyze_log(log_path, error_output_path):
"""Read a log file, count levels, extract errors."""
# TODO: Read the log file
# TODO: Count INFO, WARNING, ERROR occurrences
# TODO: Write ERROR lines to error_output_path
# TODO: Return the counts
pass
# Main
log_file = Path("server.log")
error_file = Path("errors.log")
create_sample_log(log_file)
counts = analyze_log(log_file, error_file)
print("=== Log Summary ===")
for level, count in counts.items():
print(f" {level}: {count}")
print(f"Errors saved to: {error_file}")
💡 Hint
To extract the log level, split each line and look for the third element (index 2). Use a Counter from collections to tally them up. Filter lines where the level is "ERROR".
✅ Solution
from pathlib import Path
from collections import Counter
def create_sample_log(path):
log_lines = [
"2024-01-15 10:00:01 INFO Server started on port 8080\n",
"2024-01-15 10:00:05 INFO Connected to database\n",
"2024-01-15 10:01:12 WARNING High memory usage: 85%\n",
"2024-01-15 10:02:30 ERROR Failed to process request /api/users\n",
"2024-01-15 10:02:31 INFO Retrying request...\n",
"2024-01-15 10:02:35 ERROR Retry failed: connection timeout\n",
"2024-01-15 10:03:00 INFO Request processed successfully\n",
"2024-01-15 10:05:00 WARNING Disk usage above 90%\n",
"2024-01-15 10:06:22 ERROR Unhandled exception in /api/orders\n",
"2024-01-15 10:07:00 INFO Health check passed\n",
]
with open(path, "w", encoding="utf-8") as f:
f.writelines(log_lines)
def analyze_log(log_path, error_output_path):
counts = Counter()
errors = []
with open(log_path, "r", encoding="utf-8") as f:
for line in f:
parts = line.split()
if len(parts) >= 3:
level = parts[2]
counts[level] += 1
if level == "ERROR":
errors.append(line)
with open(error_output_path, "w", encoding="utf-8") as f:
f.writelines(errors)
return counts
log_file = Path("server.log")
error_file = Path("errors.log")
create_sample_log(log_file)
counts = analyze_log(log_file, error_file)
print("=== Log Summary ===")
for level, count in sorted(counts.items()):
print(f" {level}: {count}")
print(f"Errors saved to: {error_file}")
# Output:
# ERROR: 3
# INFO: 5
# WARNING: 2
# Errors saved to: errors.log
🏋️ Exercise 2: CSV Grade Report
Objective: Practice CSV reading, processing, and writing with DictReader and DictWriter.
Requirements:
- Create a CSV file with columns:
name,math,science,english - Read the CSV and calculate each student's average score
- Assign letter grades: A (90+), B (80–89), C (70–79), D (60–69), F (<60)
- Write a new CSV with the original data plus
averageandletter_gradecolumns - Print a class summary: average of all students, highest scorer, lowest scorer
💡 Hint
Remember that CSV values are strings — convert to float() before calculating averages. For the letter grade, write a helper function that takes a number and returns the grade string.
✅ Solution
import csv
from pathlib import Path
def create_sample_grades(path):
students = [
{"name": "Alice", "math": "95", "science": "88", "english": "92"},
{"name": "Bob", "math": "72", "science": "68", "english": "75"},
{"name": "Carlos", "math": "88", "science": "91", "english": "85"},
{"name": "Dana", "math": "64", "science": "70", "english": "58"},
{"name": "Eve", "math": "97", "science": "99", "english": "94"},
]
with open(path, "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "math", "science", "english"])
writer.writeheader()
writer.writerows(students)
def letter_grade(avg):
if avg >= 90: return "A"
if avg >= 80: return "B"
if avg >= 70: return "C"
if avg >= 60: return "D"
return "F"
def process_grades(input_path, output_path):
results = []
with open(input_path, "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
scores = [float(row["math"]), float(row["science"]), float(row["english"])]
avg = sum(scores) / len(scores)
row["average"] = f"{avg:.1f}"
row["letter_grade"] = letter_grade(avg)
results.append(row)
fieldnames = ["name", "math", "science", "english", "average", "letter_grade"]
with open(output_path, "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(results)
return results
# Run
input_csv = Path("grades.csv")
output_csv = Path("grade_report.csv")
create_sample_grades(input_csv)
results = process_grades(input_csv, output_csv)
# Summary
averages = [float(r["average"]) for r in results]
class_avg = sum(averages) / len(averages)
best = max(results, key=lambda r: float(r["average"]))
worst = min(results, key=lambda r: float(r["average"]))
print("=== Class Report ===")
for r in results:
print(f" {r['name']}: {r['average']} ({r['letter_grade']})")
print(f"\nClass Average: {class_avg:.1f}")
print(f"Highest: {best['name']} ({best['average']})")
print(f"Lowest: {worst['name']} ({worst['average']})")
print(f"\nFull report saved to: {output_csv}")
🎯 Quick Quiz
Question 1: What happens if you open a file with "w" mode that already exists?
Question 2: What does the with statement guarantee when working with files?
Question 3: What's the advantage of csv.DictReader over csv.reader?
📏 Best Practices
✅ Do's
- Always use
withstatements for file operations — no exceptions - Always specify
encoding="utf-8"— don't rely on system defaults - Use
pathlibfor path manipulation — it's cleaner than string concatenation - Use
csv.DictReaderovercsv.readerwhen you have headers — named access is more maintainable - Iterate over large files line by line instead of using
.read()— keeps memory usage constant
❌ Don'ts
- Don't use
"w"mode without thinking — you'll erase the file. Consider"a"or"x"when appropriate. - Don't parse CSV by splitting on commas — values can contain commas inside quotes. Use the
csvmodule. - Don't suppress exceptions in
__exit__unless you have a very good reason (returnFalseby default) - Don't hardcode absolute paths — use
pathliband relative paths for portability
💡 Pro Tips
Path.read_text()andPath.write_text()are one-liners for simple read/write — great for config files and small data- For JSON files, combine
open()withjson.load()/json.dump()— we'll cover this in a later lesson - The
@contextmanagerdecorator is usually enough for simple context managers. Use the class-based approach when you need to store state or reuse the manager tempfile.NamedTemporaryFile()andtempfile.TemporaryDirectory()from the standard library are production-ready context managers for temporary resources
📝 Summary
🎉 Key Takeaways
open()returns a file object — read with.read(),.readline(),.readlines(), or iteration- File modes:
"r"(read),"w"(write/overwrite),"a"(append),"x"(exclusive create) - The
withstatement guarantees cleanup — use it for everyopen()call csvmodule:DictReader/DictWriterfor named-column access; always passnewline=""pathlib.Pathis the modern way to handle file paths cross-platform- Custom context managers: class-based (
__enter__/__exit__) or generator-based (@contextmanager)
| Task | Code Pattern |
|---|---|
| Read entire file | with open(p, "r") as f: text = f.read() |
| Read line by line | with open(p) as f: for line in f: ... |
| Write file | with open(p, "w") as f: f.write(text) |
| Append to file | with open(p, "a") as f: f.write(text) |
| Read CSV (dict) | csv.DictReader(f) |
| Write CSV (dict) | csv.DictWriter(f, fieldnames=[...]) |
| Join paths | Path("dir") / "file.txt" |
| Quick read | Path("f.txt").read_text(encoding="utf-8") |
📚 Additional Resources
- Python Docs — Reading & Writing Files
- Python Docs — csv Module
- Python Docs — pathlib
- Python Docs — contextlib
🚀 What's Next?
In the next lesson, we'll dive into Error Handling in Depth — custom exception hierarchies, the full try/except/else/finally pattern, and how to use Python's logging module instead of print() for debugging.
🎉 Level Up!
Your programs can now read data from the outside world, process it, and write results back to disk. Combined with OOP from Module 1, you're building real tools — not just exercises.