Skip to main content

📂 Lesson 4: File I/O & Context Managers

Read and write files, process CSV data, and manage resources cleanly with the with statement — then build your own context managers.

🎯 Learning Objectives

By the end of this lesson, you will be able to:

  • Open, read, and write text files using Python's built-in open()
  • Use the with statement to guarantee files are properly closed
  • Read and write CSV files using the csv module
  • Work with file paths using pathlib
  • Build custom context managers with both classes and @contextmanager

Estimated Time: 60 minutes

Project: Build a log-file analyzer that reads, filters, and summarizes server logs

In This Lesson

🤔 Why File I/O Matters

So far, all of our data has lived in memory — once the program ends, it's gone. File I/O (Input/Output) lets your programs persist data, read configuration, process logs, import datasets, and communicate with other systems.

Almost every real-world Python script reads from or writes to files: web servers write access logs, data pipelines read CSVs, apps load settings from config files, and scripts generate reports.

📖 Key Terms

File handle (file object): The Python object returned by open() — it's your connection to the file on disk.

Context manager: An object that defines setup and teardown actions for a with block — files are the most common example.

Encoding: How text is converted to/from bytes on disk. UTF-8 is the modern standard.

Mode: How the file is opened — read ('r'), write ('w'), append ('a'), etc.

graph LR A["🐍 Python Program"] -->|"open('data.txt')"| B["📄 File on Disk"] B -->|"read()"| A A -->|"write()"| B A -->|"close()"| C["🔒 Connection Closed"] style A fill:#6366f1,color:#fff style B fill:#3b82f6,color:#fff style C fill:#10b981,color:#fff

📖 Reading Text Files

The built-in open() function returns a file object. The simplest way to read is with .read(), which loads the entire file into a string:

# Assume we have a file called "greeting.txt" containing:
# Hello, World!
# Welcome to Python file I/O.

f = open("greeting.txt", "r")  # "r" = read mode (the default)
content = f.read()
f.close()  # Always close when done!

print(content)
print(type(content))  # <class 'str'>

Output:

Hello, World!
Welcome to Python file I/O.

<class 'str'>

Reading Methods

Method Returns Best For
.read() Entire file as one string Small files you need as a single block
.readline() One line at a time When you need line-by-line control
.readlines() List of all lines (with \n) When you need random access to lines
Iterate the file object One line per iteration Large files — memory efficient

Iterating Line by Line

For most real-world work, you'll iterate over the file object directly. This is memory efficient because Python only loads one line at a time:

f = open("server.log", "r")
error_count = 0

for line in f:
    if "ERROR" in line:
        error_count += 1
        print(line.strip())  # .strip() removes the trailing \n

f.close()
print(f"\nTotal errors: {error_count}")

⚠️ Always Close Your Files

If you forget f.close(), bad things can happen: data may not be flushed to disk, file locks may persist, and you can run out of file descriptors. The with statement (next section) solves this automatically.

Specifying Encoding

Always specify the encoding explicitly for maximum portability:

f = open("data.txt", "r", encoding="utf-8")
content = f.read()
f.close()

🧠 Why UTF-8?

UTF-8 is the universal standard that handles every character from every language — English, Japanese, Arabic, emoji, everything. On some systems (especially Windows), the default encoding might be something else, which can cause UnicodeDecodeError. Always specify encoding="utf-8" to avoid surprises.

✍️ Writing Text Files

Open a file in write mode ("w") or append mode ("a"):

Mode Behavior If File Exists If File Doesn't Exist
"w" Write (overwrite) Erases content Creates it
"a" Append Adds to end Creates it
"x" Exclusive create Raises FileExistsError Creates it
"r+" Read + write Keeps content Raises FileNotFoundError
# Write a new file (or overwrite existing)
f = open("output.txt", "w", encoding="utf-8")
f.write("Line 1: Hello!\n")
f.write("Line 2: This is Python.\n")
f.close()

# Append to an existing file
f = open("output.txt", "a", encoding="utf-8")
f.write("Line 3: Appended later.\n")
f.close()

# Read it back to verify
f = open("output.txt", "r", encoding="utf-8")
print(f.read())
f.close()

Output:

Line 1: Hello!
Line 2: This is Python.
Line 3: Appended later.

Writing Multiple Lines

lines = ["Alice,95,A\n", "Bob,87,B+\n", "Carlos,92,A-\n"]

f = open("grades.csv", "w", encoding="utf-8")
f.writelines(lines)  # Writes all strings — does NOT add \n for you
f.close()

⚠️ "w" Mode Erases Everything

Opening a file with "w" mode immediately deletes all existing content — even before you write anything. If you want to add to a file, use "a" (append). If you want safety, use "x" (exclusive create) which refuses to overwrite.

🛡️ The with Statement

The with statement is Python's solution to the "forgot to close the file" problem. It guarantees the file is closed when the block ends — even if an exception occurs:

# ✅ The Pythonic way — always use this
with open("greeting.txt", "r", encoding="utf-8") as f:
    content = f.read()
    print(content)

# f is automatically closed here — no f.close() needed!
print(f.closed)  # True

Compare with the manual approach:

# ❌ The fragile way — don't do this
f = open("greeting.txt", "r")
try:
    content = f.read()
    print(content)
finally:
    f.close()  # Same effect, more boilerplate

The with statement works with any context manager — an object that defines what happens at the start and end of a block. Files are the most common, but you'll also see it with database connections, network sockets, locks, and more.

① __enter__() Open file Return file object with open(...) as f: ② Your Code Read / write Process data ③ __exit__() Close file Even if error occurs! ✅ Cleanup is GUARANTEED — no leaks

Reading and Writing in One Script

# Read input, process, write output
with open("raw_data.txt", "r", encoding="utf-8") as infile:
    lines = infile.readlines()

# Process: strip whitespace, filter blanks, uppercase
processed = [line.strip().upper() for line in lines if line.strip()]

with open("clean_data.txt", "w", encoding="utf-8") as outfile:
    for line in processed:
        outfile.write(line + "\n")

print(f"Processed {len(processed)} lines.")

✅ Rule of Thumb

Every open() call should be inside a with block. There's almost never a reason to call .close() manually in modern Python.

📊 Working with CSV Files

CSV (Comma-Separated Values) is one of the most common data formats. Python's built-in csv module handles the tricky parts — quoting, escaping, different delimiters — so you don't have to.

Reading CSV with csv.reader

import csv

# Assume students.csv contains:
# name,grade,score
# Alice,A,95
# Bob,B+,87
# Carlos,A-,92

with open("students.csv", "r", encoding="utf-8") as f:
    reader = csv.reader(f)
    header = next(reader)  # Skip the header row
    print(f"Columns: {header}")

    for row in reader:
        name, grade, score = row
        print(f"  {name}: {grade} ({score})")

Output:

Columns: ['name', 'grade', 'score']
  Alice: A (95)
  Bob: B+ (87)
  Carlos: A- (92)

Reading CSV with csv.DictReader

DictReader is usually more convenient — it gives you each row as a dictionary keyed by the header names:

import csv

with open("students.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)

    for row in reader:
        # row is a dict: {'name': 'Alice', 'grade': 'A', 'score': '95'}
        print(f"{row['name']} scored {row['score']} ({row['grade']})")

Output:

Alice scored 95 (A)
Bob scored 87 (B+)
Carlos scored 92 (A-)

Writing CSV Files

import csv

students = [
    {"name": "Alice", "grade": "A", "score": 95},
    {"name": "Bob", "grade": "B+", "score": 87},
    {"name": "Carlos", "grade": "A-", "score": 92},
]

with open("output.csv", "w", encoding="utf-8", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "grade", "score"])
    writer.writeheader()
    writer.writerows(students)

print("CSV written successfully!")

🧠 Why newline=""?

On Windows, the csv module handles line endings itself. If you don't pass newline="", you'll get extra blank lines between rows on Windows. This parameter tells Python to leave line-ending handling to the csv module.

graph TD A["📄 CSV File"] --> B{"Which reader?"} B -->|csv.reader| C["Returns lists
['Alice', 'A', '95']"] B -->|csv.DictReader| D["Returns dicts
{'name': 'Alice', 'grade': 'A'}"] C --> E["Access by index
row[0], row[1]"] D --> F["Access by name
row['name'], row['grade']"] style D fill:#10b981,color:#fff style F fill:#10b981,color:#fff

🗂️ File Paths with pathlib

Hardcoding file paths as strings is fragile — different operating systems use different separators (/ vs \). Python's pathlib module gives you an object-oriented way to work with paths that's cross-platform and readable:

from pathlib import Path

# Create a path object
data_dir = Path("data")
file_path = data_dir / "students.csv"  # The / operator joins paths!

print(file_path)          # data/students.csv (or data\students.csv on Windows)
print(file_path.name)     # students.csv
print(file_path.stem)     # students
print(file_path.suffix)   # .csv
print(file_path.parent)   # data

Common Path Operations

from pathlib import Path

p = Path("data/reports/q3.csv")

# Check existence
print(p.exists())          # True/False
print(p.is_file())         # True if it's a file
print(p.is_dir())          # True if it's a directory

# Create directories
Path("output/reports").mkdir(parents=True, exist_ok=True)

# List files in a directory
for f in Path("data").iterdir():
    print(f"  {f.name} ({'dir' if f.is_dir() else 'file'})")

# Find all CSV files recursively
for csv_file in Path("data").glob("**/*.csv"):
    print(f"  Found: {csv_file}")

Using pathlib with open()

Path objects work directly with open(), and they also have their own convenience methods:

from pathlib import Path

p = Path("greeting.txt")

# Option 1: Path object with open()
with open(p, "r", encoding="utf-8") as f:
    content = f.read()

# Option 2: Path's built-in methods (even simpler!)
content = p.read_text(encoding="utf-8")     # Read entire file
p.write_text("Hello!\n", encoding="utf-8")  # Write entire file

pathlib vs String Paths

pathlib is the modern, Pythonic approach. Use it for any code you plan to share or maintain. The older os.path module still works but is more verbose and less readable.

data / reports / summary.csv .parent data/reports .stem summary .suffix .csv

🔧 Custom Context Managers

You've been using context managers with files. Now let's build our own. There are two approaches: the class-based approach (implementing __enter__ and __exit__) and the generator-based approach (using @contextmanager).

Approach 1: Class-Based

A context manager is any object with __enter__ and __exit__ methods:

import time

class Timer:
    """Measures how long a block of code takes."""

    def __enter__(self):
        self.start = time.time()
        print("⏱️ Timer started...")
        return self  # This becomes the `as` variable

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.elapsed = time.time() - self.start
        print(f"⏱️ Elapsed: {self.elapsed:.4f} seconds")
        return False  # Don't suppress exceptions


# Usage
with Timer() as t:
    # Simulate some work
    total = sum(range(1_000_000))
    print(f"Sum: {total}")

print(f"Stored time: {t.elapsed:.4f}s")

Output:

⏱️ Timer started...
Sum: 499999500000
⏱️ Elapsed: 0.0312 seconds
Stored time: 0.0312s

📖 __exit__ Parameters

The __exit__ method receives three arguments about any exception that occurred inside the with block:

exc_type: The exception class (e.g., ValueError), or None if no exception.

exc_val: The exception instance, or None.

exc_tb: The traceback, or None.

Returning True from __exit__ suppresses the exception. Returning False (or nothing) lets it propagate normally. In most cases, you want to return False.

Approach 2: Generator-Based with @contextmanager

For simpler cases, the contextlib module provides a decorator that turns a generator into a context manager:

from contextlib import contextmanager
import time

@contextmanager
def timer(label="Block"):
    start = time.time()
    print(f"⏱️ {label} started...")
    try:
        yield  # Everything before yield = __enter__, after = __exit__
    finally:
        elapsed = time.time() - start
        print(f"⏱️ {label} finished in {elapsed:.4f}s")


with timer("Data processing"):
    total = sum(range(1_000_000))
    print(f"Sum: {total}")

Output:

⏱️ Data processing started...
Sum: 499999500000
⏱️ Data processing finished in 0.0298s

Practical Example: Temporary Directory

from contextlib import contextmanager
from pathlib import Path
import shutil

@contextmanager
def temp_directory(name="temp_work"):
    """Create a temporary working directory, clean up when done."""
    path = Path(name)
    path.mkdir(exist_ok=True)
    print(f"📁 Created {path}/")
    try:
        yield path
    finally:
        shutil.rmtree(path)
        print(f"🗑️ Cleaned up {path}/")


with temp_directory("scratch") as tmp:
    # Write a temp file
    (tmp / "data.txt").write_text("temporary data", encoding="utf-8")
    print(f"Files: {list(tmp.iterdir())}")

# Directory is automatically deleted here
print(f"Still exists? {Path('scratch').exists()}")

Output:

📁 Created scratch/
Files: [PosixPath('scratch/data.txt')]
🗑️ Cleaned up scratch/
Still exists? False
graph TD A{"Need a
context manager?"} --> B{"Complex state
or reusable?"} B -->|Yes| C["Class-based
__enter__ / __exit__"] B -->|No| D["@contextmanager
generator with yield"] style C fill:#6366f1,color:#fff style D fill:#10b981,color:#fff

🏋️ Hands-on Exercises

🏋️ Exercise 1: Log Analyzer

Objective: Practice reading files, processing text, and writing output.

Requirements:

  1. Create a sample log file with at least 10 lines, mixing INFO, WARNING, and ERROR levels
  2. Read the log file and count occurrences of each level
  3. Extract only the ERROR lines and write them to a separate file
  4. Print a summary with counts and the error filename
  5. Use with statements and pathlib throughout

Starter Code:

from pathlib import Path
from collections import Counter

def create_sample_log(path):
    """Create a sample log file for testing."""
    log_lines = [
        "2024-01-15 10:00:01 INFO  Server started on port 8080",
        "2024-01-15 10:00:05 INFO  Connected to database",
        "2024-01-15 10:01:12 WARNING  High memory usage: 85%",
        "2024-01-15 10:02:30 ERROR  Failed to process request /api/users",
        "2024-01-15 10:02:31 INFO  Retrying request...",
        "2024-01-15 10:02:35 ERROR  Retry failed: connection timeout",
        "2024-01-15 10:03:00 INFO  Request processed successfully",
        "2024-01-15 10:05:00 WARNING  Disk usage above 90%",
        "2024-01-15 10:06:22 ERROR  Unhandled exception in /api/orders",
        "2024-01-15 10:07:00 INFO  Health check passed",
    ]
    # TODO: Write log_lines to the file
    pass

def analyze_log(log_path, error_output_path):
    """Read a log file, count levels, extract errors."""
    # TODO: Read the log file
    # TODO: Count INFO, WARNING, ERROR occurrences
    # TODO: Write ERROR lines to error_output_path
    # TODO: Return the counts
    pass


# Main
log_file = Path("server.log")
error_file = Path("errors.log")

create_sample_log(log_file)
counts = analyze_log(log_file, error_file)

print("=== Log Summary ===")
for level, count in counts.items():
    print(f"  {level}: {count}")
print(f"Errors saved to: {error_file}")
💡 Hint

To extract the log level, split each line and look for the third element (index 2). Use a Counter from collections to tally them up. Filter lines where the level is "ERROR".

✅ Solution
from pathlib import Path
from collections import Counter

def create_sample_log(path):
    log_lines = [
        "2024-01-15 10:00:01 INFO  Server started on port 8080\n",
        "2024-01-15 10:00:05 INFO  Connected to database\n",
        "2024-01-15 10:01:12 WARNING  High memory usage: 85%\n",
        "2024-01-15 10:02:30 ERROR  Failed to process request /api/users\n",
        "2024-01-15 10:02:31 INFO  Retrying request...\n",
        "2024-01-15 10:02:35 ERROR  Retry failed: connection timeout\n",
        "2024-01-15 10:03:00 INFO  Request processed successfully\n",
        "2024-01-15 10:05:00 WARNING  Disk usage above 90%\n",
        "2024-01-15 10:06:22 ERROR  Unhandled exception in /api/orders\n",
        "2024-01-15 10:07:00 INFO  Health check passed\n",
    ]
    with open(path, "w", encoding="utf-8") as f:
        f.writelines(log_lines)

def analyze_log(log_path, error_output_path):
    counts = Counter()
    errors = []

    with open(log_path, "r", encoding="utf-8") as f:
        for line in f:
            parts = line.split()
            if len(parts) >= 3:
                level = parts[2]
                counts[level] += 1
                if level == "ERROR":
                    errors.append(line)

    with open(error_output_path, "w", encoding="utf-8") as f:
        f.writelines(errors)

    return counts


log_file = Path("server.log")
error_file = Path("errors.log")

create_sample_log(log_file)
counts = analyze_log(log_file, error_file)

print("=== Log Summary ===")
for level, count in sorted(counts.items()):
    print(f"  {level}: {count}")
print(f"Errors saved to: {error_file}")

# Output:
#   ERROR: 3
#   INFO: 5
#   WARNING: 2
# Errors saved to: errors.log

🏋️ Exercise 2: CSV Grade Report

Objective: Practice CSV reading, processing, and writing with DictReader and DictWriter.

Requirements:

  1. Create a CSV file with columns: name, math, science, english
  2. Read the CSV and calculate each student's average score
  3. Assign letter grades: A (90+), B (80–89), C (70–79), D (60–69), F (<60)
  4. Write a new CSV with the original data plus average and letter_grade columns
  5. Print a class summary: average of all students, highest scorer, lowest scorer
💡 Hint

Remember that CSV values are strings — convert to float() before calculating averages. For the letter grade, write a helper function that takes a number and returns the grade string.

✅ Solution
import csv
from pathlib import Path

def create_sample_grades(path):
    students = [
        {"name": "Alice", "math": "95", "science": "88", "english": "92"},
        {"name": "Bob", "math": "72", "science": "68", "english": "75"},
        {"name": "Carlos", "math": "88", "science": "91", "english": "85"},
        {"name": "Dana", "math": "64", "science": "70", "english": "58"},
        {"name": "Eve", "math": "97", "science": "99", "english": "94"},
    ]
    with open(path, "w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["name", "math", "science", "english"])
        writer.writeheader()
        writer.writerows(students)

def letter_grade(avg):
    if avg >= 90: return "A"
    if avg >= 80: return "B"
    if avg >= 70: return "C"
    if avg >= 60: return "D"
    return "F"

def process_grades(input_path, output_path):
    results = []

    with open(input_path, "r", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            scores = [float(row["math"]), float(row["science"]), float(row["english"])]
            avg = sum(scores) / len(scores)
            row["average"] = f"{avg:.1f}"
            row["letter_grade"] = letter_grade(avg)
            results.append(row)

    fieldnames = ["name", "math", "science", "english", "average", "letter_grade"]
    with open(output_path, "w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(results)

    return results

# Run
input_csv = Path("grades.csv")
output_csv = Path("grade_report.csv")

create_sample_grades(input_csv)
results = process_grades(input_csv, output_csv)

# Summary
averages = [float(r["average"]) for r in results]
class_avg = sum(averages) / len(averages)

best = max(results, key=lambda r: float(r["average"]))
worst = min(results, key=lambda r: float(r["average"]))

print("=== Class Report ===")
for r in results:
    print(f"  {r['name']}: {r['average']} ({r['letter_grade']})")
print(f"\nClass Average: {class_avg:.1f}")
print(f"Highest: {best['name']} ({best['average']})")
print(f"Lowest:  {worst['name']} ({worst['average']})")
print(f"\nFull report saved to: {output_csv}")

🎯 Quick Quiz

Question 1: What happens if you open a file with "w" mode that already exists?

Question 2: What does the with statement guarantee when working with files?

Question 3: What's the advantage of csv.DictReader over csv.reader?

📏 Best Practices

✅ Do's

  • Always use with statements for file operations — no exceptions
  • Always specify encoding="utf-8" — don't rely on system defaults
  • Use pathlib for path manipulation — it's cleaner than string concatenation
  • Use csv.DictReader over csv.reader when you have headers — named access is more maintainable
  • Iterate over large files line by line instead of using .read() — keeps memory usage constant

❌ Don'ts

  • Don't use "w" mode without thinking — you'll erase the file. Consider "a" or "x" when appropriate.
  • Don't parse CSV by splitting on commas — values can contain commas inside quotes. Use the csv module.
  • Don't suppress exceptions in __exit__ unless you have a very good reason (return False by default)
  • Don't hardcode absolute paths — use pathlib and relative paths for portability

💡 Pro Tips

  • Path.read_text() and Path.write_text() are one-liners for simple read/write — great for config files and small data
  • For JSON files, combine open() with json.load() / json.dump() — we'll cover this in a later lesson
  • The @contextmanager decorator is usually enough for simple context managers. Use the class-based approach when you need to store state or reuse the manager
  • tempfile.NamedTemporaryFile() and tempfile.TemporaryDirectory() from the standard library are production-ready context managers for temporary resources

📝 Summary

🎉 Key Takeaways

  • open() returns a file object — read with .read(), .readline(), .readlines(), or iteration
  • File modes: "r" (read), "w" (write/overwrite), "a" (append), "x" (exclusive create)
  • The with statement guarantees cleanup — use it for every open() call
  • csv module: DictReader/DictWriter for named-column access; always pass newline=""
  • pathlib.Path is the modern way to handle file paths cross-platform
  • Custom context managers: class-based (__enter__/__exit__) or generator-based (@contextmanager)
Task Code Pattern
Read entire file with open(p, "r") as f: text = f.read()
Read line by line with open(p) as f: for line in f: ...
Write file with open(p, "w") as f: f.write(text)
Append to file with open(p, "a") as f: f.write(text)
Read CSV (dict) csv.DictReader(f)
Write CSV (dict) csv.DictWriter(f, fieldnames=[...])
Join paths Path("dir") / "file.txt"
Quick read Path("f.txt").read_text(encoding="utf-8")

📚 Additional Resources

🚀 What's Next?

In the next lesson, we'll dive into Error Handling in Depth — custom exception hierarchies, the full try/except/else/finally pattern, and how to use Python's logging module instead of print() for debugging.

🎉 Level Up!

Your programs can now read data from the outside world, process it, and write results back to disk. Combined with OOP from Module 1, you're building real tools — not just exercises.