File I/O & Context Managers - Intermediate Python

🤔 Why File I/O Matters

So far, all of our data has lived in memory — once the program ends, it's gone. File I/O (Input/Output) lets your programs persist data, read configuration, process logs, import datasets, and communicate with other systems.

Almost every real-world Python script reads from or writes to files: web servers write access logs, data pipelines read CSVs, apps load settings from config files, and scripts generate reports.

📖 Key Terms

File handle (file object): The Python object returned by open() — it's your connection to the file on disk.

Context manager: An object that defines setup and teardown actions for a with block — files are the most common example.

Encoding: How text is converted to/from bytes on disk. UTF-8 is the modern standard.

Mode: How the file is opened — read ('r'), write ('w'), append ('a'), etc.

📖 Reading Text Files

The built-in open() function returns a file object. The simplest way to read is with .read(), which loads the entire file into a string:

# Assume we have a file called "greeting.txt" containing:
# Hello, World!
# Welcome to Python file I/O.

f = open("greeting.txt", "r")  # "r" = read mode (the default)
content = f.read()
f.close()  # Always close when done!

print(content)
print(type(content))  # <class 'str'>

Output:

Hello, World!
Welcome to Python file I/O.

<class 'str'>

Reading Methods

Method	Returns	Best For
`.read()`	Entire file as one string	Small files you need as a single block
`.readline()`	One line at a time	When you need line-by-line control
`.readlines()`	List of all lines (with `\n`)	When you need random access to lines
Iterate the file object	One line per iteration	Large files — memory efficient

Iterating Line by Line

For most real-world work, you'll iterate over the file object directly. This is memory efficient because Python only loads one line at a time:

f = open("server.log", "r")
error_count = 0

for line in f:
    if "ERROR" in line:
        error_count += 1
        print(line.strip())  # .strip() removes the trailing \n

f.close()
print(f"\nTotal errors: {error_count}")

⚠️ Always Close Your Files

If you forget f.close(), bad things can happen: data may not be flushed to disk, file locks may persist, and you can run out of file descriptors. The with statement (next section) solves this automatically.

Specifying Encoding

Always specify the encoding explicitly for maximum portability:

f = open("data.txt", "r", encoding="utf-8")
content = f.read()
f.close()

🧠 Why UTF-8?

UTF-8 is the universal standard that handles every character from every language — English, Japanese, Arabic, emoji, everything. On some systems (especially Windows), the default encoding might be something else, which can cause UnicodeDecodeError. Always specify encoding="utf-8" to avoid surprises.

✍️ Writing Text Files

Open a file in write mode ("w") or append mode ("a"):

Mode	Behavior	If File Exists	If File Doesn't Exist
`"w"`	Write (overwrite)	Erases content	Creates it
`"a"`	Append	Adds to end	Creates it
`"x"`	Exclusive create	Raises `FileExistsError`	Creates it
`"r+"`	Read + write	Keeps content	Raises `FileNotFoundError`

# Write a new file (or overwrite existing)
f = open("output.txt", "w", encoding="utf-8")
f.write("Line 1: Hello!\n")
f.write("Line 2: This is Python.\n")
f.close()

# Append to an existing file
f = open("output.txt", "a", encoding="utf-8")
f.write("Line 3: Appended later.\n")
f.close()

# Read it back to verify
f = open("output.txt", "r", encoding="utf-8")
print(f.read())
f.close()

Output:

Line 1: Hello!
Line 2: This is Python.
Line 3: Appended later.

Writing Multiple Lines

lines = ["Alice,95,A\n", "Bob,87,B+\n", "Carlos,92,A-\n"]

f = open("grades.csv", "w", encoding="utf-8")
f.writelines(lines)  # Writes all strings — does NOT add \n for you
f.close()

⚠️ `"w"` Mode Erases Everything

Opening a file with "w" mode immediately deletes all existing content — even before you write anything. If you want to add to a file, use "a" (append). If you want safety, use "x" (exclusive create) which refuses to overwrite.

🛡️ The `with` Statement

The with statement is Python's solution to the "forgot to close the file" problem. It guarantees the file is closed when the block ends — even if an exception occurs:

# ✅ The Pythonic way — always use this
with open("greeting.txt", "r", encoding="utf-8") as f:
    content = f.read()
    print(content)

# f is automatically closed here — no f.close() needed!
print(f.closed)  # True

Compare with the manual approach:

# ❌ The fragile way — don't do this
f = open("greeting.txt", "r")
try:
    content = f.read()
    print(content)
finally:
    f.close()  # Same effect, more boilerplate

The with statement works with any context manager — an object that defines what happens at the start and end of a block. Files are the most common, but you'll also see it with database connections, network sockets, locks, and more.

Reading and Writing in One Script

# Read input, process, write output
with open("raw_data.txt", "r", encoding="utf-8") as infile:
    lines = infile.readlines()

# Process: strip whitespace, filter blanks, uppercase
processed = [line.strip().upper() for line in lines if line.strip()]

with open("clean_data.txt", "w", encoding="utf-8") as outfile:
    for line in processed:
        outfile.write(line + "\n")

print(f"Processed {len(processed)} lines.")

✅ Rule of Thumb

Every open() call should be inside a with block. There's almost never a reason to call .close() manually in modern Python.

📊 Working with CSV Files

CSV (Comma-Separated Values) is one of the most common data formats. Python's built-in csv module handles the tricky parts — quoting, escaping, different delimiters — so you don't have to.

Reading CSV with `csv.reader`

import csv

# Assume students.csv contains:
# name,grade,score
# Alice,A,95
# Bob,B+,87
# Carlos,A-,92

with open("students.csv", "r", encoding="utf-8") as f:
    reader = csv.reader(f)
    header = next(reader)  # Skip the header row
    print(f"Columns: {header}")

    for row in reader:
        name, grade, score = row
        print(f"  {name}: {grade} ({score})")

Output:

Columns: ['name', 'grade', 'score']
  Alice: A (95)
  Bob: B+ (87)
  Carlos: A- (92)

Reading CSV with `csv.DictReader`

DictReader is usually more convenient — it gives you each row as a dictionary keyed by the header names:

import csv

with open("students.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)

    for row in reader:
        # row is a dict: {'name': 'Alice', 'grade': 'A', 'score': '95'}
        print(f"{row['name']} scored {row['score']} ({row['grade']})")

Output:

Alice scored 95 (A)
Bob scored 87 (B+)
Carlos scored 92 (A-)

Writing CSV Files

import csv

students = [
    {"name": "Alice", "grade": "A", "score": 95},
    {"name": "Bob", "grade": "B+", "score": 87},
    {"name": "Carlos", "grade": "A-", "score": 92},
]

with open("output.csv", "w", encoding="utf-8", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "grade", "score"])
    writer.writeheader()
    writer.writerows(students)

print("CSV written successfully!")

🧠 Why `newline=""`?

On Windows, the csv module handles line endings itself. If you don't pass newline="", you'll get extra blank lines between rows on Windows. This parameter tells Python to leave line-ending handling to the csv module.

graph TD A["📄 CSV File"] --> B{"Which reader?"} B -->|csv.reader| C["Returns lists
['Alice', 'A', '95']"] B -->|csv.DictReader| D["Returns dicts
{'name': 'Alice', 'grade': 'A'}"] C --> E["Access by index
row[0], row[1]"] D --> F["Access by name
row['name'], row['grade']"] style D fill:#10b981,color:#fff style F fill:#10b981,color:#fff

🗂️ File Paths with `pathlib`

Hardcoding file paths as strings is fragile — different operating systems use different separators (/ vs \). Python's pathlib module gives you an object-oriented way to work with paths that's cross-platform and readable:

from pathlib import Path

# Create a path object
data_dir = Path("data")
file_path = data_dir / "students.csv"  # The / operator joins paths!

print(file_path)          # data/students.csv (or data\students.csv on Windows)
print(file_path.name)     # students.csv
print(file_path.stem)     # students
print(file_path.suffix)   # .csv
print(file_path.parent)   # data

Common `Path` Operations

from pathlib import Path

p = Path("data/reports/q3.csv")

# Check existence
print(p.exists())          # True/False
print(p.is_file())         # True if it's a file
print(p.is_dir())          # True if it's a directory

# Create directories
Path("output/reports").mkdir(parents=True, exist_ok=True)

# List files in a directory
for f in Path("data").iterdir():
    print(f"  {f.name} ({'dir' if f.is_dir() else 'file'})")

# Find all CSV files recursively
for csv_file in Path("data").glob("**/*.csv"):
    print(f"  Found: {csv_file}")

Using `pathlib` with `open()`

Path objects work directly with open(), and they also have their own convenience methods:

from pathlib import Path

p = Path("greeting.txt")

# Option 1: Path object with open()
with open(p, "r", encoding="utf-8") as f:
    content = f.read()

# Option 2: Path's built-in methods (even simpler!)
content = p.read_text(encoding="utf-8")     # Read entire file
p.write_text("Hello!\n", encoding="utf-8")  # Write entire file

✅ `pathlib` vs String Paths

pathlib is the modern, Pythonic approach. Use it for any code you plan to share or maintain. The older os.path module still works but is more verbose and less readable.

🔧 Custom Context Managers

You've been using context managers with files. Now let's build our own. There are two approaches: the class-based approach (implementing __enter__ and __exit__) and the generator-based approach (using @contextmanager).

Approach 1: Class-Based

A context manager is any object with __enter__ and __exit__ methods:

import time

class Timer:
    """Measures how long a block of code takes."""

    def __enter__(self):
        self.start = time.time()
        print("⏱️ Timer started...")
        return self  # This becomes the `as` variable

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.elapsed = time.time() - self.start
        print(f"⏱️ Elapsed: {self.elapsed:.4f} seconds")
        return False  # Don't suppress exceptions


# Usage
with Timer() as t:
    # Simulate some work
    total = sum(range(1_000_000))
    print(f"Sum: {total}")

print(f"Stored time: {t.elapsed:.4f}s")

Output:

⏱️ Timer started...
Sum: 499999500000
⏱️ Elapsed: 0.0312 seconds
Stored time: 0.0312s

📖 `exit` Parameters

The __exit__ method receives three arguments about any exception that occurred inside the with block:

exc_type: The exception class (e.g., ValueError), or None if no exception.

exc_val: The exception instance, or None.

exc_tb: The traceback, or None.

Returning True from __exit__ suppresses the exception. Returning False (or nothing) lets it propagate normally. In most cases, you want to return False.

Approach 2: Generator-Based with `@contextmanager`

For simpler cases, the contextlib module provides a decorator that turns a generator into a context manager:

from contextlib import contextmanager
import time

@contextmanager
def timer(label="Block"):
    start = time.time()
    print(f"⏱️ {label} started...")
    try:
        yield  # Everything before yield = __enter__, after = __exit__
    finally:
        elapsed = time.time() - start
        print(f"⏱️ {label} finished in {elapsed:.4f}s")


with timer("Data processing"):
    total = sum(range(1_000_000))
    print(f"Sum: {total}")

Output:

⏱️ Data processing started...
Sum: 499999500000
⏱️ Data processing finished in 0.0298s

Practical Example: Temporary Directory

from contextlib import contextmanager
from pathlib import Path
import shutil

@contextmanager
def temp_directory(name="temp_work"):
    """Create a temporary working directory, clean up when done."""
    path = Path(name)
    path.mkdir(exist_ok=True)
    print(f"📁 Created {path}/")
    try:
        yield path
    finally:
        shutil.rmtree(path)
        print(f"🗑️ Cleaned up {path}/")


with temp_directory("scratch") as tmp:
    # Write a temp file
    (tmp / "data.txt").write_text("temporary data", encoding="utf-8")
    print(f"Files: {list(tmp.iterdir())}")

# Directory is automatically deleted here
print(f"Still exists? {Path('scratch').exists()}")

Output:

📁 Created scratch/
Files: [PosixPath('scratch/data.txt')]
🗑️ Cleaned up scratch/
Still exists? False

graph TD A{"Need a
context manager?"} --> B{"Complex state
or reusable?"} B -->|Yes| C["Class-based
__enter__ / __exit__"] B -->|No| D["@contextmanager
generator with yield"] style C fill:#6366f1,color:#fff style D fill:#10b981,color:#fff

🏋️ Hands-on Exercises

🏋️ Exercise 1: Log Analyzer

Objective: Practice reading files, processing text, and writing output.

Requirements:

Create a sample log file with at least 10 lines, mixing INFO, WARNING, and ERROR levels
Read the log file and count occurrences of each level
Extract only the ERROR lines and write them to a separate file
Print a summary with counts and the error filename
Use with statements and pathlib throughout

Starter Code:

from pathlib import Path
from collections import Counter

def create_sample_log(path):
    """Create a sample log file for testing."""
    log_lines = [
        "2024-01-15 10:00:01 INFO  Server started on port 8080",
        "2024-01-15 10:00:05 INFO  Connected to database",
        "2024-01-15 10:01:12 WARNING  High memory usage: 85%",
        "2024-01-15 10:02:30 ERROR  Failed to process request /api/users",
        "2024-01-15 10:02:31 INFO  Retrying request...",
        "2024-01-15 10:02:35 ERROR  Retry failed: connection timeout",
        "2024-01-15 10:03:00 INFO  Request processed successfully",
        "2024-01-15 10:05:00 WARNING  Disk usage above 90%",
        "2024-01-15 10:06:22 ERROR  Unhandled exception in /api/orders",
        "2024-01-15 10:07:00 INFO  Health check passed",
    ]
    # TODO: Write log_lines to the file
    pass

def analyze_log(log_path, error_output_path):
    """Read a log file, count levels, extract errors."""
    # TODO: Read the log file
    # TODO: Count INFO, WARNING, ERROR occurrences
    # TODO: Write ERROR lines to error_output_path
    # TODO: Return the counts
    pass


# Main
log_file = Path("server.log")
error_file = Path("errors.log")

create_sample_log(log_file)
counts = analyze_log(log_file, error_file)

print("=== Log Summary ===")
for level, count in counts.items():
    print(f"  {level}: {count}")
print(f"Errors saved to: {error_file}")

💡 Hint

To extract the log level, split each line and look for the third element (index 2). Use a Counter from collections to tally them up. Filter lines where the level is "ERROR".

✅ Solution

from pathlib import Path
from collections import Counter

def create_sample_log(path):
    log_lines = [
        "2024-01-15 10:00:01 INFO  Server started on port 8080\n",
        "2024-01-15 10:00:05 INFO  Connected to database\n",
        "2024-01-15 10:01:12 WARNING  High memory usage: 85%\n",
        "2024-01-15 10:02:30 ERROR  Failed to process request /api/users\n",
        "2024-01-15 10:02:31 INFO  Retrying request...\n",
        "2024-01-15 10:02:35 ERROR  Retry failed: connection timeout\n",
        "2024-01-15 10:03:00 INFO  Request processed successfully\n",
        "2024-01-15 10:05:00 WARNING  Disk usage above 90%\n",
        "2024-01-15 10:06:22 ERROR  Unhandled exception in /api/orders\n",
        "2024-01-15 10:07:00 INFO  Health check passed\n",
    ]
    with open(path, "w", encoding="utf-8") as f:
        f.writelines(log_lines)

def analyze_log(log_path, error_output_path):
    counts = Counter()
    errors = []

    with open(log_path, "r", encoding="utf-8") as f:
        for line in f:
            parts = line.split()
            if len(parts) >= 3:
                level = parts[2]
                counts[level] += 1
                if level == "ERROR":
                    errors.append(line)

    with open(error_output_path, "w", encoding="utf-8") as f:
        f.writelines(errors)

    return counts


log_file = Path("server.log")
error_file = Path("errors.log")

create_sample_log(log_file)
counts = analyze_log(log_file, error_file)

print("=== Log Summary ===")
for level, count in sorted(counts.items()):
    print(f"  {level}: {count}")
print(f"Errors saved to: {error_file}")

# Output:
#   ERROR: 3
#   INFO: 5
#   WARNING: 2
# Errors saved to: errors.log

🏋️ Exercise 2: CSV Grade Report

Objective: Practice CSV reading, processing, and writing with DictReader and DictWriter.

Requirements:

Create a CSV file with columns: name, math, science, english
Read the CSV and calculate each student's average score
Assign letter grades: A (90+), B (80–89), C (70–79), D (60–69), F (<60)
Write a new CSV with the original data plus average and letter_grade columns
Print a class summary: average of all students, highest scorer, lowest scorer

💡 Hint

Remember that CSV values are strings — convert to float() before calculating averages. For the letter grade, write a helper function that takes a number and returns the grade string.

✅ Solution

import csv
from pathlib import Path

def create_sample_grades(path):
    students = [
        {"name": "Alice", "math": "95", "science": "88", "english": "92"},
        {"name": "Bob", "math": "72", "science": "68", "english": "75"},
        {"name": "Carlos", "math": "88", "science": "91", "english": "85"},
        {"name": "Dana", "math": "64", "science": "70", "english": "58"},
        {"name": "Eve", "math": "97", "science": "99", "english": "94"},
    ]
    with open(path, "w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["name", "math", "science", "english"])
        writer.writeheader()
        writer.writerows(students)

def letter_grade(avg):
    if avg >= 90: return "A"
    if avg >= 80: return "B"
    if avg >= 70: return "C"
    if avg >= 60: return "D"
    return "F"

def process_grades(input_path, output_path):
    results = []

    with open(input_path, "r", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            scores = [float(row["math"]), float(row["science"]), float(row["english"])]
            avg = sum(scores) / len(scores)
            row["average"] = f"{avg:.1f}"
            row["letter_grade"] = letter_grade(avg)
            results.append(row)

    fieldnames = ["name", "math", "science", "english", "average", "letter_grade"]
    with open(output_path, "w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(results)

    return results

# Run
input_csv = Path("grades.csv")
output_csv = Path("grade_report.csv")

create_sample_grades(input_csv)
results = process_grades(input_csv, output_csv)

# Summary
averages = [float(r["average"]) for r in results]
class_avg = sum(averages) / len(averages)

best = max(results, key=lambda r: float(r["average"]))
worst = min(results, key=lambda r: float(r["average"]))

print("=== Class Report ===")
for r in results:
    print(f"  {r['name']}: {r['average']} ({r['letter_grade']})")
print(f"\nClass Average: {class_avg:.1f}")
print(f"Highest: {best['name']} ({best['average']})")
print(f"Lowest:  {worst['name']} ({worst['average']})")
print(f"\nFull report saved to: {output_csv}")

🎯 Quick Quiz

Question 1: What happens if you open a file with "w" mode that already exists?

Question 2: What does the with statement guarantee when working with files?

Question 3: What's the advantage of csv.DictReader over csv.reader?

📏 Best Practices

✅ Do's

Always use with statements for file operations — no exceptions
Always specify encoding="utf-8" — don't rely on system defaults
Use pathlib for path manipulation — it's cleaner than string concatenation
Use csv.DictReader over csv.reader when you have headers — named access is more maintainable
Iterate over large files line by line instead of using .read() — keeps memory usage constant

❌ Don'ts

Don't use "w" mode without thinking — you'll erase the file. Consider "a" or "x" when appropriate.
Don't parse CSV by splitting on commas — values can contain commas inside quotes. Use the csv module.
Don't suppress exceptions in __exit__ unless you have a very good reason (return False by default)
Don't hardcode absolute paths — use pathlib and relative paths for portability

💡 Pro Tips

Path.read_text() and Path.write_text() are one-liners for simple read/write — great for config files and small data
For JSON files, combine open() with json.load() / json.dump() — we'll cover this in a later lesson
The @contextmanager decorator is usually enough for simple context managers. Use the class-based approach when you need to store state or reuse the manager
tempfile.NamedTemporaryFile() and tempfile.TemporaryDirectory() from the standard library are production-ready context managers for temporary resources

📝 Summary

🎉 Key Takeaways

open() returns a file object — read with .read(), .readline(), .readlines(), or iteration
File modes: "r" (read), "w" (write/overwrite), "a" (append), "x" (exclusive create)
The with statement guarantees cleanup — use it for every open() call
csv module: DictReader/DictWriter for named-column access; always pass newline=""
pathlib.Path is the modern way to handle file paths cross-platform
Custom context managers: class-based (__enter__/__exit__) or generator-based (@contextmanager)

Task	Code Pattern
Read entire file	`with open(p, "r") as f: text = f.read()`
Read line by line	`with open(p) as f: for line in f: ...`
Write file	`with open(p, "w") as f: f.write(text)`
Append to file	`with open(p, "a") as f: f.write(text)`
Read CSV (dict)	`csv.DictReader(f)`
Write CSV (dict)	`csv.DictWriter(f, fieldnames=[...])`
Join paths	`Path("dir") / "file.txt"`
Quick read	`Path("f.txt").read_text(encoding="utf-8")`

📚 Additional Resources

🚀 What's Next?

In the next lesson, we'll dive into Error Handling in Depth — custom exception hierarchies, the full try/except/else/finally pattern, and how to use Python's logging module instead of print() for debugging.

🎉 Level Up!

Your programs can now read data from the outside world, process it, and write results back to disk. Combined with OOP from Module 1, you're building real tools — not just exercises.

🤔 Why File I/O Matters

📖 Key Terms

📖 Reading Text Files

Output:

Reading Methods

Iterating Line by Line

⚠️ Always Close Your Files

Specifying Encoding

🧠 Why UTF-8?

✍️ Writing Text Files

Output:

Writing Multiple Lines

⚠️ "w" Mode Erases Everything

🛡️ The with Statement

Reading and Writing in One Script

✅ Rule of Thumb

📊 Working with CSV Files

Reading CSV with csv.reader

Output:

Reading CSV with csv.DictReader

Output:

Writing CSV Files

🧠 Why newline=""?

🗂️ File Paths with pathlib

Common Path Operations

Using pathlib with open()

✅ pathlib vs String Paths

🔧 Custom Context Managers

Approach 1: Class-Based

Output:

📖 __exit__ Parameters

Approach 2: Generator-Based with @contextmanager

Output:

Practical Example: Temporary Directory

Output:

🏋️ Hands-on Exercises

🏋️ Exercise 1: Log Analyzer

Requirements:

Starter Code:

🏋️ Exercise 2: CSV Grade Report

Requirements:

🎯 Quick Quiz

📏 Best Practices

✅ Do's

❌ Don'ts

💡 Pro Tips

📝 Summary

🎉 Key Takeaways

📚 Additional Resources

🚀 What's Next?

🎉 Level Up!

⚠️ `"w"` Mode Erases Everything

🛡️ The `with` Statement

Reading CSV with `csv.reader`

Reading CSV with `csv.DictReader`

🧠 Why `newline=""`?

🗂️ File Paths with `pathlib`

Common `Path` Operations

Using `pathlib` with `open()`

✅ `pathlib` vs String Paths

📖 `exit` Parameters

Approach 2: Generator-Based with `@contextmanager`