Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The Challenge

Reading a file is one of the most common operations performed in many Python programs.

However, reading files can introduce security issues if not handled carefully.

The goal should be to use a stable method across different operating systems that is resistant to malicious input. In Python, both open() and pathlib are standard tools, but both can leave the door open to path traversal attacks if user input is not strictly controlled.

The Threat

The primary security risk when reading files is a Path Traversal (also known as Directory Traversal) attack. If an attacker provides a path such as ../../etc/shadow or ../../root/.bash_history, they can “climb” out of the intended directory and access sensitive files.

Furthermore, relying on the system’s default encoding (as the first example does) can lead to crashes or data corruption when code is moved from a Linux server to a Windows environment. This can create a denial-of-service (DoS) risk or cause unexpected behaviour.

Vulnerable Code Example

# Vulnerable: No path validation + unpredictable encoding
def get_file(path):
    with open(path, "r") as f:
        return f.read()

Secure Mitigation

To read files more securely, you should resolve the path to its absolute form and verify that it still resides within a designated “safe” directory.

from pathlib import Path

def get_file_safe(user_input: str, safe_directory: str = "data_folder") -> str:
    base = Path(safe_directory).resolve()
    # Join and resolve the path to remove all '..'
    target = (base / user_input).resolve()

    # The Security Check
    if not target.is_relative_to(base):
        raise PermissionError("Access Denied: Path traversal detected.")
    
    if not target.is_file():
        raise FileNotFoundError("Target is not a valid file.")

    return target.read_text(encoding="utf-8")

Discussion

Neither open() nor pathlib automatically “jails” execution to a specific directory.

Overview:

Featureopen(path)Path(path).read_text()
Path Traversal RiskHighHigh
Encoding HandlingImplicit (OS-dependent)Explicit (secure default)
Logic ErrorsEasy to forget f.close()Handles closing automatically
Input SanitisationNoneNone

pathlib.Path is generally considered more secure in a broad sense because it encourages better coding practices that reduce accidental errors—though not solely because of path security. Using pathlib is the modern approach to file handling in Python.

Key advantages include: