Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The Python tarfile module makes it possible to read and write tar archives.

Using these methods in Python code can give serious security concerns.

Security concerns

The tarfile module enables reading and writing tar archives. However, TarFile.extract() and TarFile.extractall() are inherently dangerous when handling untrusted archives.

Key risks include:

Tar archives are extremely permissive by design. Malicious tarballs are trivial to craft, and the default extraction behavior offers little protection.

Preventive measures

import tarfile

with tarfile.open("archive.tar.gz") as tar:
    tar.extractall(path="/safe/dir", filter="data")  # Safest built-in filter

Example:

import tarfile

# Vulnerable - classic insecure pattern
with tarfile.open("untrusted.tar.gz") as tar:
    tar.extractall("/tmp/extract")   # Can write anywhere!

# Still risky without proper filter
with tarfile.open("untrusted.tar.gz") as tar:
    tar.extractall(path="/tmp/extract", filter=None)

Discussion

tarfile extraction should trigger in-depth security review in every code audit. Defense-in-depth is essential: combine safe filters, privilege dropping, sandboxing, and (when possible) cryptographic verification of archives before extraction. Even with mitigations, handling untrusted tarballs remains one of the riskiest operations in Python.

More Information