The Python tarfile module makes it possible to read and write tar archives.
Using these methods in Python code can give serious security concerns.
Security concerns¶
The tarfile module enables reading and writing tar archives. However, TarFile.extract() and TarFile.extractall() are inherently dangerous when handling untrusted archives.
Key risks include:
Path traversal (CWE-22): Malicious members can use absolute paths (
/etc/shadow),..sequences, or symlinks to escape the target directory.Privilege escalation: Running with elevated privileges allows overwriting system files, SSH keys, or configuration.
Sandbox escape: Breaks out of containers, chroots, or temporary directories.
Metadata tampering: Timestamps and other fields can obscure attacker activity and hinder forensics.
Denial of Service: Historical vulnerabilities (infinite loops in parsing) enabled resource exhaustion.
Tar archives are extremely permissive by design. Malicious tarballs are trivial to craft, and the default extraction behavior offers little protection.
Preventive measures¶
Never extract archives from untrusted sources without inspection and validation.
Always specify the
filterargument (Python 3.12+ strongly recommended):
import tarfile
with tarfile.open("archive.tar.gz") as tar:
tar.extractall(path="/safe/dir", filter="data") # Safest built-in filterfilter='data'strips the most dangerous behaviors (absolute paths,.., dangerous symlinks, etc.).Avoid
filter=None(old default) orfilter='tar'for untrusted data.Run extraction inside a dedicated low-privilege container or VM.
Manually validate members before extraction (reject absolute paths, symlinks to outside targets, devices, etc.).
Consider safer formats when possible (
zipfilewith validation, or libraries enforcing stricter safety).
Example:
import tarfile
# Vulnerable - classic insecure pattern
with tarfile.open("untrusted.tar.gz") as tar:
tar.extractall("/tmp/extract") # Can write anywhere!
# Still risky without proper filter
with tarfile.open("untrusted.tar.gz") as tar:
tar.extractall(path="/tmp/extract", filter=None)Discussion¶
tarfile extraction should trigger in-depth security review in every code audit. Defense-in-depth is essential: combine safe filters, privilege dropping, sandboxing, and (when possible) cryptographic verification of archives before extraction. Even with mitigations, handling untrusted tarballs remains one of the riskiest operations in Python.
