The marshal module in Python’s standard library is designed for internal serialization of Python objects, primarily for reading and writing compiled bytecode (.pyc) files. However, it is not intended to be secure against erroneous or maliciously constructed data and should never be used to unmarshal data from untrusted or unauthenticated sources.
Security Concerns¶
Remote Code Execution (RCE) Vulnerability¶
The most critical risk associated with marshal is its ability to deserialize Python code objects. If an attacker can provide a crafted marshalled file containing malicious code objects and the allow_code=True parameter is enabled, the following attack chain becomes possible:
The attacker supplies a marshalled file with embedded Python code objects
Your application calls
marshal.load()withallow_code=TrueThe malicious code becomes a Python code object in memory
If your application then executes this object via
exec()oreval()— or if the object is implicitly executed as part of a larger structure — the attacker’s code runs with your application’s permissions
This constitutes a classic Remote Code Execution (RCE) vulnerability that can lead to complete system compromise.
Lack of Data Integrity and Authentication¶
marshal provides no mechanisms for:
Data integrity verification: No cryptographic signatures or checksums to detect tampering
Authentication: No way to verify the source or authenticity of the data
Encryption: All data is transmitted or stored in plaintext
Format Instability and Compatibility Issues¶
The marshal format is:
Version-dependent: Data marshalled in one Python version may not be readable in another
Implementation-specific: The format can change between Python releases without notice
Not designed for persistence: Intended only for temporary storage of compiled bytecode
Inadequate Type Safety¶
Unlike more modern serialization formats, marshal:
Does not validate object types before deserialization
Can instantiate arbitrary objects, potentially bypassing security controls
Lacks a safe subset or whitelist mechanism for permitted types
False Sense of Security¶
Developers sometimes assume that marshal is a “lighter” or “safer” alternative to pickle. This is dangerously incorrect. marshal offers even fewer security guarantees and should never be considered for secure data exchange.
Preventive Measures¶
Never Use on Untrusted Data¶
Never call marshal.load() or marshal.loads() on data received from:
External APIs or web services
User uploads or input
Network sockets
Any source you do not fully control and trust
Restrict allow_code Usage¶
When using marshal, always set allow_code=False unless you have an absolute need for code object deserialization:
import marshal
# SECURE: Disallow code objects
with open("data.marshal", "rb") as f:
data = marshal.load(f, allow_code=False) # Raises ValueError if code objects presentUse Secure Alternatives¶
For serialization needs involving untrusted data, prefer:
JSON with schema validation for simple data structures
MessagePack or Protocol Buffers for binary data with defined schemas
PyYAML with
safe_load()for YAML dataXML with proper parsing and entity restrictions
Restrict exec() and eval() Usage¶
If you must use marshal with code objects, ensure that the loaded code is never passed to exec(), eval(), or similar dynamic execution functions:
# DANGEROUS: Never do this
import marshal
with open("untrusted.marshal", "rb") as f:
code_obj = marshal.load(f, allow_code=True)
exec(code_obj) # RCE vulnerability!
# SAFER: Use only for introspection (still risky)
with open("trusted.marshal", "rb") as f:
code_obj = marshal.load(f, allow_code=True)
# Inspect but never execute
print(code_obj.co_name)Implement Input Validation¶
For any deserialized data, implement strict validation:
Validate data types and structure
Sanitize any strings or values
Apply allowlists for permitted values
Reject unexpected or malformed data
Use Sandboxing and Isolation¶
If marshal usage is unavoidable:
Run the deserialization process in an isolated environment (e.g., container, virtual machine)
Apply minimal privileges (least privilege principle)
Consider using
subprocesswith a dedicated, restricted user account
Example¶
The following examples illustrate secure and insecure usage patterns:
import marshal
# INSECURE: Loading untrusted data with code objects enabled
def dangerous_load(filepath):
with open(filepath, "rb") as f:
data = marshal.load(f, allow_code=True) # Vulnerable to RCE
if isinstance(data, types.CodeType):
exec(data) # Attacker's code runs here!
return data
# SECURE: Disallow code objects entirely
def safe_load(filepath):
with open(filepath, "rb") as f:
# This will raise ValueError if any code object is present
return marshal.load(f, allow_code=False)A bit more secure Validate and use safe serialization:
import json
def best_practice_load(filepath):
with open(filepath, "r") as f:
data = json.load(f)
# Validate structure and types
if not isinstance(data, dict):
raise ValueError("Expected dictionary")
# Validate each field
required_fields = {"name": str, "value": int}
for field, expected_type in required_fields.items():
if field not in data or not isinstance(data[field], expected_type):
raise ValueError(f"Invalid or missing field: {field}")
return dataBest Practices¶
When using marshal in legacy code, apply a defensive measures like e.g:
import marshal
import types
def safely_load_marshal(filepath, allowed_types=None):
"""
Load marshal data with type restrictions.
Args:
filepath: Path to marshal file
allowed_types: Tuple of allowed Python types
Returns:
Deserialized data or None if invalid
"""
if allowed_types is None:
# Only allow basic, safe types
allowed_types = (dict, list, str, int, float, bool, tuple)
try:
with open(filepath, "rb") as f:
# Disallow code objects
data = marshal.load(f, allow_code=False)
# Recursively validate types
def validate_types(obj):
if isinstance(obj, allowed_types):
if isinstance(obj, (dict, list, tuple)):
for item in obj:
if isinstance(obj, dict):
validate_types(obj[item])
else:
validate_types(item)
else:
raise TypeError(f"Disallowed type: {type(obj)}")
validate_types(data)
return data
except (ValueError, TypeError, marshal.Error) as e:
# Log the error but don't expose details to users
print(f"Security check failed: {e}")
return NoneDiscussion¶
The marshal module occupies a unique position in Python’s standard library: it’s a low-level serialization tool designed for the interpreter’s internal use, not for general-purpose data exchange or security-sensitive applications.
The allow_code=True Danger¶
The allow_code=True parameter represents a particularly severe risk because it allows deserialization of executable code objects. Consider this scenario:
# Attacker creates and marshals malicious code
import marshal, types
malicious_code = compile('__import__("os").system("rm -rf /")', '<string>', 'exec')
with open("evil.marshal", "wb") as f:
marshal.dump(malicious_code, f)
# Vulnerable application loads and executes it
import marshal
with open("evil.marshal", "rb") as f:
code = marshal.load(f, allow_code=True)
exec(code) # System compromise!When Is marshal Acceptable?¶
The only legitimate use cases for marshal are:
Internal
.pyccompilation: Python itself usesmarshalto write compiled bytecodeTemporary caching in build tools: For performance-critical internal processes
Debugging and introspection: Examining code objects in development environments
Even in these cases, you should:
Never expose
marshalinterfaces to users or external systemsSet
allow_code=Falseunless working directly with code objectsConsider alternatives like
picklewith proper security protocols
