Marshal Statement

The marshal module in Python’s standard library is designed for internal serialization of Python objects, primarily for reading and writing compiled bytecode (.pyc) files. However, it is not intended to be secure against erroneous or maliciously constructed data and should never be used to unmarshal data from untrusted or unauthenticated sources.

Security Concerns¶

Remote Code Execution (RCE) Vulnerability¶

The most critical risk associated with marshal is its ability to deserialize Python code objects. If an attacker can provide a crafted marshalled file containing malicious code objects and the allow_code=True parameter is enabled, the following attack chain becomes possible:

The attacker supplies a marshalled file with embedded Python code objects
Your application calls marshal.load() with allow_code=True
The malicious code becomes a Python code object in memory
If your application then executes this object via exec() or eval() — or if the object is implicitly executed as part of a larger structure — the attacker’s code runs with your application’s permissions

This constitutes a classic Remote Code Execution (RCE) vulnerability that can lead to complete system compromise.

Lack of Data Integrity and Authentication¶

marshal provides no mechanisms for:

Data integrity verification: No cryptographic signatures or checksums to detect tampering
Authentication: No way to verify the source or authenticity of the data
Encryption: All data is transmitted or stored in plaintext

Format Instability and Compatibility Issues¶

The marshal format is:

Version-dependent: Data marshalled in one Python version may not be readable in another
Implementation-specific: The format can change between Python releases without notice
Not designed for persistence: Intended only for temporary storage of compiled bytecode

Inadequate Type Safety¶

Unlike more modern serialization formats, marshal:

Does not validate object types before deserialization
Can instantiate arbitrary objects, potentially bypassing security controls
Lacks a safe subset or whitelist mechanism for permitted types

False Sense of Security¶

Developers sometimes assume that marshal is a “lighter” or “safer” alternative to pickle. This is dangerously incorrect. marshal offers even fewer security guarantees and should never be considered for secure data exchange.

Preventive Measures¶

Never Use on Untrusted Data¶

Never call marshal.load() or marshal.loads() on data received from:

External APIs or web services
User uploads or input
Network sockets
Any source you do not fully control and trust

Restrict `allow_code` Usage¶

When using marshal, always set allow_code=False unless you have an absolute need for code object deserialization:

import marshal

# SECURE: Disallow code objects
with open("data.marshal", "rb") as f:
    data = marshal.load(f, allow_code=False)  # Raises ValueError if code objects present

Use Secure Alternatives¶

For serialization needs involving untrusted data, prefer:

JSON with schema validation for simple data structures
MessagePack or Protocol Buffers for binary data with defined schemas
PyYAML with safe_load() for YAML data
XML with proper parsing and entity restrictions

Restrict `exec()` and `eval()` Usage¶

If you must use marshal with code objects, ensure that the loaded code is never passed to exec(), eval(), or similar dynamic execution functions:

# DANGEROUS: Never do this
import marshal
with open("untrusted.marshal", "rb") as f:
    code_obj = marshal.load(f, allow_code=True)
    exec(code_obj)  # RCE vulnerability!

# SAFER: Use only for introspection (still risky)
with open("trusted.marshal", "rb") as f:
    code_obj = marshal.load(f, allow_code=True)
    # Inspect but never execute
    print(code_obj.co_name)

Implement Input Validation¶

For any deserialized data, implement strict validation:

Validate data types and structure
Sanitize any strings or values
Apply allowlists for permitted values
Reject unexpected or malformed data

Use Sandboxing and Isolation¶

If marshal usage is unavoidable:

Run the deserialization process in an isolated environment (e.g., container, virtual machine)
Apply minimal privileges (least privilege principle)
Consider using subprocess with a dedicated, restricted user account

Example¶

The following examples illustrate secure and insecure usage patterns:

import marshal

# INSECURE: Loading untrusted data with code objects enabled
def dangerous_load(filepath):
    with open(filepath, "rb") as f:
        data = marshal.load(f, allow_code=True)  # Vulnerable to RCE
        if isinstance(data, types.CodeType):
            exec(data)  # Attacker's code runs here!
        return data

# SECURE: Disallow code objects entirely
def safe_load(filepath):
    with open(filepath, "rb") as f:
        # This will raise ValueError if any code object is present
        return marshal.load(f, allow_code=False)

A bit more secure Validate and use safe serialization:

import json

def best_practice_load(filepath):
    with open(filepath, "r") as f:
        data = json.load(f)
    
    # Validate structure and types
    if not isinstance(data, dict):
        raise ValueError("Expected dictionary")
    
    # Validate each field
    required_fields = {"name": str, "value": int}
    for field, expected_type in required_fields.items():
        if field not in data or not isinstance(data[field], expected_type):
            raise ValueError(f"Invalid or missing field: {field}")
    
    return data

Best Practices¶

When using marshal in legacy code, apply a defensive measures like e.g:

import marshal
import types

def safely_load_marshal(filepath, allowed_types=None):
    """
    Load marshal data with type restrictions.
    
    Args:
        filepath: Path to marshal file
        allowed_types: Tuple of allowed Python types
    
    Returns:
        Deserialized data or None if invalid
    """
    if allowed_types is None:
        # Only allow basic, safe types
        allowed_types = (dict, list, str, int, float, bool, tuple)
    
    try:
        with open(filepath, "rb") as f:
            # Disallow code objects
            data = marshal.load(f, allow_code=False)
        
        # Recursively validate types
        def validate_types(obj):
            if isinstance(obj, allowed_types):
                if isinstance(obj, (dict, list, tuple)):
                    for item in obj:
                        if isinstance(obj, dict):
                            validate_types(obj[item])
                        else:
                            validate_types(item)
            else:
                raise TypeError(f"Disallowed type: {type(obj)}")
        
        validate_types(data)
        return data
        
    except (ValueError, TypeError, marshal.Error) as e:
        # Log the error but don't expose details to users
        print(f"Security check failed: {e}")
        return None

Discussion¶

The marshal module occupies a unique position in Python’s standard library: it’s a low-level serialization tool designed for the interpreter’s internal use, not for general-purpose data exchange or security-sensitive applications.

The `allow_code=True` Danger¶

The allow_code=True parameter represents a particularly severe risk because it allows deserialization of executable code objects. Consider this scenario:

# Attacker creates and marshals malicious code
import marshal, types
malicious_code = compile('__import__("os").system("rm -rf /")', '<string>', 'exec')
with open("evil.marshal", "wb") as f:
    marshal.dump(malicious_code, f)

# Vulnerable application loads and executes it
import marshal
with open("evil.marshal", "rb") as f:
    code = marshal.load(f, allow_code=True)
    exec(code)  # System compromise!

When Is `marshal` Acceptable?¶

The only legitimate use cases for marshal are:

Internal .pyc compilation: Python itself uses marshal to write compiled bytecode
Temporary caching in build tools: For performance-critical internal processes
Debugging and introspection: Examining code objects in development environments

Even in these cases, you should:

Never expose marshal interfaces to users or external systems
Set allow_code=False unless working directly with code objects
Consider alternatives like pickle with proper security protocols