Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The marshal module in Python’s standard library is designed for internal serialization of Python objects, primarily for reading and writing compiled bytecode (.pyc) files. However, it is not intended to be secure against erroneous or maliciously constructed data and should never be used to unmarshal data from untrusted or unauthenticated sources.

Security Concerns

Remote Code Execution (RCE) Vulnerability

The most critical risk associated with marshal is its ability to deserialize Python code objects. If an attacker can provide a crafted marshalled file containing malicious code objects and the allow_code=True parameter is enabled, the following attack chain becomes possible:

  1. The attacker supplies a marshalled file with embedded Python code objects

  2. Your application calls marshal.load() with allow_code=True

  3. The malicious code becomes a Python code object in memory

  4. If your application then executes this object via exec() or eval() — or if the object is implicitly executed as part of a larger structure — the attacker’s code runs with your application’s permissions

This constitutes a classic Remote Code Execution (RCE) vulnerability that can lead to complete system compromise.

Lack of Data Integrity and Authentication

marshal provides no mechanisms for:

Format Instability and Compatibility Issues

The marshal format is:

Inadequate Type Safety

Unlike more modern serialization formats, marshal:

False Sense of Security

Developers sometimes assume that marshal is a “lighter” or “safer” alternative to pickle. This is dangerously incorrect. marshal offers even fewer security guarantees and should never be considered for secure data exchange.

Preventive Measures

Never Use on Untrusted Data

Never call marshal.load() or marshal.loads() on data received from:

Restrict allow_code Usage

When using marshal, always set allow_code=False unless you have an absolute need for code object deserialization:

import marshal

# SECURE: Disallow code objects
with open("data.marshal", "rb") as f:
    data = marshal.load(f, allow_code=False)  # Raises ValueError if code objects present

Use Secure Alternatives

For serialization needs involving untrusted data, prefer:

Restrict exec() and eval() Usage

If you must use marshal with code objects, ensure that the loaded code is never passed to exec(), eval(), or similar dynamic execution functions:

# DANGEROUS: Never do this
import marshal
with open("untrusted.marshal", "rb") as f:
    code_obj = marshal.load(f, allow_code=True)
    exec(code_obj)  # RCE vulnerability!

# SAFER: Use only for introspection (still risky)
with open("trusted.marshal", "rb") as f:
    code_obj = marshal.load(f, allow_code=True)
    # Inspect but never execute
    print(code_obj.co_name)

Implement Input Validation

For any deserialized data, implement strict validation:

Use Sandboxing and Isolation

If marshal usage is unavoidable:

Example

The following examples illustrate secure and insecure usage patterns:

import marshal

# INSECURE: Loading untrusted data with code objects enabled
def dangerous_load(filepath):
    with open(filepath, "rb") as f:
        data = marshal.load(f, allow_code=True)  # Vulnerable to RCE
        if isinstance(data, types.CodeType):
            exec(data)  # Attacker's code runs here!
        return data

# SECURE: Disallow code objects entirely
def safe_load(filepath):
    with open(filepath, "rb") as f:
        # This will raise ValueError if any code object is present
        return marshal.load(f, allow_code=False)

A bit more secure Validate and use safe serialization:

import json

def best_practice_load(filepath):
    with open(filepath, "r") as f:
        data = json.load(f)
    
    # Validate structure and types
    if not isinstance(data, dict):
        raise ValueError("Expected dictionary")
    
    # Validate each field
    required_fields = {"name": str, "value": int}
    for field, expected_type in required_fields.items():
        if field not in data or not isinstance(data[field], expected_type):
            raise ValueError(f"Invalid or missing field: {field}")
    
    return data

Best Practices

When using marshal in legacy code, apply a defensive measures like e.g:

import marshal
import types

def safely_load_marshal(filepath, allowed_types=None):
    """
    Load marshal data with type restrictions.
    
    Args:
        filepath: Path to marshal file
        allowed_types: Tuple of allowed Python types
    
    Returns:
        Deserialized data or None if invalid
    """
    if allowed_types is None:
        # Only allow basic, safe types
        allowed_types = (dict, list, str, int, float, bool, tuple)
    
    try:
        with open(filepath, "rb") as f:
            # Disallow code objects
            data = marshal.load(f, allow_code=False)
        
        # Recursively validate types
        def validate_types(obj):
            if isinstance(obj, allowed_types):
                if isinstance(obj, (dict, list, tuple)):
                    for item in obj:
                        if isinstance(obj, dict):
                            validate_types(obj[item])
                        else:
                            validate_types(item)
            else:
                raise TypeError(f"Disallowed type: {type(obj)}")
        
        validate_types(data)
        return data
        
    except (ValueError, TypeError, marshal.Error) as e:
        # Log the error but don't expose details to users
        print(f"Security check failed: {e}")
        return None

Discussion

The marshal module occupies a unique position in Python’s standard library: it’s a low-level serialization tool designed for the interpreter’s internal use, not for general-purpose data exchange or security-sensitive applications.

The allow_code=True Danger

The allow_code=True parameter represents a particularly severe risk because it allows deserialization of executable code objects. Consider this scenario:

# Attacker creates and marshals malicious code
import marshal, types
malicious_code = compile('__import__("os").system("rm -rf /")', '<string>', 'exec')
with open("evil.marshal", "wb") as f:
    marshal.dump(malicious_code, f)

# Vulnerable application loads and executes it
import marshal
with open("evil.marshal", "rb") as f:
    code = marshal.load(f, allow_code=True)
    exec(code)  # System compromise!

When Is marshal Acceptable?

The only legitimate use cases for marshal are:

  1. Internal .pyc compilation: Python itself uses marshal to write compiled bytecode

  2. Temporary caching in build tools: For performance-critical internal processes

  3. Debugging and introspection: Examining code objects in development environments

Even in these cases, you should:

More Information