Skip to content

fix: RestrictedUnpickler allowlist contains 30+ modules without audit -- needs minimum-privilege reduction #1502

@AlexFiliakov

Description

@AlexFiliakov

Problem Statement

The RestrictedUnpickler allowlist permits over 20 standard library modules beyond what is required for basic pickle reconstruction. The current allowlist was designed permissively ("allow anything that might be needed") rather than restrictively ("allow only what IS needed"). This broad surface area creates an additive risk: each additional allowed module provides more gadget chain primitives that can be combined to bypass restrictions.

The following modules should be audited and likely removed:

  • "itertools" -- itertools.chain, itertools.repeat can create infinite iterators causing memory exhaustion
  • "math" -- math.inf, math.nan can cause NaN poisoning; math.pow with extreme values causes OverflowError
  • "numbers" -- abstract base classes, not needed for reconstruction
  • "abc" -- abstract base classes, not needed for reconstruction
  • "array" -- array.array provides low-level memory access similar to struct

The core issue is architectural: the allowlist should follow the principle of minimum privilege, listing ONLY modules whose classes are actually present in serialized data.

v1.0 Impact

Yes, this is a v1.0 blocker as a comprehensive hardening issue. The aggregate risk of the broad allowlist exceeds the sum of individual module risks because modules can be combined in unexpected ways.

Affected Code

  • ergodic_insurance/safe_pickle.py:L45-L89 -- entire _ALLOWED_MODULES frozenset

Current Behavior

The allowlist contains 30+ module entries, many of which are not required for deserializing any object in the ergodic_insurance package. No documentation exists explaining WHY each module is in the allowlist or which serialized object types require it.

Expected Behavior

The allowlist should be minimal and documented. Each entry should have a comment explaining which serialized class requires it. The list should be derived from an audit of all safe_dump call sites and the types they serialize.

Recommended minimal allowlist (to be confirmed by audit):

_ALLOWED_MODULES = frozenset({
    "ergodic_insurance",   # Project types
    "numpy",               # numpy arrays (ndarray, dtype)
    "numpy.core",          # numpy internals for array reconstruction
    "pandas",              # DataFrames
    "pandas.core",         # pandas internals
    "collections",         # OrderedDict, defaultdict
    "_collections",        # C accelerator
    "datetime",            # datetime objects
    "_datetime",           # C accelerator
    "decimal",             # Decimal values
    "_decimal",            # C accelerator
    "copyreg",             # pickle infrastructure (_reconstructor)
    "pathlib",             # Path objects
    "dataclasses",         # dataclass reconstruction
    "enum",                # Enum values
})

Alternative Solutions Evaluated

  1. Audit and reduce allowlist: Determine which modules are actually needed by examining all safe_dump call sites. Pros: Precise, documented. Cons: Requires thorough testing.
  2. Switch to per-class allowlisting: Instead of module prefixes, list explicit (module, class) pairs. Pros: Maximum security. Cons: Highest maintenance burden, may miss internal reconstruction helpers.
  3. Add a test that enumerates pickled types: Create a test that serializes and deserializes every type used in production, then verify the minimum required allowlist. Pros: Self-documenting, catches regressions. Cons: Initial effort to create.

Recommended Approach

Option 1 combined with option 3: Audit the allowlist, remove unnecessary modules, and add a test that verifies the allowlist is sufficient for all production serialization patterns. Document each remaining entry with its justification.

Acceptance Criteria

  • Audit of all safe_dump call sites to identify actually-serialized types
  • Remove all modules not required for deserializing those types
  • Each remaining allowlist entry documented with justification
  • Test added that serializes/deserializes all production types and verifies minimum allowlist
  • All existing tests continue to pass
  • Allowlist reduced to fewer than 20 entries (from current 30+)

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority-mediumMedium prioritysecuritySecurity vulnerabilities and hardening

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions