Skip to content

Fitters.py refactoring #26

@LeonidElkin

Description

@LeonidElkin

Refactor: fitters module (architecture, caching boundaries, options, tests)

Problem

The current fitters.py implementation is not maintainable and blocks graph evolution:

  • Too few fitters and too few controllable behaviors.
  • Low test coverage → low confidence in correctness.
  • No consistent NumPy/array semantics.
  • Caching is conceptually present in the project (ComputationMethod, FittedComputationMethod), but fitter code does not
    separate cacheable “core math” from non-cacheable logic.
  • Implementations are hard to read, rushed, and not optimized.
  • High risk of code duplication as we add more characteristics and conversion paths.

Goal

Bring fitters to a clean, extensible, well-tested state:

  • A clear fitter API and module structure.
  • Array-first NumPy semantics (broadcasting, vectorization) using project types.py aliases.
  • Explicit caching boundaries compatible with ComputationMethod / FittedComputationMethod.
  • Discoverable, structured options (not “hidden kwargs”), with predictable scoping.
  • Solid unit test coverage (correctness + reachability with defaults).

Deliverables

1) Module structure & public API

  • Decide module layout (e.g. pysatl_core/distributions/fitters/ or similar).
  • Introduce a unified fitter abstraction (function-based or class-based), but it must support:
    • declared targets and sources,
    • constraints / requirements,
    • option schema / descriptors,
    • an execution entry point operating on arrays.
  • Ensure compatibility with current graph/configuration plumbing (minimal friction to register).

2) Matching/registration mechanism (to avoid boilerplate & duplication)

Implement a way to register and select the “best” fitter by:

  • target characteristic,
  • source characteristic(s),
  • constraints (domain restrictions, monotonicity, continuity assumptions, etc.),
  • optional priority/score rules if multiple candidates match.

This should make configuration.py registration possible in a simple loop without bespoke glue for each fitter.

3) Options: scoping + discoverability

Current situation: options are passed via Distribution.query_method(..., **options) which is opaque to users and can collide.

Required improvements:

  • Define structured options (descriptors/metadata) per fitter:
    • name, type, default, description, validation.
  • Define option scoping rules to avoid collisions:
    • options must be unambiguous even if a computation path uses multiple fitters with the same “name”.
    • choose a default strategy options.
  • Interim allowance (if needed): keep options internal (defaults only) to stabilize the system, but the schema must exist and be testable.

4) Caching boundaries

Refactor each fitter implementation so that:

  • cacheable “core math” is separated from non-cacheable steps (input normalization, validation, shape handling, etc.),
  • integration with ComputationMethod / FittedComputationMethod is straightforward,
  • caching keys are stable and do not accidentally include large array payloads unless intended.

5) Rewrite existing fitters (and keep NumPy semantics)

  • Review every fitter currently in fitters.py.
  • For each:
    • either fully replace it or bring it to an acceptable production-level state,
    • ensure vectorized array implementation (no Python loops where avoidable),
    • define meaningful options where appropriate (with validation and tests),
    • document expected input/output shapes and broadcasting behavior via docstrings

6) Tests (must be comprehensive)

Everything here must be well-tested.

Minimum expectations:

  • Unit tests for each fitter:
    • correctness on scalar and vector inputs,
    • broadcasting behavior,
    • edge cases (tails, boundary values, invalid inputs),
    • option effects (non-default options change behavior predictably).
  • Tests for registration/matching:
    • correct fitter selection given (target, sources, constraints),
    • deterministic selection when multiple candidates exist.
  • Tests for caching boundaries:
    • cached part is reused when expected,
    • non-cacheable part is not cached by mistake,
    • cache keys behave as intended.

7) Performance sanity checks (non-benchmark)

  • Ensure no obvious quadratic behavior on common array sizes.
  • Avoid redundant recomputation where caching or vectorization can help.
  • Add at least a few “sanity” tests that run on moderate arrays to catch accidental slow paths.

Acceptance Criteria

  • fitters.py is refactored (or replaced) into a clean module with a consistent API.
  • All fitters operate on arrays with NumPy semantics and use project type aliases.
  • Options are structured and discoverable (schema exists and is validated), with collision-free scoping.
  • Caching boundaries are explicit and compatible with existing caching abstractions.
  • High-quality unit tests exist for fitters, matching/registration, and caching behavior.
  • Code is readable and ready for extending the characteristic graph.

Out of scope

  • The final public exposure of options via Distribution.query_method may be staged, but the option schema must be implemented now.

Metadata

Metadata

Assignees

Type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions