-
Notifications
You must be signed in to change notification settings - Fork 2
Labels
ALG: ComputationsNew algorithms for computations or improvements of existing onesNew algorithms for computations or improvements of existing onesOptimizationRefactorcore.distributions
Description
Refactor: fitters module (architecture, caching boundaries, options, tests)
Problem
The current fitters.py implementation is not maintainable and blocks graph evolution:
- Too few fitters and too few controllable behaviors.
- Low test coverage → low confidence in correctness.
- No consistent NumPy/array semantics.
- Caching is conceptually present in the project (
ComputationMethod,FittedComputationMethod), but fitter code does not
separate cacheable “core math” from non-cacheable logic. - Implementations are hard to read, rushed, and not optimized.
- High risk of code duplication as we add more characteristics and conversion paths.
Goal
Bring fitters to a clean, extensible, well-tested state:
- A clear fitter API and module structure.
- Array-first NumPy semantics (broadcasting, vectorization) using project
types.pyaliases. - Explicit caching boundaries compatible with
ComputationMethod/FittedComputationMethod. - Discoverable, structured options (not “hidden kwargs”), with predictable scoping.
- Solid unit test coverage (correctness + reachability with defaults).
Deliverables
1) Module structure & public API
- Decide module layout (e.g.
pysatl_core/distributions/fitters/or similar). - Introduce a unified fitter abstraction (function-based or class-based), but it must support:
- declared targets and sources,
- constraints / requirements,
- option schema / descriptors,
- an execution entry point operating on arrays.
- Ensure compatibility with current graph/configuration plumbing (minimal friction to register).
2) Matching/registration mechanism (to avoid boilerplate & duplication)
Implement a way to register and select the “best” fitter by:
- target characteristic,
- source characteristic(s),
- constraints (domain restrictions, monotonicity, continuity assumptions, etc.),
- optional priority/score rules if multiple candidates match.
This should make configuration.py registration possible in a simple loop without bespoke glue for each fitter.
3) Options: scoping + discoverability
Current situation: options are passed via Distribution.query_method(..., **options) which is opaque to users and can collide.
Required improvements:
- Define structured options (descriptors/metadata) per fitter:
- name, type, default, description, validation.
- Define option scoping rules to avoid collisions:
- options must be unambiguous even if a computation path uses multiple fitters with the same “name”.
- choose a default strategy options.
- Interim allowance (if needed): keep options internal (defaults only) to stabilize the system, but the schema must exist and be testable.
4) Caching boundaries
Refactor each fitter implementation so that:
- cacheable “core math” is separated from non-cacheable steps (input normalization, validation, shape handling, etc.),
- integration with
ComputationMethod/FittedComputationMethodis straightforward, - caching keys are stable and do not accidentally include large array payloads unless intended.
5) Rewrite existing fitters (and keep NumPy semantics)
- Review every fitter currently in
fitters.py. - For each:
- either fully replace it or bring it to an acceptable production-level state,
- ensure vectorized array implementation (no Python loops where avoidable),
- define meaningful options where appropriate (with validation and tests),
- document expected input/output shapes and broadcasting behavior via docstrings
6) Tests (must be comprehensive)
Everything here must be well-tested.
Minimum expectations:
- Unit tests for each fitter:
- correctness on scalar and vector inputs,
- broadcasting behavior,
- edge cases (tails, boundary values, invalid inputs),
- option effects (non-default options change behavior predictably).
- Tests for registration/matching:
- correct fitter selection given (target, sources, constraints),
- deterministic selection when multiple candidates exist.
- Tests for caching boundaries:
- cached part is reused when expected,
- non-cacheable part is not cached by mistake,
- cache keys behave as intended.
7) Performance sanity checks (non-benchmark)
- Ensure no obvious quadratic behavior on common array sizes.
- Avoid redundant recomputation where caching or vectorization can help.
- Add at least a few “sanity” tests that run on moderate arrays to catch accidental slow paths.
Acceptance Criteria
fitters.pyis refactored (or replaced) into a clean module with a consistent API.- All fitters operate on arrays with NumPy semantics and use project type aliases.
- Options are structured and discoverable (schema exists and is validated), with collision-free scoping.
- Caching boundaries are explicit and compatible with existing caching abstractions.
- High-quality unit tests exist for fitters, matching/registration, and caching behavior.
- Code is readable and ready for extending the characteristic graph.
Out of scope
- The final public exposure of options via
Distribution.query_methodmay be staged, but the option schema must be implemented now.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
ALG: ComputationsNew algorithms for computations or improvements of existing onesNew algorithms for computations or improvements of existing onesOptimizationRefactorcore.distributions
Type
Projects
Status
In progress