-
Notifications
You must be signed in to change notification settings - Fork 0
Consolidate remaining v2 design issues around semiring backend contracts #20
Description
cc @tensor4all-meta
Summary
This issue consolidates the remaining design-v2 questions after the recent placement cleanup. The main unresolved theme is how custom semiring backends should relate to primitive vocabulary, trait contracts, and general execution engines.
Remaining issues
1. Custom semiring backend minimum contract is still inconsistent
Two different contracts are described today.
primitive-catalog.mdtreats tensor-structural operations likepermute,reshape,broadcast, anddiagonalas tensor-layer views / metadata transforms, so the backend-facing strict minimum is close toBatchedGemm + ReduceSum(with diagonal-related behavior depending on whether diagonal stays in the tensor layer).tenferro-internal-design.mdinstead putsTranspose,Reshape, andBroadcastInDimintoSemiringOpKind/SemiringOps, which makes them part of the required semiring contract.
These are materially different obligations for custom backends. The design needs one source of truth.
2. Compile-cache identity rules are still inconsistent
computegraph-design.md says compiled programs are cached using GlobalValKey-based structure, which includes InputKey.
tensor-api-pseudocode.md says the cache key is based on normalized graph topology and should ignore concrete InputKey / DiffPassId values so repeated differentiate calls hit the same cache entry.
This needs an explicit decision:
- either tenferro owns a second normalized compile-cache layer above computegraph
- or computegraph itself changes its cache contract
Without that, higher-order AD cache behavior is underspecified.
3. StdTensorOp is not fully a single source of truth yet
There are still mismatches between the vocabulary and the lowering tables. Examples:
- direct
StdTensorOp::Add/Mul/DotGeneralstyle descriptions vsStdTensorOp::Semiring(SemiringOpKind) Choleskydescribed as custom-call in one place and directstablehlo.choleskyin anotherLuFullPivot/StdTensorOp::CustomCallshown in later sections even though they are not present in the main enum definition
The enum definition, lowering table, and extensibility story should be unified.
Current idea
A cleaner direction is to separate Tenferro primitives from the traits that execute them.
Proposed layering
-
Primitive descriptors
- Keep a primitive vocabulary crate/module that only describes operations and their planning/execution descriptors.
- Examples: semiring core descriptors, scalar descriptors, linalg descriptors, transfer descriptors.
-
Executor traits
- Define small family-specific traits that know how to
plan/executethose descriptors. - Backends implement these traits, not a single giant monolithic backend trait.
- Define small family-specific traits that know how to
-
General engines
- Build generic interpreters / engines that consume primitive descriptors and dispatch through the executor traits.
- Einsum becomes one such engine. Standard tensor execution could become another.
-
Backend implementations
- CPU / CUDA / custom algebra backends only implement the trait families they actually support.
Why this looks promising
This is close to what origin/main already does for semiring execution:
Semiringdefines the algebraTensorSemiringCore<Alg>is the required semiring execution contractTensorSemiringFastPath<Alg>is the optional optimization contractEinsumBackend<Alg>is just the composition of those traitstenferro-einsumis the general engine that interprets einsum plans through those traits
That pattern is attractive because backend authors implement capabilities, while high-level algorithms live in reusable engines.
Design questions to resolve next
- Should design-v2 explicitly adopt the
origin/maindescriptor +plan/executepattern as the reference model? - For custom semiring backends, what is the true required core: only contraction/reduction primitives, or also tensor-structural view-like ops?
- Should diagonal / trace behavior stay in the semiring core, or be normalized away earlier?
- Should the standard backend story also move from
Backend<Op>toward capability-family traits plus one or more general engines?
Suggested resolution order
- Fix the custom semiring minimum contract
- Fix compile-cache identity semantics
- Make
StdTensorOp+ lowering tables a single source of truth - Then decide how far the primitive-descriptor / executor-trait / engine split should become the v2 architectural pattern