tensor4all · shinaoka · Apr 1, 2026 · Mar 31, 2026 · Mar 31, 2026 · Mar 31, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -7,9 +7,14 @@ Before acting, read the vendored shared rules from `template-rs`:
 - `ai/vendor/template-rs/common-agent-rules.md`
 - `ai/vendor/template-rs/numerical-rust-rules.md`
 - `ai/vendor/template-rs/pr-workflow-rules.md`
+- `REPOSITORY_RULES.md`
 
 The sections below are tenferro-specific additions and overrides.
 
+Before implementation work, review `REPOSITORY_RULES.md`.
+Before creating a PR, review `REPOSITORY_RULES.md` again.
+Before touching AD rules, oracle replay, or linearized boundary code, review `REPOSITORY_RULES.md` first.
+
 ## Current Implementation Status
 
 The workspace contains active implementations alongside evolving APIs. Implementation work is allowed unless a task explicitly says otherwise.

diff --git a/Cargo.toml b/Cargo.toml
@@ -47,9 +47,9 @@ thiserror = "2"
 criterion = "0.5"
 serde = "1"
 serde_json = "1"
-chainrules-core = { git = "https://github.com/tensor4all/chainrules-rs", branch = "deferred-hvp-tangents" }
-chainrules = { git = "https://github.com/tensor4all/chainrules-rs", branch = "deferred-hvp-tangents" }
-tidu = { git = "https://github.com/tensor4all/tidu-rs", rev = "ea504cd" }
+chainrules-core = { git = "https://github.com/tensor4all/chainrules-rs", rev = "6cc46775b33653f91df96ca1571ce9905a6224f8" }
+chainrules = { git = "https://github.com/tensor4all/chainrules-rs", rev = "6cc46775b33653f91df96ca1571ce9905a6224f8" }
+tidu = { git = "https://github.com/tensor4all/tidu-rs.git", rev = "f2f57f75d228fdfa2494fb7d9a6a2e3efa019a0d" }
 strided-view = { git = "https://github.com/tensor4all/strided-rs", rev = "ea37986f4d9cb99cfc1f62a1d4aea561cb3a9722" }
 strided-traits = { git = "https://github.com/tensor4all/strided-rs", rev = "ea37986f4d9cb99cfc1f62a1d4aea561cb3a9722" }
 strided-perm = { git = "https://github.com/tensor4all/strided-rs", rev = "ea37986f4d9cb99cfc1f62a1d4aea561cb3a9722", features = ["parallel"] }

diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@ C / C++ / Fortran / Python / Julia
        tenferro-rs
   ┌─────────────────────────────┐
   │  einsum over any algebra *  │
-  │  full AD  (VJP / JVP / HVP) │
+  │  reverse AD  (VJP)          │
   │  extended precision (xprec) │
   └─────────────────────────────┘
             │
@@ -35,7 +35,7 @@ extended precision. The same einsum engine and AD machinery work across all of t
 ### Key strengths
 
 - **Callable from C, C++, Fortran, Python, and Julia** via a stable C FFI — drop it into existing HPC codebases without rewriting anything.
-- **AD-compatible across language boundaries** — reverse-mode (VJP), forward-mode (JVP), and Hessian-vector products (HVP) are exposed through the C API so Python and Julia AD systems can interoperate.
+- **AD-compatible across language boundaries** — reverse-mode (VJP) is exposed through the C API so Python and Julia AD systems can interoperate with the current frontend.
 - **Algebra-parameterized** — einsum, linalg, and AD rules are generic over the algebra, not hardwired to floating-point arithmetic.
 - **Extended precision** — planned support for double-double and higher-precision types for numerically demanding simulations.
 - **ML bridge** — designed to interoperate with [burn](https://github.com/tracel-ai/burn) for hybrid neural network + tensor network models.
@@ -56,7 +56,7 @@ A general-purpose tensor computation library in Rust with CPU support today and
   (`TensorSemiringCore`, `TensorSemiringFastPath`, `TensorScalarPrims`,
   `TensorAnalyticPrims`)
 - High-level einsum with N-ary contraction tree optimization
-- Automatic differentiation (VJP/JVP)
+- Reverse-mode automatic differentiation
 - C FFI for Julia/Python integration
 
 Extension crates (tropical semiring, burn bridge, ndarray interop) live under `extension/`.
@@ -85,12 +85,12 @@ The workspace now uses three naming buckets:
 |-------|----------|
 | **`tenferro-tensor-compute`** | You want typed `Tensor<T>` with einsum and linalg — **start here** |
 | `tenferro-dynamic-compute` | You want runtime-selected dtypes without automatic differentiation |
-| `tenferro` | You need automatic differentiation (VJP/JVP) |
+| `tenferro` | You need reverse-mode automatic differentiation |
 | `tenferro-tensor` | You only need the data type, no computation (library authors) |
 
 - **Typed path** (`tenferro-tensor-compute`): `Tensor<T>` with a fixed scalar type at compile time. Best when you know the scalar type and do not need automatic gradient tracking.
 - **Dynamic primal path** (`tenferro-dynamic-compute`): dynamic scalar type without automatic differentiation. Best when you need runtime dtype selection but no tape/gradient state.
-- **Dynamic AD path** (`tenferro`): dynamic scalar type with automatic differentiation (VJP/JVP). Best when you need gradients.
+- **Dynamic AD path** (`tenferro`): dynamic scalar type with reverse-mode automatic differentiation. Best when you need gradients.
 
 The quickstart below uses the typed path; `tenferro-dynamic-compute` is the non-AD dynamic alternative, and the [Autodiff quickstart](#autodiff-quickstart) shows the dynamic AD path.
 
@@ -110,15 +110,15 @@ Current implementation homes:
 - `tenferro-internal-frontend-core`: shared dynamic tensor substrate and
   structured-layout helpers used by both `tenferro-dynamic-compute` and
   `tenferro`
-- `tenferro-internal-ad-core`: `AdTensor<T>`, homogeneous tape glue, and the
-  shared AD operation helpers that used to live in `tenferro/src/ops/common.rs`
-- `tenferro-internal-ad-surface`: the dynamic AD tensor surface, eager AD
-  entrypoints (`grad`, `backward`, `forward_ad`), and the builder-style linalg
-  wrappers used behind `tenferro`
-- `tenferro-internal-ad-linalg`: typed linalg AD builders, eager helpers, and
-  result types used behind `tenferro`
-- `tenferro-internal-ad-ops`: typed scalar, reduction, and einsum AD builders,
-  eager helpers, and pullback helpers used behind `tenferro`
+- `tenferro-internal-ad-core`: shared `DynTensor`/`Value` interop and AD
+  support code used by the public tensor facade
+- `tenferro-internal-ad-surface`: the dynamic `Tensor` facade, `grad` /
+  `backward` entrypoints, checkpoint policy plumbing, and runtime-dispatched
+  linalg wrappers used behind `tenferro`
+- `tenferro-internal-ad-linalg`: `LinearizableOp` / `LinearizedOp`
+  implementations for linalg operations and their result wrappers
+- `tenferro-internal-ad-ops`: `LinearizableOp` / `LinearizedOp`
+  implementations for scalar, reduction, and einsum operations
 
 ## Quickstart
 
@@ -197,7 +197,7 @@ fn main() {
 
     // 2. Create tensors and enable gradient tracking.
     let mut x = Tensor::from_slice(&[1.0_f64, 2.0, 3.0], &[3]).unwrap();
-    x.set_requires_grad(true).unwrap();
+    x = x.with_requires_grad(true);
 
     // 3. Forward pass: loss = sum(exp(x))
     let loss = x.exp().unwrap().sum().unwrap();
@@ -217,6 +217,10 @@ fn main() {
 
 For more examples, see the crate docs for `tenferro-einsum` and `tenferro-tensor`.
 
+For the current `tenferro` dynamic AD surface, including which `Tensor`
+operations are wired into public `jvp(...)`, see
+[`tenferro/README.md`](./tenferro/README.md).
+
 ### Linear algebra quickstart
 
 Linalg is included in `tenferro-tensor-compute` by default (the `linalg` feature).
@@ -294,17 +298,16 @@ The API and internal architecture are strongly influenced by
 - **Plan-based execution** — The primitive family traits keep a
   describe-plan-execute contract that follows the cuTENSOR / BLAS pattern used
   by PyTorch's GPU backend.
-- **Automatic differentiation** — Tape-based reverse mode (VJP) and
-  dual-number forward mode (JVP) follow PyTorch's autograd and `torch.func`
-  design, factored into standalone `chainrules-core` / `chainrules` crates
-  (inspired by Julia's
-  [ChainRulesCore.jl](https://github.com/JuliaDiff/ChainRulesCore.jl)).
+- **Automatic differentiation** — The public `Tensor` facade is backed by
+  `tidu::Value<DynTensor>`. Reverse-mode frontend helpers are exposed as
+  `grad` / `backward`, while downstream custom ops integrate through
+  `LinearizableOp` / `LinearizedOp` and implement `jvp` / `vjp` explicitly.
 - **Einsum** — Ported from Julia's
   [OMEinsum.jl](https://github.com/under-Peter/OMEinsum.jl); string notation
   (`"ij,jk->ik"`) is compatible with `torch.einsum`, with N-ary contraction
   tree optimization.
 - **Linear algebra** — `tenferro-linalg` mirrors `torch.linalg` (SVD, QR, LU,
-  eigen, Cholesky, solve) with differentiable decompositions.
+  eig/eigh, Cholesky, solve) with differentiable decompositions.
 
 Key differences from PyTorch: column-major default layout with `(m, n, *)`
 batch convention, compile-time generics (`Tensor<T>`) instead of runtime dtype
@@ -341,12 +344,11 @@ For a detailed feature-by-feature mapping, see
 
 | Area | Status | Notes |
 | --- | --- | --- |
-| Multi-input backward | Strong | `einsum`, `solve`, `solve_triangular`, `lstsq` |
-| Forward mode | Strong | Best supported when only a few inputs carry tangents |
-| Multi-input HVP | Partial | Explicitly exposed for `einsum` |
-| Higher-order derivatives (non-HVP) | Partial | Low-level `tidu::Tape<Tensor<T>>` flows are available, but the `tenferro` frontend validation depth is still limited |
+| Reverse-mode frontend | Strong | `Tensor::with_requires_grad`, `Tensor::grad`, `Tensor::backward`, and top-level `grad` / `backward` helpers |
+| Custom op integration | Available | Downstream custom ops should implement `LinearizableOp` + `LinearizedOp` and supply `jvp` / `vjp` |
+| Public JVP frontend | Available | `tenferro::jvp(...)` returns primals plus optional output tangents; there is still no public HVP helper |
 | Linalg AD surface | Available | Broad op coverage, but validation depth is uneven across ops |
-| Complex/real matrices | Strong | Complex `einsum`, complex `solve_triangular`, and real-to-complex `eig` are covered |
+| Complex/real matrices | Strong | Public `Tensor` JVP covers complex `einsum`, `solve`, `solve_triangular`, `det`, `inv`, `slogdet`, `cholesky`, `pinv`, `matrix_exp`, `eigh`, and the current `vector_norm` / `matrix_norm` slice; see [`tenferro/README.md`](tenferro/README.md) for the current operation matrix |
 
 ## Design
 
@@ -398,8 +400,8 @@ The shared docs deploy workflow publishes the same `target/docs-site` tree to Gi
 
 `tenferro-linalg` continuously replays the vendored
 `third_party/tensor-ad-oracles` database during workspace tests. Supported
-families are validated against the published first-order references and, where
-available, scalarized HVP payloads. Published families that tenferro does not
+families are validated against the published first-order references. Published
+families that tenferro does not
 yet replay are tracked explicitly in:
 
 - [`docs/generated/tensor-ad-oracles-support.md`](docs/generated/tensor-ad-oracles-support.md)

diff --git a/REPOSITORY_RULES.md b/REPOSITORY_RULES.md
@@ -0,0 +1,28 @@
+# Repository Rules
+
+## Public Surface Drift
+
+- `README`, rustdoc, and examples must not claim capabilities beyond the current public surface.
+- When the public API changes, check for stale names, stale capability claims, and deleted paths in `README`, rustdoc, and examples before considering the work complete.
+
+## Oracle Gate
+
+- Do not add or keep an AD `frule` or `rrule` in the mainline without a corresponding oracle family.
+- Prefer oracle families with both Torch reference data and finite-difference checks.
+- If a Torch reference is not available, a finite-difference-only oracle is acceptable.
+- If no corresponding oracle exists yet, add it to `tensor-ad-oracles` before treating the rule as a supported mainline AD rule.
+
+## Rule Source Of Truth
+
+- Treat `frule` and `rrule` as the semantic source of truth for first-order AD.
+- `LinearizedOp::jvp` and `LinearizedOp::vjp` should be thin adapters to the existing `frule` and `rrule` by default.
+
+## Linearized Seam Coverage
+
+- If `LinearizedOp::jvp` or `LinearizedOp::vjp` is not a thin delegation to the existing `frule` or `rrule`, add a focused seam test.
+- The seam test must exercise the runtime packaging that the rule-math tests do not cover, such as saved linearization, schema, optional tangents/cotangents, or multi-output packaging.
+
+## No Ad Hoc Fixes
+
+- Do not add ad hoc fixes that violate DRY, KISS, or layering.
+- Do not introduce compatibility shims, duplicated logic, or downstream reach-through into lower layers when the correct fix belongs in an existing seam or high-level API.
diff --git a/coverage-thresholds.json b/coverage-thresholds.json
@@ -68,11 +68,17 @@
     "tenferro-tensor/src/tensor/structural.rs": 64,
     "tenferro-tensor/src/structured_tensor/conversion.rs": 75,
     "internal/tenferro-internal-ad-core/src/registry.rs": 78,
+    "internal/tenferro-internal-ad-linalg/src/linearized.rs": 69,
+    "internal/tenferro-internal-ad-ops/src/linearized.rs": 76,
+    "internal/tenferro-internal-ad-ops/src/math.rs": 45,
     "internal/tenferro-internal-ad-ops/src/ops/einsum/dense_rule.rs": 50,
     "internal/tenferro-internal-ad-surface/src/autograd_api.rs": 70,
     "internal/tenferro-internal-ad-surface/src/core/dynamic/dyn_ad_tensor/eager_linalg/extra_tensorized.rs": 79,
+    "internal/tenferro-internal-frontend-core/src/autodiff.rs": 48,
+    "internal/tenferro-internal-frontend-core/src/structured_einsum.rs": 23,
     "internal/tenferro-internal-ad-surface/src/ops/linalg/primal/factorizations.rs": 75,
     "internal/tenferro-internal-ad-surface/src/ops/linalg/primal/solve.rs": 78,
-    "internal/tenferro-internal-runtime/src/dispatch.rs": 65
+    "internal/tenferro-internal-runtime/src/dispatch.rs": 65,
+    "extension/tenferro-ext-tropical/src/ad/convert.rs": 65
   }
 }
diff --git a/docs/AD/lstsq.md b/docs/AD/lstsq.md
@@ -1,128 +1,75 @@
-# Least Squares Reverse-Mode Rule (`lstsq_rrule`)
+# Least Squares AD Notes (`lstsq`)
 
-## Forward
+## Public contract
 
-$$
-x = \arg\min_x \|Ax - b\|_2^2, \quad A \in \mathbb{C}^{M \times N},\ b \in \mathbb{C}^M,\ M \geq N
-$$
-
-The solution satisfies the normal equations $A^\dagger A x = A^\dagger b$.
-Via thin QR ($A = QR$): $x = R^{-1} Q^\dagger b$.
-
-## Reverse rule
-
-**Given:** cotangent $\bar{x} \in \mathbb{C}^N$ of a real scalar loss $\ell$.
-
-**Compute:** $\bar{A} \in \mathbb{C}^{M \times N}$ and $\bar{b} \in \mathbb{C}^M$.
-
-### Step 1: QR decompose $A$
-
-$$
-A = QR
-$$
-
-where $Q \in \mathbb{C}^{M \times N}$ ($Q^\dagger Q = I_N$) and $R \in \mathbb{C}^{N \times N}$ (upper triangular).
-
-### Step 2: Solve two triangular systems
+`lstsq(a, b)` returns
 
-$$
-y = R^{-\dagger} \bar{x}, \qquad z = R^{-1} y
-$$
+- `solution = pinv(a) @ b`
+- `residuals = ||a @ solution - b||_F^2` per right-hand side when `m > n` and the solve is full-rank
+- `residuals = []` otherwise
 
-Note that $z = (R^\dagger R)^{-1} \bar{x} = (A^\dagger A)^{-1} \bar{x}$.
-
-### Step 3: Compute cotangents
-
-$$
-\bar{b} = Q y
-$$
+The auxiliary metadata `rank` and `singular_values` are not differentiated.
 
-$$
-\bar{A} = r \, z^\dagger - \bar{b} \, x^\dagger
-$$
+## First-order source of truth
 
-where $r = b - Ax$ is the residual.
+The first-order rules are expressed in terms of the pseudoinverse:
 
-### Complete formulas
+### JVP for the solution
 
-$$
-\bar{b} = Q R^{-\dagger} \bar{x}
-$$
+For `x = pinv(A) b`,
 
 $$
-\bar{A} = (b - Ax)(R^{-1} R^{-\dagger} \bar{x})^\dagger - (Q R^{-\dagger} \bar{x}) x^\dagger
+dx = d(pinv(A))\, b + pinv(A)\, db
 $$
 
-### Derivation
+In the implementation, `d(pinv(A))` is provided by `pinv_frule`.
 
-The optimality condition is $A^\dagger(Ax - b) = 0$, i.e. $A^\dagger r = 0$ where $r = b - Ax$.
+### JVP for the residual summaries
 
-Differentiating the normal equations $A^\dagger A x = A^\dagger b$:
+Let
 
 $$
-dA^\dagger A x + A^\dagger dA \, x + A^\dagger A \, dx = dA^\dagger b + A^\dagger db
+r = A x - b, \qquad dr = dA\,x - db
 $$
 
-Rearranging:
+Then, for each right-hand side,
 
 $$
-A^\dagger A \, dx = A^\dagger db + dA^\dagger r - A^\dagger dA \, x
+d\,\mathrm{residuals} = 2 \sum \mathrm{Re}(r \odot \overline{dr})
 $$
 
-$$
-dx = (A^\dagger A)^{-1}(A^\dagger db + dA^\dagger r - A^\dagger dA \, x)
-$$
-
-For the pullback, let $z = (A^\dagger A)^{-1} \bar{x}$:
-
-$$
-\delta\ell = \langle \bar{x}, dx \rangle = \langle z, A^\dagger db + dA^\dagger r - A^\dagger dA \, x \rangle
-$$
+For the current real-valued `lstsq` AD path, this reduces to
 
 $$
-= \langle Az, db \rangle + \langle r z^\dagger, dA \rangle - \langle Az \, x^\dagger, dA \rangle
+d\,\mathrm{residuals} = 2 \sum r \odot dr
 $$
 
-Reading off the cotangents:
+### VJP for the solution
 
-$$
-\bar{b} = Az = A (A^\dagger A)^{-1} \bar{x} = Q R R^{-1} R^{-\dagger} \bar{x} = Q R^{-\dagger} \bar{x} = Qy
-$$
+Let `gx` be the cotangent of `solution`. Since `x = pinv(A) b`,
 
-$$
-\bar{A} = r z^\dagger - \bar{b} x^\dagger
-$$
+- the cotangent for `pinv(A)` is `gx @ b^H`
+- the cotangent for `b` from this path is `pinv(A)^H @ gx`
+- the cotangent for `A` from this path is given by `pinv_rrule`
 
-## Implementation notes
+This is the path used by the implementation.
 
-- Compute QR once and reuse for both triangular solves.
-- Never form $(A^\dagger A)^{-1}$ explicitly; always use triangular solves.
-- The residual $r = b - Ax$ may already be available from the forward pass.
+### VJP for the residual summaries
 
-## Verification
-
-### Forward check
+Let `gr` be the cotangent of the summary residual outputs, broadcast per RHS. Then
 
 $$
-\|Ax - b\|_2 \text{ is minimized}, \quad A^\dagger(Ax - b) \approx 0
+\bar{A}_{res} = 2 (gr \odot r)\, x^H
 $$
 
-### Gradient check (backward)
-
-Scalar test function (from BackwardsLinalg.jl):
-
 $$
-f(A, b) = x^\dagger \mathrm{op} \, x, \quad x = A \backslash b
+\bar{b}_{res} = -2 (gr \odot r)
 $$
 
-where $\mathrm{op}$ is a random Hermitian matrix independent of $A$ and $b$.
-
-Two separate gradient checks:
-- **$\bar{A}$:** fix $b$, perturb $A$
-- **$\bar{b}$:** fix $A$, perturb $b$
+The full VJP is the sum of the solution-path and residual-summary-path contributions.
 
-## References
+## Verification policy
 
-1. BackwardsLinalg.jl (GiggleLiu), `src/lstsq.jl`.
-2. M. B. Giles, "An extended collection of matrix derivative results
-   for forward and reverse mode automatic differentiation," 2008.
+- `frule/rrule` are the semantic source of truth
+- oracle replay must exist before the rule is considered mainline
+- `LinearizedOp::jvp/vjp` is expected to be a thin adapter over these rules