Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
cd4a28d
refactor: hard cut tenferro to tidu value surface
shinaoka Mar 31, 2026
eceb68d
refactor: rebuild internal ad ops and linalg on linearized runtime
shinaoka Mar 31, 2026
e475585
feat: restore public tensor linalg surface
shinaoka Mar 31, 2026
287815e
chore: close tenferro linearize hard cut surface
shinaoka Mar 31, 2026
16d6855
chore: remove remaining legacy ad docs and tests
shinaoka Mar 31, 2026
1270164
docs: add tenferro public jvp design and plan
shinaoka Mar 31, 2026
dfdf1ab
test: lock tenferro public jvp contract
shinaoka Mar 31, 2026
b3b1d99
test: tighten public jvp qr contract
shinaoka Mar 31, 2026
01f99ca
feat: add tenferro public jvp surface
shinaoka Mar 31, 2026
77c74f3
fix: let tenferro jvp surface run primals first
shinaoka Mar 31, 2026
41ddec7
feat: implement tenferro public jvp transform
shinaoka Mar 31, 2026
e0761bf
fix: reject unsupported jvp tensor ops
shinaoka Mar 31, 2026
547a57c
test: cover optimized jvp seam behavior
shinaoka Mar 31, 2026
ad42552
test: strengthen jvp seam numeric checks
shinaoka Mar 31, 2026
efaa821
test: make qr and svd seam checks invariant
shinaoka Mar 31, 2026
9bfab1b
docs: add public jvp surface guidance
shinaoka Mar 31, 2026
305b531
fix: close public jvp verification regressions
shinaoka Mar 31, 2026
3d52e22
feat: expose full public tensor jvp seam
shinaoka Mar 31, 2026
5c3ca06
docs: add complex linalg ad rollout design
shinaoka Mar 31, 2026
a739c6d
refactor: rename requires_grad builder and clean warnings
shinaoka Mar 31, 2026
304ba81
fix: support complex eigh oracle replay
shinaoka Mar 31, 2026
4896520
docs: add pytorch-aligned ad surface design
shinaoka Mar 31, 2026
a2aa61c
docs: expand pytorch-aligned ad surface sketches
shinaoka Mar 31, 2026
6b0b080
feat: align tensor ad surface with linalg oracles
shinaoka Mar 31, 2026
fea600b
test: add lu oracle replay coverage
shinaoka Mar 31, 2026
41a4682
test: add norm oracle replay subset coverage
shinaoka Mar 31, 2026
8017f31
feat: add lstsq oracle subset replay support
shinaoka Mar 31, 2026
3bd3852
test: add vector norm oracle replay coverage
shinaoka Mar 31, 2026
6e1905c
test: align eig oracle replay and support coverage
shinaoka Apr 1, 2026
8670c29
feat: align lstsq ad with oracle residual summaries
shinaoka Apr 1, 2026
5717e1f
test: close norm oracle support gap
shinaoka Apr 1, 2026
a2125ac
chore: pin tidu to merged linearize-first
shinaoka Apr 1, 2026
3c6b597
Merge origin/main into linearize-hard-cut
shinaoka Apr 1, 2026
fe03bd1
test: fix tenferro coverage blockers
shinaoka Apr 1, 2026
5e95245
test: add focused coverage for ad surface and runtime
shinaoka Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,14 @@ Before acting, read the vendored shared rules from `template-rs`:
- `ai/vendor/template-rs/common-agent-rules.md`
- `ai/vendor/template-rs/numerical-rust-rules.md`
- `ai/vendor/template-rs/pr-workflow-rules.md`
- `REPOSITORY_RULES.md`

The sections below are tenferro-specific additions and overrides.

Before implementation work, review `REPOSITORY_RULES.md`.
Before creating a PR, review `REPOSITORY_RULES.md` again.
Before touching AD rules, oracle replay, or linearized boundary code, review `REPOSITORY_RULES.md` first.

## Current Implementation Status

The workspace contains active implementations alongside evolving APIs. Implementation work is allowed unless a task explicitly says otherwise.
Expand Down
6 changes: 3 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ thiserror = "2"
criterion = "0.5"
serde = "1"
serde_json = "1"
chainrules-core = { git = "https://github.com/tensor4all/chainrules-rs", branch = "deferred-hvp-tangents" }
chainrules = { git = "https://github.com/tensor4all/chainrules-rs", branch = "deferred-hvp-tangents" }
tidu = { git = "https://github.com/tensor4all/tidu-rs", rev = "ea504cd" }
chainrules-core = { git = "https://github.com/tensor4all/chainrules-rs", rev = "6cc46775b33653f91df96ca1571ce9905a6224f8" }
chainrules = { git = "https://github.com/tensor4all/chainrules-rs", rev = "6cc46775b33653f91df96ca1571ce9905a6224f8" }
tidu = { git = "https://github.com/tensor4all/tidu-rs.git", rev = "f2f57f75d228fdfa2494fb7d9a6a2e3efa019a0d" }
strided-view = { git = "https://github.com/tensor4all/strided-rs", rev = "ea37986f4d9cb99cfc1f62a1d4aea561cb3a9722" }
strided-traits = { git = "https://github.com/tensor4all/strided-rs", rev = "ea37986f4d9cb99cfc1f62a1d4aea561cb3a9722" }
strided-perm = { git = "https://github.com/tensor4all/strided-rs", rev = "ea37986f4d9cb99cfc1f62a1d4aea561cb3a9722", features = ["parallel"] }
Expand Down
58 changes: 30 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ C / C++ / Fortran / Python / Julia
tenferro-rs
┌─────────────────────────────┐
│ einsum over any algebra * │
full AD (VJP / JVP / HVP)
reverse AD (VJP)
│ extended precision (xprec) │
└─────────────────────────────┘
Expand All @@ -35,7 +35,7 @@ extended precision. The same einsum engine and AD machinery work across all of t
### Key strengths

- **Callable from C, C++, Fortran, Python, and Julia** via a stable C FFI — drop it into existing HPC codebases without rewriting anything.
- **AD-compatible across language boundaries** — reverse-mode (VJP), forward-mode (JVP), and Hessian-vector products (HVP) are exposed through the C API so Python and Julia AD systems can interoperate.
- **AD-compatible across language boundaries** — reverse-mode (VJP) is exposed through the C API so Python and Julia AD systems can interoperate with the current frontend.
- **Algebra-parameterized** — einsum, linalg, and AD rules are generic over the algebra, not hardwired to floating-point arithmetic.
- **Extended precision** — planned support for double-double and higher-precision types for numerically demanding simulations.
- **ML bridge** — designed to interoperate with [burn](https://github.com/tracel-ai/burn) for hybrid neural network + tensor network models.
Expand All @@ -56,7 +56,7 @@ A general-purpose tensor computation library in Rust with CPU support today and
(`TensorSemiringCore`, `TensorSemiringFastPath`, `TensorScalarPrims`,
`TensorAnalyticPrims`)
- High-level einsum with N-ary contraction tree optimization
- Automatic differentiation (VJP/JVP)
- Reverse-mode automatic differentiation
- C FFI for Julia/Python integration

Extension crates (tropical semiring, burn bridge, ndarray interop) live under `extension/`.
Expand Down Expand Up @@ -85,12 +85,12 @@ The workspace now uses three naming buckets:
|-------|----------|
| **`tenferro-tensor-compute`** | You want typed `Tensor<T>` with einsum and linalg — **start here** |
| `tenferro-dynamic-compute` | You want runtime-selected dtypes without automatic differentiation |
| `tenferro` | You need automatic differentiation (VJP/JVP) |
| `tenferro` | You need reverse-mode automatic differentiation |
| `tenferro-tensor` | You only need the data type, no computation (library authors) |

- **Typed path** (`tenferro-tensor-compute`): `Tensor<T>` with a fixed scalar type at compile time. Best when you know the scalar type and do not need automatic gradient tracking.
- **Dynamic primal path** (`tenferro-dynamic-compute`): dynamic scalar type without automatic differentiation. Best when you need runtime dtype selection but no tape/gradient state.
- **Dynamic AD path** (`tenferro`): dynamic scalar type with automatic differentiation (VJP/JVP). Best when you need gradients.
- **Dynamic AD path** (`tenferro`): dynamic scalar type with reverse-mode automatic differentiation. Best when you need gradients.

The quickstart below uses the typed path; `tenferro-dynamic-compute` is the non-AD dynamic alternative, and the [Autodiff quickstart](#autodiff-quickstart) shows the dynamic AD path.

Expand All @@ -110,15 +110,15 @@ Current implementation homes:
- `tenferro-internal-frontend-core`: shared dynamic tensor substrate and
structured-layout helpers used by both `tenferro-dynamic-compute` and
`tenferro`
- `tenferro-internal-ad-core`: `AdTensor<T>`, homogeneous tape glue, and the
shared AD operation helpers that used to live in `tenferro/src/ops/common.rs`
- `tenferro-internal-ad-surface`: the dynamic AD tensor surface, eager AD
entrypoints (`grad`, `backward`, `forward_ad`), and the builder-style linalg
wrappers used behind `tenferro`
- `tenferro-internal-ad-linalg`: typed linalg AD builders, eager helpers, and
result types used behind `tenferro`
- `tenferro-internal-ad-ops`: typed scalar, reduction, and einsum AD builders,
eager helpers, and pullback helpers used behind `tenferro`
- `tenferro-internal-ad-core`: shared `DynTensor`/`Value` interop and AD
support code used by the public tensor facade
- `tenferro-internal-ad-surface`: the dynamic `Tensor` facade, `grad` /
`backward` entrypoints, checkpoint policy plumbing, and runtime-dispatched
linalg wrappers used behind `tenferro`
- `tenferro-internal-ad-linalg`: `LinearizableOp` / `LinearizedOp`
implementations for linalg operations and their result wrappers
- `tenferro-internal-ad-ops`: `LinearizableOp` / `LinearizedOp`
implementations for scalar, reduction, and einsum operations

## Quickstart

Expand Down Expand Up @@ -197,7 +197,7 @@ fn main() {

// 2. Create tensors and enable gradient tracking.
let mut x = Tensor::from_slice(&[1.0_f64, 2.0, 3.0], &[3]).unwrap();
x.set_requires_grad(true).unwrap();
x = x.with_requires_grad(true);

// 3. Forward pass: loss = sum(exp(x))
let loss = x.exp().unwrap().sum().unwrap();
Expand All @@ -217,6 +217,10 @@ fn main() {

For more examples, see the crate docs for `tenferro-einsum` and `tenferro-tensor`.

For the current `tenferro` dynamic AD surface, including which `Tensor`
operations are wired into public `jvp(...)`, see
[`tenferro/README.md`](./tenferro/README.md).

### Linear algebra quickstart

Linalg is included in `tenferro-tensor-compute` by default (the `linalg` feature).
Expand Down Expand Up @@ -294,17 +298,16 @@ The API and internal architecture are strongly influenced by
- **Plan-based execution** — The primitive family traits keep a
describe-plan-execute contract that follows the cuTENSOR / BLAS pattern used
by PyTorch's GPU backend.
- **Automatic differentiation** — Tape-based reverse mode (VJP) and
dual-number forward mode (JVP) follow PyTorch's autograd and `torch.func`
design, factored into standalone `chainrules-core` / `chainrules` crates
(inspired by Julia's
[ChainRulesCore.jl](https://github.com/JuliaDiff/ChainRulesCore.jl)).
- **Automatic differentiation** — The public `Tensor` facade is backed by
`tidu::Value<DynTensor>`. Reverse-mode frontend helpers are exposed as
`grad` / `backward`, while downstream custom ops integrate through
`LinearizableOp` / `LinearizedOp` and implement `jvp` / `vjp` explicitly.
- **Einsum** — Ported from Julia's
[OMEinsum.jl](https://github.com/under-Peter/OMEinsum.jl); string notation
(`"ij,jk->ik"`) is compatible with `torch.einsum`, with N-ary contraction
tree optimization.
- **Linear algebra** — `tenferro-linalg` mirrors `torch.linalg` (SVD, QR, LU,
eigen, Cholesky, solve) with differentiable decompositions.
eig/eigh, Cholesky, solve) with differentiable decompositions.

Key differences from PyTorch: column-major default layout with `(m, n, *)`
batch convention, compile-time generics (`Tensor<T>`) instead of runtime dtype
Expand Down Expand Up @@ -341,12 +344,11 @@ For a detailed feature-by-feature mapping, see

| Area | Status | Notes |
| --- | --- | --- |
| Multi-input backward | Strong | `einsum`, `solve`, `solve_triangular`, `lstsq` |
| Forward mode | Strong | Best supported when only a few inputs carry tangents |
| Multi-input HVP | Partial | Explicitly exposed for `einsum` |
| Higher-order derivatives (non-HVP) | Partial | Low-level `tidu::Tape<Tensor<T>>` flows are available, but the `tenferro` frontend validation depth is still limited |
| Reverse-mode frontend | Strong | `Tensor::with_requires_grad`, `Tensor::grad`, `Tensor::backward`, and top-level `grad` / `backward` helpers |
| Custom op integration | Available | Downstream custom ops should implement `LinearizableOp` + `LinearizedOp` and supply `jvp` / `vjp` |
| Public JVP frontend | Available | `tenferro::jvp(...)` returns primals plus optional output tangents; there is still no public HVP helper |
| Linalg AD surface | Available | Broad op coverage, but validation depth is uneven across ops |
| Complex/real matrices | Strong | Complex `einsum`, complex `solve_triangular`, and real-to-complex `eig` are covered |
| Complex/real matrices | Strong | Public `Tensor` JVP covers complex `einsum`, `solve`, `solve_triangular`, `det`, `inv`, `slogdet`, `cholesky`, `pinv`, `matrix_exp`, `eigh`, and the current `vector_norm` / `matrix_norm` slice; see [`tenferro/README.md`](tenferro/README.md) for the current operation matrix |

## Design

Expand Down Expand Up @@ -398,8 +400,8 @@ The shared docs deploy workflow publishes the same `target/docs-site` tree to Gi

`tenferro-linalg` continuously replays the vendored
`third_party/tensor-ad-oracles` database during workspace tests. Supported
families are validated against the published first-order references and, where
available, scalarized HVP payloads. Published families that tenferro does not
families are validated against the published first-order references. Published
families that tenferro does not
yet replay are tracked explicitly in:

- [`docs/generated/tensor-ad-oracles-support.md`](docs/generated/tensor-ad-oracles-support.md)
Expand Down
28 changes: 28 additions & 0 deletions REPOSITORY_RULES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Repository Rules

## Public Surface Drift

- `README`, rustdoc, and examples must not claim capabilities beyond the current public surface.
- When the public API changes, check for stale names, stale capability claims, and deleted paths in `README`, rustdoc, and examples before considering the work complete.

## Oracle Gate

- Do not add or keep an AD `frule` or `rrule` in the mainline without a corresponding oracle family.
- Prefer oracle families with both Torch reference data and finite-difference checks.
- If a Torch reference is not available, a finite-difference-only oracle is acceptable.
- If no corresponding oracle exists yet, add it to `tensor-ad-oracles` before treating the rule as a supported mainline AD rule.

## Rule Source Of Truth

- Treat `frule` and `rrule` as the semantic source of truth for first-order AD.
- `LinearizedOp::jvp` and `LinearizedOp::vjp` should be thin adapters to the existing `frule` and `rrule` by default.

## Linearized Seam Coverage

- If `LinearizedOp::jvp` or `LinearizedOp::vjp` is not a thin delegation to the existing `frule` or `rrule`, add a focused seam test.
- The seam test must exercise the runtime packaging that the rule-math tests do not cover, such as saved linearization, schema, optional tangents/cotangents, or multi-output packaging.

## No Ad Hoc Fixes

- Do not add ad hoc fixes that violate DRY, KISS, or layering.
- Do not introduce compatibility shims, duplicated logic, or downstream reach-through into lower layers when the correct fix belongs in an existing seam or high-level API.
8 changes: 7 additions & 1 deletion coverage-thresholds.json
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,17 @@
"tenferro-tensor/src/tensor/structural.rs": 64,
"tenferro-tensor/src/structured_tensor/conversion.rs": 75,
"internal/tenferro-internal-ad-core/src/registry.rs": 78,
"internal/tenferro-internal-ad-linalg/src/linearized.rs": 69,
"internal/tenferro-internal-ad-ops/src/linearized.rs": 76,
"internal/tenferro-internal-ad-ops/src/math.rs": 45,
"internal/tenferro-internal-ad-ops/src/ops/einsum/dense_rule.rs": 50,
"internal/tenferro-internal-ad-surface/src/autograd_api.rs": 70,
"internal/tenferro-internal-ad-surface/src/core/dynamic/dyn_ad_tensor/eager_linalg/extra_tensorized.rs": 79,
"internal/tenferro-internal-frontend-core/src/autodiff.rs": 48,
"internal/tenferro-internal-frontend-core/src/structured_einsum.rs": 23,
"internal/tenferro-internal-ad-surface/src/ops/linalg/primal/factorizations.rs": 75,
"internal/tenferro-internal-ad-surface/src/ops/linalg/primal/solve.rs": 78,
"internal/tenferro-internal-runtime/src/dispatch.rs": 65
"internal/tenferro-internal-runtime/src/dispatch.rs": 65,
"extension/tenferro-ext-tropical/src/ad/convert.rs": 65
}
}
123 changes: 35 additions & 88 deletions docs/AD/lstsq.md
Original file line number Diff line number Diff line change
@@ -1,128 +1,75 @@
# Least Squares Reverse-Mode Rule (`lstsq_rrule`)
# Least Squares AD Notes (`lstsq`)

## Forward
## Public contract

$$
x = \arg\min_x \|Ax - b\|_2^2, \quad A \in \mathbb{C}^{M \times N},\ b \in \mathbb{C}^M,\ M \geq N
$$

The solution satisfies the normal equations $A^\dagger A x = A^\dagger b$.
Via thin QR ($A = QR$): $x = R^{-1} Q^\dagger b$.

## Reverse rule

**Given:** cotangent $\bar{x} \in \mathbb{C}^N$ of a real scalar loss $\ell$.

**Compute:** $\bar{A} \in \mathbb{C}^{M \times N}$ and $\bar{b} \in \mathbb{C}^M$.

### Step 1: QR decompose $A$

$$
A = QR
$$

where $Q \in \mathbb{C}^{M \times N}$ ($Q^\dagger Q = I_N$) and $R \in \mathbb{C}^{N \times N}$ (upper triangular).

### Step 2: Solve two triangular systems
`lstsq(a, b)` returns

$$
y = R^{-\dagger} \bar{x}, \qquad z = R^{-1} y
$$
- `solution = pinv(a) @ b`
- `residuals = ||a @ solution - b||_F^2` per right-hand side when `m > n` and the solve is full-rank
- `residuals = []` otherwise

Note that $z = (R^\dagger R)^{-1} \bar{x} = (A^\dagger A)^{-1} \bar{x}$.

### Step 3: Compute cotangents

$$
\bar{b} = Q y
$$
The auxiliary metadata `rank` and `singular_values` are not differentiated.

$$
\bar{A} = r \, z^\dagger - \bar{b} \, x^\dagger
$$
## First-order source of truth

where $r = b - Ax$ is the residual.
The first-order rules are expressed in terms of the pseudoinverse:

### Complete formulas
### JVP for the solution

$$
\bar{b} = Q R^{-\dagger} \bar{x}
$$
For `x = pinv(A) b`,

$$
\bar{A} = (b - Ax)(R^{-1} R^{-\dagger} \bar{x})^\dagger - (Q R^{-\dagger} \bar{x}) x^\dagger
dx = d(pinv(A))\, b + pinv(A)\, db
$$

### Derivation
In the implementation, `d(pinv(A))` is provided by `pinv_frule`.

The optimality condition is $A^\dagger(Ax - b) = 0$, i.e. $A^\dagger r = 0$ where $r = b - Ax$.
### JVP for the residual summaries

Differentiating the normal equations $A^\dagger A x = A^\dagger b$:
Let

$$
dA^\dagger A x + A^\dagger dA \, x + A^\dagger A \, dx = dA^\dagger b + A^\dagger db
r = A x - b, \qquad dr = dA\,x - db
$$

Rearranging:
Then, for each right-hand side,

$$
A^\dagger A \, dx = A^\dagger db + dA^\dagger r - A^\dagger dA \, x
d\,\mathrm{residuals} = 2 \sum \mathrm{Re}(r \odot \overline{dr})
$$

$$
dx = (A^\dagger A)^{-1}(A^\dagger db + dA^\dagger r - A^\dagger dA \, x)
$$

For the pullback, let $z = (A^\dagger A)^{-1} \bar{x}$:

$$
\delta\ell = \langle \bar{x}, dx \rangle = \langle z, A^\dagger db + dA^\dagger r - A^\dagger dA \, x \rangle
$$
For the current real-valued `lstsq` AD path, this reduces to

$$
= \langle Az, db \rangle + \langle r z^\dagger, dA \rangle - \langle Az \, x^\dagger, dA \rangle
d\,\mathrm{residuals} = 2 \sum r \odot dr
$$

Reading off the cotangents:
### VJP for the solution

$$
\bar{b} = Az = A (A^\dagger A)^{-1} \bar{x} = Q R R^{-1} R^{-\dagger} \bar{x} = Q R^{-\dagger} \bar{x} = Qy
$$
Let `gx` be the cotangent of `solution`. Since `x = pinv(A) b`,

$$
\bar{A} = r z^\dagger - \bar{b} x^\dagger
$$
- the cotangent for `pinv(A)` is `gx @ b^H`
- the cotangent for `b` from this path is `pinv(A)^H @ gx`
- the cotangent for `A` from this path is given by `pinv_rrule`

## Implementation notes
This is the path used by the implementation.

- Compute QR once and reuse for both triangular solves.
- Never form $(A^\dagger A)^{-1}$ explicitly; always use triangular solves.
- The residual $r = b - Ax$ may already be available from the forward pass.
### VJP for the residual summaries

## Verification

### Forward check
Let `gr` be the cotangent of the summary residual outputs, broadcast per RHS. Then

$$
\|Ax - b\|_2 \text{ is minimized}, \quad A^\dagger(Ax - b) \approx 0
\bar{A}_{res} = 2 (gr \odot r)\, x^H
$$

### Gradient check (backward)

Scalar test function (from BackwardsLinalg.jl):

$$
f(A, b) = x^\dagger \mathrm{op} \, x, \quad x = A \backslash b
\bar{b}_{res} = -2 (gr \odot r)
$$

where $\mathrm{op}$ is a random Hermitian matrix independent of $A$ and $b$.

Two separate gradient checks:
- **$\bar{A}$:** fix $b$, perturb $A$
- **$\bar{b}$:** fix $A$, perturb $b$
The full VJP is the sum of the solution-path and residual-summary-path contributions.

## References
## Verification policy

1. BackwardsLinalg.jl (GiggleLiu), `src/lstsq.jl`.
2. M. B. Giles, "An extended collection of matrix derivative results
for forward and reverse mode automatic differentiation," 2008.
- `frule/rrule` are the semantic source of truth
- oracle replay must exist before the rule is considered mainline
- `LinearizedOp::jvp/vjp` is expected to be a thin adapter over these rules
Loading
Loading