Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 1 addition & 7 deletions docs/math/cholesky.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,13 +132,7 @@ $$

This is the adjoint of the JVP map and keeps $\bar{A}$ Hermitian.

## Implementation Correspondence

- `tenferro-rs/docs/AD/cholesky.md` uses the same $\varphi / \varphi^*$ pair to
express both JVP and VJP.
- PyTorch's `cholesky_jvp` and `cholesky_backward` implement the same
triangular-solve sandwich rather than explicit inverses.
- Never form $L^{-1}$ explicitly; use triangular solves on the left and right.
Never form $L^{-1}$ explicitly; use triangular solves on the left and right.

## Verification

Expand Down
26 changes: 10 additions & 16 deletions docs/math/det.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ $$
- complex case:

$$
\bar{A} = \overline{\bar{d} \cdot \det(A)} \cdot A^{-\mathsf{H}}.
\bar{A} = \bar{d} \cdot \overline{\det(A)} \cdot A^{-\mathsf{H}}.
$$

## Singular matrix handling
Expand All @@ -107,9 +107,9 @@ still makes sense:
- rank $N-1$: the adjugate is rank 1 and can be reconstructed from an SVD
- rank $\le N-2$: the adjugate vanishes

PyTorch's `linalg_det_backward` handles this regime by reconstructing the
leave-one-out singular-value products together with the orientation/phase factor
coming from $U$ and $V^{\mathsf{H}}$.
The rank-$N-1$ adjugate can be reconstructed from the leave-one-out singular
value products together with the orientation/phase factor carried by the
singular vectors.

## 2. `slogdet`

Expand All @@ -129,34 +129,28 @@ $$

### Reverse Rule

For the differentiable log-magnitude path:
Given cotangents $\bar{s}$ for the sign output and $\bar{\ell}$ for the
log-magnitude output:

- real case:

$$
\bar{A} = \overline{\operatorname{logabsdet}} \cdot A^{-\mathsf{T}}
\bar{A} = \bar{\ell} \cdot A^{-\mathsf{T}}
$$

- complex case:

$$
\bar{A} = g \cdot A^{-\mathsf{H}},
\qquad
g = \overline{\operatorname{logabsdet}}
- i \operatorname{Im}(\overline{\operatorname{sign}}^* \operatorname{sign}).
g = \bar{\ell} - i \operatorname{Im}(\bar{s}^* s),
$$

where $s = \operatorname{sign}(A)$.

`slogdet` is not differentiable at singular matrices because
$\operatorname{logabsdet} = -\infty$ there.

## Implementation Correspondence

- `tenferro-rs/docs/AD/det.md` keeps both `det` and `slogdet` in one note and
discusses the singular adjugate path explicitly.
- PyTorch's `linalg_det_jvp`, `linalg_det_backward`, `slogdet_jvp`, and
`slogdet_backward` implement the same split and use solves rather than
explicit inverses.

## Verification

- compare primal `det(A)` and `slogdet(A)` with direct evaluation
Expand Down
19 changes: 4 additions & 15 deletions docs/math/eig.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,8 +150,8 @@ $$

### Normalization correction

PyTorch and `tenferro-rs` both normalize eigenvectors to unit norm. Therefore
the raw tangent must be projected back onto that gauge:
If eigenvectors are normalized to unit norm, the raw tangent must be projected
back onto that gauge:

$$
\dot{V} =
Expand Down Expand Up @@ -205,26 +205,15 @@ $$
\operatorname{Im}(\operatorname{diag}(V^\dagger \bar{V})) = 0.
$$

PyTorch's `linalg_eig_backward` checks this condition numerically and raises for
ill-defined losses.
Losses that violate this condition are ill-defined for derivatives through the
eigenvector phase gauge.

## Relationship to the Hermitian Case

When $A$ is Hermitian, $V$ is unitary, $V^{-1} = V^\dagger$, and eigenvalues are
real. The formulas simplify to the structured rule documented in
[`eigen.md`](./eigen.md).

## Implementation Correspondence

- `tenferro-rs/docs/AD/eig.md` uses the $V^{-1}\dot{A}V$ and
$V^{-\dagger} G V^\dagger$ formulation with an explicit normalization
correction.
- PyTorch's `linalg_eig_jvp` and `linalg_eig_backward` implement the same rule.
Their comments explicitly note that the uncorrected textbook formulas are
missing the normalization term.
- For real inputs with complex outputs, PyTorch applies the usual
`handle_r_to_c` projection back to the real cotangent domain.

## Verification

### Forward reconstruction
Expand Down
9 changes: 0 additions & 9 deletions docs/math/eigen.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,15 +168,6 @@ $$

with the understanding that the skew-Hermitian gauge is projected away.

## Implementation Correspondence

- `tenferro-rs/docs/AD/eigen.md` writes the reverse rule through the explicit
Hermitian inner matrix $D$; this note keeps that structure.
- PyTorch does not have a separate Hermitian kernel. It calls
`linalg_eig_backward(..., is_hermitian=true)` and
`linalg_eig_jvp(..., is_hermitian=true)`, which reduce to the same formulas
with $V^{-1} = V^\dagger$.

## Verification

### Forward reconstruction
Expand Down
5 changes: 2 additions & 3 deletions docs/math/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,8 @@
- the machine-readable oracle database

The mathematical notes under `docs/math/` are the human-facing source of truth
for known AD rules in this repository. They are maintained to preserve the full
derivation detail migrated from `tenferro-rs/docs/AD/` while adding explicit
correspondence to PyTorch's manual autograd formulas where relevant.
for known AD rules in this repository. They are maintained to preserve full
derivation detail without collapsing the rules into implementation summaries.

Standalone linalg operations are documented as one note per operation, while
shared scalar and wrapper formulas are grouped where that keeps the corpus
Expand Down
13 changes: 2 additions & 11 deletions docs/math/inv.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,17 +92,8 @@ immediately recovers
- JVP: $\dot{B} = -B\,\dot{A}\,B$
- VJP: $\bar{A} = -B^{\mathsf{H}}\,\bar{B}\,B^{\mathsf{H}}$

The same relationship is used in PyTorch and downstream libraries to avoid
duplicating logic.

## Implementation Correspondence

- `tenferro-rs/docs/AD/inv.md` writes the inverse rule directly and then points
back to solve as the conceptual source.
- PyTorch exposes the inverse derivative via solve-style formulas in
`derivatives.yaml` and related linear-solve kernels.
- For higher-order AD, prefer `solve` over explicit multiplication by a cached
inverse.
For higher-order AD, it is often more stable to treat the inverse as an
implicit linear solve rather than as a primitive cached matrix product.

## Verification

Expand Down
11 changes: 0 additions & 11 deletions docs/math/lstsq.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,17 +179,6 @@ $$

Since $A z = Q y$, the formulas for $\bar{b}$ and $\bar{A}$ follow.

## Implementation Correspondence

- `tenferro-rs/docs/AD/lstsq.md` uses the QR-based derivation above, which makes
the residual correction term explicit.
- PyTorch's `linalg_lstsq_solution_jvp` and `linalg_lstsq_backward` currently
route the solution term through `pinv_jvp` / `pinv_backward`, while the
residual term is added directly. The resulting adjoint matches the same
least-squares geometry.
- The residual JVP in PyTorch uses Danskin's theorem, treating the minimizer as
fixed when differentiating the residual objective itself.

## Verification

### Forward check
Expand Down
12 changes: 3 additions & 9 deletions docs/math/lu.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,15 +272,9 @@ P^T
U^{-\dagger}.
$$

## Implementation Correspondence

- `tenferro-rs/docs/AD/lu.md` writes the rule in exactly this block-structured
way, with separate square, wide, and tall cases.
- PyTorch's `linalg_lu_backward` and `linalg_lu_jvp` implement the same three
cases using `tril(-1)` / `triu()` projections and triangular solves rather
than explicit inverses.
- All $L^{-1} X$ and $X U^{-1}$ operations should be implemented as triangular
solves.
All appearances of $L^{-1}X$, $XU^{-1}$, $L^{-\dagger}X$, and $XU^{-\dagger}$
should be interpreted as triangular solves rather than as explicit inverse
formation.

## Verification

Expand Down
12 changes: 0 additions & 12 deletions docs/math/matrix_exp.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,6 @@ f\!\begin{pmatrix} A & E \\ 0 & A \end{pmatrix}
= \begin{pmatrix} f(A) & L_f(A, E) \\ 0 & f(A) \end{pmatrix}.
$$

PyTorch factors this pattern through the helper
`differential_analytic_matrix_function`.

## Computational cost

| Method | Cost relative to $\exp(A)$ |
Expand All @@ -123,15 +120,6 @@ PyTorch factors this pattern through the helper
| Dedicated Fr\'echet scaling-and-squaring | about $3\times$ |
| Eigendecomposition shortcut | cheaper on paper, but unstable for non-normal $A$ |

## Implementation Correspondence

- `tenferro-rs/docs/AD/matrix_exp.md` uses the block-exponential construction
as the main derivation.
- PyTorch's `differential_analytic_matrix_function` and
`linalg_matrix_exp_differential` implement the same Mathias 1996 identity.
- The block matrix approach is simple but more expensive than a dedicated
scaling-and-squaring Fr\'echet implementation.

## Verification

- compare the block-matrix Fr\'echet derivative against finite differences
Expand Down
10 changes: 0 additions & 10 deletions docs/math/norm.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,16 +182,6 @@ $$
For multiplicity $k > 1$, the subgradient is the average over the active
singular-vector dyads.

## Implementation Correspondence

- `tenferro-rs/docs/AD/norm.md` separates vector norms, Frobenius norm, nuclear
norm, and spectral norm explicitly. This note preserves that structure.
- PyTorch's `norm_backward` and `norm_jvp` implement the scalar/vector $p$-norm
cases directly, including the tie-handling for $p = \infty$.
- `linalg_vector_norm_backward` is a thin wrapper around the same formulas.
- Matrix nuclear and spectral norms are implemented in PyTorch by decomposition
into SVD-derived primitives rather than a dedicated manual formula.

## Numerical Notes

- Nonsmooth points, especially zero inputs and repeated top singular values,
Expand Down
11 changes: 2 additions & 9 deletions docs/math/pinv.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,15 +109,8 @@ $$

This is the adjoint counterpart of the same three-term structure.

## Implementation Correspondence

- `tenferro-rs/docs/AD/pinv.md` follows the classical Golub-Pereyra formulas and
makes the projector interpretation explicit.
- PyTorch's `pinv_jvp` and `pinv_backward` implement algebraically equivalent
forms but branch on $M \leq N$ versus $M > N$ to reduce intermediate matrix
sizes.
- The `atol` / `rtol` thresholding used to define the primal pseudoinverse is
treated as fixed metadata, not as a differentiable branch.
The `atol` / `rtol` thresholding used to define the primal pseudoinverse is
treated as fixed metadata, not as a differentiable branch.

## Verification

Expand Down
27 changes: 6 additions & 21 deletions docs/math/qr.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# QR AD Notes

This note covers the reduced QR rule that is materialized in the DB and keeps
the transpose-dual LQ formulas from `tenferro-rs/docs/AD/qr.md` so that no
derivation detail is lost in the migration.
This note covers the reduced QR rule materialized in the DB together with the
transpose-dual LQ formulas.

## Conventions

Expand Down Expand Up @@ -127,8 +126,7 @@ $$
\end{cases}
$$

This is the adjoint helper appearing in PyTorch's `linalg_qr_backward` for the
$M < N$ case.
This is the adjoint helper for the $M < N$ case.

## Reverse Rule

Expand Down Expand Up @@ -163,8 +161,7 @@ $$
\bar{A} = B R^{-\dagger}.
$$

Implementation-wise this is a right solve with $R^\dagger$. PyTorch expresses
the same step as
This is a right solve with $R^\dagger$. An equivalent form is

$$
\bar{A} =
Expand Down Expand Up @@ -214,7 +211,7 @@ $$
\bar{A} = \pi^\*(\bar{A}_{\mathrm{lead}}) + Q \bar{R}.
$$

PyTorch's `linalg_qr_backward` implements the same case as
Equivalently,

$$
\bar{A} = Q \bar{R} + \pi^\*\!\left(
Expand All @@ -224,8 +221,6 @@ $$

## Forward Rule

PyTorch's `linalg_qr_jvp` uses the same case split.

### Case $M \geq N$

Define $\operatorname{sym}(X) = X + X^\dagger$ and
Expand Down Expand Up @@ -282,8 +277,7 @@ $$

## LQ Reverse Rule

The transpose-dual LQ formulas are retained here because the original
`tenferro-rs` note grouped QR and LQ together.
The transpose-dual LQ formulas are included for completeness.

### LQ Forward Definition

Expand Down Expand Up @@ -390,15 +384,6 @@ $$

with random Hermitian operators independent of $A$.

## Implementation Correspondence

- `tenferro-rs/docs/AD/qr.md` writes the rule in terms of `copyltu`,
`trilImInvAdjSkew`, and the QR/LQ duality. This note keeps those helpers.
- PyTorch's `linalg_qr_backward` uses the same two reduced-QR cases:
full-rank via `syminvadj(... ) R^{-H}` and wide reduced QR via the
`pi*`-embedded `trilImInvAdjSkew` formula.
- PyTorch's `linalg_qr_jvp` mirrors the same case split in forward mode.

## References

1. M. Seeger, A. Hetzel, Z. Dai, E. Meissner, N. D. Lawrence,
Expand Down
Loading
Loading