Merge pull request #12 from tensor4all/feature/math-notes-pytorch-audit-followup

shinaoka · web-flow · commit bebcdca9a6d3 · 2026-04-06T10:32:10.000+09:00
docs: tighten mathematical AD notes
diff --git a/docs/math/cholesky.md b/docs/math/cholesky.md
@@ -132,13 +132,7 @@ $$
 
 This is the adjoint of the JVP map and keeps $\bar{A}$ Hermitian.
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/cholesky.md` uses the same $\varphi / \varphi^*$ pair to
-  express both JVP and VJP.
-- PyTorch's `cholesky_jvp` and `cholesky_backward` implement the same
-  triangular-solve sandwich rather than explicit inverses.
-- Never form $L^{-1}$ explicitly; use triangular solves on the left and right.
+Never form $L^{-1}$ explicitly; use triangular solves on the left and right.
 
 ## Verification
 
diff --git a/docs/math/det.md b/docs/math/det.md
@@ -96,7 +96,7 @@ $$
 - complex case:
 
 $$
-\bar{A} = \overline{\bar{d} \cdot \det(A)} \cdot A^{-\mathsf{H}}.
+\bar{A} = \bar{d} \cdot \overline{\det(A)} \cdot A^{-\mathsf{H}}.
 $$
 
 ## Singular matrix handling
@@ -107,9 +107,9 @@ still makes sense:
 - rank $N-1$: the adjugate is rank 1 and can be reconstructed from an SVD
 - rank $\le N-2$: the adjugate vanishes
 
-PyTorch's `linalg_det_backward` handles this regime by reconstructing the
-leave-one-out singular-value products together with the orientation/phase factor
-coming from $U$ and $V^{\mathsf{H}}$.
+The rank-$N-1$ adjugate can be reconstructed from the leave-one-out singular
+value products together with the orientation/phase factor carried by the
+singular vectors.
 
 ## 2. `slogdet`
 
@@ -129,34 +129,28 @@ $$
 
 ### Reverse Rule
 
-For the differentiable log-magnitude path:
+Given cotangents $\bar{s}$ for the sign output and $\bar{\ell}$ for the
+log-magnitude output:
 
 - real case:
 
 $$
-\bar{A} = \overline{\operatorname{logabsdet}} \cdot A^{-\mathsf{T}}
+\bar{A} = \bar{\ell} \cdot A^{-\mathsf{T}}
 $$
 
 - complex case:
 
 $$
 \bar{A} = g \cdot A^{-\mathsf{H}},
 \qquad
-g = \overline{\operatorname{logabsdet}}
-- i \operatorname{Im}(\overline{\operatorname{sign}}^* \operatorname{sign}).
+g = \bar{\ell} - i \operatorname{Im}(\bar{s}^* s),
 $$
 
+where $s = \operatorname{sign}(A)$.
+
 `slogdet` is not differentiable at singular matrices because
 $\operatorname{logabsdet} = -\infty$ there.
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/det.md` keeps both `det` and `slogdet` in one note and
-  discusses the singular adjugate path explicitly.
-- PyTorch's `linalg_det_jvp`, `linalg_det_backward`, `slogdet_jvp`, and
-  `slogdet_backward` implement the same split and use solves rather than
-  explicit inverses.
-
 ## Verification
 
 - compare primal `det(A)` and `slogdet(A)` with direct evaluation
diff --git a/docs/math/eig.md b/docs/math/eig.md
@@ -150,8 +150,8 @@ $$
 
 ### Normalization correction
 
-PyTorch and `tenferro-rs` both normalize eigenvectors to unit norm. Therefore
-the raw tangent must be projected back onto that gauge:
+If eigenvectors are normalized to unit norm, the raw tangent must be projected
+back onto that gauge:
 
 $$
 \dot{V} =
@@ -205,26 +205,15 @@ $$
 \operatorname{Im}(\operatorname{diag}(V^\dagger \bar{V})) = 0.
 $$
 
-PyTorch's `linalg_eig_backward` checks this condition numerically and raises for
-ill-defined losses.
+Losses that violate this condition are ill-defined for derivatives through the
+eigenvector phase gauge.
 
 ## Relationship to the Hermitian Case
 
 When $A$ is Hermitian, $V$ is unitary, $V^{-1} = V^\dagger$, and eigenvalues are
 real. The formulas simplify to the structured rule documented in
 [`eigen.md`](./eigen.md).
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/eig.md` uses the $V^{-1}\dot{A}V$ and
-  $V^{-\dagger} G V^\dagger$ formulation with an explicit normalization
-  correction.
-- PyTorch's `linalg_eig_jvp` and `linalg_eig_backward` implement the same rule.
-  Their comments explicitly note that the uncorrected textbook formulas are
-  missing the normalization term.
-- For real inputs with complex outputs, PyTorch applies the usual
-  `handle_r_to_c` projection back to the real cotangent domain.
-
 ## Verification
 
 ### Forward reconstruction
diff --git a/docs/math/eigen.md b/docs/math/eigen.md
@@ -168,15 +168,6 @@ $$
 
 with the understanding that the skew-Hermitian gauge is projected away.
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/eigen.md` writes the reverse rule through the explicit
-  Hermitian inner matrix $D$; this note keeps that structure.
-- PyTorch does not have a separate Hermitian kernel. It calls
-  `linalg_eig_backward(..., is_hermitian=true)` and
-  `linalg_eig_jvp(..., is_hermitian=true)`, which reduce to the same formulas
-  with $V^{-1} = V^\dagger$.
-
 ## Verification
 
 ### Forward reconstruction
diff --git a/docs/math/index.md b/docs/math/index.md
@@ -6,9 +6,8 @@
 - the machine-readable oracle database
 
 The mathematical notes under `docs/math/` are the human-facing source of truth
-for known AD rules in this repository. They are maintained to preserve the full
-derivation detail migrated from `tenferro-rs/docs/AD/` while adding explicit
-correspondence to PyTorch's manual autograd formulas where relevant.
+for known AD rules in this repository. They are maintained to preserve full
+derivation detail without collapsing the rules into implementation summaries.
 
 Standalone linalg operations are documented as one note per operation, while
 shared scalar and wrapper formulas are grouped where that keeps the corpus
diff --git a/docs/math/inv.md b/docs/math/inv.md
@@ -92,17 +92,8 @@ immediately recovers
 - JVP: $\dot{B} = -B\,\dot{A}\,B$
 - VJP: $\bar{A} = -B^{\mathsf{H}}\,\bar{B}\,B^{\mathsf{H}}$
 
-The same relationship is used in PyTorch and downstream libraries to avoid
-duplicating logic.
-
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/inv.md` writes the inverse rule directly and then points
-  back to solve as the conceptual source.
-- PyTorch exposes the inverse derivative via solve-style formulas in
-  `derivatives.yaml` and related linear-solve kernels.
-- For higher-order AD, prefer `solve` over explicit multiplication by a cached
-  inverse.
+For higher-order AD, it is often more stable to treat the inverse as an
+implicit linear solve rather than as a primitive cached matrix product.
 
 ## Verification
 
diff --git a/docs/math/lstsq.md b/docs/math/lstsq.md
@@ -179,17 +179,6 @@ $$
 
 Since $A z = Q y$, the formulas for $\bar{b}$ and $\bar{A}$ follow.
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/lstsq.md` uses the QR-based derivation above, which makes
-  the residual correction term explicit.
-- PyTorch's `linalg_lstsq_solution_jvp` and `linalg_lstsq_backward` currently
-  route the solution term through `pinv_jvp` / `pinv_backward`, while the
-  residual term is added directly. The resulting adjoint matches the same
-  least-squares geometry.
-- The residual JVP in PyTorch uses Danskin's theorem, treating the minimizer as
-  fixed when differentiating the residual objective itself.
-
 ## Verification
 
 ### Forward check
diff --git a/docs/math/lu.md b/docs/math/lu.md
@@ -272,15 +272,9 @@ P^T
 U^{-\dagger}.
 $$
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/lu.md` writes the rule in exactly this block-structured
-  way, with separate square, wide, and tall cases.
-- PyTorch's `linalg_lu_backward` and `linalg_lu_jvp` implement the same three
-  cases using `tril(-1)` / `triu()` projections and triangular solves rather
-  than explicit inverses.
-- All $L^{-1} X$ and $X U^{-1}$ operations should be implemented as triangular
-  solves.
+All appearances of $L^{-1}X$, $XU^{-1}$, $L^{-\dagger}X$, and $XU^{-\dagger}$
+should be interpreted as triangular solves rather than as explicit inverse
+formation.
 
 ## Verification
 
diff --git a/docs/math/matrix_exp.md b/docs/math/matrix_exp.md
@@ -112,9 +112,6 @@ f\!\begin{pmatrix} A & E \\ 0 & A \end{pmatrix}
 = \begin{pmatrix} f(A) & L_f(A, E) \\ 0 & f(A) \end{pmatrix}.
 $$
 
-PyTorch factors this pattern through the helper
-`differential_analytic_matrix_function`.
-
 ## Computational cost
 
 | Method | Cost relative to $\exp(A)$ |
@@ -123,15 +120,6 @@ PyTorch factors this pattern through the helper
 | Dedicated Fr\'echet scaling-and-squaring | about $3\times$ |
 | Eigendecomposition shortcut | cheaper on paper, but unstable for non-normal $A$ |
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/matrix_exp.md` uses the block-exponential construction
-  as the main derivation.
-- PyTorch's `differential_analytic_matrix_function` and
-  `linalg_matrix_exp_differential` implement the same Mathias 1996 identity.
-- The block matrix approach is simple but more expensive than a dedicated
-  scaling-and-squaring Fr\'echet implementation.
-
 ## Verification
 
 - compare the block-matrix Fr\'echet derivative against finite differences
diff --git a/docs/math/norm.md b/docs/math/norm.md
@@ -182,16 +182,6 @@ $$
 For multiplicity $k > 1$, the subgradient is the average over the active
 singular-vector dyads.
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/norm.md` separates vector norms, Frobenius norm, nuclear
-  norm, and spectral norm explicitly. This note preserves that structure.
-- PyTorch's `norm_backward` and `norm_jvp` implement the scalar/vector $p$-norm
-  cases directly, including the tie-handling for $p = \infty$.
-- `linalg_vector_norm_backward` is a thin wrapper around the same formulas.
-- Matrix nuclear and spectral norms are implemented in PyTorch by decomposition
-  into SVD-derived primitives rather than a dedicated manual formula.
-
 ## Numerical Notes
 
 - Nonsmooth points, especially zero inputs and repeated top singular values,
diff --git a/docs/math/pinv.md b/docs/math/pinv.md
@@ -109,15 +109,8 @@ $$
 
 This is the adjoint counterpart of the same three-term structure.
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/pinv.md` follows the classical Golub-Pereyra formulas and
-  makes the projector interpretation explicit.
-- PyTorch's `pinv_jvp` and `pinv_backward` implement algebraically equivalent
-  forms but branch on $M \leq N$ versus $M > N$ to reduce intermediate matrix
-  sizes.
-- The `atol` / `rtol` thresholding used to define the primal pseudoinverse is
-  treated as fixed metadata, not as a differentiable branch.
+The `atol` / `rtol` thresholding used to define the primal pseudoinverse is
+treated as fixed metadata, not as a differentiable branch.
 
 ## Verification
 
diff --git a/docs/math/qr.md b/docs/math/qr.md
@@ -1,8 +1,7 @@
 # QR AD Notes
 
-This note covers the reduced QR rule that is materialized in the DB and keeps
-the transpose-dual LQ formulas from `tenferro-rs/docs/AD/qr.md` so that no
-derivation detail is lost in the migration.
+This note covers the reduced QR rule materialized in the DB together with the
+transpose-dual LQ formulas.
 
 ## Conventions
 
@@ -127,8 +126,7 @@ $$
 \end{cases}
 $$
 
-This is the adjoint helper appearing in PyTorch's `linalg_qr_backward` for the
-$M < N$ case.
+This is the adjoint helper for the $M < N$ case.
 
 ## Reverse Rule
 
@@ -163,8 +161,7 @@ $$
 \bar{A} = B R^{-\dagger}.
 $$
 
-Implementation-wise this is a right solve with $R^\dagger$. PyTorch expresses
-the same step as
+This is a right solve with $R^\dagger$. An equivalent form is
 
 $$
 \bar{A} =
@@ -214,7 +211,7 @@ $$
 \bar{A} = \pi^\*(\bar{A}_{\mathrm{lead}}) + Q \bar{R}.
 $$
 
-PyTorch's `linalg_qr_backward` implements the same case as
+Equivalently,
 
 $$
 \bar{A} = Q \bar{R} + \pi^\*\!\left(
@@ -224,8 +221,6 @@ $$
 
 ## Forward Rule
 
-PyTorch's `linalg_qr_jvp` uses the same case split.
-
 ### Case $M \geq N$
 
 Define $\operatorname{sym}(X) = X + X^\dagger$ and
@@ -282,8 +277,7 @@ $$
 
 ## LQ Reverse Rule
 
-The transpose-dual LQ formulas are retained here because the original
-`tenferro-rs` note grouped QR and LQ together.
+The transpose-dual LQ formulas are included for completeness.
 
 ### LQ Forward Definition
 
@@ -390,15 +384,6 @@ $$
 
 with random Hermitian operators independent of $A$.
 
-## Implementation Correspondence
-
-- `tenferro-rs/docs/AD/qr.md` writes the rule in terms of `copyltu`,
-  `trilImInvAdjSkew`, and the QR/LQ duality. This note keeps those helpers.
-- PyTorch's `linalg_qr_backward` uses the same two reduced-QR cases:
-  full-rank via `syminvadj(... ) R^{-H}` and wide reduced QR via the
-  `pi*`-embedded `trilImInvAdjSkew` formula.
-- PyTorch's `linalg_qr_jvp` mirrors the same case split in forward mode.
-
 ## References
 
 1. M. Seeger, A. Hetzel, Z. Dai, E. Meissner, N. D. Lawrence,
diff --git a/docs/math/scalar_ops.md b/docs/math/scalar_ops.md
diff --git a/docs/math/solve.md b/docs/math/solve.md
diff --git a/docs/math/svd.md b/docs/math/svd.md
diff --git a/tests/test_math_registry.py b/tests/test_math_registry.py