Skip to content

Commit bebcdca

Browse files
authored
Merge pull request #12 from tensor4all/feature/math-notes-pytorch-audit-followup
docs: tighten mathematical AD notes
2 parents aed534a + 63d7bf0 commit bebcdca

16 files changed

Lines changed: 115 additions & 212 deletions

File tree

docs/math/cholesky.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -132,13 +132,7 @@ $$
132132

133133
This is the adjoint of the JVP map and keeps $\bar{A}$ Hermitian.
134134

135-
## Implementation Correspondence
136-
137-
- `tenferro-rs/docs/AD/cholesky.md` uses the same $\varphi / \varphi^*$ pair to
138-
express both JVP and VJP.
139-
- PyTorch's `cholesky_jvp` and `cholesky_backward` implement the same
140-
triangular-solve sandwich rather than explicit inverses.
141-
- Never form $L^{-1}$ explicitly; use triangular solves on the left and right.
135+
Never form $L^{-1}$ explicitly; use triangular solves on the left and right.
142136

143137
## Verification
144138

docs/math/det.md

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ $$
9696
- complex case:
9797

9898
$$
99-
\bar{A} = \overline{\bar{d} \cdot \det(A)} \cdot A^{-\mathsf{H}}.
99+
\bar{A} = \bar{d} \cdot \overline{\det(A)} \cdot A^{-\mathsf{H}}.
100100
$$
101101

102102
## Singular matrix handling
@@ -107,9 +107,9 @@ still makes sense:
107107
- rank $N-1$: the adjugate is rank 1 and can be reconstructed from an SVD
108108
- rank $\le N-2$: the adjugate vanishes
109109

110-
PyTorch's `linalg_det_backward` handles this regime by reconstructing the
111-
leave-one-out singular-value products together with the orientation/phase factor
112-
coming from $U$ and $V^{\mathsf{H}}$.
110+
The rank-$N-1$ adjugate can be reconstructed from the leave-one-out singular
111+
value products together with the orientation/phase factor carried by the
112+
singular vectors.
113113

114114
## 2. `slogdet`
115115

@@ -129,34 +129,28 @@ $$
129129

130130
### Reverse Rule
131131

132-
For the differentiable log-magnitude path:
132+
Given cotangents $\bar{s}$ for the sign output and $\bar{\ell}$ for the
133+
log-magnitude output:
133134

134135
- real case:
135136

136137
$$
137-
\bar{A} = \overline{\operatorname{logabsdet}} \cdot A^{-\mathsf{T}}
138+
\bar{A} = \bar{\ell} \cdot A^{-\mathsf{T}}
138139
$$
139140

140141
- complex case:
141142

142143
$$
143144
\bar{A} = g \cdot A^{-\mathsf{H}},
144145
\qquad
145-
g = \overline{\operatorname{logabsdet}}
146-
- i \operatorname{Im}(\overline{\operatorname{sign}}^* \operatorname{sign}).
146+
g = \bar{\ell} - i \operatorname{Im}(\bar{s}^* s),
147147
$$
148148

149+
where $s = \operatorname{sign}(A)$.
150+
149151
`slogdet` is not differentiable at singular matrices because
150152
$\operatorname{logabsdet} = -\infty$ there.
151153

152-
## Implementation Correspondence
153-
154-
- `tenferro-rs/docs/AD/det.md` keeps both `det` and `slogdet` in one note and
155-
discusses the singular adjugate path explicitly.
156-
- PyTorch's `linalg_det_jvp`, `linalg_det_backward`, `slogdet_jvp`, and
157-
`slogdet_backward` implement the same split and use solves rather than
158-
explicit inverses.
159-
160154
## Verification
161155

162156
- compare primal `det(A)` and `slogdet(A)` with direct evaluation

docs/math/eig.md

Lines changed: 4 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -150,8 +150,8 @@ $$
150150

151151
### Normalization correction
152152

153-
PyTorch and `tenferro-rs` both normalize eigenvectors to unit norm. Therefore
154-
the raw tangent must be projected back onto that gauge:
153+
If eigenvectors are normalized to unit norm, the raw tangent must be projected
154+
back onto that gauge:
155155

156156
$$
157157
\dot{V} =
@@ -205,26 +205,15 @@ $$
205205
\operatorname{Im}(\operatorname{diag}(V^\dagger \bar{V})) = 0.
206206
$$
207207

208-
PyTorch's `linalg_eig_backward` checks this condition numerically and raises for
209-
ill-defined losses.
208+
Losses that violate this condition are ill-defined for derivatives through the
209+
eigenvector phase gauge.
210210

211211
## Relationship to the Hermitian Case
212212

213213
When $A$ is Hermitian, $V$ is unitary, $V^{-1} = V^\dagger$, and eigenvalues are
214214
real. The formulas simplify to the structured rule documented in
215215
[`eigen.md`](./eigen.md).
216216

217-
## Implementation Correspondence
218-
219-
- `tenferro-rs/docs/AD/eig.md` uses the $V^{-1}\dot{A}V$ and
220-
$V^{-\dagger} G V^\dagger$ formulation with an explicit normalization
221-
correction.
222-
- PyTorch's `linalg_eig_jvp` and `linalg_eig_backward` implement the same rule.
223-
Their comments explicitly note that the uncorrected textbook formulas are
224-
missing the normalization term.
225-
- For real inputs with complex outputs, PyTorch applies the usual
226-
`handle_r_to_c` projection back to the real cotangent domain.
227-
228217
## Verification
229218

230219
### Forward reconstruction

docs/math/eigen.md

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -168,15 +168,6 @@ $$
168168

169169
with the understanding that the skew-Hermitian gauge is projected away.
170170

171-
## Implementation Correspondence
172-
173-
- `tenferro-rs/docs/AD/eigen.md` writes the reverse rule through the explicit
174-
Hermitian inner matrix $D$; this note keeps that structure.
175-
- PyTorch does not have a separate Hermitian kernel. It calls
176-
`linalg_eig_backward(..., is_hermitian=true)` and
177-
`linalg_eig_jvp(..., is_hermitian=true)`, which reduce to the same formulas
178-
with $V^{-1} = V^\dagger$.
179-
180171
## Verification
181172

182173
### Forward reconstruction

docs/math/index.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@
66
- the machine-readable oracle database
77

88
The mathematical notes under `docs/math/` are the human-facing source of truth
9-
for known AD rules in this repository. They are maintained to preserve the full
10-
derivation detail migrated from `tenferro-rs/docs/AD/` while adding explicit
11-
correspondence to PyTorch's manual autograd formulas where relevant.
9+
for known AD rules in this repository. They are maintained to preserve full
10+
derivation detail without collapsing the rules into implementation summaries.
1211

1312
Standalone linalg operations are documented as one note per operation, while
1413
shared scalar and wrapper formulas are grouped where that keeps the corpus

docs/math/inv.md

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -92,17 +92,8 @@ immediately recovers
9292
- JVP: $\dot{B} = -B\,\dot{A}\,B$
9393
- VJP: $\bar{A} = -B^{\mathsf{H}}\,\bar{B}\,B^{\mathsf{H}}$
9494

95-
The same relationship is used in PyTorch and downstream libraries to avoid
96-
duplicating logic.
97-
98-
## Implementation Correspondence
99-
100-
- `tenferro-rs/docs/AD/inv.md` writes the inverse rule directly and then points
101-
back to solve as the conceptual source.
102-
- PyTorch exposes the inverse derivative via solve-style formulas in
103-
`derivatives.yaml` and related linear-solve kernels.
104-
- For higher-order AD, prefer `solve` over explicit multiplication by a cached
105-
inverse.
95+
For higher-order AD, it is often more stable to treat the inverse as an
96+
implicit linear solve rather than as a primitive cached matrix product.
10697

10798
## Verification
10899

docs/math/lstsq.md

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -179,17 +179,6 @@ $$
179179

180180
Since $A z = Q y$, the formulas for $\bar{b}$ and $\bar{A}$ follow.
181181

182-
## Implementation Correspondence
183-
184-
- `tenferro-rs/docs/AD/lstsq.md` uses the QR-based derivation above, which makes
185-
the residual correction term explicit.
186-
- PyTorch's `linalg_lstsq_solution_jvp` and `linalg_lstsq_backward` currently
187-
route the solution term through `pinv_jvp` / `pinv_backward`, while the
188-
residual term is added directly. The resulting adjoint matches the same
189-
least-squares geometry.
190-
- The residual JVP in PyTorch uses Danskin's theorem, treating the minimizer as
191-
fixed when differentiating the residual objective itself.
192-
193182
## Verification
194183

195184
### Forward check

docs/math/lu.md

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -272,15 +272,9 @@ P^T
272272
U^{-\dagger}.
273273
$$
274274

275-
## Implementation Correspondence
276-
277-
- `tenferro-rs/docs/AD/lu.md` writes the rule in exactly this block-structured
278-
way, with separate square, wide, and tall cases.
279-
- PyTorch's `linalg_lu_backward` and `linalg_lu_jvp` implement the same three
280-
cases using `tril(-1)` / `triu()` projections and triangular solves rather
281-
than explicit inverses.
282-
- All $L^{-1} X$ and $X U^{-1}$ operations should be implemented as triangular
283-
solves.
275+
All appearances of $L^{-1}X$, $XU^{-1}$, $L^{-\dagger}X$, and $XU^{-\dagger}$
276+
should be interpreted as triangular solves rather than as explicit inverse
277+
formation.
284278

285279
## Verification
286280

docs/math/matrix_exp.md

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -112,9 +112,6 @@ f\!\begin{pmatrix} A & E \\ 0 & A \end{pmatrix}
112112
= \begin{pmatrix} f(A) & L_f(A, E) \\ 0 & f(A) \end{pmatrix}.
113113
$$
114114

115-
PyTorch factors this pattern through the helper
116-
`differential_analytic_matrix_function`.
117-
118115
## Computational cost
119116

120117
| Method | Cost relative to $\exp(A)$ |
@@ -123,15 +120,6 @@ PyTorch factors this pattern through the helper
123120
| Dedicated Fr\'echet scaling-and-squaring | about $3\times$ |
124121
| Eigendecomposition shortcut | cheaper on paper, but unstable for non-normal $A$ |
125122

126-
## Implementation Correspondence
127-
128-
- `tenferro-rs/docs/AD/matrix_exp.md` uses the block-exponential construction
129-
as the main derivation.
130-
- PyTorch's `differential_analytic_matrix_function` and
131-
`linalg_matrix_exp_differential` implement the same Mathias 1996 identity.
132-
- The block matrix approach is simple but more expensive than a dedicated
133-
scaling-and-squaring Fr\'echet implementation.
134-
135123
## Verification
136124

137125
- compare the block-matrix Fr\'echet derivative against finite differences

docs/math/norm.md

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -182,16 +182,6 @@ $$
182182
For multiplicity $k > 1$, the subgradient is the average over the active
183183
singular-vector dyads.
184184

185-
## Implementation Correspondence
186-
187-
- `tenferro-rs/docs/AD/norm.md` separates vector norms, Frobenius norm, nuclear
188-
norm, and spectral norm explicitly. This note preserves that structure.
189-
- PyTorch's `norm_backward` and `norm_jvp` implement the scalar/vector $p$-norm
190-
cases directly, including the tie-handling for $p = \infty$.
191-
- `linalg_vector_norm_backward` is a thin wrapper around the same formulas.
192-
- Matrix nuclear and spectral norms are implemented in PyTorch by decomposition
193-
into SVD-derived primitives rather than a dedicated manual formula.
194-
195185
## Numerical Notes
196186

197187
- Nonsmooth points, especially zero inputs and repeated top singular values,

0 commit comments

Comments
 (0)