Mooncake requires that if two CoDuals share the same primal object, they also share the same fdata object. This is the "aliasing invariant" —aliased primals must have aliased fdatas, so that in-place mutations are tracked consistently.
This aliasing invariance relies on: if primal(a) === primal(b) (same object), then fdata(a) === fdata(b) (same gradient storage). Utilities like stop_gradient deliberately break this — primal(y) === x but fdata(y) = _copy(fdata(x)).
The only scenario where this actually bites is if one uses both x and y = stop_gradient(x) on the same mutable object after the call, with in-place mutations going through y:
function f(x)
y = stop_gradient(x) # y === x in primal, but fdata(y) ≠ fdata(x)
y[1] = 2.0 # mutates x[1] (since y===x) but tangent goes into fdata(y)
return x[1] + x[2] # reads through x's fdata — which is now desynchronised from fdata(y)
end
Mutations to y update fdata(y) (the copy), while reads from x use fdata(x). After the mutation, these two fdatas have diverged, potentially giving wrong gradients.
But in practice, this scenario is extremely unlikely because:
stop_gradient is meant to detach a value — users don't normally continue using both x and y = stop_gradient(x) on the same mutable object simultaneously
- The typical use case is
stop_gradient(x) used in one branch, with x used in another, not mixed via in-place mutation
So it's more of a theoretical correctness concern about invariant violation than something that shows up in realistic code.
Mooncake requires that if two CoDuals share the same primal object, they also share the same fdata object. This is the "aliasing invariant" —aliased primals must have aliased fdatas, so that in-place mutations are tracked consistently.
This aliasing invariance relies on: if
primal(a) === primal(b)(same object), thenfdata(a) === fdata(b)(same gradient storage). Utilities likestop_gradientdeliberately break this —primal(y) === xbutfdata(y) = _copy(fdata(x)).The only scenario where this actually bites is if one uses both
xandy = stop_gradient(x)on the same mutable object after the call, with in-place mutations going throughy:Mutations to
yupdatefdata(y)(the copy), while reads from x use fdata(x). After the mutation, these twofdatas have diverged, potentially giving wrong gradients.But in practice, this scenario is extremely unlikely because:
stop_gradientis meant to detach a value — users don't normally continue using bothxandy = stop_gradient(x)on the same mutable object simultaneouslystop_gradient(x)used in one branch, withxused in another, not mixed via in-place mutationSo it's more of a theoretical correctness concern about invariant violation than something that shows up in realistic code.