aliasing invariance for primals and tangents

Mooncake requires that if two CoDuals share the same primal object, they also share the same fdata object. This is the "aliasing invariant" —aliased primals must have aliased fdatas, so that in-place mutations are tracked consistently.

This aliasing invariance relies on: if `primal(a) === primal(b)` (same object), then `fdata(a) === fdata(b)` (same gradient storage). Utilities like `stop_gradient` deliberately break this — `primal(y) === x` but `fdata(y) = _copy(fdata(x))`.                                                                                  
                                                                                                                                                              
The only scenario where this actually bites is if one uses both `x` and `y = stop_gradient(x)` on the same mutable object after the call, with in-place mutations going through `y`:                                                                                                                                           

```julia                                                                                                                                                              
  function f(x)                                           
      y = stop_gradient(x)   # y === x in primal, but fdata(y) ≠ fdata(x)
      y[1] = 2.0              # mutates x[1] (since y===x) but tangent goes into fdata(y)                                                                     
      return x[1] + x[2]     # reads through x's fdata — which is now desynchronised from fdata(y)                                                            
  end                                                                                                                                                         
 ```
                                                                                                                                                             
Mutations to `y` update `fdata(y)` (the copy), while reads from x use fdata(x). After the mutation, these two `fdata`s have diverged, potentially giving wrong gradients.                                                                 
                                                                                                                                                              
  But in practice, this scenario is extremely unlikely because:                                                                                               
  - `stop_gradient` is meant to detach a value — users don't normally continue using both `x` and `y = stop_gradient(x)` on the same mutable object simultaneously
  - The typical use case is `stop_gradient(x)` used in one branch, with `x` used in another, not mixed via in-place mutation                                      
                                                                                                                        
  So it's more of a theoretical correctness concern about invariant violation than something that shows up in realistic code. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aliasing invariance for primals and tangents #1081

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

aliasing invariance for primals and tangents #1081

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions