Conversation
Cobord
commented
Nov 20, 2025
- do many steps of SimpleOptimizer so example of optimization_demo get actually closer to the target (This can be left only for tests rather than be pub, because it is just looping steps)
- multi_constraint example gave false positives for is_in_manifold
- clippy suppressions
… actually closer to the target, multi_constraint example gave false positives for is_in_manifold
…grad_remove as well when only using that gradient information once
…d to make the test pass
…not strictly tied of other content of the files they were in, expand tests and examples
…t fail with runtime errors if they are not precisely 1 or 2 depending on the example, an optimizer that is allowed to use hessian but just ignores it
|
I'm not sure if it's finished, but if it is, could you please undraft the PR? That way I'll know. |
… dimensions if fix all those coordinates the remaining tensor is a single point on the manifold
… dimensions might have some constraints
|
I am building off in a way that isn't suited for open PR. The initial work so far is fine. These earlier commits fix issues that are general enough. But the more specific stuff does not really fit here. So to take what is already here as needed, and I can continue the other stuff on own fork that doesn't need to merge in if it is suited to my purposes and not for general use. |
| const DECAY_STATE_TO_DEVICE: bool = false; | ||
| if DECAY_STATE_TO_DEVICE { | ||
| ManifoldRGDState { | ||
| lr_decay: state.lr_decay.to_device(device), | ||
| } | ||
| } else { | ||
| state | ||
| } |
There was a problem hiding this comment.
Since this constant cannot be defined by the user, what is its purpose?
| fn step<const D: usize, const H: usize>( | ||
| &self, | ||
| lr: LearningRate, | ||
| tensor: Tensor<B, D>, | ||
| grad: Tensor<B, D>, | ||
| hessian: Option<Tensor<B, H>>, | ||
| state: Option<Self::StateWithHessian<D, H>>, | ||
| ) -> (Tensor<B, D>, Option<Self::StateWithHessian<D, H>>) { | ||
| if let Some(hessian) = hessian { | ||
| self.step_with_hessian(lr, tensor, grad, hessian, state) | ||
| } else { | ||
| let (new_pt, new_state) = | ||
| SimpleOptimizer::step(self, lr, tensor, grad, state.map(Into::into)); | ||
| (new_pt, new_state.map(Into::into)) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Why is the Hessian matrix optional? It's a Hessian optimizer; I don't think it should be able to optimize differently.
| /// Given a tensor of shape `a_1 ... a_k x N x N` | ||
| /// create a tensor of shape `a_1 ... a_k x 1 x 1` | ||
| /// whose `i_1...i_k,0,0` entry is the trace | ||
| /// of the `N x N` matrix with those previous `k` | ||
| /// fixed but the last two remaining free. | ||
| #[allow(dead_code)] | ||
| pub(crate) fn trace<B: Backend, const D: usize>( | ||
| t: Tensor<B, D>, | ||
| ) -> Tensor<B, D> { | ||
| ein_sum(t, D-2, D-1) | ||
| } | ||
|
|
There was a problem hiding this comment.
Why don't you use the burn implementation of the trace() ?
src/manifolds/matrix_groups.rs
Outdated
| fn lie_mul<const D: usize>(points0: Tensor<B,D>, points1: Tensor<B,D>) -> Tensor<B,D> { | ||
| points0.matmul(points1) | ||
| } | ||
| } No newline at end of file |
| ); | ||
| } | ||
|
|
||
| pub(crate) fn create_test_matrix<TestBackend: Backend>( |
src/manifolds/utils.rs
Outdated
| /// whose `i_1...i_k,m,n` entry is `f(m)` if `m==n` and `0` otherwise | ||
| #[allow(dead_code)] | ||
| pub(crate) fn diag_i<B: Backend, const D: usize>( | ||
| example: &Tensor<B, D>, | ||
| diag_fun: impl Fn(usize) -> f32 |
There was a problem hiding this comment.
This works, but the loop with slice_assign generates multiple tensor writes and keeps part of the computation on the CPU, which doesn’t scale well on GPU backends. Using a broadcasted diagonal mask and vectorized multiplication would achieve the same result more efficiently in Burn.
| pub(crate) fn identity_in_last_two<B: Backend, const D: usize>( | ||
| example: &Tensor<B, D>, | ||
| ) -> Tensor<B, D> { | ||
| let shape: [usize; D] = example.shape().dims(); | ||
| debug_assert!(D >= 2); | ||
| debug_assert_eq!(shape[D - 1], shape[D - 2]); | ||
| let n = shape[D - 1]; | ||
| let mut other = example.zeros_like(); | ||
| let mut ones_shape = [1usize; D]; | ||
| ones_shape[..(D - 2)].copy_from_slice(&shape[..(D - 2)]); | ||
| let ones_patch = Tensor::<B, D>::ones(ones_shape, &example.device()); | ||
| for diag in 0..n { | ||
| let ranges: [_; D] = std::array::from_fn(|dim| { | ||
| if dim < D - 2 { | ||
| 0..shape[dim] | ||
| } else { | ||
| diag..diag + 1 | ||
| } | ||
| }); | ||
| other = other.slice_assign(ranges, ones_patch.clone()); | ||
| } | ||
| other | ||
| } | ||
|
|
There was a problem hiding this comment.
You can use eye to generate and identity matrix and expand to better use the GPU efficiency.
Thank you for your contribution! It helps establish solid foundations for manops in Rust. We can discuss the more specific details in issues if you like. |