how to return last hidden state

hey this is awesome thank you! It is way faster than the reference implementation

I had some issues with numeric stability training a mamba model with a chunk size of 16, batch size 8, sequence length 256, and dim 32 and state dim 8. I found that adding a small epsilon term of 1e-11 in [this division](https://github.com/MzeroMiko/mamba-mini/blob/8935646bfc40d4a4b9ef855109d3e16a917917ad/test_selective_scan.py#L42) helped with all NaNs. The model seemed to train just the same but I'm not sure what the implications of adding this epsilon is

How do you obtain the last hidden state, like what is in the original reference function; is it just rearrange(hprefix, 'B G D N -> B') ? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to return last hidden state #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

how to return last hidden state #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions