Skip to content

how to return last hidden state #1

@theAdamColton

Description

@theAdamColton

hey this is awesome thank you! It is way faster than the reference implementation

I had some issues with numeric stability training a mamba model with a chunk size of 16, batch size 8, sequence length 256, and dim 32 and state dim 8. I found that adding a small epsilon term of 1e-11 in this division helped with all NaNs. The model seemed to train just the same but I'm not sure what the implications of adding this epsilon is

How do you obtain the last hidden state, like what is in the original reference function; is it just rearrange(hprefix, 'B G D N -> B') ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions