Add mamba2 model #103

CosmoNaught · 2025-12-11T15:59:09Z

Resolves #99

Adds Mamba2 model implementation in JAX/Flax NNX.

Implements the State Space Duality (SSD) algorithm from "Transformers are SSMs" (Dao & Gu, ICML 2024).

Models Added

Mamba2Model - Base backbone
Mamba2ForCausalLM - Causal language modeling
Mamba2Forecaster - Time series forecasting

Reference

Paper: https://arxiv.org/abs/2405.21060
Reference impl: https://github.com/vasqu/mamba2-torch
Original JAX port: https://github.com/CosmoNaught/mamba2-jax

Checklist

I have read the Contribution Guidelines and used pre-commit hooks to format this commit.
I have added all the necessary unit tests for my change. (run_model.py for model usage, test_outputs.py and/or model_validation_colab.ipynb for quality).
(If using an LLM) I have carefully reviewed and removed all superfluous comments or unneeded, commented-out code. Only necessary and functional code remains.
I have signed the Contributor License Agreement (CLA).

google-cla · 2025-12-11T15:59:13Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

chapman20j · 2025-12-15T21:38:38Z

bonsai/models/mamba2/modeling.py

+def segsum(x: jnp.ndarray) -> jnp.ndarray:
+    """Stable segment sum calculation. Input: (..., T) -> Output: (..., T, T)."""
+    T = x.shape[-1]
+    x_rep = repeat(x, "... d -> ... d e", e=T)


Can we use jax.numpy.tile to remove dependence on the einops library?

+1, although einops is useful at times we want Bonsai to be dependency-free and in pure jax and flax.

Fixed in 90737bf

chapman20j · 2025-12-15T21:42:39Z

bonsai/models/mamba2/modeling.py

+
+    # Chunk everything
+    def chunk_tensor(t):
+        return rearrange(t, "b (c l) ... -> b c l ...", l=chunk_size)


Can we use .reshape to remove dependence on the einops library? Could do the following

def chunk_tensor(t) b, cl, *remaining = t.shape return t.reshape(b, cl // chunk_size, chunk_size, *remaining)

Fixed in 90737bf

chapman20j · 2025-12-15T21:43:55Z

bonsai/models/mamba2/params.py

+# limitations under the License.
+
+"""Parameter utilities for Mamba2 models."""
+


Could you add a function to this file which loads mamba2 from pre-trained weights?

Implemented in 472243a

Additionally please see 27f596c as we have now confirmed we can load all pre-trained models of mamba2 class architecture

chapman20j · 2025-12-15T21:44:28Z

bonsai/models/mamba2/tests/test_outputs_mamba_2.py

+
+        loss, _grads = nnx.value_and_grad(loss_fn)(model, input_ids, labels)
+        self.assertFalse(jnp.isnan(loss))
+


Could you add a test to compare the outputs of the bonsai implementation compared to a pretrained model?

Implemented in d5ea00e

jenriver

Thanks for the Mamba2 implementation! It's super cool to have linear attention in our collection!

jenriver · 2025-12-16T05:55:28Z

bonsai/models/mamba2/params.py

+def count_parameters(model: nnx.Module) -> int:
+    """Count the total number of trainable parameters in a model.
+
+    Args:
+        model: NNX module to count parameters for.
+
+    Returns:
+        Total number of parameters.
+    """
+    _____graphdef, state = nnx.split(model)
+    params = state.filter(nnx.Param)
+    return sum(p.size for p in jax.tree.leaves(params))


Could we remove this? It's not critical path for code and quality test guarantees parameter is successfully loaded.

Removed in aa2cbe4

jenriver · 2025-12-16T06:12:33Z

bonsai/models/mamba2/modeling.py

+def segsum(x: jnp.ndarray) -> jnp.ndarray:
+    """Stable segment sum calculation. Input: (..., T) -> Output: (..., T, T)."""
+    T = x.shape[-1]
+    x_rep = repeat(x, "... d -> ... d e", e=T)


Current use of repeat will create a huge physical copy leading to potential OOM's. Instead, please use broadcast to something like this:

x_cumsum = jnp.cumsum(x, axis=-1) x_segsum = x_cumsum[..., :, None] - x_cumsum[..., None, :] mask = jnp.tril(jnp.ones((T, T), dtype=bool), k=0) x_segsum = jnp.where(mask, x_segsum, -jnp.inf)

Good catch! Fixed in 494c166

jenriver · 2025-12-17T21:17:22Z

bonsai/models/mamba2/tests/test_outputs_mamba_2.py

@@ -0,0 +1,353 @@
+# Copyright 2025 The JAX Authors.


Could you add a quality test to check your implementation output vs. original reference model?
Currently it only checks the shapes/dtypes and doesn't check the actual output, so we don't know whether the quality of this model is correct.

ex: https://github.com/jax-ml/bonsai/blob/main/bonsai/models/vit/tests/test_outputs_vit.py

Implemented in d5ea00e

jenriver · 2025-12-18T21:01:00Z

bonsai/models/mamba2/tests/run_model.py

@@ -0,0 +1,135 @@
+# Copyright 2025 The JAX Authors.


Nice work. Some quick notes:

Can we add some example questions and outputs to the logs? It helps verify that the generation "looks right" at a glance. (example)

Let's remove the error tolerance check from this file; we should handle quality/numerical parity in test_outputs.py instead to keep this as a pure smoke test.

Also, let's remove most of the prin statements to focus on only the model input / output.

Thanks! See refactor under ec4611c

jenriver · 2025-12-19T21:41:51Z

bonsai/models/mamba2/tests/test_outputs_mamba_2.py

+
+        # Tolerances account for Triton vs JAX numerical differences across 24 layers
+        self.assertLess(max_diff, 1e-1, f"Max diff {max_diff:.2e} exceeds tolerance")
+        self.assertLess(mean_diff, 1e-3, f"Mean diff {mean_diff:.2e} exceeds tolerance")


Thanks Cosmo -- could you compare the rtol instead? Usually atol is dependent on model and gives less signal.

A good rule of thumb is 1e-5 for float32 and 1e-3 for bfloat16, although these can vary based on rng seed and model.

(i.e. via np.testing.assert_allclose or torch.testing.assert_close)
https://github.com/jax-ml/bonsai/blob/main/bonsai/models/vit/tests/test_outputs_vit.py

Thanks Jen good to know this!

I've now fixed this in 706b62d

I updated the golden parity checks to use np.testing.assert_allclose with rtol as the primary signal (fp32 = 1e-5, bf16 = 1e-3), per your comment and the ViT tests.

I also pinned jax_default_matmul_precision to "highest" to match the golden generator, that removed extra backend drift and made the rtol thresholds stable.

…nd reduce verbosity

chapman20j · 2025-12-22T15:15:11Z

bonsai/models/mamba2/modeling.py

+        last_hidden = outputs["last_hidden_state"][:, -1, :]
+        out = self.output_proj(last_hidden)
+        return out.reshape(x.shape[0], self.forecast_horizon, self.output_dim)
+


Could improve inference performance with caching.

jenriver

Hello Cosmo, thanks for the implementation! This looks great esp with the proper golden logits test. :)

I left the KV Cache as a TODO item in #99 . With this, hope we can see the full benefits in performance from linear attention!

CosmoNaught force-pushed the add-mamba2-model branch 2 times, most recently from 00c7589 to f661341 Compare December 11, 2025 16:13

Add Mamba2 model implementation (JAX NNX)

ce45e8e

CosmoNaught force-pushed the add-mamba2-model branch from f661341 to ce45e8e Compare December 11, 2025 16:24

Merge branch 'main' into add-mamba2-model

55e0c66

chapman20j reviewed Dec 15, 2025

View reviewed changes

jenriver reviewed Dec 16, 2025

View reviewed changes

CosmoNaught and others added 6 commits December 16, 2025 20:48

remove einops deps.

90737bf

add function to load mamba2 from pre-trained weights mamba2-130m

472243a

fix small shape mismatch on diag block and cumsum for decay state

e7378b9

remove count_parameters() in favour of tests

aa2cbe4

update pre-trained weight loading verification badge

27f596c

Clean up comments

79c547a

jenriver reviewed Dec 17, 2025

View reviewed changes

use broadcast cumsum in segsum to avoid OOM

494c166

jenriver reviewed Dec 18, 2025

View reviewed changes

add golden parity tests against mamba_ssm reference

d5ea00e

jenriver reviewed Dec 19, 2025

View reviewed changes

CosmoNaught added 3 commits December 20, 2025 01:07

Fix Flax NNX deprecation warnings by updating parameter access

2d8dcac

Refactor smoke test, add generation examples, remove parity checks, a…

ec4611c

…nd reduce verbosity

align matmul precision and switch golden parity to assert_allclose

d0185ae

CosmoNaught force-pushed the add-mamba2-model branch from 706b62d to d0185ae Compare December 20, 2025 01:28

Clean up comments

3046643

chapman20j reviewed Dec 22, 2025

View reviewed changes

CosmoNaught added 2 commits December 22, 2025 17:44

add golden logit file generator

0df4e9a

Clean up comments

0024056

jenriver approved these changes Dec 22, 2025

View reviewed changes

jenriver merged commit a907b75 into jax-ml:main Dec 22, 2025
3 checks passed

		# limitations under the License.

		"""Parameter utilities for Mamba2 models."""


		loss, _grads = nnx.value_and_grad(loss_fn)(model, input_ids, labels)
		self.assertFalse(jnp.isnan(loss))

Add mamba2 model #103

Add mamba2 model #103

Uh oh!

Conversation

CosmoNaught commented Dec 11, 2025

Uh oh!

google-cla bot commented Dec 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jenriver left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jenriver left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants