JAX support #157

teddykoker · 2025-09-01T16:25:04Z

teddykoker
Sep 1, 2025

The README mentions "future frontend support outside of torch"; are there any plans for integration with JAX, e.g. using foreign function interface?

asglover · 2025-09-01T18:19:27Z

asglover
Sep 1, 2025
Collaborator

Hello Teddy,

It's great to hear from you! That's the idea! I have not started working towards it yet, as the work is not yet finished on the torch integration.

If / when we start work on JAX bindings, is there a project you have in mind, where you would be interested in integrating it?

18 replies

teddykoker Oct 12, 2025
Author

I'll ping you when I start working on JAX bindings.

Thanks, appreciate this!

Could you make a list of the all the JAX features you require / use?

Currently we just need the uvu tensor product and the fused convolution. Happy to provide more information though.

Any chance you could open discussions on nequix?

Absolutely! Just started the discussion and added some reference numbers across jax/torch/torch + oeq.

asglover Oct 13, 2025
Collaborator

Could you make a list of the all the JAX features you require / use?

Currently we just need the uvu tensor product and the fused convolution. Happy to provide more information though.

I meant more along the lines of, what JAX features are you using. So for torch, you are using torch.compile, autograd, .to, and not using torch.jit.script, autocast ect. Just because I want to know for planning if you use some particularly tricky jax feature, like distributed.

teddykoker Oct 13, 2025
Author

Ah yes -- the JAX implementation uses equinox, which is a very lightweight library around JAX. As long as it supports the usual jax.jit(), jax.grad() it should be great. We do use jax.pmap() for training on multiple gpus as well, but I'm not sure if that would add any difficulty to the kernel implementation.

asglover Oct 14, 2025
Collaborator

Ah, nice, that is the exact kind of thing I was wondering about. There is no documentation about the interaction of FFI and pmap, but it's the most benign kind of parallelism, so I'm guessing that pmap will fully compose with FFI? Thanks for the info!

teddykoker Oct 14, 2025
Author

Yes I could not find anything either - I'm also assuming that because it is simple data parallelism it should simply compose with FFI, but I guess we shall see!

abhijeetgangan · 2025-11-01T20:00:10Z

abhijeetgangan
Nov 1, 2025

Is there any update on the JAX integration? Would be nice to also use it with the mace-jax. I also plan to integrate these models for simulation in JAX-MD and having these kernels would be great!

2 replies

asglover Nov 1, 2025
Collaborator

Hi Abhijeet,
No meaningful update, there are still other things in the queue before JAX work will begin, but I will post here when it does. At that time if you are still interested in integrating the kernels, maybe we can collaborate.

Austin

abhijeetgangan Nov 2, 2025

Looking forward to it. Happy to collaborate!

vbharadwaj-bk · 2026-01-13T02:52:34Z

vbharadwaj-bk
Jan 13, 2026
Maintainer

Hi @abhijeetgangan and @teddykoker - a note that we'll have JAX support merged shortly and would love some testing (@asglover is doing some on his end, but we want to stress test).

As the packages aren't published yet to pypi and the diff is not on main, the steps are below. The README on the jax_support branch gives a usage example, but the API is the same as OEQ PyTorch so it shouldn't be too much of a problem.

git clone https://github.com/PASSIONLab/OpenEquivariance.git
cd OpenEquivariance; git pull; git checkout jax_support
pip install -e "./openequivariance[jax]"
pip install -e "./openequivariance_extjax" --no-build-isolation

Let us know if there are any issues (and include the error and a JAX version if you can).

1 reply

teddykoker Jan 13, 2026
Author

@vbharadwaj-bk great work and thanks for letting us know! I'll check it out and let you know our findings.

vbharadwaj-bk · 2026-01-16T05:19:29Z

vbharadwaj-bk
Jan 16, 2026
Maintainer

Update: merged JAX support, installation slightly easier with

pip install openequivariance[jax]
pip install openequivariance_extjax --no-build-isolation

3 replies

abhijeetgangan Jan 16, 2026

Thanks, will try out and keep you posted.

mitkotak Jan 16, 2026

Thanks the release !

Was there a test with jax.jit ?
Were you able to benchmark wrt torch just to see if there's any additional overhead (compile and runtime-wise) ?

vbharadwaj-bk Jan 18, 2026
Maintainer

Ok, JAX JIT should work (see the README for an example in JAX), higher order derivatives probably work, but will need some testing from y'all just to make absolutely sure.

I don't want to spam any more releases this week, so these latest fixes aren't on PyPI yet (but they are merged to main). Install with

git clone https://github.com/PASSIONLab/OpenEquivariance.git
cd OpenEquivariance
pip install -e "./openequivariance[jax]"
pip install -e "./openequivariance_extjax" --no-build-isolation

vbharadwaj-bk · 2026-01-17T02:40:39Z

vbharadwaj-bk
Jan 17, 2026
Maintainer

JAX JIT should work, but a cursory test shows that it doesn't due to a well-documented, fixable issue (https://docs.jax.dev/en/latest/jep/4008-custom-vjp-update.html). We'll get that patched up in the next couple of days.
No benchmarks, we'd welcome help with that. No derivatives beyond 2nd right now either (for your phonon fine-tuning requiring 3rd), but I believe we can patch that up too in a couple of days.

0 replies

abhijeetgangan · 2026-01-23T10:24:08Z

abhijeetgangan
Jan 23, 2026

I was able to integrate the kernels with the nequix jax models and also with the simulation engines (ASE and jax-md). Tests to compare the integration are also present. It would be nice to have a new release.

The speed seams good and jax.jit seems to work fine. Need to do more benchmark for the speed. Also, what does the support look like for vmap / pmap?

Below is a result of simulation with and without kernels with the two simulators (script is available on the above fork):

1 reply

vbharadwaj-bk Jan 23, 2026
Maintainer

Amazing :D

vmap will require a bit of adjustment, I think, the kernels right now do some shape checking for 2D inputs. Should be able to take care of it. An example of a tensor product call with vmap would be nice (just the line of code - want to make sure we support the right features).

Pmap I believe is replaced by shard_map these days I think, and according to the docs, FFI calls should compose without a problem.

We'll get a release out with this stuff in a few days.

teddykoker · 2026-01-23T18:43:36Z

teddykoker
Jan 23, 2026
Author

Here is a demo for PFT (using @abhijeetgangan's ag/oeq_integration branch).

Right now it gets an error:

NotImplementedError: Differentiation rule for 'custom_lin' not implemented

But once higher-order derivatives are supported this should be a good demonstration of speed/memory improvements.

from pathlib import Path
import time
import urllib.request

import ase
import equinox as eqx
import jax
import jax.numpy as jnp
import matplotlib.pyplot as plt
import numpy as np
import optax
import phonopy
from nequix.data import atomic_numbers_to_indices, preprocess_graph
from nequix.model import Nequix
from tqdm import tqdm


def train(model, graph, ref_hessian, n_epochs=50, lr=0.003, label="HVP"):
    x = graph["positions"]
    n_atoms, n_dof = x.shape[0], x.size

    def energy_fn(model, pos_flat):
        pos = pos_flat.reshape(n_atoms, 3)
        disp = pos[graph["senders"]] - pos[graph["receivers"]] + graph["shifts"] @ graph["cell"]
        return model.node_energies(
            disp, graph["species"], graph["senders"], graph["receivers"]
        ).sum()

    grad_fn = jax.grad(energy_fn, argnums=1)

    def hvp_fn(model, x, v):
        return jax.jvp(lambda pos: grad_fn(model, pos), (x,), (v,))[1]

    optimizer = optax.adam(lr)
    opt_state = optimizer.init(eqx.filter(model, eqx.is_array))

    @eqx.filter_jit
    def train_step_hvp(model, opt_state, x_flat, idx):
        def loss_fn(model):
            v = jnp.zeros(n_dof, dtype=x.dtype).at[idx].set(1.0)
            hvp = hvp_fn(model, x_flat, v)
            return jnp.abs(hvp - ref_hessian[:, idx]).mean()

        loss, grads = eqx.filter_value_and_grad(loss_fn)(model)
        updates, opt_state_new = optimizer.update(grads, opt_state, eqx.filter(model, eqx.is_array))
        model = eqx.apply_updates(model, updates)
        return model, opt_state_new, loss

    x_flat = x.flatten()
    device = jax.devices()[0]

    loss_history = []
    step_times = []
    warmup_steps = 5
    rng_key = jax.random.key(42)

    for epoch in tqdm(range(n_epochs), desc=label):
        jax.block_until_ready(model)
        step_start = time.perf_counter()

        rng_key, subkey = jax.random.split(rng_key)
        idx = jax.random.randint(subkey, (), 0, n_dof)
        model, opt_state, loss = train_step_hvp(model, opt_state, x_flat, idx)

        jax.block_until_ready(model)
        step_time = time.perf_counter() - step_start

        if epoch >= warmup_steps:
            step_times.append(step_time)
        
        loss_history.append(float(loss))

    avg_step_time = sum(step_times) / len(step_times) if step_times else 0
    mem_stats = device.memory_stats()
    peak_mem = mem_stats.get("peak_bytes_in_use", 0) / 1024**3 if mem_stats else 0
    return loss_history, avg_step_time, peak_mem


def pft():
    n_epochs = 1000
    cutoff = 5.0
    model_kwargs = dict(
        n_species=1,
        cutoff=cutoff,
        hidden_irreps="32x0e + 32x1o + 32x2e",
        n_layers=3,
        radial_basis_size=8,
        radial_mlp_size=64,
        radial_mlp_layers=2,
    )

    data_path = Path("mp-149.yaml")
    if not data_path.exists():
        url = "https://github.com/teddykoker/nequix-examples/raw/refs/heads/main/phonon/mp-149.yaml"
        print(f"downloading {url}...")
        urllib.request.urlretrieve(url, data_path)
    ph_ref = phonopy.load(data_path)
    ph_ref.produce_force_constants()

    atoms = ase.Atoms(
        symbols=ph_ref.supercell.symbols,
        positions=ph_ref.supercell.positions,
        cell=ph_ref.supercell.cell,
        pbc=True,
    )
    atom_indices = atomic_numbers_to_indices(set(atoms.get_atomic_numbers()))

    g = preprocess_graph(atoms, atom_indices, cutoff, targets=False)

    graph = {
        k: jnp.array(v) for k, v in g.items() if v is not None and k not in ("n_node", "n_edge")
    }
    graph["species"] = graph["species"].astype(jnp.int32)
    graph["senders"] = graph["senders"].astype(jnp.int32)
    graph["receivers"] = graph["receivers"].astype(jnp.int32)
    n_atoms = g["n_node"][0]

    # (n, n, 3, 3) -> (3n, 3n)
    ref_hessian = (
        jnp.array(ph_ref.force_constants, dtype=jnp.float32)
        .swapaxes(1, 2)
        .reshape(n_atoms * 3, n_atoms * 3)
    )


    key = jax.random.key(0)
    model_kernel = Nequix(key=key, kernel=True, **model_kwargs)
    loss_kernel, avg_step_kernel, mem_kernel = train(
        model_kernel, graph, ref_hessian, n_epochs=n_epochs, label="HVP (kernel)"
    )

    key = jax.random.key(0)
    model_no_kernel = Nequix(key=key, kernel=False, **model_kwargs)
    n_params = sum(x.size for x in jax.tree_util.tree_leaves(eqx.filter(model_no_kernel, eqx.is_array)))

    loss_no_kernel, avg_step_no_kernel, mem_no_kernel = train(
        model_no_kernel, graph, ref_hessian, n_epochs=n_epochs, label="HVP (no kernel)"
    )

    print(f"With Kernel: {avg_step_kernel * 1000:.1f}ms/step, {mem_kernel:.2f}GB, final loss={loss_kernel[-1]:.2e}")
    print(f"No Kernel: {avg_step_no_kernel * 1000:.1f}ms/step, {mem_no_kernel:.2f}GB, final loss={loss_no_kernel[-1]:.2e}")
    speedup = avg_step_no_kernel / avg_step_kernel if avg_step_kernel > 0 else 0
    mem_ratio = mem_no_kernel / mem_kernel if mem_kernel > 0 else 0
    print(f"Speedup: {speedup:.1f}x, Memory: {mem_ratio:.1f}x")

    steps = np.arange(len(loss_no_kernel))

    fig, ax = plt.subplots(figsize=(5, 4))
    ax.plot(
        steps,
        loss_no_kernel,
        "b-",
        lw=2,
        label=f"No Kernel ({avg_step_no_kernel * 1000:.1f}ms/step, {mem_no_kernel:.2f}GB)",
    )
    ax.plot(
        steps,
        loss_kernel,
        "r--",
        lw=2,
        label=f"Kernel ({avg_step_kernel * 1000:.1f}ms/step, {mem_kernel:.2f}GB)",
    )
    ax.set(
        xlabel="Step",
        ylabel=r"Hessian MAE [meV/Å$^2$/atom]",
        title=f"Nequix {n_params // 1000}K Hessian Training (mp-149)",
    )
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.savefig("loss_comparison.png", dpi=300, bbox_inches="tight")
    plt.close()


if __name__ == "__main__":
    pft()

10 replies

teddykoker Jan 24, 2026
Author

Ah that makes sense, thanks for working on this! In the meantime since VJP works we can calculate the HVP like this

    def hvp_fn(model, x, v):
        _, vjp_fn = jax.vjp(lambda pos: grad_fn(model, pos), x)
        return vjp_fn(v)[0]

Which works because the Hessian is symmetric. In my testing this is slightly slower than JVP. However, even in this toy example using the results in a ~2.3x speedup and ~2.4x memory reduction!

teddykoker Jan 24, 2026
Author

(can reproduce simply by replacing hvp_fn with the one above.)

vbharadwaj-bk Jan 24, 2026
Maintainer

Awesome! Thanks for having a reproducer script on hand so quickly - this has been the easiest model integration to debug :D.

We'll keep you posted on the custom primitive integration so that jvp works too- but the third derivative seems correct in this figure. You'll probably see a greater speedup by pumping up either the batch size or the irrep multiplicities / max \ell.

Another brief comment for @abhijeetgangan - in Nequix, you'd want to check for the presence of the kernel in JAX by trying to import both OEQ and openequivariance_extjax (or openequivariance.jax). One can be present without the other.

teddykoker Jan 24, 2026
Author

For the full size Nequix 700K model (hidden_irreps = "128x0e + 64x1o + 32x2e + 32x3o", n_layers=4, cutoff=6.0), getting a speedup of ~2.5x and ~3.8x less memory! Once JVP is in we can test the full batched training.

abhijeetgangan Jan 26, 2026

Good catch! I added the changes on the fork.

vbharadwaj-bk · 2026-01-31T08:49:10Z

vbharadwaj-bk
Jan 31, 2026
Maintainer

#181 adds JVP support; verified that the original JVP script works. This is still not yet released to pypi in case further changes are required, but it is merged to main. You'll need to reinstall both oeq and openequivariance_extjax.

Happy hunting and let us know what the final speedup is (JVP loss curve below):

1 reply

teddykoker Jan 31, 2026
Author

Amazing, thank you for working on this! We'll be working on a PR for this shortly and can get you the speedup numbers for training (both normal and PFT) and inference.

teddykoker · 2026-02-05T19:14:24Z

teddykoker
Feb 5, 2026
Author

Working on the PR for Nequix. Here are the preliminary speed ups we are seeing.

Task	Kernels	Time	Speedup
MPtrj training	No	1.62 step/sec
MPtrj training	Yes	7.41 step/sec	4.6x
PFT	No	0.65 step/sec
PFT	Yes	2.62	4.0x

These are on a single A100 GPU. MPtrj uses a batch size of 64 with energy/force/stress loss. PFT uses a batch size of 16 (although has much bigger supercell inputs) with a energy/force/stress/hvp loss.

For multi gpu training, we use jax.pmap and are currently getting an error:

NotImplementedError: Batching rule for 'conv_fwd' not implemented

Would it be feasible to implement this? Would be happy to provide a reproducer if helpful.

8 replies

teddykoker Feb 5, 2026
Author

just added to scripts/pmap_example.py in that branch to make things easier.

vbharadwaj-bk Feb 9, 2026
Maintainer

Merged a diff to main; the pmap reproducer script works on 4 A100s and the loss seems to track the no-kernel version. The time per iteration that the plot shows is not meaningfully lower for the kernel version.. but this seems like an artifact (?) of the very small batch size. We can iterate if that is not the case.

teddykoker Feb 9, 2026
Author

Amazing! Here is the timing for the full MPtrj setting:

Task	GPUs	Kernels	Time	Speedup
MPtrj training	1 x A100	No	1.62 step/sec
MPtrj training	1 x A100	Yes	7.41 step/sec	4.6x
MPtrj training	4 x A100	No	1.55 step/sec
MPtrj training	4 x A100	Yes	6.78 step/sec	4.37x

There is a bit more overhead to sync gradients between the GPUs (about 10-20ms in this case), which explains why the speedup is slightly lower and why there is no clear difference in the reproduction script above (the sync overhead is way more than the step time for batch size 1).

Seems like everything is working well. Thank you so much for spending the time on this! Are you planning a new release soon?

vbharadwaj-bk Feb 9, 2026
Maintainer

Woot, glad that it's all functioning!

Yes; we'll release tonight and ping so you can merge.

teddykoker Feb 9, 2026
Author

Thank you!

teddykoker · 2026-02-06T04:38:05Z

teddykoker
Feb 6, 2026
Author

Inference speedups are even greater, especially combined with jax-md (the md overhead is more apparent once the kernels are in). For this ~2000 atom system of water molecules + NaCl I am getting almost a 18x speedup from 0.73 steps/second to 13.0 steps/second! Can run with nvt.py. Using the ASE calculator, we get a similar speedup to the torch version, about 8-10x.

0 replies

vbharadwaj-bk · 2026-02-10T03:57:29Z

vbharadwaj-bk
Feb 10, 2026
Maintainer

Released!

pip install openequivariance[jax]
pip install openequivariance_extjax --no-build-isolation

0 replies

JAX support #157

Uh oh!

teddykoker Sep 1, 2025

Replies: 11 comments · 44 replies

Uh oh!

asglover Sep 1, 2025 Collaborator

Uh oh!

teddykoker Oct 12, 2025 Author

Uh oh!

asglover Oct 13, 2025 Collaborator

Uh oh!

teddykoker Oct 13, 2025 Author

Uh oh!

asglover Oct 14, 2025 Collaborator

Uh oh!

teddykoker Oct 14, 2025 Author

Uh oh!

abhijeetgangan Nov 1, 2025

Uh oh!

asglover Nov 1, 2025 Collaborator

Uh oh!

abhijeetgangan Nov 2, 2025

Uh oh!

vbharadwaj-bk Jan 13, 2026 Maintainer

Uh oh!

teddykoker Jan 13, 2026 Author

Uh oh!

vbharadwaj-bk Jan 16, 2026 Maintainer

Uh oh!

abhijeetgangan Jan 16, 2026

Uh oh!

mitkotak Jan 16, 2026

Uh oh!

vbharadwaj-bk Jan 18, 2026 Maintainer

Uh oh!

vbharadwaj-bk Jan 17, 2026 Maintainer

Uh oh!

abhijeetgangan Jan 23, 2026

Uh oh!

vbharadwaj-bk Jan 23, 2026 Maintainer

Uh oh!

teddykoker Jan 23, 2026 Author

Uh oh!

teddykoker Jan 24, 2026 Author

Uh oh!

teddykoker Jan 24, 2026 Author

Uh oh!

vbharadwaj-bk Jan 24, 2026 Maintainer

Uh oh!

Uh oh!

teddykoker Jan 24, 2026 Author

Uh oh!

abhijeetgangan Jan 26, 2026

Uh oh!

vbharadwaj-bk Jan 31, 2026 Maintainer

Uh oh!

teddykoker Jan 31, 2026 Author

Uh oh!

teddykoker Feb 5, 2026 Author

Uh oh!

teddykoker Feb 5, 2026 Author

Uh oh!

vbharadwaj-bk Feb 9, 2026 Maintainer

Uh oh!

Uh oh!

teddykoker Feb 9, 2026 Author

Uh oh!

vbharadwaj-bk Feb 9, 2026 Maintainer

Uh oh!

teddykoker Feb 9, 2026 Author

Uh oh!

teddykoker
Sep 1, 2025

Replies: 11 comments 44 replies

asglover
Sep 1, 2025
Collaborator

teddykoker Oct 12, 2025
Author

asglover Oct 13, 2025
Collaborator

teddykoker Oct 13, 2025
Author

asglover Oct 14, 2025
Collaborator

teddykoker Oct 14, 2025
Author

abhijeetgangan
Nov 1, 2025

asglover Nov 1, 2025
Collaborator

vbharadwaj-bk
Jan 13, 2026
Maintainer

teddykoker Jan 13, 2026
Author

vbharadwaj-bk
Jan 16, 2026
Maintainer

vbharadwaj-bk Jan 18, 2026
Maintainer

vbharadwaj-bk
Jan 17, 2026
Maintainer

abhijeetgangan
Jan 23, 2026

vbharadwaj-bk Jan 23, 2026
Maintainer

teddykoker
Jan 23, 2026
Author

teddykoker Jan 24, 2026
Author

teddykoker Jan 24, 2026
Author

vbharadwaj-bk Jan 24, 2026
Maintainer

teddykoker Jan 24, 2026
Author

vbharadwaj-bk
Jan 31, 2026
Maintainer

teddykoker Jan 31, 2026
Author

teddykoker
Feb 5, 2026
Author

teddykoker Feb 5, 2026
Author

vbharadwaj-bk Feb 9, 2026
Maintainer

teddykoker Feb 9, 2026
Author

vbharadwaj-bk Feb 9, 2026
Maintainer

teddykoker Feb 9, 2026
Author