Skip to content

Conversation

@angeloskath
Copy link
Member

This update adds the following:

  • Large buffers for JACCL that fix the BW ( >= macOS 26.3 h/t Jack Beasly)
  • JACCL ring for arbitrarily large number of connected Macs and larger bandwidth (albeit with slightly higher latency for small sizes)
  • Adds the backend and environment variables to the hostfile so we don't have to define them when calling mlx.launch

The following shows the achieved bandwidth with the ring and mesh across 4 M3 Ultras.
jaccl-bw

The code needs a cleaning pass for sure but I think it is in a mergeable state. The main issue is I am reusing a lot of logic with slight changes which may (or may not if we lose performance) be possible to simplify.

@blightbow
Copy link

blightbow commented Feb 4, 2026

Has send/recv been working well for you in live testing? While implementing P2P transfers of files between M3 Ultras using JACCL, I found that tight synchronization between send_like and recv_like was an absolute must in order to avoid silent data corruption. Did your test suite include validation that the data received matched what was originally sent? all_reduce is an inherently synchronized operation, so I would caution additional testing if you have not confirmed this already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants