MLXtron (work in progress)

4D parallelizable training for models using MLX. Based on Picotron.

very minimal implementation and probably will only support LLama architecture for now.

as mac users we mostly operate in the GPU-poor case 😭, but with enough macs together some real power kicks in.

read the blog to learn along with me at stefpi.net/blog/

design

the benefit of training across multiple macs (aside from the biggest consumer RAM capacity) is the fact that each GPU used in the training network is attached to a significant amount of storage and CPU power. With this it gives us the option to skip many communication/broadcast steps because each device can have the dataset locally and pull only necessary samples into unified memory.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
mlxtron		mlxtron
.gitignore		.gitignore
README.md		README.md
llama-old.py		llama-old.py
monitor.sh		monitor.sh
requirements.txt		requirements.txt
structure.md		structure.md
testing.py		testing.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLXtron (work in progress)

design

About

Uh oh!

Releases

Packages

Languages

stefpi/mlxtron

Folders and files

Latest commit

History

Repository files navigation

MLXtron (work in progress)

design

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages