vLLM is difficult to use on Mac and requires significant workarounds. MLX is Apple's native framework that operates closer to the hardware than PyTorch.
MLX requires explicit management of Streams and Memory, similar to CUDA, but within Python. This provides fine-grained control over computation and memory operations.
This project uses uv for package management and requires Python 3.12+.
-
Install
uvif you haven't already:curl -LsSf https://astral.sh/uv/install.sh | sh -
Install dependencies:
uv sync
This will create a virtual environment (.venv/) and install MLX and its dependencies.
MLX examples:
uv run main.py # Matrix multiplication benchmark
uv run generate.py # Interactive text generationModal examples (NVIDIA GPU):
uv run modal run modal/get_started.pySee docs/modal.md for more details.
When you need NVIDIA GPUs, use Modal instead of RunPod:
- Function-based: Decorate functions with
@app.function(gpu="A10g")and run instantly - Pay-per-second: Only pay for execution time, no VM management
- No setup overhead: Write code locally, execute on cloud immediately