🍋 ezpz

Write once, run anywhere.

ezpz makes distributed PyTorch launches portable across NVIDIA, AMD, Intel, MPS, and CPU—with zero-code changes and guardrails for HPC schedulers.

It provides a:

🧰 CLI: ezpz with utilities for launching distributed jobs
🐍 Python library ezpz for writing hardware-agnostic, distributed PyTorch code
📝 Pre-built examples:

All of which:
- Use modern distributed PyTorch features (FSDP, TP, HF Trainer)
- Can be run anywhere (e.g. NVIDIA, AMD, Intel, MPS, CPU)

Checkout the 📘 Docs for more information!

🐣 Getting Started

Setup Python environment:
To use ezpz, we first need a Python environment (preferably virtual) that has torch and mpi4py installed.
- Already have one? Skip to (2.) below!
- Otherwise, we can use the provided src/ezpz/bin/utils.sh¹ to setup our environment:
```
source <(curl -LsSf https://bit.ly/ezpz-utils) && ezpz_setup_env
```
  [Optional]
  
  Note: This is technically optional, but recommended.
  Especially if you happen to be running behind a job scheduler (e.g. PBS/Slurm) at any of {ALCF, OLCF, NERSC}, this will automatically load the appropriate modules and use these to bootstrap a virtual environment.
  
  However, if you already have a Python environment with {torch, mpi4py} installed and would prefer to use that, skip directly to (2.) installing ezpz below

Install ezpz²:

uv pip install "git+https://github.com/saforem2/ezpz"

Need PyTorch or mpi4py?

If you don't already have PyTorch or mpi4py installed, you can specify these as additional dependencies:
```
uv pip install --no-cache --link-mode=copy "git+https://github.com/saforem2/ezpz[torch,mpi]"
```

... or try without installing!

If you already have a Python environment with {torch, mpi4py} installed, you can try ezpz without installing it:

# pip install uv first, if needed
uv run --with "git+https://github.com/saforem2/ezpz" ezpz doctor

TMPDIR=$(pwd) uv run --with "git+https://github.com/saforem2/ezpz" \
    --python=$(which python3) \
    ezpz test

TMPDIR=$(pwd) uv run --with "git+https://github.com/saforem2/ezpz" \
    --python=$(which python3) \
    ezpz launch \
        python3 -m ezpz.examples.fsdp_tp

Distributed Smoke Test:

Train simple MLP on MNIST with PyTorch + DDP:
```
ezpz test
```
See: [📑 ezpz test | W&B Report] for sample output and details of metric tracking.

🐍 Python Library

At its core, ezpz is a Python library designed to make writing distributed PyTorch code easy and portable across different hardware backends.

See 🐍 Python Library for more information.

✨ Features

See 🚀 Quickstart for a detailed walk-through of ezpz features.
🪄 Automatic:
- Accelerator detection: ezpz.get_torch_device(),
  across {cuda, xpu, mps, cpu}
- Distributed initialization: ezpz.setup_torch(), to pick the right device + backend combo
- Metric handling and utilities for {tracking, recording, plotting}: ezpz.History() with Weights & Biases support
- Integration with native job scheduler(s) (PBS, Slurm)
  - with safe fall-backs when no scheduler is detected
- Single-process logging with filtering for distributed runs

Note

See Examples for ready-to-go examples that can be used as templates or starting points for your own distributed PyTorch workloads.

🧰 `ezpz`: CLI Toolbox

Once installed, ezpz provides a CLI with a few useful utilities to help with distributed launches and environment validation.

Explicitly, these are:

ezpz doctor  # environment validation and health-check
ezpz test    # distributed smoke test
ezpz launch  # general purpose, scheduler-aware launching

To see the list of available commands, run:

ezpz --help

Note

Checkout 🧰 CLI for additional information.

🩺 `ezpz doctor`

Health-check your environment and ensure that ezpz is installed correctly

ezpz doctor
ezpz doctor --json   # machine-friendly output for CI

Checks MPI, scheduler detection, Torch import + accelerators, and wandb readiness, returning non-zero on errors.

See: 🩺 Doctor for more information.

✅ `ezpz test`

Run the bundled test suite (great for first-time validation):

ezpz test

Or, try without installing:

TMPDIR=$(pwd) uv run \
    --python=$(which python3) \
    --with "git+https://github.com/saforem2/ezpz" \
    ezpz test

See ✅ Test for more information.

🚀 `ezpz launch`

Single entry point for distributed jobs.

ezpz detects PBS/Slurm automatically and falls back to mpirun, forwarding useful environment variables so your script behaves the same on laptops and clusters.

Add your own args to any command (--config, --batch-size, etc.) and ezpz will propagate them through the detected launcher.

Use the provided

ezpz launch <launch flags> -- <cmd> <cmd flags>

to automatically launch <cmd> across all available³ accelerators.

Use it to launch:

Arbitrary command(s):
```
ezpz launch hostname
```

Arbitrary Python string:

ezpz launch python3 -c 'import ezpz; ezpz.setup_torch()'

One of the ready-to-go examples:

ezpz launch python3 -m ezpz.test_dist --profile
ezpz launch -n 8 -- python3 -m ezpz.examples.fsdp_tp --tp 4

Your own distributed training script:
```
ezpz launch -n 16 -ppn 8 -- python3 -m your_app.train --config configs/your_config.yaml
```
to launch your_app.train across 16 processes, 8 per node.

See 🚀 Launch for more information.

📝 Ready-to-go Examples

See 📝 Examples for complete example scripts covering:

⚙️ Environment Variables

Additional configuration can be done through environment variables, including:

The colorized logging output can be toggled via the NO_COLOR environment var, e.g. to turn off colors:
```
NO_COLOR=1 ezpz launch python3 -m your_app.train
```
Forcing a specific torch device (useful on GPU hosts when you want CPU-only):
```
TORCH_DEVICE=cpu ezpz test
```

Changing the plot marker used in the text-based plots:

# highest resolution, may not be supported in all terminals
EZPZ_TPLOT_MARKER="braille" ezpz launch python3 -m your_app.train
# next-best resolution, more widely supported
EZPZ_TPLOT_MARKER="fhd" ezpz launch python3 -m your_app.train

➕ More Information

Examples live under ezpz.examples.*—copy them or extend them for your workloads.
Stuck? Check the docs, or run ezpz doctor for actionable hints.
See my recent talk on: LLMs on Aurora: Hands On with ezpz for a detailed walk-through containing examples and use cases.

The https://bit.ly/ezpz-utils URL is just a short link for convenience that actually points to https://raw.githubusercontent.com/saforem2/ezpz/main/src/ezpz/bin/utils.sh ↩
If you don't have uv installed, you can install it via:
```
pip install uv
```
See the uv documentation for more details. ↩
By default, this will detect if we're running behind a job scheduler (e.g. PBS or Slurm). If so, we automatically determine the specifics of the currently active job; explicitly, this will determine:
1. The number of available nodes
2. How many GPUs are present on each of these nodes
3. How many GPUs we have total
It will then use this information to automatically construct the appropriate {mpiexec, srun} command to launch, and finally, execute the launch cmd. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 1,286 Commits
.github		.github
assets		assets
docs		docs
src/ezpz		src/ezpz
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🍋 ezpz

🐣 Getting Started

🐍 Python Library

✨ Features

🧰 `ezpz`: CLI Toolbox

🩺 `ezpz doctor`

✅ `ezpz test`

🚀 `ezpz launch`

📝 Ready-to-go Examples

⚙️ Environment Variables

➕ More Information

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

saforem2/ezpz

Folders and files

Latest commit

History

Repository files navigation

🍋 ezpz

🐣 Getting Started

🐍 Python Library

✨ Features

🧰 ezpz: CLI Toolbox

🩺 ezpz doctor

✅ ezpz test

🚀 ezpz launch

📝 Ready-to-go Examples

⚙️ Environment Variables

➕ More Information

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

🧰 `ezpz`: CLI Toolbox

🩺 `ezpz doctor`

✅ `ezpz test`

🚀 `ezpz launch`

Packages