-
Notifications
You must be signed in to change notification settings - Fork 1
Zero to Hero Scripts
The scripts/zero_to_hero/ directory contains minimal bash wrapper scripts for running E2SAR network performance tests on HPC systems, with specific support for Perlmutter at NERSC and the ESnet load balancer.
All scripts run via podman-hpc using the ibaldin/e2sar container image (you may choose to update which image is used by editing them). Artifacts (log files, INSTANCE_URI) are always created in the current working directory, so you can run the scripts from any directory.
Tests follow a fixed three-step sequence:
- Reserve — allocate a load balancer session
- Run — start sender and/or receiver
- Free — release the reservation
| Script | Purpose |
|---|---|
setup_env.sh |
Source once to add the zero_to_hero directory to $PATH, enabling scripts to be called from anywhere |
minimal_reserve.sh |
Reserves a load balancer session using the admin EJFAT_URI; writes an INSTANCE_URI file consumed by the other scripts |
minimal_sender.sh |
Sends events through the load balancer; configurable rate, event size, MTU, and count; includes automatic memory monitoring |
minimal_receiver.sh |
Receives events from the load balancer; configurable port, duration, thread count, and buffer size |
minimal_free.sh |
Releases the load balancer reservation using the INSTANCE_URI file |
monitor_memory.sh |
Standalone memory monitor for all e2sar_perf processes; logs RSS/VSZ in CSV format at a configurable interval |
perlmutter_slurm.sh |
SLURM batch script for Perlmutter: allocates 2 nodes (Node 0 = receiver, Node 1 = sender), creates its own fresh LB reservation per job, and cleans up on completion |
perlmutter_multi_slurm.sh |
SLURM batch script supporting multiple concurrent senders and receivers co-located across a shared node pool; senders and receivers can share nodes |
# Optional one-time setup
source /path/to/zero_to_hero/setup_env.sh
# 1. Reserve
EJFAT_URI="ejfat://token@host:port/lb/1?sync=..." minimal_reserve.sh
# 2. Run (in separate terminals or background)
minimal_sender.sh --rate 5 --num 1000
minimal_receiver.sh --duration 60
# 3. Free
minimal_free.shPass -v to minimal_sender.sh, minimal_receiver.sh, and minimal_free.sh when the load balancer control plane SSL certificate has expired. Do not pass -v to minimal_reserve.sh.
Both SLURM scripts require E2SAR_SCRIPTS_DIR and the admin EJFAT_URI to be exported before submission. Each job creates an isolated working directory under runs/slurm_job_<JOBID>/ and manages its own LB reservation lifecycle.
export E2SAR_SCRIPTS_DIR=/path/to/E2SAR/scripts/zero_to_hero
# Single sender + receiver across 2 nodes
sbatch -A <project> perlmutter_slurm.sh --rate 10 --num 5000
# Multiple senders + receivers
sbatch -N 2 -A <project> perlmutter_multi_slurm.sh \
--receivers 4 --senders 4 --receivers-per-node 2 --senders-per-node 2 --rate 1For full usage details, all available options, troubleshooting, and step-by-step tutorials, see the scripts/zero_to_hero/README.md and the guides in the docs/ subdirectory.