Skip to content

Zero to Hero Scripts

Ilya Baldin edited this page Feb 24, 2026 · 3 revisions

Overview

The scripts/zero_to_hero/ directory contains minimal bash wrapper scripts for running E2SAR network performance tests on HPC systems, with specific support for Perlmutter at NERSC and the ESnet load balancer.

All scripts run via podman-hpc using the ibaldin/e2sar container image (you may choose to update which image is used by editing them). Artifacts (log files, INSTANCE_URI) are always created in the current working directory, so you can run the scripts from any directory.

Required Workflow

Tests follow a fixed three-step sequence:

  1. Reserve — allocate a load balancer session
  2. Run — start sender and/or receiver
  3. Free — release the reservation

Scripts

Script Purpose
setup_env.sh Source once to add the zero_to_hero directory to $PATH, enabling scripts to be called from anywhere
minimal_reserve.sh Reserves a load balancer session using the admin EJFAT_URI; writes an INSTANCE_URI file consumed by the other scripts
minimal_sender.sh Sends events through the load balancer; configurable rate, event size, MTU, and count; includes automatic memory monitoring
minimal_receiver.sh Receives events from the load balancer; configurable port, duration, thread count, and buffer size
minimal_free.sh Releases the load balancer reservation using the INSTANCE_URI file
monitor_memory.sh Standalone memory monitor for all e2sar_perf processes; logs RSS/VSZ in CSV format at a configurable interval
perlmutter_slurm.sh SLURM batch script for Perlmutter: allocates 2 nodes (Node 0 = receiver, Node 1 = sender), creates its own fresh LB reservation per job, and cleans up on completion
perlmutter_multi_slurm.sh SLURM batch script supporting multiple concurrent senders and receivers co-located across a shared node pool; senders and receivers can share nodes

Quick Example

# Optional one-time setup
source /path/to/zero_to_hero/setup_env.sh

# 1. Reserve
EJFAT_URI="ejfat://token@host:port/lb/1?sync=..." minimal_reserve.sh

# 2. Run (in separate terminals or background)
minimal_sender.sh --rate 5 --num 1000
minimal_receiver.sh --duration 60

# 3. Free
minimal_free.sh

Pass -v to minimal_sender.sh, minimal_receiver.sh, and minimal_free.sh when the load balancer control plane SSL certificate has expired. Do not pass -v to minimal_reserve.sh.

SLURM (Perlmutter)

Both SLURM scripts require E2SAR_SCRIPTS_DIR and the admin EJFAT_URI to be exported before submission. Each job creates an isolated working directory under runs/slurm_job_<JOBID>/ and manages its own LB reservation lifecycle.

export E2SAR_SCRIPTS_DIR=/path/to/E2SAR/scripts/zero_to_hero

# Single sender + receiver across 2 nodes
sbatch -A <project> perlmutter_slurm.sh --rate 10 --num 5000

# Multiple senders + receivers
sbatch -N 2 -A <project> perlmutter_multi_slurm.sh \
    --receivers 4 --senders 4 --receivers-per-node 2 --senders-per-node 2 --rate 1

Further Reading

For full usage details, all available options, troubleshooting, and step-by-step tutorials, see the scripts/zero_to_hero/README.md and the guides in the docs/ subdirectory.

Clone this wiki locally