diff --git a/.claude/skills/delete-releases/SKILL.md b/.claude/skills/delete-releases/SKILL.md new file mode 100644 index 0000000..51eab75 --- /dev/null +++ b/.claude/skills/delete-releases/SKILL.md @@ -0,0 +1,33 @@ +--- +name: delete-releases +description: Delete all GitHub releases for a given E2SAR version +disable-model-invocation: true +user-invocable: true +argument-hint: +allowed-tools: Bash +--- + +Delete all GitHub releases matching version `$ARGUMENTS`. + +Releases follow the naming pattern: `E2SAR----` where os is ubuntu, rocky, or debian. + +Steps: + +1. First list all releases matching the version to show the user what will be deleted: + ``` + gh release list --limit 100 | grep "E2SAR-$ARGUMENTS-" + ``` + +2. Show the list to the user and ask for confirmation before deleting. + +3. If the user confirms, delete each matching release: + ``` + gh release list --limit 100 | grep "E2SAR-$ARGUMENTS-" | awk '{print $1}' | xargs -I {} gh release delete {} --yes + ``` + +4. Verify deletion by listing releases again to confirm they are gone. + +Important: +- Always show the user what will be deleted BEFORE deleting +- Never clean up the associated tags +- If no releases match, inform the user and do nothing diff --git a/.claude/skills/review-pr/SKILL.md b/.claude/skills/review-pr/SKILL.md new file mode 100644 index 0000000..e4d126f --- /dev/null +++ b/.claude/skills/review-pr/SKILL.md @@ -0,0 +1,221 @@ +--- +name: review-pr +description: Perform a structured code review of an E2SAR GitHub PR and post inline + summary comments +user-invocable: true +argument-hint: +allowed-tools: Bash, Read, Glob, Grep +--- + +Perform a thorough, structured code review of E2SAR pull request `$ARGUMENTS` and post actionable GitHub comments. + +## Steps + +### 1. Fetch PR metadata and diff + +```bash +gh pr view $ARGUMENTS --json number,title,body,author,baseRefName,headRefName +gh repo view --json nameWithOwner --jq '.nameWithOwner' # capture as REPO +gh api repos/REPO/pulls/$ARGUMENTS/files --jq '.[].filename' +``` + +Display a summary to the user: PR title, author, base branch, files changed. + +To read the actual changed lines for a file, use the per-file patch from the API — **do not parse `gh pr diff` with awk**, as range patterns silently drop content when the next `diff --git` header is adjacent: + +```bash +# All patches at once (good for small PRs) +gh api repos/REPO/pulls/$ARGUMENTS/files --jq '.[] | {path: .filename, patch: .patch}' + +# One file at a time +gh api repos/REPO/pulls/$ARGUMENTS/files \ + --jq '.[] | select(.filename == "path/to/file.cpp") | .patch' +``` + +### 2. Categorize changed files + +Group the changed files by type and apply the matching checklist in Step 3: + +| Pattern | Checklist | +|---------|-----------| +| `include/*.hpp`, `src/*.cpp` | C++ style + docs | +| `src/pybind/py_*.cpp` | pybind11 bindings | +| `test/py_test/test_*.py` | pytest conventions | +| `test/*.cpp` | Boost.Test conventions | +| `meson.build` | Build system | +| `RELEASE-NOTES.md`, `VERSION.txt` | Version hygiene | + +For context on project conventions, read the relevant reference files as needed: +- `include/e2sarError.hpp` — `E2SARErrorc` enum and `E2SARErrorInfo` +- `include/e2sarURI.hpp` — naming/docs conventions +- `include/e2sarDPSegmenter.hpp` — Flags struct, thread state, atomics +- `include/e2sarCP.hpp` — gRPC `result` pattern +- `src/pybind/py_e2sarDP.cpp` — pybind11 patterns +- `test/py_test/test_b2b_DP.py` — pytest helper/marker conventions +- `test/py_test/pytest.ini` — official list of valid pytest markers + +### 3. Apply review checklists + +#### C++ checklist (`include/*.hpp`, `src/*.cpp`) + +**Naming conventions** +- Classes: PascalCase (`Segmenter`, `LBManager`) +- Methods: camelCase (`openAndStart()`, `sendEvent()`); private helpers prefixed with `_` (`_open()`, `_threadBody()`) +- Getters: `get_` prefix; setters: `set_` prefix; boolean queries: `has_` / `is` prefix +- Variables: camelCase; constants/constexpr: UPPER_SNAKE_CASE +- Type aliases: PascalCase with `_t` suffix (`EventNum_t`, `UnixTimeNano_t`) + +**Documentation** +- All public class methods in headers must have a Doxygen `/** ... */` block with `@param` per parameter and `@return` +- Constructors must document every parameter +- `noexcept` methods must be declared as such in the signature +- Deleted copy constructors/assignment operators must have a brief `/** Don't want to copy... */` comment +- Structs with multiple fields should have a descriptive block listing each field + +**Error handling** +- Constructors throw `E2SARException` on failure — must not silently swallow errors +- Methods returning `result` must be marked `noexcept` +- Callers of `result` must check `has_error()` before `.value()`; direct `.value()` without check is a bug +- New error codes must be added to `E2SARErrorc` enum in `include/e2sarError.hpp`; never return raw `-1` + +**Memory and thread safety** +- Shared mutable state accessed from multiple threads must use `std::atomic<>` or a mutex +- New exclusively-owned class members should use `std::unique_ptr`; shared ownership uses `std::shared_ptr` +- Classes with thread ownership must delete copy constructor and copy assignment +- Large objects passed into threads should use `std::move`; never pass by copy if avoidable +- Raw `new` without a corresponding `delete` or RAII wrapper is a memory leak; use `std::unique_ptr` or pool allocator + +**Header structure** +- Header guards: `#ifndef E2SARHPP / #define E2SARHPP` +- Include order: standard library → third-party (Boost, gRPC) → project headers +- `using namespace` declarations inside `namespace e2sar {}` only; never at global scope in headers +- `"string"s` literals require `using namespace std::string_literals;` in scope + +#### pybind11 checklist (`src/pybind/py_*.cpp`) + +- Each bound method must have a short docstring as the last string argument to `.def()` +- Python-side names use snake_case (`get_instance_token`, `set_instance_token`) +- Enum values use lowercase (`.value("admin", ...)`) +- Any binding that calls a Python callable from a C++ thread must use GIL management (`py::gil_scoped_acquire`) +- Buffer/numpy methods must use `py::buffer_info` and cast through `buf_info.ptr` + +#### pytest checklist (`test/py_test/test_*.py`) + +- Every test function must start with a `"""Test ."""` docstring +- Each test must be decorated with exactly one marker from `pytest.ini`: `@pytest.mark.unit`, `@pytest.mark.b2b`, `@pytest.mark.cp`, or `@pytest.mark.lb-b2b` +- `result` return values must be checked: `assert res.has_error() is False, f"{res.error().message}"` +- b2b tests using ports must account for TIME_WAIT; use different port numbers across tests or add socket options +- Helper functions (not tests) must not have a `test_` prefix + +#### Boost.Test checklist (`test/*.cpp`) + +- Each test case registered with `BOOST_AUTO_TEST_CASE` should test one logical thing +- Failures must use `BOOST_CHECK_*` / `BOOST_REQUIRE_*` macros, not raw `assert` +- New test executables must be registered in `test/meson.build` with `suite: 'unit'` or `suite: 'live'` + +#### Meson build checklist (`meson.build`) + +- New source files must be listed in the appropriate `meson.build` `sources:` array (silently excluded otherwise) +- New test executables must be registered with `test('Name', exe, suite: 'unit|live')` +- New dependencies must follow `dependency('name', version: '>=x.y.z')` pattern +- Optional features (liburing, NUMA) must use `compiler.has_header()` / `compiler.compiles()` guards +- `link_args: linker_flags` must be included for all executables/libraries that link e2sar + +#### Version hygiene (`RELEASE-NOTES.md`, `VERSION.txt`) + +- `VERSION.txt` must be updated if the PR introduces a new release +- `RELEASE-NOTES.md` entry must match the version in `VERSION.txt` +- Alpha/beta suffixes (`a1`, `b1`) must be removed before a final release commit + +#### Secrets and usernames + +- Files must not contain secrets that look real - random strings should be flagged as potential secrets. +- Strings that look like user names should also be flagged. +- Pay special attention to strings that look like 'ejfats://token@host:port/...' or 'ejfat://token@host:port/...' or mention EJFAT_URI. It is allowed to use the word 'token' or 'randomstring' or some variation of this to denote the secret token, however strings that look random likely represent real secrets and must be flagged. + +### 4. Post inline GitHub review comments + +For each issue found, post an inline **brief** comment using the GitHub API. Determine the commit SHA and file path from the diff. + +First, get the latest commit SHA on the PR head: +```bash +gh pr view $ARGUMENTS --json headRefOid --jq '.headRefOid' +``` + +Then post each inline comment: +```bash +gh api repos/{owner}/{repo} --method GET --jq '.full_name' +# Use the returned full_name as REPO below + +gh api repos/REPO/pulls/$ARGUMENTS/comments \ + --method POST \ + --field body='COMMENT_TEXT' \ + --field commit_id='HEAD_SHA' \ + --field path='FILE_PATH' \ + --field line=LINE_NUMBER \ + --field side='RIGHT' +``` + +Get `REPO` once with: +```bash +gh repo view --json nameWithOwner --jq '.nameWithOwner' +``` + +Use this comment templates (adapt the specific details): + +**C++ naming (private helper):** +> `parseFromString` should be `_parseFromString` — private helpers use the `_` prefix. See `Segmenter::_open()` for the convention. + +**Missing Doxygen:** +> Public method `computeChecksum()` is missing a `/** ... */` Doxygen block. Add `@param` for each argument and `@return` describing the result. + +**result misuse:** +> `res.value()` is called without checking `res.has_error()` first. This will throw if the call fails. Wrap in `if (!res.has_error()) { ... }` or assert. + +**noexcept missing:** +> This method returns `result` but isn't marked `noexcept`. All `result`-returning methods should be `noexcept` — exceptions are encoded in the result type. + +**Thread safety:** +> This counter is incremented from multiple threads without synchronization. Use `std::atomic` as done in `Segmenter::AtomicStats::errCnt`. + +**Missing test marker:** +> This test function has no pytest marker. Add `@pytest.mark.unit` (or the appropriate category) so it's included in the right test suite. + +**Binding docstring missing:** +> `.def("methodName", ...)` is missing a docstring. Add a brief description as the last string argument. + +**Build file omission:** +> `new_module.cpp` is not listed in `src/meson.build`'s source list. It will be silently excluded from the build. + +**Error code missing:** +> This failure path returns a raw value instead of `E2SARErrorInfo{E2SARErrorc::SocketError, "..."}`. Use the proper error type. + +**Memory leak risk:** +> Raw `new` without a corresponding `delete` or RAII wrapper. Use `std::unique_ptr` or the existing pool allocator. + +### 5. Post a summary comment + +After all inline comments, post one top-level PR comment with the overall assessment. +Use a heredoc to avoid quoting failures when the body contains single quotes or backticks: + +```bash +gh pr comment $ARGUMENTS --body "$(cat <<'EOF' +SUMMARY +EOF +)" +``` + +The summary must include: +- **Overall verdict**: Approve / Request Changes / Comment +- **Blocking issues** (must fix before merge): numbered list, **briefly** describing each problem. +- **Non-blocking suggestions** (style/docs): bulleted list **briefly** describing each suggestion +- **What looks good**: **brief** callouts of well-done sections +- Footer: `> Review generated by /review-pr skill — verify all inline comments before merging.` + +## Important notes + +- Only report issues that are actually present in the diff — do not flag pre-existing code outside the changed lines +- Distinguish blocking issues (correctness, memory safety, build breakage) from non-blocking ones (style, docs) +- If no issues are found in a category, state that explicitly in the summary rather than inventing feedback +- Always fetch the repo name dynamically with `gh repo view` — never hardcode it +- If `$ARGUMENTS` is empty or not a valid PR number, ask the user for the PR number before proceeding +- Be **brief**, only provide enough information to explain the problem or suggestion. Do not be conversational, instead use **brief** and to-the-point suggestions only. diff --git a/.github/workflows/pr-review.yml b/.github/workflows/pr-review.yml new file mode 100644 index 0000000..4c6b904 --- /dev/null +++ b/.github/workflows/pr-review.yml @@ -0,0 +1,39 @@ +name: Automated PR Review + +on: + pull_request: + types: [opened] # fires exactly once — when the PR is first opened + workflow_dispatch: + inputs: + prnum_in: + description: 'PR Number' + type: string + required: true + +jobs: + review: + runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: write # needed so Claude can post inline + summary comments + id-token: write + steps: + - name: Resolve PR number + id: resolve_pr + run: | + if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then + echo "PR_NUMBER=${{ inputs.prnum_in }}" >> $GITHUB_ENV + else + echo "PR_NUMBER=${{ github.event.pull_request.number }}" >> $GITHUB_ENV + fi + + - name: Checkout code + uses: actions/checkout@v4 + + - name: Run /review-pr skill + uses: anthropics/claude-code-action@beta + with: + anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} + direct_prompt: "/review-pr ${{ env.PR_NUMBER }}" + env: + GH_TOKEN: ${{ secrets.CLASSIC_REPO_PAT }} # authenticates gh CLI calls inside the skill diff --git a/Dockerfile.cli b/Dockerfile.cli index 1795684..5e64a9f 100644 --- a/Dockerfile.cli +++ b/Dockerfile.cli @@ -5,7 +5,7 @@ # This is a multi-stage build optimized for minimal image size # Global build arguments (available to all stages) -ARG DEPS_VER=0.3.0a1 +ARG DEPS_VER=0.3.1 ARG E2SAR_DEPS_DEB=E2SAR-${DEPS_VER}-main-ubuntu-24.04/e2sar-deps_${DEPS_VER}_amd64.deb ARG E2SAR_DEPS_DEB_URL=https://github.com/JeffersonLab/E2SAR/releases/download/${E2SAR_DEPS_DEB} ARG E2SARINSTALL=/e2sar-install diff --git a/Doxyfile b/Doxyfile index 8706085..c42187e 100644 --- a/Doxyfile +++ b/Doxyfile @@ -48,7 +48,7 @@ PROJECT_NAME = "E2SAR" # could be handy for archiving the generated documentation or if some version # control system is used. -PROJECT_NUMBER = 0.2.0 +PROJECT_NUMBER = 0.3.0 # Using the PROJECT_BRIEF tag one can provide an optional one line description # for a project that appears at the top of each page and should give viewer a diff --git a/README.md b/README.md index 291568f..d9237da 100644 --- a/README.md +++ b/README.md @@ -267,6 +267,8 @@ E2SAR code comes with a set of tests under [test/](test/) folder. It relies on B There is a [Jupyter notebook](scripts/notebooks/EJFAT/LBCP-tester.ipynb) which runs all the tests on FABRIC testbed. +For checking for memory leaks use [scripts/perf-valgrind-loopback.sh](scripts/perf-valgrind-loopback.sh) (run it from scripts/ and give parameters `./perf-valgrind-loopback.sh build /tmp/valgrind-report/` where build is the meson build directory and /tmp/valgrind-report is the directory where the script will place all the logs). Then you can run the XML report parser to get a more concise report `scripts/parse-valgrind-xml.py /tmp/valgrind-reports`. + ### Python The code can be tested using pytest diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md index 33569de..ceb0a18 100644 --- a/RELEASE-NOTES.md +++ b/RELEASE-NOTES.md @@ -2,6 +2,13 @@ API Details can always be found in the [wiki](https://github.com/JeffersonLab/E2SAR/wiki) and in the [Doxygen site](https://jeffersonlab.github.io/E2SAR-doc/annotated.html). +## v0.3.1 + +- Reversed the order of removing senders and shutting down send threads in e2sar_perf to ensure no losses occur due to premature de-listing of the sender IP in LB +- Fixed multiple memory leaks in segmenter and reassembler +- Fixed proper handling of `-v/--novalidate` flag in lbadm +- Added 'Zero-to-Hero' scripts that show how to do performance testing on Perlmutter (located in scripts/zero_to_hero). The scripts exercise the wide variety of E2SAR tools and show how they can be integrated into Slurm workflows. + ## v0.3.0 - Dependencies change - Boost 1.89.0 and gRPC 1.74.1 diff --git a/VERSION.txt b/VERSION.txt index 0d91a54..9e11b32 100644 --- a/VERSION.txt +++ b/VERSION.txt @@ -1 +1 @@ -0.3.0 +0.3.1 diff --git a/bin/e2sar_perf.cpp b/bin/e2sar_perf.cpp index 57344c8..d44ea81 100644 --- a/bin/e2sar_perf.cpp +++ b/bin/e2sar_perf.cpp @@ -46,6 +46,9 @@ void shutDown() boost::chrono::milliseconds duration(1000); boost::this_thread::sleep_for(duration); if (segPtr != nullptr) { + // d-tor will stop the threads - it is important to do that before removing sender + // to make sure outstanding data is sent out + delete segPtr; if (lbmPtr != nullptr) { std::cout << "Removing senders: "; if (senders.size() > 0) @@ -64,8 +67,6 @@ void shutDown() std::cerr << "Unable to remove auto-detected sender from list on exit: " << rmres.error().message() << std::endl; } } - // d-tor will stop the threads - delete segPtr; } if (reasPtr != nullptr) { diff --git a/bin/lbadm.cpp b/bin/lbadm.cpp index 302689b..d6caaef 100644 --- a/bin/lbadm.cpp +++ b/bin/lbadm.cpp @@ -935,7 +935,7 @@ int main(int argc, char **argv) preferV6 = true; } - novalidate = not vm["novalidate"].as(); + novalidate = vm["novalidate"].as(); // if ipv4 or ipv6 requested explicitly bool preferHostAddr = false; @@ -968,11 +968,11 @@ int main(int argc, char **argv) throw std::runtime_error("Unable to read server root certificate file"); return LBManager(uri, true, preferHostAddr, opts_res.value()); } - + if (novalidate) std::cerr << "Skipping server certificate validation" << std::endl; - return LBManager(uri, !novalidate, preferHostAddr); + return LBManager(uri, !novalidate, preferHostAddr); }; try { diff --git a/bin/meson.build b/bin/meson.build index 2098627..5b89ba8 100644 --- a/bin/meson.build +++ b/bin/meson.build @@ -30,4 +30,4 @@ executable('e2sar_udp_relay', 'e2sar_udp_relay.cpp', link_with: libe2sar, install: true, link_args: linker_flags, - dependencies: [boost_dep, thread_dep]) + dependencies: [boost_dep, thread_dep, grpc_dep, protobuf_dep]) diff --git a/docs b/docs index 414b9a6..8e2e7bb 160000 --- a/docs +++ b/docs @@ -1 +1 @@ -Subproject commit 414b9a660c96562ce3a05045301a29d8b1cff576 +Subproject commit 8e2e7bbe9de6a5bae6dace37cf848a5bc12b0c5e diff --git a/include/e2sarDPReassembler.hpp b/include/e2sarDPReassembler.hpp index 8c470e3..87aed6a 100644 --- a/include/e2sarDPReassembler.hpp +++ b/include/e2sarDPReassembler.hpp @@ -665,7 +665,26 @@ namespace e2sar for(auto i = recvThreadState.begin(); i != recvThreadState.end(); ++i) i->threadObj.join(); + // drain event queue + EventQueueItem* item{nullptr}; + bool a{false}; + do { + a = eventQueue.pop(item); + if (a) + { + if (item->event != nullptr) + delete[] item->event; + delete item; + } + } while (a); + gcThreadState.threadObj.join(); + + // drain lost events queue - tuples are heap-allocated in logLostEvent + // and only freed by get_LostEvent(); if the caller never calls it they leak + boost::tuple* evtPtr{nullptr}; + while (recvStats.lostEventsQueue.pop(evtPtr)) + delete evtPtr; } } protected: diff --git a/include/e2sarNetUtil.hpp b/include/e2sarNetUtil.hpp index 15e1586..3a7b6fb 100644 --- a/include/e2sarNetUtil.hpp +++ b/include/e2sarNetUtil.hpp @@ -52,6 +52,7 @@ namespace e2sar static result> getInterfaceAndMTU(const ip::address &addr); #endif + static result getSocketOutstandingBytes(int sockfd) noexcept; }; } #endif diff --git a/meson.build b/meson.build index 90932d1..716da29 100644 --- a/meson.build +++ b/meson.build @@ -76,6 +76,26 @@ if compiler.compiles(mmsgcode, name: 'sendmmsg check') add_project_arguments('-DSENDMMSG_AVAILABLE', language: ['cpp']) endif +outqcode = ''' +#include +void f() { + ioctl(0, TIOCOUTQ, nullptr); +} +''' +if compiler.compiles(outqcode, name: 'ioctl TIOCOUTQ check') + add_project_arguments('-DSIOCOUTQ_AVAILABLE', language: ['cpp']) +endif + +sockoptnwritecode = ''' +#include +void f() { + getsockopt(0, SOL_SOCKET, SO_NWRITE, nullptr, nullptr); +} +''' +if compiler.compiles(sockoptnwritecode, name: 'setsockopt SO_NWRITE check') + add_project_arguments('-DSO_NWRITE_AVAILABLE', language: ['cpp']) +endif + add_project_arguments(f'-DE2SAR_VERSION="' + meson.project_version() + '"', language:['cpp']) # -Wall diff --git a/scripts/parse-valgrind-xml.py b/scripts/parse-valgrind-xml.py new file mode 100755 index 0000000..8a01fd7 --- /dev/null +++ b/scripts/parse-valgrind-xml.py @@ -0,0 +1,28 @@ +#!/usr/bin/env python3 +"""Parse Valgrind XML reports and print definite leaks.""" +import sys +import glob +import xml.etree.ElementTree as ET + +report_dir = sys.argv[1] if len(sys.argv) > 1 else "/tmp/valgrind-reports" + +for xml_file in sorted(glob.glob(f"{report_dir}/*.xml")): + print(xml_file) + try: + tree = ET.parse(xml_file) + except ET.ParseError: + continue + root = tree.getroot() + leaks = [e for e in root.findall(".//error") + if e.findtext("kind", "") in ("Leak_DefinitelyLost", "Leak_IndirectlyLost")] + if not leaks: + continue + print(f"\n{'='*60}") + print(f"File: {xml_file}") + for e in leaks: + kind = e.findtext("kind") + what = e.findtext("xwhat/text") or e.findtext("what", "") + frames = [f.findtext("fn", "?") for f in e.findall(".//frame")][:5] + print(f" [{kind}] {what}") + print(f" Stack: {' -> '.join(frames)}") + diff --git a/scripts/perf-valgrind-loopback.sh b/scripts/perf-valgrind-loopback.sh new file mode 100755 index 0000000..8395d00 --- /dev/null +++ b/scripts/perf-valgrind-loopback.sh @@ -0,0 +1,70 @@ +#!/usr/bin/env bash +# Run e2sar_perf loopback under Valgrind. Collects separate XML for sender/receiver. +# Usage: run-perf-valgrind.sh [BUILD_DIR] [REPORT_DIR] [SUPP_FILE] +set -euo pipefail +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +BUILD_DIR="$REPO_ROOT/${1:-build-valgrind}" +REPORT_DIR="${2:-/tmp/valgrind-reports}" +SUPP="${3:-$REPO_ROOT/scripts/valgrind.supp}" +PERF="$BUILD_DIR/bin/e2sar_perf" +mkdir -p "$REPORT_DIR" + +if [[ ! -x "$PERF" ]]; then + echo "ERROR: $PERF not found." >&2 + exit 1 +fi + +BASE_PORT=$(( 22000 + (RANDOM % 3000) )) +URI="ejfat://token@127.0.0.1:18020/lb/1?data=127.0.0.1:${BASE_PORT}" + +VALGRIND_COMMON=( + --tool=memcheck + --leak-check=full + --show-leak-kinds=definite,indirect + --track-origins=yes + --trace-children=yes + --fair-sched=yes + --num-callers=30 + --suppressions="$SUPP" + --error-exitcode=1 +) + +echo "--- Receiver under Valgrind (port $BASE_PORT) ---" +valgrind "${VALGRIND_COMMON[@]}" \ + --xml=yes \ + --xml-file="$REPORT_DIR/perf_recv.xml" \ + "$PERF" -r \ + --ip 127.0.0.1 \ + --port "$BASE_PORT" \ + --duration 60 \ + --timeout 2000 \ + --quiet \ + -u "$URI" \ + >"$REPORT_DIR/perf_recv.log" 2>&1 & +RECV_PID=$! + +# Valgrind startup is slower — give it more time +sleep 3 + +echo "--- Sender under Valgrind ---" +valgrind "${VALGRIND_COMMON[@]}" \ + --xml=yes \ + --xml-file="$REPORT_DIR/perf_send.xml" \ + "$PERF" -s \ + --ip 127.0.0.1 \ + -n 1000 \ + --length 1000000\ + --mtu 9000 \ + --rate 10 \ + -u "$URI" \ + >"$REPORT_DIR/perf_send.log" 2>&1 +SEND_EXIT=$? + +wait "$RECV_PID" +RECV_EXIT=$? + +echo "Sender: exit=$SEND_EXIT xml=$REPORT_DIR/perf_send.xml" +echo "Receiver: exit=$RECV_EXIT xml=$REPORT_DIR/perf_recv.xml" + +[[ $SEND_EXIT -eq 0 && $RECV_EXIT -eq 0 ]] || exit 1 + diff --git a/scripts/zero_to_hero/.gitignore b/scripts/zero_to_hero/.gitignore new file mode 100644 index 0000000..37b9168 --- /dev/null +++ b/scripts/zero_to_hero/.gitignore @@ -0,0 +1,3 @@ +INSTANCE_URI +*.log +runs/ diff --git a/scripts/zero_to_hero/CLAUDE.md b/scripts/zero_to_hero/CLAUDE.md new file mode 100644 index 0000000..e76feec --- /dev/null +++ b/scripts/zero_to_hero/CLAUDE.md @@ -0,0 +1,450 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +This is an E2SAR (Enhanced End-to-End Send And Receive) network performance testing framework designed for high-performance computing environments. The system uses containerized bash scripts to orchestrate network performance measurements between sender and receiver nodes through a load balancer. + +## Core Architecture + +The framework follows a reservation-based workflow using four main components: + +1. **Load Balancer (LB) Reservation**: `minimal_reserve.sh` creates network resource reservations +2. **Sender**: `minimal_sender.sh` sends network traffic for performance testing +3. **Receiver**: `minimal_receiver.sh` receives and measures network traffic +4. **Cleanup**: `minimal_free.sh` releases reserved network resources + +All operations use the `ibaldin/e2sar:0.3.1a3` container image via `podman-hpc`. + +## Required Workflow Sequence + +**Always follow this sequence:** + +1. **Reserve resources first** (requires admin `EJFAT_URI` in environment; no `-v` flag): + ```bash + EJFAT_URI="ejfat://token@host:port/lb/1?sync=..." ./minimal_reserve.sh + ``` + +2. **Run sender and/or receiver** (can run simultaneously): + ```bash + ./minimal_sender.sh [OPTIONS] # pass -v if SSL cert is expired + ./minimal_receiver.sh [OPTIONS] # pass -v if SSL cert is expired + ``` + +3. **Free resources when done** (pass `-v` if SSL cert is expired): + ```bash + ./minimal_free.sh [-v] + ``` + +The reservation creates an `INSTANCE_URI` file that contains the session `EJFAT_URI` needed +by sender and receiver scripts. + +**Note on `-v` (skip SSL cert validation):** Pass `-v` to `minimal_sender.sh`, +`minimal_receiver.sh`, and `minimal_free.sh` when the LB control plane SSL certificate has +expired. Do NOT pass `-v` to `minimal_reserve.sh` — reserve uses the admin token which skips +cert validation unconditionally, and passing `--novalidate` to lbadm reserve causes failures. + +## Setup and Environment Configuration + +### Directory-Independent Operation + +The scripts can be run from any directory. All artifacts (INSTANCE_URI, log files) are created in the current working directory, not the script directory. + +### One-Time Setup (Optional) + +Add the scripts to your PATH for easy access: + +```bash +# Option 1: Source directly (temporary, current shell only) +source /path/to/zero_to_hero/setup_env.sh + +# Option 2: Add to shell config (permanent) +echo 'source /path/to/zero_to_hero/setup_env.sh' >> ~/.bashrc # or ~/.zshrc +``` + +After sourcing the setup script: + +```bash +# Create your working directory +mkdir -p ~/my_tests && cd ~/my_tests + +# Scripts are now in your PATH - run from anywhere +minimal_reserve.sh +minimal_sender.sh --rate 5 +minimal_receiver.sh --duration 60 + +# All artifacts are created in the current directory +ls # Shows: INSTANCE_URI, minimal_sender.log, minimal_receiver.log, etc. +``` + +### Running Without Setup Script + +You can also invoke scripts with full paths: + +```bash +cd /tmp/my_test +EJFAT_URI="..." /path/to/zero_to_hero/minimal_reserve.sh +/path/to/zero_to_hero/minimal_sender.sh --rate 5 +# Artifacts are still created in /tmp/my_test +``` + +## Common Commands + +### Reservation Management +```bash +# Create reservation (required first step) +EJFAT_URI="ejfat://..." ./minimal_reserve.sh + +# Create reservation with custom name +EJFAT_URI="ejfat://..." ./minimal_reserve.sh --lbname my_test + +# Check reservation status +podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 lbadm --overview + +# Free reservation (cleanup) +./minimal_free.sh +``` + +### Sender Operations +```bash +# Basic sender (1 Gbps, 100 events, 1MB buffers) +# Includes automatic memory monitoring +./minimal_sender.sh + +# High-rate sender with custom parameters +./minimal_sender.sh --rate 10 --length 2097152 --num 1000 + +# IPv6 sender +./minimal_sender.sh --ipv6 --rate 5 + +# Custom MTU (e.g., jumbo frames) +./minimal_sender.sh --mtu 9000 --rate 10 + +# Disable memory monitoring (for pure performance benchmarking) +./minimal_sender.sh --no-monitor --rate 10 + +# Skip SSL certificate validation (for testing/dev environments) +./minimal_sender.sh -v --rate 5 +``` + +### Receiver Operations +```bash +# Basic receiver (indefinite duration) +./minimal_receiver.sh + +# Receiver with time limit +./minimal_receiver.sh --duration 60 + +# Custom port receiver +./minimal_receiver.sh --port 20000 --duration 120 + +# IPv6 receiver +./minimal_receiver.sh --ipv6 --duration 60 + +# High-throughput receiver (more threads and buffer) +./minimal_receiver.sh --threads 32 --deq 32 --bufsize 268435456 + +# Skip SSL certificate validation (for testing/dev environments) +./minimal_receiver.sh -v --duration 60 +``` + +## Configuration System + +### Environment Variables +- `EJFAT_URI`: The primary configuration URI containing authentication token and load balancer details +- `E2SAR_IMAGE`: Container image override (default: `ibaldin/e2sar:0.3.1a3`) +- `LB_NAME`: Load balancer reservation name (default: `e2sar_test`). Can also be set via `--lbname` option in `minimal_reserve.sh` +- `E2SAR_SCRIPTS_DIR`: Required for SLURM scripts (`perlmutter_slurm.sh`, `perlmutter_multi_slurm.sh`). Must point to the `zero_to_hero` directory. Example: `export E2SAR_SCRIPTS_DIR=/path/to/E2SAR/scripts/zero_to_hero` + +### Container Pre-Installation (Perlmutter) + +Pre-install the container image to avoid re-downloading on each compute node: + +```bash +podman-hpc pull ibaldin/e2sar:latest +``` + +See QuickStartMinimalScripts.md for details. + +### State Management +- `INSTANCE_URI`: File containing reservation details shared between scripts +- `minimal_sender.log`: Detailed sender execution log with timestamps and exit codes +- `minimal_sender_memory.log`: Automatic memory usage monitoring data (CSV format) +- `minimal_receiver.log`: Detailed receiver execution log with timestamps and exit codes + +### Network Auto-Detection +The system automatically detects appropriate IP addresses by: +1. Extracting LB hostname from `EJFAT_URI` +2. Resolving hostname to IP (IPv4/IPv6) +3. Using `ip route get` to find the local source IP for that destination +4. This ensures correct network interface selection in multi-homed systems + +## Key Performance Parameters + +### Sender Parameters +- `--rate`: Sending rate in Gbps (default: 1) +- `--length`: Event buffer size in bytes (default: 1048576 = 1MB) +- `--num`: Number of events to send (default: 100) +- `--mtu`: MTU size in bytes (default: 9000) +- `--optimize`: Optimization mode (sendmsg, sendmmsg, liburing_send). Default: sendmmsg. Affects memory usage and performance. +- `--ipv6`: Use IPv6 instead of IPv4 +- `-v`: Skip SSL certificate validation (default: disabled) +- `--no-monitor`: Disable automatic memory monitoring +- `--image`: Override container image + +### Receiver Parameters +- `--port`: Data receiving port (default: 10000) +- `--duration`: Run duration in seconds (default: 0 = indefinite) +- `--threads`: Number of receive threads (default: 16) +- `--deq`: Number of dequeue threads (default: 16) +- `--bufsize`: Socket buffer size in bytes (default: 134217728 = 128MB) +- `--ipv6`: Use IPv6 instead of IPv4 +- `-v`: Skip SSL certificate validation (default: disabled) +- `--image`: Override container image + +### Container Optimizations +Both scripts use these optimizations: +- `--network host`: Direct host networking for performance +- `MALLOC_ARENA_MAX=32`: Memory allocation tuning +- `--mtu=9000`: Jumbo frame support +- `--bufsize=134217728`: 128MB buffer size +- `--optimize=sendmmsg`: Sender-side syscall optimization + +## Monitoring and Debugging + +### Log Analysis +Check log files for performance metrics and troubleshooting: +```bash +# View sender results +tail -f minimal_sender.log + +# Check receiver performance +tail -f minimal_receiver.log + +# View memory usage summary +tail minimal_sender_memory.log + +# Analyze timestamps and exit codes +grep -E "(START_TIME|END_TIME|EXIT_CODE)" *.log +``` + +### Memory Monitoring +The sender automatically monitors memory usage and logs to `minimal_sender_memory.log`: +```bash +# View memory summary (at end of file) +tail minimal_sender_memory.log +# Shows: Peak RSS, Min RSS, Growth + +# Track memory in real-time (during test) +tail -f minimal_sender_memory.log | grep -v '^#' + +# Disable monitoring if not needed +./minimal_sender.sh --no-monitor +``` + +### Standalone Memory Monitor +For detailed memory tracking across all e2sar_perf processes: +```bash +# Start monitoring in separate terminal (1 second interval) +./monitor_memory.sh 1 + +# View collected data +tail -f memory_monitor.log + +# Analyze CSV data: TIMESTAMP, PID, RSS_KB, VSZ_KB, %MEM, %CPU, ELAPSED_TIME, COMMAND +grep -v '^#' memory_monitor.log +``` + +### Validation Commands +```bash +# Verify reservation is active +source INSTANCE_URI && podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 lbadm --overview + +# Test container connectivity +podman-hpc run --rm --network host ibaldin/e2sar:0.3.1a3 e2sar_perf --help +``` + +## Error Recovery + +### Common Issues +1. **Missing INSTANCE_URI**: Always run `minimal_reserve.sh` first +2. **Invalid reservation**: Re-run `minimal_reserve.sh` to create new reservation +3. **Network detection failure**: Check connectivity to load balancer hostname +4. **Container issues**: Verify `podman-hpc` and image availability +5. **SSL certificate expired**: The LB control plane certificate may expire. Pass `-v` to + sender/receiver and `minimal_free.sh` scripts to skip validation. Note that `-v` is NOT + accepted by `minimal_reserve.sh` — reserve uses the admin token which skips cert validation + unconditionally; passing `--novalidate` to lbadm reserve breaks this and causes failures. + If `minimal_free.sh -v` still fails (lbadm's `--novalidate` does not fully bypass gRPC-level + SSL), use the admin token method below to free reservations directly. + +### Freeing Reservations with the Admin Token + +When `minimal_free.sh` fails due to an expired SSL certificate, use the admin `EJFAT_URI` +(set in your environment) with `lbadm --free --lbid` to free reservations directly. The admin +token code path in `lbadm` skips certificate validation unconditionally. + +```bash +# View all active reservations and their LB IDs +podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 \ + lbadm --overview + +# Free a specific reservation by LB ID (replace 302 with the actual ID) +podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 \ + lbadm --free --lbid 302 + +# Free multiple orphaned reservations at once +for lbid in 301 302 303; do + podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 \ + lbadm --free --lbid $lbid +done +``` + +The LB ID for a specific job's reservation can be found in: +- The `INSTANCE_URI` file: the number after `/lb/` in the URI +- The SLURM job output (`slurm-.out`): logged during Phase 1 reservation +- `lbadm --overview` output: lists all active reservations with their IDs + +### Cleanup and Reset +```bash +# Force cleanup if minimal_free.sh fails +rm -f INSTANCE_URI +rm -f *.log + +# Start fresh +EJFAT_URI="..." ./minimal_reserve.sh +``` + +## SLURM Batch Processing (Perlmutter) + +The `perlmutter_slurm.sh` script orchestrates distributed tests on Perlmutter HPC system. +Each job creates its own fresh LB reservation on startup and frees it on completion. + +**Prerequisites:** +- `E2SAR_SCRIPTS_DIR` must be set: `export E2SAR_SCRIPTS_DIR=/path/to/E2SAR/scripts/zero_to_hero` +- `EJFAT_URI` must be set to the admin URI (not a session token from INSTANCE_URI) + +```bash +# Basic SLURM submission +sbatch -A perlmutter_slurm.sh + +# With custom test parameters +sbatch -A perlmutter_slurm.sh --rate 10 --num 5000 --length 2097152 + +# Override SLURM parameters +sbatch -A -q debug -t 00:30:00 perlmutter_slurm.sh --rate 20 --mtu 9000 + +# Skip SSL certificate validation for sender/receiver/free (not needed for reserve) +sbatch -A perlmutter_slurm.sh -v --rate 10 +``` + +**Important:** `EJFAT_URI` must be the admin URI already set in your shell environment. +Do NOT source an `INSTANCE_URI` file before submitting — that would overwrite the admin +`EJFAT_URI` with a session token, which cannot create new reservations. + +**Key features:** +- Designed specifically for Perlmutter at NERSC +- Uses exactly 2 nodes (Node 0: receiver, Node 1: sender) +- Creates a fresh LB reservation per job in an isolated working directory: `runs/slurm_job_/` +- Handles reservation, execution, and cleanup automatically +- Collects all logs in the job-specific directory + +### Multi-Instance SLURM Testing (Perlmutter) + +The `perlmutter_multi_slurm.sh` script enables testing with multiple concurrent senders and +receivers. Senders and receivers share the same node pool and can be co-located on the same +nodes. Like `perlmutter_slurm.sh`, each job creates its own fresh LB reservation. + +**Prerequisites:** Same as `perlmutter_slurm.sh` - requires `E2SAR_SCRIPTS_DIR` and admin `EJFAT_URI`. + +```bash +# 4 receivers + 4 senders co-located on 2 nodes (2 of each per node) +sbatch -N 2 -A perlmutter_multi_slurm.sh \ + --receivers 4 --receivers-per-node 2 \ + --senders 4 --senders-per-node 2 \ + --rate 1 --num 10000 + +# Single node running all instances +sbatch -N 1 -A perlmutter_multi_slurm.sh \ + --receivers 2 --receivers-per-node 2 \ + --senders 2 --senders-per-node 2 \ + --rate 1 --num 1000 + +# 2 receivers + 2 senders, one per node (co-located: 1 sender + 1 receiver per node) +sbatch -N 2 -A perlmutter_multi_slurm.sh \ + --receivers 2 --senders 2 --rate 1 --num 10000 + +# Skip SSL certificate validation +sbatch -N 2 -A perlmutter_multi_slurm.sh \ + --receivers 2 --senders 2 --rate 10 --num 1000 -v +``` + +**Multi-instance options:** +- `--receivers N`: Total number of receiver instances (default: 1) +- `--senders M`: Total number of sender instances (default: 1) +- `--receivers-per-node K`: Receiver instances per node (default: 1) +- `--senders-per-node K`: Sender instances per node (default: 1) +- `--threads N`: Receive threads per receiver instance; also sets port stride (default: 16) +- `--base-port PORT`: Starting port for receiver 0 (default: 10000) +- `--receiver-delay SEC`: Wait time after starting receivers (default: 10) + +**Node allocation formula:** +``` +Receiver nodes = ceil(receivers / receivers-per-node) +Sender nodes = ceil(senders / senders-per-node) +Total nodes = max(Receiver nodes, Sender nodes) +``` + +**Port assignment:** + +Each receiver uses `--threads` consecutive ports. Receiver `i` gets ports `base_port + i * threads` through `base_port + i * threads + threads - 1`. With defaults (`--base-port 10000 --threads 16`): +- Receiver 0: 10000–10015 +- Receiver 1: 10016–10031 +- Receiver 2: 10032–10047 + +**Key features:** +- Senders and receivers share the same node pool (co-location supported) +- Each instance runs in an isolated subdirectory with its own logs +- All senders and receivers start in parallel +- Script waits for all senders to complete before shutting down receivers +- Graceful receiver shutdown with SIGTERM then SIGKILL +- Comprehensive summary report with all exit codes + +**Log structure:** +``` +runs/slurm_job_/ +├── receiver_0/ +│ ├── minimal_receiver.log +│ └── receiver_srun.log +├── receiver_1/ +│ ├── minimal_receiver.log +│ └── receiver_srun.log +├── sender_0/ +│ ├── minimal_sender.log +│ ├── minimal_sender_memory.log +│ └── sender_srun.log +├── sender_1/ +│ ├── minimal_sender.log +│ ├── minimal_sender_memory.log +│ └── sender_srun.log +└── INSTANCE_URI +``` + +## Additional Scripts + +### monitor_memory.sh +Standalone memory monitoring for all e2sar_perf processes: +- Logs to `memory_monitor.log` in CSV format +- Samples at configurable interval (default: 1 second) +- Tracks: RSS, VSZ, %MEM, %CPU, elapsed time +- Use in parallel with tests for detailed memory analysis + +## Development Notes + +- All scripts use `set -euo pipefail` for strict error handling +- IP detection logic is shared between sender and receiver scripts +- Container commands are built as arrays to handle complex parameter passing +- Signal trapping ensures proper log completion even on interruption +- The framework supports both IPv4 and IPv6 operation modes +- Memory monitoring is automatic in sender (disable with `--no-monitor`) \ No newline at end of file diff --git a/scripts/zero_to_hero/README.md b/scripts/zero_to_hero/README.md new file mode 100644 index 0000000..0f98957 --- /dev/null +++ b/scripts/zero_to_hero/README.md @@ -0,0 +1,42 @@ +# E2SAR Zero to Hero + +This directory contains minimal wrapper scripts for running E2SAR network performance tests on HPC systems, specifically designed for use with Perlmutter and the ESnet load balancer. + +## Getting Started + +Please see the documentation in the `docs/` directory: + +- **[docs/ZeroToHeroStart.md](docs/ZeroToHeroStart.md)** - First tutorial: Simple loopback testing on your laptop +- **[docs/QuickStartMinimalScripts.md](docs/QuickStartMinimalScripts.md)** - Second tutorial: Running traffic on Perlmutter +- **[docs/RunningMinimalScripts.md](docs/RunningMinimalScripts.md)** - Comprehensive guide with detailed examples +- **[docs/RunningSlurmOnPerlmutter.md](docs/RunningSlurmOnPerlmutter.md)** - Third tutorial: Running SLURM batch jobs on Perlmutter (single and multi-instance) + +## Setup (Optional) + +Add scripts to your PATH for easy access from any directory: + +```bash +# Temporary (current shell only) +source /path/to/zero_to_hero/setup_env.sh + +# Permanent (add to ~/.bashrc or ~/.zshrc) +echo 'source /path/to/zero_to_hero/setup_env.sh' >> ~/.bashrc +``` + +After setup, run scripts from any directory - artifacts are created in your current directory. + +## Quick Start + +```bash +# 1. Reserve load balancer +EJFAT_URI="ejfat://token@host:port/lb/1?sync=..." ./minimal_reserve.sh + +# 2. Run sender and receiver +./minimal_sender.sh --rate 5 --num 1000 +./minimal_receiver.sh --duration 60 + +# 3. Cleanup +./minimal_free.sh +``` + +**Note:** All commands work from any directory if you've run `setup_env.sh`, or use full paths otherwise. diff --git a/scripts/zero_to_hero/docs/QuickStartMinimalScripts.md b/scripts/zero_to_hero/docs/QuickStartMinimalScripts.md new file mode 100644 index 0000000..c3f76d5 --- /dev/null +++ b/scripts/zero_to_hero/docs/QuickStartMinimalScripts.md @@ -0,0 +1,286 @@ +# E2SAR Zero to Hero - Network Performance Testing Framework + +This repository contains a containerized E2SAR (Enhanced End-to-End Send And Receive) framework for high-performance network testing in HPC environments. + +## Optional: Setup for Easy Access + +Add scripts to your PATH to run them from any directory: + +```bash +# Temporary (current shell) +source /path/to/zero_to_hero/setup_env.sh + +# Permanent (add to ~/.bashrc or ~/.zshrc) +echo 'source /path/to/zero_to_hero/setup_env.sh' >> ~/.bashrc +``` + +After setup, create working directories and run scripts from anywhere - all artifacts are created in your current directory. + +```bash +# Example: Create a custom directory for your test runs +mkdir -p ~/my_runs_dir +cd ~/my_runs_dir + +# Run scripts - all logs and INSTANCE_URI will be created here +minimal_reserve.sh +minimal_sender.sh --rate 5 +minimal_receiver.sh --duration 60 + +# For runs on login nodes: all artifacts are in your current directory +ls # Shows: INSTANCE_URI, minimal_sender.log, minimal_receiver.log, minimal_sender_memory.log + +# For SLURM jobs: artifacts are organized in runs/slurm_job_/ +# - Single instance: runs/slurm_job_12345/{INSTANCE_URI, minimal_sender.log, minimal_receiver.log, ...} +# - Multi-instance: runs/slurm_job_12345/{INSTANCE_URI, sender_0/, sender_1/, receiver_0/, receiver_1/, ...} +``` + +## Optional: Pre-Installing Containers on Perlmutter + +On Perlmutter, you can pre-install container images to avoid re-downloading them on each compute node. This significantly reduces job startup time and network overhead. + +### Pull and cache the container image + +```bash +# On a login node or in an interactive session +podman-hpc pull ibaldin/e2sar:latest +``` + +This downloads the image once and stores it in your `$HOME/.local/share/containers` directory, which is accessible from all compute nodes via the shared filesystem. + +### Verify the image is cached + +```bash +podman-hpc images +``` + +You should see `ibaldin/e2sar` with tag `latest` in the list. + +### Benefits + +- **Faster job startup**: No download time when jobs start on compute nodes +- **Reduced network traffic**: Image downloaded once instead of per-node +- **Consistent image version**: All jobs use the same cached image + +### Using custom or updated images + +If you need to use a different image version: + +```bash +# Pull the new image +podman-hpc pull ibaldin/e2sar:0.3.2 + +# Override in scripts with --image flag +./minimal_sender.sh --image ibaldin/e2sar:0.3.2 --rate 5 + +# Or set environment variable +export E2SAR_IMAGE=ibaldin/e2sar:0.3.2 +./minimal_sender.sh --rate 5 +``` + +## Quick Start + +### 1. Create Reservation +```bash +EJFAT_URI="ejfat://token@host:port/lb/1?sync=..." ./minimal_reserve.sh +``` + +### 2. Run Sender +```bash +./minimal_sender.sh --rate 5 --num 1000 +``` + +### 3. Run Receiver (in another terminal/node) +```bash +./minimal_receiver.sh --duration 60 +``` + +### 4. Clean Up +```bash +./minimal_free.sh +``` + +## Documentation + +- **[CLAUDE.md](../CLAUDE.md)** - Project overview and core architecture +- **[RunningMinimalScripts.md](RunningMinimalScripts.md)** - Detailed usage guide + +## Scripts + +### Reservation Management +| Script | Purpose | +|--------|---------| +| `minimal_reserve.sh` | Create LB reservation (required first step) | +| `minimal_free.sh` | Release reservation and cleanup | + +### Network Testing +| Script | Purpose | +|--------|---------| +| `minimal_sender.sh` | Sender with flexible options and memory monitoring | +| `minimal_receiver.sh` | Receiver with configurable duration | +| `perlmutter_slurm.sh` | SLURM batch job template (single sender/receiver) | +| `perlmutter_multi_slurm.sh` | SLURM batch job for multiple concurrent senders/receivers | + +### Monitoring +| Script | Purpose | +|--------|---------| +| `monitor_memory.sh` | Real-time memory usage monitoring | + +## Memory-Efficient Usage + +### Why Memory Matters + +E2SAR's send mode can accumulate memory depending on configuration. The analysis identified three key factors: + +1. **Optimization mode** - `liburing_send` defers cleanup, `sendmmsg` is synchronous +2. **Buffer allocation** - `--realmalloc` allocates before queue checks +3. **Rate limiting** - Applied after queueing, not before + +### Recommended Configurations + +**Lowest memory (safest):** +```bash +./minimal_sender.sh --rate 5 --optimize sendmmsg +# Uses: sendmmsg optimization (synchronous, lower memory), reusable buffers +``` + +**Balanced performance:** +```bash +./minimal_sender.sh --rate 10 --optimize sendmmsg +``` + +**Avoid these combinations:** +- `--optimize liburing_send` + large events (high accumulation) +- `--realmalloc` + high rate (pre-allocation issues) +- `--smooth` mode (increases thread pool pressure) + +## Monitoring Memory Usage + +### Start monitoring before running tests +```bash +# Terminal 1: Monitor memory +./monitor_memory.sh 1 + +# Terminal 2: Run sender +./minimal_sender.sh --rate 5 --num 10000 +``` + +## Typical Workflow + +### Local Testing +```bash +# 1. Reserve resources +EJFAT_URI="ejfat://..." ./minimal_reserve.sh + +# 2. Monitor memory (optional) +./monitor_memory.sh 1 & + +# 3. Run sender +./minimal_sender.sh --rate 5 --num 1000 + +# 4. Check logs +tail -f minimal_sender.log + +# 5. Clean up +./minimal_free.sh +``` + +### HPC Cluster (SLURM) +```bash +# Submit batch job +sbatch perlmutter_slurm.sh + +# Monitor job +squeue -u $USER +tail -f slurm-*.out +``` + +## Troubleshooting + +### Reservation Issues +**Problem:** Missing INSTANCE_URI file +- **Solution:** Run `minimal_reserve.sh` first + +**Problem:** Connection refused or timeout +- **Solution:** Verify EJFAT_URI is valid and LB is accessible + +### Network Detection Issues +**Problem:** Cannot determine source IP +- **Solution:** Check connectivity to LB hostname: `ping ` + +### Exit Code Issues +**Problem:** Exit code 141 (SIGPIPE) in sender or receiver logs +- **Note:** This was a known issue caused by the `tee` pipeline receiving SIGPIPE when the container process exited before all output was flushed. It has been fixed in the current scripts using `|| true` guards and `PIPESTATUS[0]` to capture the container's actual exit code. If you see exit code 141 in older script versions, update to the latest scripts. + +### SSL Certificate Validation Issues +**Problem:** SSL certificate validation errors when connecting to the load balancer + +**Solution:** Use the `-v` flag to skip SSL certificate validation: + +```bash +# Skip validation for sender and receiver +./minimal_sender.sh -v --rate 5 +./minimal_receiver.sh -v --duration 60 + +# Skip validation when freeing reservation +./minimal_free.sh -v +``` + +**Important notes:** +- **Do NOT** pass `-v` to `minimal_reserve.sh` — the reserve operation uses the admin token which skips certificate validation unconditionally +- The `-v` flag is particularly useful when the LB control plane SSL certificate has expired +- For SLURM jobs, pass `-v` to the SLURM script and it will propagate to sender/receiver/free operations: + ```bash + sbatch -A perlmutter_slurm.sh -v --rate 10 + ``` + +**If `minimal_free.sh -v` still fails:** Use the admin `EJFAT_URI` to free reservations directly: + +```bash +# View active reservations +podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 lbadm --overview + +# Free specific reservation by LB ID (replace 302 with actual ID) +podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 lbadm --free --lbid 302 +``` + +## Performance Parameters + +### Sender Key Options +- `--rate GBPS` - Target sending rate (default: 1) +- `--num EVENTS` - Number of events to send (default: 100) +- `--length BYTES` - Event buffer size (default: 1048576 = 1MB) +- `--optimize MODE` - sendmsg (default), sendmmsg (recommended), liburing_send (high risk) +- `--realmalloc` - Allocate buffers per event (increases memory, avoid if possible) + +### Receiver Key Options +- `--port PORT` - Data receiving port (default: 10000) +- `--duration SECONDS` - Run time in seconds (default: 0 = indefinite) + +## Log Files + +| File | Contains | +|------|----------| +| `minimal_sender.log` | Sender execution log with timestamps | +| `minimal_sender_memory.log` | Automatic memory usage monitoring (CSV format) | +| `minimal_receiver.log` | Receiver execution log with timestamps | +| `memory_monitor.log` | Real-time memory usage data from monitor_memory.sh | + +## Container Configuration + +**Image:** `ibaldin/e2sar:0.3.1a3` (configurable via `E2SAR_IMAGE`) + +**Optimizations:** +- `--network host` - Direct host networking +- `MALLOC_ARENA_MAX=32` - Memory allocation tuning +- `--mtu=9000` - Jumbo frame support +- `--bufsize=134217728` - 128MB buffer size + +## Additional Resources + +- [E2SAR GitHub Repository](https://github.com/JeffersonLab/E2SAR) +- [E2SAR Documentation](https://jeffersonlab.github.io/E2SAR/) +- EJFAT Project Documentation (contact your site administrator) + +## License + +This framework follows E2SAR's licensing. See the [E2SAR repository](https://github.com/JeffersonLab/E2SAR) for details. diff --git a/scripts/zero_to_hero/docs/RunningMinimalScripts.md b/scripts/zero_to_hero/docs/RunningMinimalScripts.md new file mode 100644 index 0000000..1c4f838 --- /dev/null +++ b/scripts/zero_to_hero/docs/RunningMinimalScripts.md @@ -0,0 +1,786 @@ +# Running the Minimal Scripts: From Zero to Hero + +## Introduction + +Welcome to the E2SAR Minimal Scripts tutorial! This guide will walk you through using the streamlined shell scripts that make network performance testing with E2SAR straightforward and accessible. + +**CAUTION:** Perlmutter login nodes are only for lightweight workflow setup and orientation exercises. When testing these scripts on a login node, keep the data rate below 5Gbps, and do not send more than 60 seconds worth of traffic. + +The minimal scripts are a set of bash wrappers around the E2SAR containerized tools. They simplify the process of: + +- Creating load balancer reservations +- Sending network traffic for performance measurements +- Receiving and analyzing network data +- Running distributed tests on HPC systems like Perlmutter + +By the end of this guide, you'll be able to run your own network performance tests and understand how to tune parameters for different scenarios. + +### What You'll Need + +Before starting, make sure you have: + +1. **An EJFAT_URI**: This is your authentication token and load balancer connection string (format: `ejfat://token@hostname:port/lb/1?sync=...`) +2. **podman-hpc**: The containerized environment manager +3. **Network access**: Connectivity to the EJFAT load balancer +4. **The minimal scripts**: All scripts should be in the same directory + +### Optional: Setup for Easy Access + +You can add the scripts to your PATH for convenient access from any directory: + +```bash +# Option 1: Temporary setup (current shell only) +source /path/to/zero_to_hero/setup_env.sh + +# Option 2: Permanent setup (add to ~/.bashrc or ~/.zshrc) +echo 'source /path/to/zero_to_hero/setup_env.sh' >> ~/.bashrc +# Then start a new shell or: source ~/.bashrc +``` + +**After setup:** +- Run scripts from any directory without full paths +- All artifacts (INSTANCE_URI, logs) are created in your current directory +- Example: + ```bash + mkdir -p ~/e2sar_tests && cd ~/e2sar_tests + minimal_reserve.sh # Works! Artifacts created here + ``` + +**Without setup:** +- Use full paths: `/path/to/minimal_sender.sh` +- Or run from the script directory: `cd /path/to/zero_to_hero && ./minimal_sender.sh` +- Artifacts are still created in your current working directory + +### Quick Overview of the Workflow + +The E2SAR testing workflow follows a simple four-step pattern: + +``` +Reserve → Run Receiver/Sender → Analyze Results → Free Resources +``` + +Each step has a dedicated script: + +| Script | Purpose | +|--------|---------| +| `minimal_reserve.sh` | Create load balancer reservations | +| `minimal_receiver.sh` | Receive and measure network traffic | +| `minimal_sender.sh` | Send network traffic for testing | +| `minimal_free.sh` | Release reserved resources | +| `perlmutter_slurm.sh` | SLURM batch script for HPC environments | + +--- + +## Part 1: Hello World - Your First Test + +Let's start with the simplest possible test: sending 100 events at 1 Gbps. + +### Step 1: Create a Reservation + +First, you need to reserve resources on the load balancer. This creates a session that both senders and receivers will use. + +```bash +EJFAT_URI="ejfat://your_token@lb.hostname:19522/lb/1?sync=..." ./minimal_reserve.sh +``` + +**What happens:** +- The script validates your EJFAT_URI +- Checks if a valid reservation already exists +- If needed, creates a new reservation and saves it to `INSTANCE_URI` + +**Expected output:** +``` +Checking for existing reservation... +Creating new reservation... +Reservation created and saved to INSTANCE_URI +EJFAT_URI=ejfat://... +``` + +**Important:** The `INSTANCE_URI` file is created in your current directory. This file is required by the sender and receiver scripts, so keep it safe! + +### Step 2: Start the Receiver + +In a terminal window, start the receiver. This will wait indefinitely for data: + +```bash +./minimal_receiver.sh +``` + +**Expected output:** +``` +Loading EJFAT_URI from INSTANCE_URI... +Starting E2SAR receiver... +Auto-detecting receiver IP... +LB Host: lb.hostname +LB IP: 192.168.1.100 +Receiver IP: 10.0.0.50 +Data Port: 10000 +Receive Threads: 16 +Dequeue Threads: 16 +Buffer Size: 134217728 + +Running: podman-hpc run --rm --network host ... +START_TIME (UTC): 2026-02-17 14:30:00 + +Waiting for data... +``` + +The receiver is now ready to accept traffic. Leave this terminal open and running. + +### Step 3: Run the Sender + +In a new terminal (same directory), start the sender: + +```bash +./minimal_sender.sh +``` + +**Expected output:** +``` +Loading EJFAT_URI from INSTANCE_URI... +Starting E2SAR sender... +Auto-detecting sender IP... +Sender IP: 10.0.0.51 +Rate: 1 Gbps +Event Length: 1048576 bytes +Number of Events: 100 + +Running: podman-hpc run --rm --network host ... +START_TIME (UTC): 2026-02-17 14:31:00 + +Sending 100 events at 1 Gbps... +Progress: 100/100 events sent +Performance: 1.02 Gbps, 0.95 Gbps payload + +END_TIME (UTC): 2026-02-17 14:31:15 +EXIT_CODE: 0 +``` + +The sender will complete and exit. Check back in the receiver terminal - you should see it received the events! + +### Step 4: Stop the Receiver + +Go back to the receiver terminal and press `Ctrl+C` to stop it. You'll see final statistics: + +``` +Received 100 events +Total data: 104857600 bytes (100 MB) +Average rate: 1.01 Gbps +Packet loss: 0% + +END_TIME (UTC): 2026-02-17 14:31:20 +EXIT_CODE: 0 +``` + +### Step 5: Free the Reservation + +When you're done testing, release the load balancer resources: + +```bash +./minimal_free.sh +``` + +**Expected output:** +``` +Found INSTANCE_URI +Freeing load balancer reservation... +Reservation freed successfully +Removed INSTANCE_URI +``` + +**Congratulations!** You've just completed your first E2SAR network performance test. + +--- + +## Part 2: Exploring the Scripts + +Now that you've run a basic test, let's dive deeper into what each script can do. + +### 2.1 minimal_reserve.sh - Reservation Management + +The reservation script is your entry point. It creates a session on the load balancer that coordinates traffic between senders and receivers. + +**Basic usage:** +```bash +EJFAT_URI="ejfat://..." ./minimal_reserve.sh +``` + +**Smart behavior:** +- If `INSTANCE_URI` already exists and is valid, the script does nothing (idempotent) +- If the reservation expired, it automatically creates a new one +- The reservation details are saved to `INSTANCE_URI` for other scripts to use + +**Checking reservation status:** + +You can manually verify your reservation is active: + +```bash +source INSTANCE_URI +podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 lbadm --overview +``` + +This shows detailed load balancer status including registered receivers. + +### 2.2 minimal_sender.sh - Sending Traffic + +The sender script transmits network events through the load balancer to registered receivers. + +**Command-line options:** + +| Option | Description | Default | +|--------|-------------|---------| +| `--rate RATE` | Sending rate in Gbps | 1 | +| `--length LENGTH` | Event buffer size in bytes | 1048576 (1 MB) | +| `--num COUNT` | Number of events to send | 100 | +| `--mtu MTU` | MTU size in bytes | 9000 | +| `--ipv6` | Use IPv6 instead of IPv4 | false | +| `-v` | Skip SSL certificate validation | disabled | +| `--no-monitor` | Disable memory monitoring | false (monitoring enabled) | +| `--image IMAGE` | Container image override | ibaldin/e2sar:0.3.1a3 | +| `--help` | Show help message | - | + +**Example: High-rate test with larger buffers** + +```bash +./minimal_sender.sh --rate 10 --length 2097152 --num 1000 +``` + +This sends 1000 events at 10 Gbps with 2 MB buffers (total: ~2 GB of data). + +**Example: Custom MTU for jumbo frames** + +```bash +./minimal_sender.sh --rate 5 --mtu 9000 +``` + +Uses 9000-byte MTU (jumbo frames) for improved efficiency on networks that support it. + +**Example: Skip SSL certificate validation (testing/dev)** + +```bash +./minimal_sender.sh -v --rate 5 +``` + +Skips SSL certificate validation - useful for testing environments with self-signed certificates. + +**What the sender does:** + +1. Loads `EJFAT_URI` from the `INSTANCE_URI` file +2. Auto-detects the appropriate sender IP address by: + - Extracting the load balancer hostname from `EJFAT_URI` + - Resolving it to an IP address + - Using `ip route get` to find the correct source IP +3. Runs the containerized `e2sar_perf --send` command +4. Logs all output to `minimal_sender.log` with timestamps + +**Performance tuning:** + +The sender uses several optimizations: +- `--optimize=sendmmsg`: Batched send syscalls for efficiency +- `--mtu=9000`: Jumbo frames (configurable via `--mtu` option) +- `--bufsize=134217728`: 128 MB socket buffer +- `MALLOC_ARENA_MAX=32`: Memory allocator tuning + +### 2.3 minimal_receiver.sh - Receiving Traffic + +The receiver script waits for and processes incoming network events. + +**Command-line options:** + +| Option | Description | Default | +|--------|-------------|---------| +| `--port PORT` | Data receiving port | 10000 | +| `--duration SEC` | Run duration in seconds (0 = indefinite) | 0 | +| `--threads NUM` | Number of receive threads | 16 | +| `--deq NUM` | Number of dequeue threads | 16 | +| `--bufsize SIZE` | Socket buffer size in bytes | 134217728 | +| `--ipv6` | Use IPv6 instead of IPv4 | false | +| `-v` | Skip SSL certificate validation | disabled | +| `--image IMAGE` | Container image override | ibaldin/e2sar:0.3.1a3 | +| `--help` | Show help message | - | + +**Example: Time-limited receiver** + +```bash +./minimal_receiver.sh --duration 60 +``` + +The receiver will automatically stop after 60 seconds. + +**Example: Custom port** + +```bash +./minimal_receiver.sh --port 20000 +``` + +This uses a custom data port for receiving. + +**Example: High-throughput configuration** + +```bash +./minimal_receiver.sh --threads 32 --deq 32 --bufsize 268435456 +``` + +Doubles the thread count and socket buffer size for maximum throughput. + +**Example: Skip SSL certificate validation (testing/dev)** + +```bash +./minimal_receiver.sh -v --duration 60 +``` + +Skips SSL certificate validation - useful for testing environments with self-signed certificates. + +**Understanding receiver behavior:** + +- With `--duration 0` (default), the receiver runs indefinitely until you press Ctrl+C +- The receiver registers itself with the load balancer automatically +- It uses a 3000ms timeout - if no packets arrive for 3 seconds, it reports this but continues waiting +- All output is logged to `minimal_receiver.log` with timestamps + +**IP address auto-detection:** + +Like the sender, the receiver automatically detects the correct IP address: +1. Extracts LB hostname from `EJFAT_URI` +2. Resolves to an IP address +3. Uses `ip route get` to find the source IP for that route + +This ensures correct operation on multi-homed systems. + +### 2.4 minimal_free.sh - Cleanup + +The cleanup script releases your load balancer reservation. + +**Basic usage:** +```bash +./minimal_free.sh +``` + +**What it does:** +1. Reads `EJFAT_URI` from the `INSTANCE_URI` file +2. Calls `lbadm --free` to release the reservation +3. Removes the `INSTANCE_URI` file + +**When to use it:** +- Always run this when you're done testing +- If a reservation expires, just create a new one with `minimal_reserve.sh` +- If `minimal_free.sh` fails (network issue), you can manually delete `INSTANCE_URI` and create a fresh reservation + +--- + +## Part 3: Advanced Scenarios + +### 3.1 Multiple Receivers + +You can run multiple receivers simultaneously to test load distribution. + +Each receiver uses `--threads` consecutive ports (default: 16). When running multiple receivers, their base ports must be separated by at least the thread count to avoid conflicts. + +**Terminal 1 - Receiver A (ports 10000–10015):** +```bash +./minimal_receiver.sh --port 10000 --duration 120 +``` + +**Terminal 2 - Receiver B (ports 10016–10031):** +```bash +./minimal_receiver.sh --port 10016 --duration 120 +``` + +**Terminal 3 - Sender:** +```bash +./minimal_sender.sh --rate 5 --num 5000 +``` + +The load balancer will distribute events across both receivers. Check the logs to see the distribution! + +If you change `--threads`, adjust the port spacing accordingly. For example, with `--threads 8`, use ports 10000 and 10008. + +### 3.2 Sequential Testing + +For repeated tests, the reservation script makes things easy: + +```bash +# Create reservation once +EJFAT_URI="ejfat://..." ./minimal_reserve.sh + +# Run test 1 +./minimal_receiver.sh --duration 60 & +sleep 5 +./minimal_sender.sh --rate 1 --num 100 +wait + +# Run test 2 (same reservation!) +./minimal_receiver.sh --duration 60 & +sleep 5 +./minimal_sender.sh --rate 10 --num 1000 +wait + +# Cleanup +./minimal_free.sh +``` + +The reservation persists across multiple sender/receiver invocations. + +### 3.3 Log Analysis + +All scripts generate detailed logs: + +**Sender log analysis:** +```bash +# View performance summary +tail -20 minimal_sender.log + +# Extract timing information +grep -E "(START_TIME|END_TIME|EXIT_CODE)" minimal_sender.log + +# Check for errors +grep -i error minimal_sender.log +``` + +**Receiver log analysis:** +```bash +# View final statistics +tail -20 minimal_receiver.log + +# Check data reception rate +grep -i "rate\|throughput" minimal_receiver.log +``` + +### 3.4 Environment Variable Configuration + +You can override the container image for all scripts: + +```bash +export E2SAR_IMAGE="ibaldin/e2sar:0.4.0" + +./minimal_reserve.sh +./minimal_sender.sh --rate 5 +``` + +Or per-command: +```bash +./minimal_sender.sh --image "ibaldin/e2sar:0.4.0" --rate 5 +``` + +--- + + +## Appendix: Tips and Troubleshooting + +### Common Issues and Solutions + +#### Problem: "ERROR: EJFAT_URI is required" + +**Cause:** The `EJFAT_URI` environment variable is not set. + +**Solution:** +```bash +EJFAT_URI="ejfat://your_token@hostname:port/lb/1?sync=..." ./minimal_reserve.sh +``` + +Make sure to include the full URI string in quotes. + +--- + +#### Problem: "ERROR: INSTANCE_URI not found" + +**Cause:** You're trying to run sender/receiver without creating a reservation first. + +**Solution:** +```bash +# Create reservation first +EJFAT_URI="ejfat://..." ./minimal_reserve.sh + +# Then run sender/receiver +./minimal_sender.sh +``` + +--- + +#### Problem: "ERROR: Failed to detect sender/receiver IP" + +**Cause:** Network routing issue or load balancer hostname not resolvable. + +**Solution:** +1. Verify LB hostname resolves: + ```bash + getent ahosts lb.hostname.net + ``` + +2. Check network connectivity: + ```bash + ping lb.hostname.net + ``` + +3. Verify routing: + ```bash + ip route get + ``` + +--- + +#### Problem: Receiver shows "Timeout waiting for packets" + +**Cause:** No traffic arriving, or sender/receiver not using the same reservation. + +**Solution:** +1. Verify both use the same `INSTANCE_URI` file +2. Check receiver registered with LB: + ```bash + source INSTANCE_URI + podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 lbadm --overview + ``` +3. Ensure sender ran successfully (check `minimal_sender.log`) + +--- + +#### Problem: "ERROR: Existing reservation is invalid" + +**Cause:** Reservation expired or was freed on the load balancer. + +**Solution:** This is automatically handled - the script creates a new reservation. Just verify the new `INSTANCE_URI` is created. + +--- + +#### Problem: Exit code 141 (SIGPIPE) in sender or receiver logs + +**Cause:** This was a known issue in earlier versions of the scripts. The `tee` pipeline could receive a SIGPIPE signal when the container process exited before all output was flushed, causing the script to report exit code 141 instead of the container's actual exit code. + +**Status:** Fixed. The current scripts use `|| true` guards on pipe commands and capture the container's real exit code via `PIPESTATUS[0]`. If you encounter exit code 141, update to the latest version of the scripts. + +--- + +#### Problem: SLURM job fails with reservation error + +**Cause:** Load balancer reservation failed after job started, wasting queue time. + +**Solution:** Pre-create the reservation on the login node to avoid waiting for job allocation: +```bash +# On login node +EJFAT_URI="ejfat://..." ./minimal_reserve.sh + +# Then submit job (EJFAT_URI still needed for environment) +EJFAT_URI="ejfat://..." sbatch -A perlmutter_slurm.sh +``` + +--- + +### Memory Monitoring + +The `minimal_sender.sh` script includes automatic memory monitoring to help track resource usage during tests. + +#### How It Works + +When the sender runs, it automatically: +1. Starts a background memory monitor process +2. Samples memory usage every second +3. Logs data to `minimal_sender_memory.log` +4. Generates a summary when the test completes + +#### Memory Log Location + +- **Local runs:** `minimal_sender_memory.log` in the current directory +- **SLURM runs:** `runs/slurm_job_${SLURM_JOB_ID}/minimal_sender_memory.log` + +#### Viewing Memory Results + +After a test completes, check the memory log: + +```bash +# View the summary (at the end of the file) +tail minimal_sender_memory.log + +# Example output: +# Memory Summary: +# Peak RSS: 245 MB (251392 KB) +# Min RSS: 89 MB (91136 KB) +# Growth: 156 MB +``` + +#### Analyzing Memory Data + +The log file is CSV format with these columns: +- `TIMESTAMP` - ISO8601 timestamp +- `PID` - Process ID +- `RSS_KB` - Resident Set Size in KB +- `VSZ_KB` - Virtual memory Size in KB +- `%MEM` - Memory percentage +- `%CPU` - CPU percentage +- `ELAPSED_TIME` - Process elapsed time +- `COMMAND` - Full command line + +You can analyze it with standard tools: + +```bash +# Plot memory over time (requires gnuplot) +grep -v '^#' minimal_sender_memory.log | \ + awk -F', ' '{print NR, $3/1024}' | \ + gnuplot -e "set terminal dumb; plot '-' with lines title 'RSS (MB)'" + +# Find peak memory usage +grep -v '^#' minimal_sender_memory.log | \ + awk -F', ' '{print $3}' | \ + sort -n | \ + tail -1 | \ + awk '{print $1/1024 " MB"}' +``` + +#### Disabling Memory Monitoring + +If you don't need memory tracking (e.g., for benchmarking), disable it: + +```bash +./minimal_sender.sh --no-monitor --rate 10 --num 1000 +``` + +#### Why Monitor Memory? + +Memory monitoring helps you: +- Detect memory leaks or accumulation issues +- Optimize buffer sizes and event counts +- Compare different optimization modes (sendmsg vs sendmmsg vs liburing) +- Validate memory usage stays within system limits + +--- + +### Performance Tuning Tips + +#### Maximizing Throughput + +For high-rate tests (>10 Gbps): + +**Sender:** +```bash +./minimal_sender.sh --rate 25 --length 8388608 --num 10000 +``` + +**Receiver:** +```bash +./minimal_receiver.sh --threads 32 --deq 32 --bufsize 268435456 +``` + +**Key parameters:** +- Larger `--length` (event size) reduces packet overhead +- More `--threads` and `--deq` increase parallelism +- Larger `--bufsize` prevents socket buffer overflow + +#### Testing Packet Loss + +To test under stress conditions: + +```bash +# Receiver with limited resources +./minimal_receiver.sh --threads 4 --deq 2 --bufsize 33554432 & + +# High-rate sender +./minimal_sender.sh --rate 40 --num 100000 --length 1048576 +``` + +Check receiver logs for dropped packets. + +--- + +### Log File Reference + +#### minimal_sender.log Format + +``` +START_TIME (UTC): 2026-02-17 14:30:00 + +[Container initialization output] +Sending 100 events at 1 Gbps... +Progress: 100/100 events sent +Performance: 1.02 Gbps, 0.95 Gbps payload + +END_TIME (UTC): 2026-02-17 14:30:15 +EXIT_CODE: 0 +``` + +**Key metrics:** +- **Performance**: Raw rate (including headers) vs payload rate +- **EXIT_CODE**: 0 = success, non-zero = error + +#### minimal_receiver.log Format + +``` +START_TIME (UTC): 2026-02-17 14:30:00 + +[Container initialization output] +Waiting for data... +Received 100 events +Total data: 104857600 bytes (100 MB) +Average rate: 1.01 Gbps +Packet loss: 0% + +END_TIME (UTC): 2026-02-17 14:30:20 +EXIT_CODE: 0 +``` + +**Key metrics:** +- **Total data**: Bytes received (payload) +- **Average rate**: Sustained receive rate +- **Packet loss**: Percentage of events dropped + +--- + +### Quick Reference Commands + +**Complete test sequence:** +```bash +# Setup +EJFAT_URI="ejfat://..." ./minimal_reserve.sh + +# Test +./minimal_receiver.sh --duration 60 & +sleep 5 +./minimal_sender.sh --rate 5 --num 1000 + +# Cleanup +./minimal_free.sh +``` + +**Check reservation status:** +```bash +source INSTANCE_URI +podman-hpc run -e EJFAT_URI="$EJFAT_URI" --rm --network host ibaldin/e2sar:0.3.1a3 lbadm --overview +``` + +**View help for any script:** +```bash +./minimal_sender.sh --help +./minimal_receiver.sh --help +``` + +**Emergency cleanup:** +```bash +# If minimal_free.sh fails +rm -f INSTANCE_URI +rm -f *.log + +# Start fresh +EJFAT_URI="ejfat://..." ./minimal_reserve.sh +``` + +--- + +## Conclusion + +You now have all the tools you need to run E2SAR network performance tests! + +**Remember the basic workflow:** +1. **Reserve** resources with `minimal_reserve.sh` +2. **Start receiver** with `minimal_receiver.sh` +3. **Run sender** with `minimal_sender.sh` +4. **Analyze logs** to see performance results +5. **Free resources** with `minimal_free.sh` + +For HPC environments with SLURM, see **[RunningSlurmOnPerlmutter.md](RunningSlurmOnPerlmutter.md)** for batch job submission. + +**Key takeaways:** +- Always create a reservation first +- The `INSTANCE_URI` file is shared between all scripts +- Logs contain detailed timestamps and performance metrics +- On HPC systems, pre-create reservations on login nodes +- Use `--help` on any script to see all options + +Happy testing! diff --git a/scripts/zero_to_hero/docs/RunningSlurmOnPerlmutter.md b/scripts/zero_to_hero/docs/RunningSlurmOnPerlmutter.md new file mode 100644 index 0000000..1668709 --- /dev/null +++ b/scripts/zero_to_hero/docs/RunningSlurmOnPerlmutter.md @@ -0,0 +1,341 @@ +# Running SLURM Jobs on Perlmutter + +The `perlmutter_slurm.sh` script orchestrates distributed E2SAR tests on HPC systems using the SLURM workload manager. + +## Understanding the SLURM Script + +The SLURM script automates a complete sender/receiver test across two compute nodes: + +**What it does:** +1. Reserves load balancer resources (or uses pre-created reservation) +2. Starts receiver on Node 0 (background) +3. Waits 10 seconds for receiver registration +4. Starts sender on Node 1 (foreground) +5. Terminates receiver after sender completes +6. Frees load balancer reservation +7. Collects all logs into a job-specific directory + +**Key features:** +- Uses exactly 2 nodes (configurable via SLURM options) +- Creates isolated working directory: `runs/slurm_job_/` +- Generates comprehensive logs for debugging +- Handles cleanup automatically + +## Pre-Creating Reservations (Recommended) + +It's recommended to create your reservation on the login node before submitting the SLURM job. This avoids waiting for your job to start only to have the reservation fail. The best practice is to create your reservation on the login node: + +**On the login node:** +```bash +cd /path/to/minimal_scripts + +# Create reservation +EJFAT_URI="ejfat://your_token@lb.hostname:19522/lb/1?sync=..." ./minimal_reserve.sh + +# Verify it was created +ls -l INSTANCE_URI +cat INSTANCE_URI +``` + +This creates an `INSTANCE_URI` file that the SLURM script will use. + +## Submitting Jobs + +### Finding Your Project Allocation + +Before submitting jobs, you need to know your NERSC project allocation. To find your available projects: + +```bash +# View all your project allocations and remaining hours +iris + +# Alternative: Check your default project +sacctmgr show user $USER format=account%20,defaultaccount +``` + +The `iris` command shows all your active allocations with remaining compute hours. Common project formats are: +- Repository allocations: `m####` (e.g., `m1234`) +- Startup allocations: `m####_g` +- ALCC/INCITE allocations: Named projects + +Use the project name with the `-A` flag when submitting SLURM jobs. + +### Basic Job Submission + +**Basic job submission:** + +```bash +EJFAT_URI="ejfat://..." sbatch -A perlmutter_slurm.sh +``` + +**With custom parameters:** + +```bash +EJFAT_URI="ejfat://..." sbatch -A perlmutter_slurm.sh --rate 10 --num 5000 --length 2097152 +``` + +**With custom MTU:** + +```bash +EJFAT_URI="ejfat://..." sbatch -A perlmutter_slurm.sh --rate 20 --mtu 9000 +``` + +**SLURM options you can override:** + +```bash +sbatch -A \ + -N 2 \ # Number of nodes (fixed at 2) + -q debug \ # Queue: debug or regular + -t 00:30:00 \ # Time limit + perlmutter_slurm.sh --rate 5 +``` + +**All available test options:** + +| Option | Description | Default | +|--------|-------------|---------| +| `--rate RATE` | Sending rate in Gbps | 1 | +| `--length LENGTH` | Event buffer size in bytes | 1048576 | +| `--num COUNT` | Number of events | 100 | +| `--mtu MTU` | MTU size in bytes | 9000 | +| `--port PORT` | Receiver data port | 10000 | +| `--image IMAGE` | Container image | ibaldin/e2sar:0.3.1a3 | + +## Monitoring Jobs + +**Check job status:** +```bash +squeue -u $USER +``` + +**Watch live output:** +```bash +tail -f slurm-.out +``` + +**After job completes:** +```bash +# View SLURM output +cat slurm-.out + +# Navigate to job directory +cd runs/slurm_job_/ + +# Check individual logs +cat minimal_sender.log +cat minimal_receiver.log +cat sender_srun.log +cat receiver_srun.log +``` + +## Example SLURM Workflow + +**Complete example: High-rate test on Perlmutter** + +```bash +# 1. Login to Perlmutter +ssh username@perlmutter-p1.nersc.gov + +# 2. Navigate to scripts directory +cd /global/homes/u/username/e2sar/zero_to_hero + +# 3. Create reservation on login node +EJFAT_URI="ejfat://token@lb.es.net:19522/lb/1?sync=..." ./minimal_reserve.sh + +# 4. Submit job (replace m1234 with your project - use 'iris' to find it) +sbatch -A m1234 -q regular -t 01:00:00 perlmutter_slurm.sh \ + --rate 20 \ + --num 10000 \ + --length 4194304 \ + --mtu 9000 + +# 5. Monitor +squeue -u $USER +tail -f slurm-*.out + +# 6. After completion, analyze results +cd runs/slurm_job_*/ +grep -E "Performance|rate|throughput" minimal_sender.log minimal_receiver.log +``` + +## Multi-Instance Testing with `perlmutter_multi_slurm.sh` + +The `perlmutter_multi_slurm.sh` script extends the single sender/receiver model to support multiple concurrent senders and receivers. Senders and receivers share the same node pool and may be **co-located** on the same nodes, enabling flexible configurations from a single node running everything to many nodes each running multiple instances. + +### How It Works + +1. Reserves a fresh load balancer instance for the job +2. Starts all receiver instances in parallel (co-located on shared nodes) +3. Waits a configurable delay for receivers to register with the LB +4. Starts all sender instances in parallel (co-located on the same shared nodes) +5. Waits for all senders to complete +6. Gracefully terminates all receivers (SIGTERM, then SIGKILL after 5s) +7. Frees load balancer reservation +8. Prints a summary report with all exit codes + +### Multi-Instance Parameters + +| Option | Description | Default | +|--------|-------------|---------| +| `--receivers N` | Total number of receiver instances | 1 | +| `--senders M` | Total number of sender instances | 1 | +| `--receivers-per-node K` | Receiver instances per node | 1 | +| `--senders-per-node K` | Sender instances per node | 1 | +| `--threads N` | Receive threads per receiver (also sets port stride) | 16 | +| `--base-port PORT` | Starting port for receiver 0 | 10000 | +| `--receiver-delay SEC` | Seconds to wait after starting receivers | 10 | + +All test options (`--rate`, `--length`, `--num`, `--mtu`, `--image`, `--ipv6`, `-v`) are supported and passed through to the underlying scripts. + +### Node Calculation Formula + +Senders and receivers share the same pool of nodes. The script requires: + +``` +Receiver nodes = ceil(receivers / receivers-per-node) +Sender nodes = ceil(senders / senders-per-node) +Total nodes = max(Receiver nodes, Sender nodes) +``` + +Request exactly this many nodes via `sbatch -N `. + +### Port Assignment + +Each receiver instance uses `--threads` consecutive ports (default: 16). Receiver ports are assigned as: + +``` +Receiver 0: base_port + 0 * threads → base_port to base_port + threads - 1 +Receiver 1: base_port + 1 * threads → ... +Receiver i: base_port + i * threads +``` + +With the defaults (`--base-port 10000 --threads 16`): +- Receiver 0: ports 10000–10015 +- Receiver 1: ports 10016–10031 +- Receiver 2: ports 10032–10047 + +If you change `--threads`, the port stride updates automatically to match. + +### Example Submissions + +**4 receivers and 4 senders co-located on 2 nodes (2 of each per node):** +```bash +EJFAT_URI="ejfat://..." sbatch -N 2 -A perlmutter_multi_slurm.sh \ + --receivers 4 --receivers-per-node 2 \ + --senders 4 --senders-per-node 2 \ + --rate 1 --num 10000 +``` + +**Single node running all instances:** +```bash +EJFAT_URI="ejfat://..." sbatch -N 1 -A perlmutter_multi_slurm.sh \ + --receivers 2 --receivers-per-node 2 \ + --senders 2 --senders-per-node 2 \ + --rate 1 --num 1000 +``` + +**2 receivers + 2 senders, one per node (no co-location):** +```bash +EJFAT_URI="ejfat://..." sbatch -N 2 -A perlmutter_multi_slurm.sh \ + --receivers 2 --senders 2 --rate 1 --num 10000 +``` + +**Custom thread count (port stride updates automatically):** +```bash +EJFAT_URI="ejfat://..." sbatch -N 2 -A perlmutter_multi_slurm.sh \ + --receivers 4 --receivers-per-node 2 \ + --senders 4 --senders-per-node 2 \ + --threads 8 --rate 1 --num 5000 +# Receiver ports: 10000-10007, 10008-10015, 10016-10023, 10024-10031 +``` + +### Output Directory Structure + +Each job creates an isolated directory under `runs/`: + +``` +runs/slurm_job_/ +├── INSTANCE_URI +├── receiver_0/ +│ ├── INSTANCE_URI +│ ├── minimal_receiver.log +│ └── receiver_srun.log +├── receiver_1/ +│ ├── INSTANCE_URI +│ ├── minimal_receiver.log +│ └── receiver_srun.log +├── sender_0/ +│ ├── INSTANCE_URI +│ ├── minimal_sender.log +│ ├── minimal_sender_memory.log +│ └── sender_srun.log +└── sender_1/ + ├── INSTANCE_URI + ├── minimal_sender.log + ├── minimal_sender_memory.log + └── sender_srun.log +``` + +Receivers terminated by the script (after senders complete) will show exit codes 137 (SIGKILL) or 143 (SIGTERM) in the SLURM accounting — both are expected and reported as such in the summary. + +--- + +## Understanding SLURM Output + +The SLURM script produces detailed output with clear phase markers: + +``` +========================================= +EJFAT Minimal Test - SLURM Job 12345678 +========================================= +Start time: 2026-02-17 10:30:00 UTC + +EJFAT_URI: ejfat://... +Job nodes: nid[003201-003202] +Job ID: 12345678 + +Receiver node (Node 0): nid003201 +Sender node (Node 1): nid003202 + +========================================= +Phase 1: Reserve Load Balancer +========================================= +Found existing INSTANCE_URI in submit directory... +Reservation ready + +========================================= +Phase 2: Start Receiver on nid003201 +========================================= +Receiver started (PID: 54321) +Waiting 10 seconds for receiver to register... + +========================================= +Phase 3: Start Sender on nid003202 +========================================= +[sender output...] +Sender completed (exit code: 0) + +========================================= +Phase 4: Shutdown Receiver +========================================= +Sending SIGKILL to receiver... +Receiver terminated successfully + +========================================= +Phase 5: Free Load Balancer +========================================= +Reservation freed successfully + +========================================= +Test Summary +========================================= +Job ID: 12345678 +Job directory: /path/to/runs/slurm_job_12345678 +Sender exit code: 0 +Receiver exit code: 0 + +Logs available at: + - Sender log: /path/to/runs/slurm_job_12345678/minimal_sender.log + - Receiver log: /path/to/runs/slurm_job_12345678/minimal_receiver.log + ... +``` diff --git a/scripts/zero_to_hero/docs/ZeroToHeroStart.md b/scripts/zero_to_hero/docs/ZeroToHeroStart.md new file mode 100644 index 0000000..463c69a --- /dev/null +++ b/scripts/zero_to_hero/docs/ZeroToHeroStart.md @@ -0,0 +1,258 @@ +# EJFAT - From Zero to Hero + +## Introduction + +The following set of learning activities can be pursued in parallel: + +1. "Hello world". Using e2sar_perf to send and receive packets in back to back mode on a single node or laptop. This does not need a load balancer and assumes 1 sender and 1 receiver. + +2. Extending "Hello world" by running it on Perlmutter with a real load balancer and more than one sender and / or receiver + +3. Replacing e2sar_perf with a real application. This is done with the E2SAR development library, and can be done in Python (recommended) or C++ (steeper learning curve). + +4. Extending the transmission model to send traffic from SLAC to Perlmutter. This will depend on how quickly we can debug the network path. It does not limit progress on 1, 2 and 3 above. + +## 1. Hello World + +Study the documentation here to get oriented with EJFAT: + +[https://github.com/JeffersonLab/E2SAR/wiki/Integration](https://github.com/JeffersonLab/E2SAR/wiki/Integration) + +To get started we need the e2sar_perf binaries. There are many ways to get these, including conda install, pre built docker versions, and building from source. For the very first pass we recommend using docker: + +[https://github.com/JeffersonLab/E2SAR/wiki/Code-and-Binaries](https://github.com/JeffersonLab/E2SAR/wiki/Code-and-Binaries) + +```bash +docker run --rm --network host ibaldin/e2sar:latest lbadm --help -v +``` + +Example output: + +``` +(base)yak@yk-macbook:ejfat_tests(main)$ docker run --rm --network host ibaldin/e2sar:latest e2sar_perf --help -v + +Unable to find image 'ibaldin/e2sar:latest' locally +latest: Pulling from ibaldin/e2sar +a3be5d4ce401: Downloading [==========================> ] 15.87MB/29.54MB +8820264101df: Downloading [=> ] 9.174MB/240.2MB +60235472d435: Download complete +cfe54ce66ff6: Download complete +570caf699672: Waiting +92ad791d8720: Waiting +c50bbcbc011a: Waiting +a3d4a61339e0: Waiting +6d3c5af0f2be: Waiting + +E2SAR Version: 0.2.2 +E2SAR Available Optimizations: none,sendmmsg +Command-line options: + -h [ --help ] show this help message + -s [ --send ] send traffic + -r [ --recv ] receive traffic + -l [ --length ] arg (=1048576) event buffer length (defaults to 1024^2) [s] + -u [ --uri ] arg specify EJFAT_URI on the command-line instead + of the environment variable + -n [ --num ] arg (=10) number of event buffers to send (defaults to + 10) [s] + -e [ --enum ] arg (=0) starting event number (defaults to 0) [s] + -m [ --mtu ] arg (=1500) MTU (default 1500) [s] + --src arg (=1234) Event source (default 1234) [s] + --dataid arg (=4321) Data id (default 4321) [s] + --threads arg (=1) number of receive threads (defaults to 1) [r] + --sockets arg (=4) number of send sockets (defaults to 4) [r] + --rate arg (=1) send rate in Gbps (defaults to 1.0, negative + value means no limit) + -p [ --period ] arg (=1000) receive side reporting thread sleep period in + ms (defaults to 1000) [r] + -b [ --bufsize ] arg (=3145728) send or receive socket buffer size (default + to 3MB) + -d [ --duration ] arg (=0) duration for receiver to run for (defaults to + 0 - until Ctrl-C is pressed)[s] + -c [ --withcp ] enable control plane interactions + -i [ --ini ] arg INI file to initialize SegmenterFlags [s] or + ReassemblerFlags [r]. Values found in the + file override --withcp, --mtu, --sockets, + --novalidate, --ip[46] and --bufsize + --ip arg IP address (IPv4 or IPv6) from which sender + sends from or on which receiver listens + (conflicts with --autoip) [s,r] + --port arg (=10000) Starting UDP port number on which receiver + listens. Defaults to 10000. [r] + -6 [ --ipv6 ] force using IPv6 control plane address if URI + specifies hostname (disables cert validation) + [s,r] + -4 [ --ipv4 ] force using IPv4 control plane address if URI + specifies hostname (disables cert validation) + [s,r] + -v [ --novalidate ] don't validate server certificate [s,r] + --autoip auto-detect dataplane outgoing ip address + (conflicts with --ip; doesn't work for + reassembler in back-to-back testing) [s,r] + --deq arg (=1) number of event dequeue threads in receiver + (defaults to 1) [r] + --cores arg optional list of cores to bind sender or + receiver threads to; number of receiver + threads is equal to the number of cores [s,r] + -o [ --optimize ] arg a list of optimizations to turn on [s] + --numa arg (=-1) bind all memory allocation to this NUMA node + (if >= 0) [s,r] + --multiport use consecutive destination ports instead of + one port [s] + --smooth use smooth shaping in the sender (only works + without optimizations and at low sub 3-5Gbps + rates!) [s] + --timeout arg (=500) event timeout on reassembly in MS [r] +``` + +A trivial loopback invocation sending 10 1MB events at 1Gbps looks like this (start the receiver first, stop it with Ctrl-C when done): + +**Receiver:** +```bash +e2sar_perf --ip '127.0.0.1' -r -u 'ejfat://token@127.0.0.1:18020/lb/36?data=127.0.0.1:10000' +``` + +**Sender:** +```bash +e2sar_perf --ip '127.0.0.1' -s -u 'ejfat://token@127.0.0.1:18020/lb/36?data=127.0.0.1:10000' --rate 1 +``` + +At the very bottom of the help page is an easter egg for new users. The exact commands you need to run a back to back test. + +Start a receiver and then a transmitter and see if they connect to each other. + +On Mac OS you **might** see some event loss with this first docker based test. Ignore those for now, on Mac OS we will switch to using conda based native binaries for e2sar_perf in the next step. On Linux based systems, we will continue with the docker based workflow. + +## 1.2 Moving to conda based installs + +Now that we have first success with e2sar_perf. Follow the instructions on this page to install E2SAR into a conda env. This will improve performance on Mac OS and set the stage for further development when we want to integrate E2SAR into a custom application. + +[https://github.com/JeffersonLab/E2SAR/wiki/Code-and-Binaries](https://github.com/JeffersonLab/E2SAR/wiki/Code-and-Binaries) + +Once e2sar_perf is installed with conda on your laptop, it is possible to invoke e2sar_perf directly. It also installs python python packages that we can import into our own custom senders and receivers at a later step. + +## 2. Extending hello world and running it on Perlmutter + +### 2.1 Setting up Load Balancer Access on Perlmutter + +Log into Perlmutter and repeat the docker based exercise to run e2sar_perf on the Perlmutter login node. NERSC has comprehensive documentation on how to run docker containers in their environment. They use podman-hpc rather than docker, for our purposes it is functionally the same. + +[https://docs.nersc.gov/development/containers/podman-hpc/overview/](https://docs.nersc.gov/development/containers/podman-hpc/overview/) + +NERSC provides shifter and podman-hpc as two different alternatives for running containers. We have been working with podman-hpc. + +Make sure that you have set your environment variable EJFAT_URI based on the token provided to you by ESnet. It should look something like the following. The port and load balancer hostname might differ. + +```bash +export EJFAT_URI='ejfats://nuYr-------JlZ1@ejfat-lb.es.net:18008/lb/' +``` + +Now test to see if you can contact the ESnet load balancer: + +```bash +podman-hpc run -e EJFAT_URI=$EJFAT_URI --rm --network host ibaldin/e2sar:latest lbadm --overview +``` + +If you are successful go ahead and reserve a load balancer: + +```bash +podman-hpc run -e EJFAT_URI=$EJFAT_URI --rm --network host ibaldin/e2sar:latest lbadm --reserve --lbname "yk_test" --export +``` + +Example output: +```bash +export EJFAT_URI='ejfats://cc----5c21@ejfat-lb.es.net:18008/lb/272?sync=192.188.29.6:19010&data=192.188.29.10&data=[2001:400:a300::10]' +``` + +The reserve command will return a new URI, conveniently formatted as an export statement. Cut and paste the export command so that your EJFAT_URI is now pointing to this more specific instance URI. You can see that it has additional information about your load balancer instance, such as the dataplane address that you will be sending to. + +You can run overview again to see the status of this freshly minted load balancer: + +```bash +podman-hpc run -e EJFAT_URI=$EJFAT_URI --rm --network host ibaldin/e2sar:latest lbadm --overview +``` + +And you can free this instance using lbadm --free: + +```bash +podman-hpc run -e EJFAT_URI=$EJFAT_URI --rm --network host ibaldin/e2sar:latest lbadm --free +``` + +Congratulations... sit back and bask in the joy that you are now ready to start streaming to and from the live load balancer hosted inside ESnet. + +### 2.2 Running Sender and Receiver Jobs on Perlmutter + +Now that you have successfully connected to the ESnet load balancer, you're ready to run actual performance tests on Perlmutter. + +To make testing easier, we provide a set of minimal wrapper scripts that handle: +- Load balancer reservation management +- Automatic IP address detection for multi-homed systems +- Containerized sender and receiver execution +- Memory usage monitoring +- SLURM batch job orchestration for distributed testing + +**For complete instructions on running sender and receiver jobs on Perlmutter, please see:** + +- **[QuickStartMinimalScripts.md](QuickStartMinimalScripts.md)** - Quick start guide and script overview +- **[RunningMinimalScripts.md](RunningMinimalScripts.md)** - Comprehensive tutorial covering: + - Basic sender/receiver workflows + - Performance tuning parameters + - SLURM batch job submission + - Memory monitoring and optimization + - Troubleshooting common issues + +These scripts provide a production-ready workflow for network performance testing on Perlmutter, handling all the containerization and networking complexity automatically. + +--- + +## Appendix - Useful Stuff to Know + +### On Mac OS: Running Docker with Colima + +To run Docker with Colima, you need to install both the docker CLI and colima, start the Colima virtual machine (VM), and then use standard docker commands. Colima provides a lightweight Linux VM on your macOS (or Linux) machine where the Docker daemon runs. + +#### Prerequisites + +You will need Homebrew installed to manage packages on macOS. + +#### Setup and Usage + +Follow these steps to set up and run Docker with Colima: + +1. **Install Colima and Docker CLI:** Open your terminal and run the following commands to install Colima and the Docker client, Docker Compose, and Buildx via Homebrew: + + ```bash + brew install colima docker docker-compose docker-buildx + ``` + +2. **Start Colima:** Initiate the Colima VM. By default, Colima uses the Docker runtime: + + ```bash + colima start + ``` + + You can customize the VM's resources (CPU, memory, disk) on startup if needed: + + ```bash + colima start --cpu 4 --memory 8 --disk 60 + ``` + + If you have Apple Silicon (M1/M2/M3), you can also specify the architecture to ensure compatibility with certain images: + + ```bash + colima start --arch x86_64 + ``` + +3. **Verify Status:** Check that the Colima instance is running and using the Docker runtime: + + ```bash + colima status + ``` + +4. **Use Docker Commands:** Once Colima is running, you can use standard Docker commands as usual. The Docker client on your host machine will automatically connect to the Docker daemon running inside the Colima VM: + + ```bash + docker run hello-world + docker pull ubuntu + docker images + docker ps + ``` diff --git a/scripts/zero_to_hero/minimal_free.sh b/scripts/zero_to_hero/minimal_free.sh new file mode 100755 index 0000000..0f11956 --- /dev/null +++ b/scripts/zero_to_hero/minimal_free.sh @@ -0,0 +1,74 @@ +#!/bin/bash +# Minimal E2SAR load balancer free script +# +# Usage: +# ./minimal_free.sh [-v] +# +# Options: +# -v Skip SSL certificate validation +# +# Frees the load balancer reservation using INSTANCE_URI file + +set -euo pipefail + +# Script location (for finding sibling scripts if needed) +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Artifacts are created in the current working directory (not script directory) + +SKIP_SSL_VERIFY="false" +E2SAR_IMAGE="${E2SAR_IMAGE:-ibaldin/e2sar:0.3.1a3}" + +while [[ $# -gt 0 ]]; do + case $1 in + -v) + SKIP_SSL_VERIFY="true" + shift + ;; + *) + echo "Unknown option: $1" + exit 1 + ;; + esac +done + +INSTANCE_URI_FILE="INSTANCE_URI" + +# Check if INSTANCE_URI file exists +if [[ ! -f "$INSTANCE_URI_FILE" ]]; then + echo "ERROR: $INSTANCE_URI_FILE not found" + echo "No reservation to free" + exit 1 +fi + +echo "Found $INSTANCE_URI_FILE" + +# Extract EJFAT_URI safely without sourcing the entire file +EJFAT_URI=$(grep -E '^export EJFAT_URI=' "$INSTANCE_URI_FILE" | head -1 | sed "s/^export EJFAT_URI=//; s/^['\"]//; s/['\"]$//") + +# Validate EJFAT_URI was set +if [[ -z "${EJFAT_URI:-}" ]]; then + echo "ERROR: EJFAT_URI not found in $INSTANCE_URI_FILE" + exit 1 +fi + +echo "Freeing load balancer reservation..." +EJFAT_URI_REDACTED=$(echo "$EJFAT_URI" | sed -E 's|(://)(.{4})[^@]*(.{4})@|\1\2---\3@|') +echo "EJFAT_URI: $EJFAT_URI_REDACTED" + +# Run lbadm --free +LBADM_CMD=(lbadm) +[[ "$SKIP_SSL_VERIFY" == "true" ]] && LBADM_CMD+=(--novalidate) +LBADM_CMD+=(--free) + +export EJFAT_URI +if podman-hpc run --env EJFAT_URI --rm --network host "$E2SAR_IMAGE" "${LBADM_CMD[@]}"; then + echo "Reservation freed successfully" + + # Remove the INSTANCE_URI file + rm -f "$INSTANCE_URI_FILE" + echo "Removed $INSTANCE_URI_FILE" +else + echo "ERROR: Failed to free reservation" + exit 1 +fi diff --git a/scripts/zero_to_hero/minimal_receiver.sh b/scripts/zero_to_hero/minimal_receiver.sh new file mode 100755 index 0000000..da77ee2 --- /dev/null +++ b/scripts/zero_to_hero/minimal_receiver.sh @@ -0,0 +1,203 @@ +#!/bin/bash +# Minimal E2SAR receiver startup script +# +# Usage: +# ./minimal_reserve.sh # First, create a reservation +# ./minimal_receiver.sh [OPTIONS] +# +# Requires INSTANCE_URI file (created by minimal_reserve.sh) +# +# Options: +# --image IMAGE Container image (default: ibaldin/e2sar:0.3.1a3) +# --port PORT Data port (default: 10000) +# --duration SEC Run duration in seconds (default: 0 = indefinite) +# --ipv6 Use IPv6 (default: false) +# -v Skip SSL certificate validation (default: disabled) +# --threads NUM Number of receive threads (default: 16) +# --deq NUM Number of dequeue threads (default: 16) +# --bufsize SIZE Socket buffer size in bytes (default: 134217728) +# --help Show this help message + +set -euo pipefail + +# Script location (for finding sibling scripts if needed) +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Artifacts are created in the current working directory (not script directory) + +# Default values +EJFAT_URI="${EJFAT_URI:-}" +E2SAR_IMAGE="${E2SAR_IMAGE:-ibaldin/e2sar:0.3.1a3}" +DATA_PORT="10000" +DURATION="0" +USE_IPV6="false" +SKIP_SSL_VERIFY="false" +RECV_THREADS="16" +DEQUEUE_THREADS="16" +BUFFER_SIZE="134217728" + +# Parse command line arguments +while [[ $# -gt 0 ]]; do + case $1 in + --image) + E2SAR_IMAGE="$2" + shift 2 + ;; + --port) + DATA_PORT="$2" + shift 2 + ;; + --duration) + DURATION="$2" + shift 2 + ;; + --ipv6) + USE_IPV6="true" + shift + ;; + -v) + SKIP_SSL_VERIFY="true" + shift + ;; + --threads) + RECV_THREADS="$2" + shift 2 + ;; + --deq) + DEQUEUE_THREADS="$2" + shift 2 + ;; + --bufsize) + BUFFER_SIZE="$2" + shift 2 + ;; + --help) + sed -n '2,/^$/p' "$0" | sed 's/^# \?//' + exit 0 + ;; + *) + echo "Unknown option: $1" + echo "Use --help for usage information" + exit 1 + ;; + esac +done + +# Load EJFAT_URI from INSTANCE_URI file +INSTANCE_URI_FILE="INSTANCE_URI" +if [[ ! -f "$INSTANCE_URI_FILE" ]]; then + echo "ERROR: $INSTANCE_URI_FILE not found" + echo "Run minimal_reserve.sh first to create a reservation" + exit 1 +fi + +echo "Loading EJFAT_URI from $INSTANCE_URI_FILE..." +# Extract EJFAT_URI safely without sourcing the entire file +EJFAT_URI=$(grep -E '^export EJFAT_URI=' "$INSTANCE_URI_FILE" | head -1 | sed "s/^export EJFAT_URI=//; s/^['\"]//; s/['\"]$//") + +# Validate EJFAT_URI was loaded +if [[ -z "$EJFAT_URI" ]]; then + echo "ERROR: EJFAT_URI not found in $INSTANCE_URI_FILE" + exit 1 +fi + +echo "Starting E2SAR receiver..." +EJFAT_URI_REDACTED=$(echo "$EJFAT_URI" | sed -E 's|(://)(.{4})[^@]*(.{4})@|\1\2---\3@|') +echo "EJFAT_URI: $EJFAT_URI_REDACTED" +echo "Container Image: $E2SAR_IMAGE" + +# Auto-detect receiver IP +echo "Auto-detecting receiver IP..." + +# Extract LB hostname from EJFAT_URI +# Format: ejfat://token@hostname:port/lb/1?sync=... +LB_HOST=$(echo "$EJFAT_URI" | sed 's|.*@\([^:]*\):.*|\1|') +echo "LB Host: $LB_HOST" + +# Resolve LB hostname to IP +if [[ "$USE_IPV6" == "true" ]]; then + LB_IP=$(getent ahostsv6 "$LB_HOST" | head -1 | awk '{print $1}') +else + LB_IP=$(getent ahostsv4 "$LB_HOST" | head -1 | awk '{print $1}') +fi + +if [[ -z "$LB_IP" ]]; then + echo "ERROR: Failed to resolve LB host: $LB_HOST" + exit 1 +fi +echo "LB IP: $LB_IP" + +# Find source IP for route to LB +RECEIVER_IP=$(ip route get "$LB_IP" | head -1 | sed 's/^.*src//' | awk '{print $1}') + +if [[ -z "$RECEIVER_IP" ]]; then + echo "ERROR: Failed to detect receiver IP" + exit 1 +fi + +echo "Receiver IP: $RECEIVER_IP" +echo "Data Port: $DATA_PORT" +echo "Receive Threads: $RECV_THREADS" +echo "Dequeue Threads: $DEQUEUE_THREADS" +echo "Buffer Size: $BUFFER_SIZE" + +# Build podman-hpc command + +# Export EJFAT_URI so it can be passed to container without exposing in process list +export EJFAT_URI + +CMD=( + podman-hpc + run + --rm + --network host + --env EJFAT_URI + -e "MALLOC_ARENA_MAX=32" + "$E2SAR_IMAGE" + e2sar_perf + --recv + --withcp + --ip="$RECEIVER_IP" + --port="$DATA_PORT" + --duration="$DURATION" + --timeout=3000 + --threads="$RECV_THREADS" + --deq="$DEQUEUE_THREADS" + --bufsize="$BUFFER_SIZE" +) + +# Add -v flag if SSL verification should be skipped +if [[ "$SKIP_SSL_VERIFY" == "true" ]]; then + CMD+=(-v) +fi + +echo "" +echo "Running: ${CMD[*]}" +echo "" + +# Function to write end timestamp +write_end_time() { + local exit_code=$? + echo "" >> minimal_receiver.log + echo "END_TIME (UTC): $(date -u '+%Y-%m-%d %H:%M:%S')" >> minimal_receiver.log + echo "EXIT_CODE: $exit_code" >> minimal_receiver.log + return $exit_code +} + +# Trap signals to ensure END_TIME is written even on interrupt/termination +trap 'write_end_time' EXIT INT TERM + +# Run podman-hpc with timestamps logged to file +{ + echo "START_TIME (UTC): $(date -u '+%Y-%m-%d %H:%M:%S')" + echo "" +} | tee minimal_receiver.log || true + +# Run the container and append output +# Use || true to prevent pipefail from failing on SIGPIPE in tee +# Use PIPESTATUS[0] to capture container's exit code (not tee's) +"${CMD[@]}" 2>&1 | tee -a minimal_receiver.log || true +CONTAINER_EXIT_CODE=${PIPESTATUS[0]} + +# Exit with container's exit code (trap will handle END_TIME/EXIT_CODE logging) +exit $CONTAINER_EXIT_CODE diff --git a/scripts/zero_to_hero/minimal_reserve.sh b/scripts/zero_to_hero/minimal_reserve.sh new file mode 100755 index 0000000..a50b7bb --- /dev/null +++ b/scripts/zero_to_hero/minimal_reserve.sh @@ -0,0 +1,84 @@ +#!/bin/bash +# Minimal E2SAR load balancer reservation script +# +# Usage: +# EJFAT_URI="ejfat://token@host:port/lb/1?sync=..." ./minimal_reserve.sh +# +# Creates/updates INSTANCE_URI file with reservation details +# Note: lbadm --reserve skips SSL cert validation internally; no -v flag needed. + +set -euo pipefail + +# Script location (for finding sibling scripts if needed) +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Artifacts are created in the current working directory (not script directory) + +INSTANCE_URI_FILE="INSTANCE_URI" + +# Default configuration +LB_NAME="${LB_NAME:-e2sar_test}" +E2SAR_IMAGE="${E2SAR_IMAGE:-ibaldin/e2sar:0.3.1a3}" + +# Parse command-line arguments +while [[ $# -gt 0 ]]; do + case "$1" in + --lbname) + LB_NAME="$2" + shift 2 + ;; + --image) + E2SAR_IMAGE="$2" + shift 2 + ;; + *) + echo "ERROR: Unknown argument: $1" + echo "Usage: $0 [--lbname NAME] [--image IMAGE]" + exit 1 + ;; + esac +done + +# Validate EJFAT_URI +if [[ -z "${EJFAT_URI:-}" ]]; then + echo "ERROR: EJFAT_URI is required" + echo "Set via EJFAT_URI environment variable" + exit 1 +fi + +echo "Checking for existing reservation..." + +# Check if INSTANCE_URI file exists and is valid +if [[ -f "$INSTANCE_URI_FILE" ]]; then + echo "Found $INSTANCE_URI_FILE, validating..." + + # Save original admin URI + ORIGINAL_EJFAT_URI="$EJFAT_URI" + + # Use the instance URI (not the admin URI) to check session validity + INSTANCE_EJFAT_URI=$(grep -E '^export EJFAT_URI=' "$INSTANCE_URI_FILE" | head -1 | sed "s/^export EJFAT_URI=//; s/^['\"]//; s/['\"]$//") + + # Temporarily use instance URI for validation + EJFAT_URI="$INSTANCE_EJFAT_URI" + export EJFAT_URI + if podman-hpc run --env EJFAT_URI --rm --network host "${E2SAR_IMAGE:-ibaldin/e2sar:0.3.1a3}" lbadm --overview &>/dev/null; then + echo "Existing reservation is valid, skipping reserve" + exit 0 + else + echo "Existing reservation is invalid, will create new reservation" + fi + + # Restore admin URI for reservation creation below + EJFAT_URI="$ORIGINAL_EJFAT_URI" +fi + +echo "Creating new reservation..." + +# Run lbadm --reserve and save output to INSTANCE_URI +# Note: lbadm --reserve skips SSL cert validation internally regardless of --novalidate; +# passing --novalidate interferes with this and causes failures, so it is intentionally omitted. +export EJFAT_URI +podman-hpc run --env EJFAT_URI --rm --network host "$E2SAR_IMAGE" lbadm --reserve --lbname "$LB_NAME" --export > "$INSTANCE_URI_FILE" + +echo "Reservation created and saved to $INSTANCE_URI_FILE" +cat "$INSTANCE_URI_FILE" diff --git a/scripts/zero_to_hero/minimal_sender.sh b/scripts/zero_to_hero/minimal_sender.sh new file mode 100755 index 0000000..3eebe6f --- /dev/null +++ b/scripts/zero_to_hero/minimal_sender.sh @@ -0,0 +1,300 @@ +#!/bin/bash +# Minimal E2SAR sender startup script +# +# Usage: +# ./minimal_reserve.sh # First, create a reservation +# ./minimal_sender.sh [OPTIONS] +# +# Requires INSTANCE_URI file (created by minimal_reserve.sh) +# +# Options: +# --image IMAGE Container image (default: ibaldin/e2sar:0.3.1a3) +# --rate RATE Sending rate in Gbps (default: 1) +# --length LENGTH Event buffer length in bytes (default: 1048576) +# --num COUNT Number of events to send (default: 100) +# --mtu MTU MTU size in bytes (default: 9000) +# --ipv6 Use IPv6 (default: false) +# -v Skip SSL certificate validation (default: disabled) +# --no-monitor Disable memory monitoring (default: enabled) +# --help Show this help message + +set -euo pipefail + +# Script location (for finding sibling scripts if needed) +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Artifacts are created in the current working directory (not script directory) + +# Memory monitoring process ID +MEMORY_MONITOR_PID="" + +# Default values +EJFAT_URI="${EJFAT_URI:-}" +E2SAR_IMAGE="${E2SAR_IMAGE:-ibaldin/e2sar:0.3.1a3}" +RATE="1" +LENGTH="1048576" +NUM="100" +MTU="9000" +USE_IPV6="false" +SKIP_SSL_VERIFY="false" +ENABLE_MONITOR="true" + +# Parse command line arguments +while [[ $# -gt 0 ]]; do + case $1 in + --image) + E2SAR_IMAGE="$2" + shift 2 + ;; + --rate) + RATE="$2" + shift 2 + ;; + --length) + LENGTH="$2" + shift 2 + ;; + --num) + NUM="$2" + shift 2 + ;; + --mtu) + MTU="$2" + shift 2 + ;; + --ipv6) + USE_IPV6="true" + shift + ;; + -v) + SKIP_SSL_VERIFY="true" + shift + ;; + --no-monitor) + ENABLE_MONITOR="false" + shift + ;; + --help) + sed -n '2,/^$/p' "$0" | sed 's/^# \?//' + exit 0 + ;; + *) + echo "Unknown option: $1" + echo "Use --help for usage information" + exit 1 + ;; + esac +done + +# Load EJFAT_URI from INSTANCE_URI file +INSTANCE_URI_FILE="INSTANCE_URI" +if [[ ! -f "$INSTANCE_URI_FILE" ]]; then + echo "ERROR: $INSTANCE_URI_FILE not found" + echo "Run minimal_reserve.sh first to create a reservation" + exit 1 +fi + +echo "Loading EJFAT_URI from $INSTANCE_URI_FILE..." +# Extract EJFAT_URI safely without sourcing the entire file +EJFAT_URI=$(grep -E '^export EJFAT_URI=' "$INSTANCE_URI_FILE" | head -1 | sed "s/^export EJFAT_URI=//; s/^['\"]//; s/['\"]$//") + +# Validate EJFAT_URI was loaded +if [[ -z "$EJFAT_URI" ]]; then + echo "ERROR: EJFAT_URI not found in $INSTANCE_URI_FILE" + exit 1 +fi + +echo "Starting E2SAR sender..." +EJFAT_URI_REDACTED=$(echo "$EJFAT_URI" | sed -E 's|(://)(.{4})[^@]*(.{4})@|\1\2---\3@|') +echo "EJFAT_URI: $EJFAT_URI_REDACTED" +echo "Container Image: $E2SAR_IMAGE" + +# Auto-detect sender IP +echo "Auto-detecting sender IP..." + +# Extract LB hostname from EJFAT_URI +# Format: ejfat://token@hostname:port/lb/1?sync=... +LB_HOST=$(echo "$EJFAT_URI" | sed 's|.*@\([^:]*\):.*|\1|') +echo "LB Host: $LB_HOST" + +# Resolve LB hostname to IP +if [[ "$USE_IPV6" == "true" ]]; then + LB_IP=$(getent ahostsv6 "$LB_HOST" | head -1 | awk '{print $1}') +else + LB_IP=$(getent ahostsv4 "$LB_HOST" | head -1 | awk '{print $1}') +fi + +if [[ -z "$LB_IP" ]]; then + echo "ERROR: Failed to resolve LB host: $LB_HOST" + exit 1 +fi +echo "LB IP: $LB_IP" + +# Find source IP for route to LB +SENDER_IP=$(ip route get "$LB_IP" | head -1 | sed 's/^.*src//' | awk '{print $1}') + +if [[ -z "$SENDER_IP" ]]; then + echo "ERROR: Failed to detect sender IP" + exit 1 +fi + +echo "Sender IP: $SENDER_IP" +echo "Rate: $RATE Gbps" +echo "Event Length: $LENGTH bytes" +echo "Number of Events: $NUM" +echo "MTU: $MTU bytes" +echo "" + +#============================================================================= +# Memory monitoring function +#============================================================================= + +start_memory_monitor() { + local log_file="$1" + local interval="${2:-1}" + + # Create log file with header + { + echo "# E2SAR Memory Monitor" + echo "# Started: $(date -Iseconds)" + echo "# Interval: ${interval} second(s)" + echo "#" + echo "# Columns: TIMESTAMP, PID, RSS_KB, VSZ_KB, %MEM, %CPU, ELAPSED_TIME, COMMAND" + } > "$log_file" + + # Start monitoring loop in background + ( + while true; do + # Find all e2sar_perf processes + PIDS=$(pgrep -f "e2sar_perf.*--send" 2>/dev/null || true) + + if [ -n "$PIDS" ]; then + for PID in $PIDS; do + # Get process info + PS_INFO=$(ps -p "$PID" -o pid=,rss=,vsz=,%mem=,%cpu=,etime=,args= 2>/dev/null || true) + + if [ -n "$PS_INFO" ]; then + # Parse and log + read -r P_PID RSS VSZ MEM CPU ETIME ARGS <<< "$PS_INFO" + echo "$(date -Iseconds), $P_PID, $RSS, $VSZ, $MEM, $CPU, $ETIME, $ARGS" >> "$log_file" + fi + done + fi + + sleep "$interval" + done + ) >/dev/null 2>&1 & + + # Return the PID of the monitoring process + echo $! +} + +stop_memory_monitor() { + local monitor_pid="$1" + local log_file="$2" + + if [ -n "$monitor_pid" ] && kill -0 "$monitor_pid" 2>/dev/null; then + kill "$monitor_pid" 2>/dev/null || true + wait "$monitor_pid" 2>/dev/null || true + + # Add footer to log + { + echo "#" + echo "# Stopped: $(date -Iseconds)" + } >> "$log_file" + + # Generate summary if log has data + if [ -f "$log_file" ] && [ -s "$log_file" ]; then + MAX_RSS=$(grep -v '^#' "$log_file" | awk -F', ' '{print $3}' | sort -n | tail -1) + MIN_RSS=$(grep -v '^#' "$log_file" | awk -F', ' '{print $3}' | sort -n | head -1) + + if [ -n "$MAX_RSS" ] && [ -n "$MIN_RSS" ]; then + { + echo "#" + echo "# Memory Summary:" + echo "# Peak RSS: $((MAX_RSS / 1024)) MB ($MAX_RSS KB)" + echo "# Min RSS: $((MIN_RSS / 1024)) MB ($MIN_RSS KB)" + echo "# Growth: $(((MAX_RSS - MIN_RSS) / 1024)) MB" + } >> "$log_file" + fi + fi + fi +} + +#============================================================================= +# Build podman-hpc command +#============================================================================= + +# Export EJFAT_URI so it can be passed to container without exposing in process list +export EJFAT_URI + +CMD=( + podman-hpc + run + --rm + --network host + --env EJFAT_URI + -e "MALLOC_ARENA_MAX=32" + "$E2SAR_IMAGE" + e2sar_perf + --send + --withcp + --optimize=sendmmsg + --sockets=16 + --ip="$SENDER_IP" + --rate="$RATE" + --length="$LENGTH" + --num="$NUM" + --mtu="$MTU" + --bufsize=134217728 +) + +# Add -v flag if SSL verification should be skipped +if [[ "$SKIP_SSL_VERIFY" == "true" ]]; then + CMD+=(-v) +fi + +echo "" +echo "Running: ${CMD[*]}" +echo "" + +# Function to write end timestamp and stop monitoring +write_end_time() { + local exit_code=$? + + # Stop memory monitor if running + if [[ "$ENABLE_MONITOR" == "true" ]] && [[ -n "$MEMORY_MONITOR_PID" ]]; then + stop_memory_monitor "$MEMORY_MONITOR_PID" "minimal_sender_memory.log" + fi + + echo "" >> minimal_sender.log + echo "END_TIME (UTC): $(date -u '+%Y-%m-%d %H:%M:%S')" >> minimal_sender.log + echo "EXIT_CODE: $exit_code" >> minimal_sender.log + return $exit_code +} + +# Trap signals to ensure END_TIME is written even on interrupt/termination +trap 'write_end_time' EXIT INT TERM + +# Run podman-hpc with timestamps logged to file +{ + echo "START_TIME (UTC): $(date -u '+%Y-%m-%d %H:%M:%S')" + echo "" +} | tee minimal_sender.log || true + +# Start memory monitoring if enabled +if [[ "$ENABLE_MONITOR" == "true" ]]; then + echo "Starting memory monitor (logging to minimal_sender_memory.log)..." + MEMORY_MONITOR_PID=$(start_memory_monitor "minimal_sender_memory.log" 1) + echo "Memory monitor started (PID: $MEMORY_MONITOR_PID)" + echo "" +fi + +# Run the container and append output +# Use || true to prevent pipefail from failing on SIGPIPE in tee +# Use PIPESTATUS[0] to capture container's exit code (not tee's) +"${CMD[@]}" 2>&1 | tee -a minimal_sender.log || true +CONTAINER_EXIT_CODE=${PIPESTATUS[0]} + +# Exit with container's exit code (trap will handle END_TIME/EXIT_CODE logging) +exit $CONTAINER_EXIT_CODE diff --git a/scripts/zero_to_hero/monitor_memory.sh b/scripts/zero_to_hero/monitor_memory.sh new file mode 100755 index 0000000..42187b9 --- /dev/null +++ b/scripts/zero_to_hero/monitor_memory.sh @@ -0,0 +1,68 @@ +#!/bin/bash +# +# monitor_memory.sh - Monitor e2sar_perf memory usage during testing +# +# Usage: +# ./monitor_memory.sh [interval_seconds] +# +# Default interval: 1 second +# + +set -euo pipefail + +# Script location (for finding sibling scripts if needed) +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Artifacts are created in the current working directory (not script directory) + +INTERVAL="${1:-1}" +LOG_FILE="memory_monitor.log" + +echo "E2SAR Memory Monitor" +echo "====================" +echo "Monitoring e2sar_perf processes every ${INTERVAL} second(s)" +echo "Logging to: ${LOG_FILE}" +echo "Press Ctrl+C to stop" +echo "" + +# Create log file with header +cat > "${LOG_FILE}" </dev/null || true) + + if [ -n "$PS_INFO" ]; then + # Parse values + read -r P_PID RSS VSZ MEM CPU ETIME ARGS <<< "$PS_INFO" + + # Log to file + echo "$(date -Iseconds), $P_PID, $RSS, $VSZ, $MEM, $CPU, $ETIME, $ARGS" >> "${LOG_FILE}" + + # Display current values + RSS_MB=$((RSS / 1024)) + VSZ_MB=$((VSZ / 1024)) + echo -ne "\r[$(date +%H:%M:%S)] PID:$P_PID RSS:${RSS_MB}MB VSZ:${VSZ_MB}MB MEM:${MEM}% CPU:${CPU}% TIME:${ETIME} " + fi + done + fi + + sleep "$INTERVAL" +done diff --git a/scripts/zero_to_hero/perlmutter_multi_slurm.sh b/scripts/zero_to_hero/perlmutter_multi_slurm.sh new file mode 100755 index 0000000..8707ffb --- /dev/null +++ b/scripts/zero_to_hero/perlmutter_multi_slurm.sh @@ -0,0 +1,574 @@ +#!/bin/bash +# Perlmutter SLURM batch script for E2SAR multi-sender/receiver tests +# +# Usage: +# EJFAT_URI="ejfat://..." sbatch -N perlmutter_multi_slurm.sh [OPTIONS] +# +# SLURM Options (must be specified via sbatch): +# -N Total nodes (receivers + senders) +# -C cpu CPU nodes +# -q debug Queue (debug or regular) +# -t 00:30:00 Time limit +# -A Project allocation +# +# Multi-Instance Options: +# --receivers N Total number of receiver instances (default: 1) +# --senders M Number of sender instances (default: 1) +# --receivers-per-node K Receivers per node (default: 1) +# --senders-per-node K Senders per node (default: 1) +# --base-port PORT Starting port for receivers (default: 10000) +# --receiver-delay SEC Delay in seconds after receivers start (default: 10) +# +# Senders and receivers share the same node pool and may be co-located. +# Required nodes = max(ceil(receivers/receivers-per-node), ceil(senders/senders-per-node)) +# +# Test Options (passed to minimal_sender.sh/minimal_receiver.sh): +# --rate RATE Sending rate in Gbps (default: 1) +# --length LENGTH Event buffer length in bytes (default: 1048576) +# --num COUNT Number of events to send (default: 100) +# --mtu MTU MTU size in bytes (default: 9000) +# --threads N Receive threads per receiver instance (default: 16) +# Also determines port stride: each receiver occupies N consecutive ports +# --image IMAGE Container image (default: ibaldin/e2sar:0.3.1a3) +# --ipv6 Use IPv6 (default: false) +# -v Skip SSL certificate validation (default: disabled) +# +# Note: Receivers always run with --duration 0 (indefinite) and are terminated +# with SIGTERM/SIGKILL after all senders complete. +# +# Environment Variables: +# EJFAT_URI Required: EJFAT admin URI (not an INSTANCE_URI) +# +# Note: A fresh LB reservation is created for each job and freed on completion. +# +# Example (4 receivers and 4 senders co-located on 2 nodes): +# EJFAT_URI="ejfat://..." sbatch -N 2 -A perlmutter_multi_slurm.sh \ +# --receivers 4 --receivers-per-node 2 --senders 4 --senders-per-node 2 --rate 1 --num 100 +# +# Example (single node running everything): +# EJFAT_URI="ejfat://..." sbatch -N 1 -A perlmutter_multi_slurm.sh \ +# --receivers 2 --receivers-per-node 2 --senders 2 --senders-per-node 2 --rate 1 --num 100 +# + +#SBATCH -C cpu +#SBATCH -q debug +#SBATCH -t 00:30:00 +#SBATCH --ntasks-per-node=128 +#SBATCH -J ejfat_multi +#SBATCH -o ./slurm-%j.out +#SBATCH -e ./slurm-%j.err +#SBATCH --mail-type=BEGIN,END,FAIL +#SBATCH --mail-user=$USER@nersc.gov + +set -euo pipefail + +#============================================================================= +# Parse command-line arguments +#============================================================================= + +# Multi-instance configuration +NUM_RECEIVERS=1 +NUM_SENDERS=1 +RECEIVERS_PER_NODE=1 +SENDERS_PER_NODE=1 +BASE_PORT=10000 +RECEIVER_DELAY=10 + +# Test parameters +RATE="" +LENGTH="" +NUM="" +MTU="" +RECV_THREADS=16 +IMAGE="" +IPV6_FLAG="" +V_FLAG="" + +while [[ $# -gt 0 ]]; do + case $1 in + --receivers) + NUM_RECEIVERS="$2" + shift 2 + ;; + --senders) + NUM_SENDERS="$2" + shift 2 + ;; + --receivers-per-node) + RECEIVERS_PER_NODE="$2" + shift 2 + ;; + --senders-per-node) + SENDERS_PER_NODE="$2" + shift 2 + ;; + --base-port) + BASE_PORT="$2" + shift 2 + ;; + --receiver-delay) + RECEIVER_DELAY="$2" + shift 2 + ;; + --rate) + RATE="$2" + shift 2 + ;; + --length) + LENGTH="$2" + shift 2 + ;; + --num) + NUM="$2" + shift 2 + ;; + --mtu) + MTU="$2" + shift 2 + ;; + --threads) + RECV_THREADS="$2" + shift 2 + ;; + --image) + IMAGE="$2" + shift 2 + ;; + --ipv6) + IPV6_FLAG="--ipv6" + shift + ;; + -v) + V_FLAG="-v" + shift + ;; + --help) + sed -n '2,/^$/p' "$0" | sed 's/^# \?//' + exit 0 + ;; + *) + echo "Unknown option: $1" + echo "Use --help for usage information" + exit 1 + ;; + esac +done + +#============================================================================= +# Environment setup +#============================================================================= + +echo "=========================================" +echo "EJFAT Multi-Instance Test - SLURM Job $SLURM_JOB_ID" +echo "=========================================" +echo "Start time: $(date -u '+%Y-%m-%d %H:%M:%S UTC')" +echo "" + +# Validate EJFAT_URI +if [[ -z "${EJFAT_URI:-}" ]]; then + echo "ERROR: EJFAT_URI is required" + echo "Set via: EJFAT_URI='ejfat://...' sbatch $0" + exit 1 +fi + +EJFAT_URI_REDACTED=$(echo "$EJFAT_URI" | sed -E 's|(://)(.{4})[^@]*(.{4})@|\1\2---\3@|') +echo "EJFAT_URI: $EJFAT_URI_REDACTED" +echo "Job nodes: $SLURM_JOB_NODELIST" +echo "Job ID: $SLURM_JOB_ID" +echo "" + +# Get script directory (where minimal_*.sh scripts are located) +# Require E2SAR_SCRIPTS_DIR to be set to avoid hardcoded paths +if [[ -z "${E2SAR_SCRIPTS_DIR:-}" ]]; then + echo "ERROR: E2SAR_SCRIPTS_DIR must be set to the zero_to_hero directory" + echo " export E2SAR_SCRIPTS_DIR=/path/to/E2SAR/scripts/zero_to_hero" + exit 1 +fi +SCRIPT_DIR="$E2SAR_SCRIPTS_DIR" +echo "Script directory: $SCRIPT_DIR" + +# Create runs directory if it doesn't exist +RUNS_DIR="${SLURM_SUBMIT_DIR}/runs" +mkdir -p "$RUNS_DIR" +echo "Runs directory: $RUNS_DIR" + +# Create job-specific working directory for logs and INSTANCE_URI +JOB_DIR="${RUNS_DIR}/slurm_job_${SLURM_JOB_ID}" +mkdir -p "$JOB_DIR" +cd "$JOB_DIR" +echo "Working directory: $JOB_DIR" +echo "" + +# Parse node list +NODE_ARRAY=($(scontrol show hostname $SLURM_JOB_NODELIST)) + +# Port stride matches receive thread count: each receiver binds RECV_THREADS consecutive ports +PORT_STRIDE=$RECV_THREADS + +echo "Configuration:" +echo " Receivers: $NUM_RECEIVERS ($RECEIVERS_PER_NODE per node)" +echo " Senders: $NUM_SENDERS ($SENDERS_PER_NODE per node)" +echo " Base port: $BASE_PORT (stride: $PORT_STRIDE)" +echo "" + +# Calculate required nodes: senders and receivers share the same node pool +NUM_RECEIVER_NODES=$(( (NUM_RECEIVERS + RECEIVERS_PER_NODE - 1) / RECEIVERS_PER_NODE )) +NUM_SENDER_NODES=$(( (NUM_SENDERS + SENDERS_PER_NODE - 1) / SENDERS_PER_NODE )) +REQUIRED_NODES=$(( NUM_RECEIVER_NODES > NUM_SENDER_NODES ? NUM_RECEIVER_NODES : NUM_SENDER_NODES )) + +echo "Node allocation:" +echo " Receiver nodes needed: $NUM_RECEIVER_NODES" +echo " Sender nodes needed: $NUM_SENDER_NODES" +echo " Total nodes required: $REQUIRED_NODES" +echo " Nodes allocated: ${#NODE_ARRAY[@]}" +echo "" + +# Validate node count +if [[ ${#NODE_ARRAY[@]} -lt $REQUIRED_NODES ]]; then + echo "ERROR: Insufficient nodes allocated" + echo " Need $REQUIRED_NODES nodes (max of $NUM_RECEIVER_NODES receiver nodes, $NUM_SENDER_NODES sender nodes)" + echo " Got ${#NODE_ARRAY[@]} nodes" + echo " Use: sbatch -N $REQUIRED_NODES -A $0 --receivers $NUM_RECEIVERS --receivers-per-node $RECEIVERS_PER_NODE --senders $NUM_SENDERS --senders-per-node $SENDERS_PER_NODE" + exit 1 +fi + +# Display node assignments (combined per-node view) +echo "Node assignments:" +for ((node_idx=0; node_idx}" +echo "Receiver arguments (per instance): ${RECEIVER_ARGS[*]:-}" +echo "" + +#============================================================================= +# Install cleanup trap to free reservation on early exit or job cancellation +#============================================================================= + +CLEANUP_DONE=false + +cleanup() { + # Only run cleanup once, and only if INSTANCE_URI exists + if [[ "$CLEANUP_DONE" == "false" && -f "$JOB_DIR/INSTANCE_URI" ]]; then + CLEANUP_DONE=true + echo "" + echo "=========================================" + echo "Cleanup: Freeing Load Balancer" + echo "=========================================" + cd "$JOB_DIR" || return + "$SCRIPT_DIR/minimal_free.sh" ${V_FLAG:+"$V_FLAG"} 2>/dev/null || echo "WARNING: Failed to free load balancer reservation" + fi +} + +trap cleanup EXIT + +#============================================================================= +# Phase 1: Reserve Load Balancer (fresh reservation per job) +#============================================================================= + +echo "=========================================" +echo "Phase 1: Reserve Load Balancer" +echo "=========================================" + +export EJFAT_URI + +# Always create a fresh reservation in the job directory so each job +# has its own isolated INSTANCE_URI and LB reservation. +echo "Creating new LB reservation for job $SLURM_JOB_ID..." +if ! "$SCRIPT_DIR/minimal_reserve.sh"; then + echo "ERROR: Failed to reserve load balancer" + exit 1 +fi + +# Verify INSTANCE_URI file exists +if [[ ! -f "INSTANCE_URI" ]]; then + echo "ERROR: INSTANCE_URI file not found after reservation" + exit 1 +fi + +echo "" +echo "Reservation ready" +echo "" + +#============================================================================= +# Phase 2: Start All Receivers (parallel, background processes) +#============================================================================= + +echo "=========================================" +echo "Phase 2: Start Receivers" +echo "=========================================" + +# Arrays to track receiver state +declare -a RECEIVER_PIDS=() +declare -a RECEIVER_NODES=() +declare -a RECEIVER_PORTS=() +declare -a RECEIVER_EXIT_CODES=() + +# Start all receivers in parallel +RECV_INDEX=0 +for ((node_idx=0; node_idx "$RECV_DIR/receiver_srun.log" 2>&1 & + + RECEIVER_PIDS+=($!) + RECEIVER_NODES+=("$RECV_NODE") + RECEIVER_PORTS+=($RECV_PORT) + + RECV_INDEX=$((RECV_INDEX + 1)) + done +done + +echo "" +echo "Started $NUM_RECEIVERS receivers" +for ((i=0; i "$SEND_DIR/sender_srun.log" 2>&1 & + + SENDER_PIDS+=($!) + SENDER_NODES+=("$SEND_NODE") +done + +echo "" +echo "Started $NUM_SENDERS senders" +for ((i=0; i/dev/null; then + kill -TERM ${RECEIVER_PIDS[$i]} 2>/dev/null || true + fi +done + +# Grace period for graceful shutdown +echo "Waiting 5 seconds for graceful shutdown..." +sleep 5 + +# Force kill any remaining receivers with SIGKILL +echo "Sending SIGKILL to any remaining receivers..." +for ((i=0; i/dev/null; then + kill -9 ${RECEIVER_PIDS[$i]} 2>/dev/null || true + fi +done + +# Collect receiver exit codes +set +e +for ((i=0; i/dev/null || true + RECEIVER_EXIT_CODES[$i]=$? +done +set -e + +echo "All receivers terminated" +echo "" + +#============================================================================= +# Phase 6: Free Load Balancer +#============================================================================= + +echo "=========================================" +echo "Phase 6: Free Load Balancer" +echo "=========================================" + +# Mark cleanup as done and run it explicitly +CLEANUP_DONE=true +if ! "$SCRIPT_DIR/minimal_free.sh" ${V_FLAG:+"$V_FLAG"}; then + echo "WARNING: Failed to free load balancer reservation" +fi + +echo "" + +#============================================================================= +# Summary Report +#============================================================================= + +echo "=========================================" +echo "Multi-Instance Test Summary" +echo "=========================================" +echo "Job ID: $SLURM_JOB_ID" +echo "Job directory: $JOB_DIR" +echo "Configuration: $NUM_RECEIVERS receivers ($RECEIVERS_PER_NODE per node), $NUM_SENDERS senders ($SENDERS_PER_NODE per node)" +echo "Nodes used: $REQUIRED_NODES" +echo "" + +echo "Sender Results:" +ALL_SENDERS_SUCCESS=true +for ((i=0; i Project allocation +# +# Test Options (passed to minimal_sender.sh/minimal_receiver.sh): +# --rate RATE Sending rate in Gbps (default: 1) +# --length LENGTH Event buffer length in bytes (default: 1048576) +# --num COUNT Number of events to send (default: 100) +# --mtu MTU MTU size in bytes (default: 9000) +# --port PORT Receiver data port (default: 10000) +# --image IMAGE Container image (default: ibaldin/e2sar:0.3.1a3) +# --ipv6 Use IPv6 (default: false) +# -v Skip SSL certificate validation (default: disabled) +# +# Note: Receivers always run with --duration 0 (indefinite) and are terminated +# with SIGKILL after the sender completes. +# +# Environment Variables: +# EJFAT_URI Required: EJFAT load balancer URI +# +# Example: +# EJFAT_URI="ejfat://..." sbatch -A perlmutter_slurm.sh --rate 2 --num 1000 +# +# Note: A fresh LB reservation is created for each job and freed on completion. +# EJFAT_URI must be the admin URI (not an INSTANCE_URI). + +#SBATCH -N 2 +#SBATCH -C cpu +#SBATCH -q debug +#SBATCH -t 00:30:00 +#SBATCH -J ejfat_minimal +#SBATCH -o ./slurm-%j.out +#SBATCH -e ./slurm-%j.err +#SBATCH --mail-type=BEGIN,END,FAIL +#SBATCH --mail-user=$USER@nersc.gov + +set -euo pipefail + +#============================================================================= +# Parse command-line arguments +#============================================================================= + +RATE="" +LENGTH="" +NUM="" +MTU="" +PORT="" +IMAGE="" +IPV6_FLAG="" +V_FLAG="" + +while [[ $# -gt 0 ]]; do + case $1 in + --rate) + RATE="$2" + shift 2 + ;; + --length) + LENGTH="$2" + shift 2 + ;; + --num) + NUM="$2" + shift 2 + ;; + --mtu) + MTU="$2" + shift 2 + ;; + --port) + PORT="$2" + shift 2 + ;; + --image) + IMAGE="$2" + shift 2 + ;; + --ipv6) + IPV6_FLAG="--ipv6" + shift + ;; + -v) + V_FLAG="-v" + shift + ;; + --help) + sed -n '2,/^$/p' "$0" | sed 's/^# \?//' + exit 0 + ;; + *) + echo "Unknown option: $1" + echo "Use --help for usage information" + exit 1 + ;; + esac +done + +#============================================================================= +# Environment setup +#============================================================================= + +echo "=========================================" +echo "EJFAT Minimal Test - SLURM Job $SLURM_JOB_ID" +echo "=========================================" +echo "Start time: $(date -u '+%Y-%m-%d %H:%M:%S UTC')" +echo "" + +# Validate EJFAT_URI +if [[ -z "${EJFAT_URI:-}" ]]; then + echo "ERROR: EJFAT_URI is required" + echo "Set via: EJFAT_URI='ejfat://...' sbatch $0" + exit 1 +fi + +EJFAT_URI_REDACTED=$(echo "$EJFAT_URI" | sed -E 's|(://)(.{4})[^@]*(.{4})@|\1\2---\3@|') +echo "EJFAT_URI: $EJFAT_URI_REDACTED" +echo "Job nodes: $SLURM_JOB_NODELIST" +echo "Job ID: $SLURM_JOB_ID" +echo "" + +# Get script directory (where minimal_*.sh scripts are located) +# Require E2SAR_SCRIPTS_DIR to be set to avoid hardcoded paths +if [[ -z "${E2SAR_SCRIPTS_DIR:-}" ]]; then + echo "ERROR: E2SAR_SCRIPTS_DIR must be set to the zero_to_hero directory" + echo " export E2SAR_SCRIPTS_DIR=/path/to/E2SAR/scripts/zero_to_hero" + exit 1 +fi +SCRIPT_DIR="$E2SAR_SCRIPTS_DIR" +echo "Script directory: $SCRIPT_DIR" + +# Create runs directory if it doesn't exist +RUNS_DIR="${SLURM_SUBMIT_DIR}/runs" +mkdir -p "$RUNS_DIR" +echo "Runs directory: $RUNS_DIR" + +# Create job-specific working directory for logs and INSTANCE_URI +JOB_DIR="${RUNS_DIR}/slurm_job_${SLURM_JOB_ID}" +mkdir -p "$JOB_DIR" +cd "$JOB_DIR" +echo "Working directory: $JOB_DIR" +echo "" + +# Parse node list - get first two nodes +# SLURM_JOB_NODELIST format examples: "nid[001-002]", "nid001,nid002" +NODE_ARRAY=($(scontrol show hostname $SLURM_JOB_NODELIST)) + +if [[ ${#NODE_ARRAY[@]} -lt 2 ]]; then + echo "ERROR: Need at least 2 nodes, got ${#NODE_ARRAY[@]}" + exit 1 +fi + +NODE0="${NODE_ARRAY[0]}" +NODE1="${NODE_ARRAY[1]}" + +echo "Receiver node (Node 0): $NODE0" +echo "Sender node (Node 1): $NODE1" +echo "" + +#============================================================================= +# Build command arguments for sender and receiver +#============================================================================= + +SENDER_ARGS=() +RECEIVER_ARGS=() + +# Force receiver to run indefinitely (duration 0) - will be terminated after sender completes +RECEIVER_ARGS+=(--duration 0) + +[[ -n "$IMAGE" ]] && SENDER_ARGS+=(--image "$IMAGE") && RECEIVER_ARGS+=(--image "$IMAGE") +[[ -n "$RATE" ]] && SENDER_ARGS+=(--rate "$RATE") +[[ -n "$LENGTH" ]] && SENDER_ARGS+=(--length "$LENGTH") +[[ -n "$NUM" ]] && SENDER_ARGS+=(--num "$NUM") +[[ -n "$MTU" ]] && SENDER_ARGS+=(--mtu "$MTU") +[[ -n "$PORT" ]] && RECEIVER_ARGS+=(--port "$PORT") +[[ -n "$IPV6_FLAG" ]] && SENDER_ARGS+=("$IPV6_FLAG") && RECEIVER_ARGS+=("$IPV6_FLAG") +[[ -n "$V_FLAG" ]] && SENDER_ARGS+=("$V_FLAG") && RECEIVER_ARGS+=("$V_FLAG") + +echo "Sender arguments: ${SENDER_ARGS[*]:-}" +echo "Receiver arguments: ${RECEIVER_ARGS[*]:-}" +echo "" + +#============================================================================= +# Install cleanup trap to free reservation on early exit or job cancellation +#============================================================================= + +CLEANUP_DONE=false + +cleanup() { + # Only run cleanup once, and only if INSTANCE_URI exists + if [[ "$CLEANUP_DONE" == "false" && -f "$JOB_DIR/INSTANCE_URI" ]]; then + CLEANUP_DONE=true + echo "" + echo "=========================================" + echo "Cleanup: Freeing Load Balancer" + echo "=========================================" + cd "$JOB_DIR" || return + "$SCRIPT_DIR/minimal_free.sh" ${V_FLAG:+"$V_FLAG"} 2>/dev/null || echo "WARNING: Failed to free load balancer reservation" + fi +} + +trap cleanup EXIT + +#============================================================================= +# Phase 1: Reserve Load Balancer (fresh reservation per job) +#============================================================================= + +echo "=========================================" +echo "Phase 1: Reserve Load Balancer" +echo "=========================================" + +export EJFAT_URI + +# Create a fresh reservation for this job +echo "Creating new LB reservation for job $SLURM_JOB_ID..." +if ! "$SCRIPT_DIR/minimal_reserve.sh"; then + echo "ERROR: Failed to reserve load balancer" + exit 1 +fi + +# Verify INSTANCE_URI file exists +if [[ ! -f "INSTANCE_URI" ]]; then + echo "ERROR: INSTANCE_URI file not found after reservation" + exit 1 +fi + +echo "" +echo "Reservation ready" +echo "" + +#============================================================================= +# Phase 2: Start Receiver (background on Node 0) +#============================================================================= + +echo "=========================================" +echo "Phase 2: Start Receiver on $NODE0" +echo "=========================================" + +# Start receiver in background +srun --nodes=1 --ntasks=1 --nodelist="$NODE0" \ + bash -c "cd '$JOB_DIR' && '$SCRIPT_DIR/minimal_receiver.sh' ${RECEIVER_ARGS[*]:-}" \ + > receiver_srun.log 2>&1 & + +RECEIVER_PID=$! +echo "Receiver started (PID: $RECEIVER_PID)" +echo "" + +# Brief delay to allow receiver to register with load balancer +echo "Waiting 10 seconds for receiver to register..." +sleep 10 +echo "" + +#============================================================================= +# Phase 3: Start Sender (foreground on Node 1) +#============================================================================= + +echo "=========================================" +echo "Phase 3: Start Sender on $NODE1" +echo "=========================================" + +# Start sender in foreground (wait for completion) +set +e # Don't exit on error, we want to capture exit code +srun --nodes=1 --ntasks=1 --nodelist="$NODE1" \ + bash -c "cd '$JOB_DIR' && '$SCRIPT_DIR/minimal_sender.sh' ${SENDER_ARGS[*]:-}" \ + > sender_srun.log 2>&1 + +SENDER_EXIT_CODE=$? +set -e + +echo "" +echo "Sender completed (exit code: $SENDER_EXIT_CODE)" +echo "" + +#============================================================================= +# Phase 4: Shutdown Receiver +#============================================================================= + +echo "=========================================" +echo "Phase 4: Shutdown Receiver" +echo "=========================================" + +# Receiver is running with duration=0 (indefinite), terminate it with SIGKILL +if kill -0 $RECEIVER_PID 2>/dev/null; then + echo "Sending SIGKILL to receiver (PID: $RECEIVER_PID)..." + kill -9 $RECEIVER_PID 2>/dev/null || true + sleep 1 + + # Wait for receiver to finish + RECEIVER_EXIT_CODE=0 + if wait $RECEIVER_PID 2>/dev/null; then + echo "Receiver terminated successfully" + else + RECEIVER_EXIT_CODE=$? + echo "Receiver terminated (exit code: $RECEIVER_EXIT_CODE)" + fi +else + echo "Receiver already stopped" + RECEIVER_EXIT_CODE=0 +fi + +echo "" + +#============================================================================= +# Phase 5: Free Load Balancer +#============================================================================= + +echo "=========================================" +echo "Phase 5: Free Load Balancer" +echo "=========================================" + +# Mark cleanup as done and run it explicitly +CLEANUP_DONE=true +if ! "$SCRIPT_DIR/minimal_free.sh" ${V_FLAG:+"$V_FLAG"}; then + echo "WARNING: Failed to free load balancer reservation" +fi + +echo "" + +#============================================================================= +# Summary and Log Collection +#============================================================================= + +echo "=========================================" +echo "Test Summary" +echo "=========================================" +echo "Job ID: $SLURM_JOB_ID" +echo "Job directory: $JOB_DIR" +echo "Sender exit code: $SENDER_EXIT_CODE" +echo "Receiver exit code: $RECEIVER_EXIT_CODE" +echo "" + +echo "Logs available at:" +echo " - Sender log: $JOB_DIR/minimal_sender.log" +echo " - Sender memory log: $JOB_DIR/minimal_sender_memory.log" +echo " - Receiver log: $JOB_DIR/minimal_receiver.log" +echo " - Sender srun log: $JOB_DIR/sender_srun.log" +echo " - Receiver srun log: $JOB_DIR/receiver_srun.log" +echo "" + +# Display log excerpts if they exist +if [[ -f minimal_sender.log ]]; then + echo "--- Sender Log (last 20 lines) ---" + tail -n 20 minimal_sender.log + echo "" +fi + +if [[ -f minimal_sender_memory.log ]]; then + echo "--- Memory Summary ---" + grep "^# Memory Summary:" -A 3 minimal_sender_memory.log || echo "Memory monitoring data available in minimal_sender_memory.log" + echo "" +fi + +if [[ -f minimal_receiver.log ]]; then + echo "--- Receiver Log (last 20 lines) ---" + tail -n 20 minimal_receiver.log + echo "" +fi + +echo "=========================================" +echo "End time: $(date -u '+%Y-%m-%d %H:%M:%S UTC')" +echo "=========================================" + +# Exit with sender's exit code (most important for test success) +exit $SENDER_EXIT_CODE diff --git a/scripts/zero_to_hero/setup_env.sh b/scripts/zero_to_hero/setup_env.sh new file mode 100755 index 0000000..1f5ee22 --- /dev/null +++ b/scripts/zero_to_hero/setup_env.sh @@ -0,0 +1,31 @@ +#!/bin/bash +# E2SAR Zero to Hero Environment Setup +# +# Usage: +# source /path/to/zero_to_hero/setup_env.sh +# +# Or add to ~/.bashrc or ~/.zshrc: +# source /path/to/zero_to_hero/setup_env.sh +# + +# Determine the directory containing this script +if [[ -n "${BASH_SOURCE[0]}" ]]; then + E2SAR_SCRIPTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +elif [[ -n "${(%):-%x}" ]]; then + # zsh compatibility + E2SAR_SCRIPTS_DIR="$(cd "$(dirname "${(%):-%x}")" && pwd)" +else + echo "Warning: Could not determine E2SAR scripts directory" + return 1 2>/dev/null || exit 1 +fi + +# Add scripts directory to PATH if not already present +if [[ ":$PATH:" != *":$E2SAR_SCRIPTS_DIR:"* ]]; then + export PATH="$E2SAR_SCRIPTS_DIR:$PATH" +fi + +# Export for reference +export E2SAR_SCRIPTS_DIR + +echo "E2SAR Zero to Hero scripts available at: $E2SAR_SCRIPTS_DIR" +echo "Available commands: minimal_reserve.sh, minimal_sender.sh, minimal_receiver.sh, minimal_free.sh, perlmutter_slurm.sh, perlmutter_multi_slurm.sh, monitor_memory.sh" diff --git a/src/e2sarDPReassembler.cpp b/src/e2sarDPReassembler.cpp index 81d203d..07f0997 100644 --- a/src/e2sarDPReassembler.cpp +++ b/src/e2sarDPReassembler.cpp @@ -277,6 +277,17 @@ namespace e2sar auto until = nowT + boost::chrono::milliseconds(eventTimeout_ms); boost::this_thread::sleep_until(until); } + // drain in progress queues in threads + for(auto i = reas.recvThreadState.begin(); i != reas.recvThreadState.end(); ++i) + { + for (auto it = i->eventsInProgress.begin(); it != i->eventsInProgress.end(); ) { + if (it->second->event != nullptr) { + delete[] it->second->event; + it->second.reset(); + } + it = i->eventsInProgress.erase(it); + } + } } void Reassembler::RecvThreadState::_threadBody() diff --git a/src/e2sarDPSegmenter.cpp b/src/e2sarDPSegmenter.cpp index bbd5941..3ddd328 100644 --- a/src/e2sarDPSegmenter.cpp +++ b/src/e2sarDPSegmenter.cpp @@ -11,6 +11,7 @@ #include "e2sarDPSegmenter.hpp" #include "e2sarUtil.hpp" +#include "e2sarNetUtil.hpp" #include "e2sarAffinity.hpp" @@ -450,6 +451,10 @@ namespace e2sar } } // not doing .join or stop for threadpool as it's d-tor will do it + + // drain all pending lambdas before the destructor calls stop() which + // would abandon queued-but-not-started handlers and leak their items + threadPool.join(); #ifdef LIBURING_AVAILABLE // reap the remaining CQEs while(seg.outstandingSends > 0) @@ -869,14 +874,12 @@ namespace e2sar // otherwise just close result Segmenter::SendThreadState::_waitAndCloseFd(int fd) { - int outq = 0; bool stop{false}; // busy wait while the socket has outstanding data while(!stop) { - if (ioctl(fd, TIOCOUTQ, &outq) == 0) - stop = (outq == 0); - else + auto res = NetUtil::getSocketOutstandingBytes(fd); + if (res.has_error() || ((not res.has_error()) && (res.value() == 0))) stop = true; } close(fd); @@ -938,8 +941,10 @@ namespace e2sar //sendThreadCond.notify_one(); if (res) return 0; - else + else { + delete item; return E2SARErrorInfo{E2SARErrorc::MemoryError, "Send queue is temporarily full, try again later"}; + } } result Segmenter::SegmenterFlags::getFromINI(const std::string &iniFile) noexcept diff --git a/src/e2sarNetUtil.cpp b/src/e2sarNetUtil.cpp index 76c390c..067ee72 100644 --- a/src/e2sarNetUtil.cpp +++ b/src/e2sarNetUtil.cpp @@ -154,4 +154,22 @@ namespace e2sar return E2SARErrorInfo{E2SARErrorc::SocketError, "Unrecoverable NETLINK error"}; } #endif + result NetUtil::getSocketOutstandingBytes(int sockfd) noexcept { + int outstanding = 0; + int res = 0; + #if defined(SIOCOUTQ_AVAILABLE) + res = ioctl(sockfd, TIOCOUTQ, &outstanding); + if (res < 0) + return E2SARErrorInfo{E2SARErrorc::SocketError, strerror(errno)}; + #elif defined(SO_NWRITE_AVAILABLE) + socklen_t len = sizeof(outstanding); + res = getsockopt(sockfd, SOL_SOCKET, SO_NWRITE, &outstanding, &len); + if (res < 0) + return E2SARErrorInfo{E2SARErrorc::SocketError, strerror(errno)}; + #else + #warning "No send buffer query support on this platform" + outstanding = -1; // unsupported + #endif + return outstanding; + } } diff --git a/test/meson.build b/test/meson.build index ed70eba..dadc23a 100644 --- a/test/meson.build +++ b/test/meson.build @@ -74,7 +74,7 @@ e2sar_opt_test = executable('e2sar_opt_test', 'e2sar_opt_test.cpp', include_directories: inc, link_with: libe2sar, link_args: linker_flags, - dependencies: [boost_dep]) + dependencies: [boost_dep, thread_dep, grpc_dep, protobuf_dep]) # these tests have conditional compilation and may be NOOPs on non-linux platforms e2sar_netutil_test = executable('e2sar_netutil_test', 'e2sar_netutil_test.cpp', diff --git a/test/py_test/test_CPToken.py b/test/py_test/test_CPToken.py new file mode 100644 index 0000000..ae80a9d --- /dev/null +++ b/test/py_test/test_CPToken.py @@ -0,0 +1,320 @@ +""" +The pytest for cpp class e2sar::LBManager token management methods. + +Tests the newly added token management functionality including: +- createToken() +- listTokenPermissions() +- listChildTokens() +- revokeToken() + +To make sure it's working, either append the path of "e2sar_py.*.so" to sys.path. E.g, +# import sys +# sys.path.append( +# '/build/src/pybind') + +Or, add this path to PYTHONPATH, e.g, +# export PYTHONPATH=/build/src/pybind + +Run with: pytest -m cp -v test_CPToken.py +""" + +import pytest +import os + +# Make sure the compiled module is added to your path +import e2sar_py + +lbm = e2sar_py.ControlPlane.LBManager +ej_uri = e2sar_py.EjfatURI +TokenPermission = e2sar_py.ControlPlane.TokenPermission +TokenDetails = e2sar_py.ControlPlane.TokenDetails + +# URI string - should have admin token in EJFAT_URI environment variable +# or use a test URI string here +URI_STR = os.getenv( + "EJFAT_URI", + "ejfats://udplbd@192.168.0.3:18347/lb/1?data=127.0.0.1&sync=192.168.88.199:1234" +) + + +@pytest.mark.cp +def test_create_and_revoke_token_default_permissions(): + """ + Test case (a): Create a new instance token with default permissions + and immediately revoke it, verifying success of both operations. + """ + # Create URI with admin token + uri = ej_uri(URI_STR, ej_uri.TokenType.admin) + assert isinstance(uri, ej_uri), "EjfatURI creation failed!" + + # Create LBManager instance + lb_manager = lbm(uri, False) # False = skip server certificate validation + assert isinstance(lb_manager, lbm), "LBManager creation failed!" + + # Create a new token with default permissions (read-only on all resources) + token_name = "test_token_default_perms" + default_perms = [ + TokenPermission( + ej_uri.TokenType.admin, + "", # empty resource ID = wildcard + ej_uri.TokenPermission._read_only_ + ) + ] + + result_create = lb_manager.create_token(token_name, default_perms) + assert not result_create.has_error(), f"Token creation failed: {result_create.error().message() if result_create.has_error() else ''}" + assert result_create.has_value(), "Token creation returned no value!" + + created_token = result_create.value() + assert isinstance(created_token, str), "Created token is not a string!" + assert len(created_token) > 0, "Created token is empty!" + print(f"Created token: {created_token[:20]}... (length: {len(created_token)})") + + # Revoke the token using the token string + result_revoke = lb_manager.revoke_token_by_string(created_token) + assert not result_revoke.has_error(), f"Token revocation failed: {result_revoke.error().message() if result_revoke.has_error() else ''}" + assert result_revoke.has_value(), "Token revocation returned no value!" + assert result_revoke.value() == 0, f"Token revocation returned non-zero: {result_revoke.value()}" + + print(f"Successfully created and revoked token: {token_name}") + + +@pytest.mark.cp +def test_create_list_validate_revoke_token_with_permissions(): + """ + Test case (b): Create a new instance token with reserve and update permissions, + call listTokenPermissions() and validate the created token and its permissions, + delete the token, verifying success of each operation. + """ + # Create URI with admin token + uri = ej_uri(URI_STR, ej_uri.TokenType.admin) + assert isinstance(uri, ej_uri), "EjfatURI creation failed!" + + # Create LBManager instance + lb_manager = lbm(uri, False) + assert isinstance(lb_manager, lbm), "LBManager creation failed!" + + # Create a new token with reserve and update permissions + token_name = "test_token_reserve_update" + permissions = [ + TokenPermission( + ej_uri.TokenType.instance, + "", # empty resource ID = wildcard for all instances + ej_uri.TokenPermission._reserve_ + ), + TokenPermission( + ej_uri.TokenType.instance, + "", + ej_uri.TokenPermission._update_ + ) + ] + + # Create the token + result_create = lb_manager.create_token(token_name, permissions) + assert not result_create.has_error(), f"Token creation failed: {result_create.error().message() if result_create.has_error() else ''}" + assert result_create.has_value(), "Token creation returned no value!" + + created_token = result_create.value() + assert isinstance(created_token, str), "Created token is not a string!" + assert len(created_token) > 0, "Created token is empty!" + print(f"Created token with reserve/update perms: {created_token[:20]}...") + + # List token permissions using the token string + result_list = lb_manager.list_token_permissions_by_string(created_token) + assert not result_list.has_error(), f"List permissions failed: {result_list.error().message() if result_list.has_error() else ''}" + assert result_list.has_value(), "List permissions returned no value!" + + token_details = result_list.value() + assert isinstance(token_details, TokenDetails), "Token details is not TokenDetails type!" + + # Validate the token details + assert token_details.name == token_name, f"Token name mismatch: expected '{token_name}', got '{token_details.name}'" + assert len(token_details.permissions) == 2, f"Expected 2 permissions, got {len(token_details.permissions)}" + assert token_details.id > 0, f"Token ID should be positive, got {token_details.id}" + assert len(token_details.created_at) > 0, "Token created_at is empty!" + + print(f"Token details: name={token_details.name}, id={token_details.id}, created_at={token_details.created_at}") + print(f"Token has {len(token_details.permissions)} permissions:") + + # Validate permissions + perm_types = [] + for perm in token_details.permissions: + assert isinstance(perm, TokenPermission), "Permission is not TokenPermission type!" + perm_types.append(perm.permission) + print(f" - Resource Type: {perm.resourceType}, Resource ID: '{perm.resourceId}', Permission: {perm.permission}") + + # Check that we have reserve and update permissions + assert ej_uri.TokenPermission._reserve_ in perm_types, "Reserve permission not found in token!" + assert ej_uri.TokenPermission._update_ in perm_types, "Update permission not found in token!" + + # Revoke the token + result_revoke = lb_manager.revoke_token_by_string(created_token) + assert not result_revoke.has_error(), f"Token revocation failed: {result_revoke.error().message() if result_revoke.has_error() else ''}" + assert result_revoke.has_value(), "Token revocation returned no value!" + assert result_revoke.value() == 0, f"Token revocation returned non-zero: {result_revoke.value()}" + + print(f"Successfully created, validated, and revoked token: {token_name}") + + +@pytest.mark.cp +def test_create_list_child_tokens_and_revoke(): + """ + Test case (c): Using admin token create two instance tokens, + call listChildTokens() using admin token and verify the two new tokens + are its children, revoke both new tokens, verifying success of each operation. + """ + # Create URI with admin token + uri = ej_uri(URI_STR, ej_uri.TokenType.admin) + assert isinstance(uri, ej_uri), "EjfatURI creation failed!" + + # Create LBManager instance + lb_manager = lbm(uri, False) + assert isinstance(lb_manager, lbm), "LBManager creation failed!" + + # Get the admin token from the URI + admin_token_result = uri.get_admin_token() + assert not admin_token_result.has_error(), "Failed to get admin token from URI!" + admin_token = admin_token_result.value() + assert len(admin_token) > 0, "Admin token is empty!" + + # Get initial count of child tokens for baseline + result_initial_children = lb_manager.list_child_tokens_by_string(admin_token) + initial_child_count = 0 + if not result_initial_children.has_error(): + initial_child_count = len(result_initial_children.value()) + print(f"Initial child token count: {initial_child_count}") + + # Create first token + token_name_1 = "test_child_token_1" + permissions_1 = [ + TokenPermission( + ej_uri.TokenType.instance, + "", + ej_uri.TokenPermission._register_ + ) + ] + + result_create_1 = lb_manager.create_token(token_name_1, permissions_1) + assert not result_create_1.has_error(), f"First token creation failed: {result_create_1.error().message() if result_create_1.has_error() else ''}" + created_token_1 = result_create_1.value() + assert len(created_token_1) > 0, "First created token is empty!" + print(f"Created first child token: {created_token_1[:20]}...") + + # Create second token + token_name_2 = "test_child_token_2" + permissions_2 = [ + TokenPermission( + ej_uri.TokenType.session, + "", + ej_uri.TokenPermission._read_only_ + ) + ] + + result_create_2 = lb_manager.create_token(token_name_2, permissions_2) + assert not result_create_2.has_error(), f"Second token creation failed: {result_create_2.error().message() if result_create_2.has_error() else ''}" + created_token_2 = result_create_2.value() + assert len(created_token_2) > 0, "Second created token is empty!" + print(f"Created second child token: {created_token_2[:20]}...") + + # List child tokens using admin token + result_list_children = lb_manager.list_child_tokens_by_string(admin_token) + assert not result_list_children.has_error(), f"List child tokens failed: {result_list_children.error().message() if result_list_children.has_error() else ''}" + assert result_list_children.has_value(), "List child tokens returned no value!" + + child_tokens = result_list_children.value() + assert isinstance(child_tokens, list), "Child tokens is not a list!" + assert len(child_tokens) >= initial_child_count + 2, f"Expected at least {initial_child_count + 2} child tokens, got {len(child_tokens)}" + + print(f"Total child tokens: {len(child_tokens)}") + + # Verify both new tokens are in the children list + child_token_names = [child.name for child in child_tokens] + assert token_name_1 in child_token_names, f"First token '{token_name_1}' not found in child tokens!" + assert token_name_2 in child_token_names, f"Second token '{token_name_2}' not found in child tokens!" + + print(f"Verified both tokens are children of admin token:") + for child in child_tokens: + if child.name in [token_name_1, token_name_2]: + print(f" - {child.name} (id={child.id}, created_at={child.created_at})") + + # Revoke first token + result_revoke_1 = lb_manager.revoke_token_by_string(created_token_1) + assert not result_revoke_1.has_error(), f"First token revocation failed: {result_revoke_1.error().message() if result_revoke_1.has_error() else ''}" + assert result_revoke_1.value() == 0, f"First token revocation returned non-zero: {result_revoke_1.value()}" + print(f"Successfully revoked first token: {token_name_1}") + + # Revoke second token + result_revoke_2 = lb_manager.revoke_token_by_string(created_token_2) + assert not result_revoke_2.has_error(), f"Second token revocation failed: {result_revoke_2.error().message() if result_revoke_2.has_error() else ''}" + assert result_revoke_2.value() == 0, f"Second token revocation returned non-zero: {result_revoke_2.value()}" + print(f"Successfully revoked second token: {token_name_2}") + + # Verify child tokens are removed + result_final_children = lb_manager.list_child_tokens_by_string(admin_token) + if not result_final_children.has_error(): + final_child_tokens = result_final_children.value() + final_child_names = [child.name for child in final_child_tokens] + assert token_name_1 not in final_child_names, f"First token '{token_name_1}' still in child list after revocation!" + assert token_name_2 not in final_child_names, f"Second token '{token_name_2}' still in child list after revocation!" + print(f"Verified both tokens removed from child list. Final count: {len(final_child_tokens)}") + + print("Successfully created, verified, and revoked two child tokens!") + + +@pytest.mark.cp +def test_list_token_permissions_by_id(): + """ + Bonus test: Test listing token permissions using token ID instead of string. + """ + # Create URI with admin token + uri = ej_uri(URI_STR, ej_uri.TokenType.admin) + assert isinstance(uri, ej_uri), "EjfatURI creation failed!" + + # Create LBManager instance + lb_manager = lbm(uri, False) + assert isinstance(lb_manager, lbm), "LBManager creation failed!" + + # Create a test token + token_name = "test_token_by_id" + permissions = [ + TokenPermission( + ej_uri.TokenType.admin, + "", + ej_uri.TokenPermission._read_only_ + ) + ] + + result_create = lb_manager.create_token(token_name, permissions) + assert not result_create.has_error(), "Token creation failed!" + created_token = result_create.value() + + # First, get the token details to extract the ID + result_list_by_string = lb_manager.list_token_permissions_by_string(created_token) + assert not result_list_by_string.has_error(), "List permissions by string failed!" + token_details_from_string = result_list_by_string.value() + token_id = token_details_from_string.id + + print(f"Token ID extracted: {token_id}") + + # Now test listing by ID + result_list_by_id = lb_manager.list_token_permissions_by_id(token_id) + assert not result_list_by_id.has_error(), f"List permissions by ID failed: {result_list_by_id.error().message() if result_list_by_id.has_error() else ''}" + token_details_from_id = result_list_by_id.value() + + # Verify the details match + assert token_details_from_id.id == token_id, "Token ID mismatch!" + assert token_details_from_id.name == token_name, "Token name mismatch!" + assert len(token_details_from_id.permissions) == len(token_details_from_string.permissions), "Permission count mismatch!" + + print(f"Successfully retrieved token details by ID: {token_id}") + + # Clean up - revoke by ID + result_revoke = lb_manager.revoke_token_by_id(token_id) + assert not result_revoke.has_error(), "Token revocation by ID failed!" + assert result_revoke.value() == 0, "Token revocation returned non-zero!" + + print(f"Successfully revoked token by ID: {token_id}") + + +if __name__ == "__main__": + pytest.main(["-v", __file__]) diff --git a/wiki b/wiki index ee4e720..53dd808 160000 --- a/wiki +++ b/wiki @@ -1 +1 @@ -Subproject commit ee4e72018b30b46e15355a14eb1428726c3fa4cd +Subproject commit 53dd808303d485b7f5d83da439372bfe17cbbac4