-
Notifications
You must be signed in to change notification settings - Fork 27
[Draft] 355_benchmark #570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -264,7 +264,9 @@ def build_megatron_helper(primus_path: Path, patch_args: Path, backend_path: str | |
|
|
||
| emerging_optimizers_path = primus_path / "third_party/Emerging-Optimizers" | ||
| log_info(f"Building Emerging Optimizers in {emerging_optimizers_path}") | ||
| ret = subprocess.run(["pip", "install", "-e", str(emerging_optimizers_path)], check=True) | ||
| ret = subprocess.run( | ||
| ["pip", "install", "--no-build-isolation", "-e", str(emerging_optimizers_path)], check=True | ||
| ) | ||
| if ret.returncode != 0: | ||
| log_error_and_exit("Building Emerging Optimizers failed.") | ||
|
Comment on lines
+267
to
271
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -113,7 +113,7 @@ echo "ENV_ARGS: ${ENV_ARGS[*]}" | |||||||||||||||||||||||||
| HOSTNAME=$(hostname) | ||||||||||||||||||||||||||
| ARGS=("$@") | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| VOLUME_ARGS=(-v "$PRIMUS_PATH":"$PRIMUS_PATH" -v "$DATA_PATH":"$DATA_PATH") | ||||||||||||||||||||||||||
| VOLUME_ARGS=(-v "$PRIMUS_PATH":"$PRIMUS_PATH" -v "$DATA_PATH":"$DATA_PATH" -v "/shared_aig/c4:/shared_aig/c4") | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
| VOLUME_ARGS=(-v "$PRIMUS_PATH":"$PRIMUS_PATH" -v "$DATA_PATH":"$DATA_PATH" -v "/shared_aig/c4:/shared_aig/c4") | |
| VOLUME_ARGS=(-v "$PRIMUS_PATH":"$PRIMUS_PATH" -v "$DATA_PATH":"$DATA_PATH") | |
| # Optional extra volume mounts: set PRIMUS_EXTRA_MOUNTS to a string like: | |
| # '-v /shared_aig/c4:/shared_aig/c4 -v /other/path:/other/path:ro' | |
| if [[ -n "${PRIMUS_EXTRA_MOUNTS:-}" ]]; then | |
| # Intentional word splitting to allow multiple -v arguments. | |
| VOLUME_ARGS+=(${PRIMUS_EXTRA_MOUNTS}) | |
| elif [[ -d "/shared_aig/c4" ]]; then | |
| # Backwards-compatible default: only mount /shared_aig/c4 if it exists. | |
| VOLUME_ARGS+=(-v "/shared_aig/c4:/shared_aig/c4") | |
| fi |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -38,10 +38,14 @@ export LOG_DIR=${LOG_DIR:-"./output"} | |
| LOG_FILE="${LOG_DIR}/log_slurm_pretrain.txt" | ||
| mkdir -p "$LOG_DIR" | ||
|
|
||
| # --nodelist="uswslocpm2m-106-[273,297,310,319,687,732,836,892]" \ | ||
| srun -N "${NNODES}" \ | ||
| --exclusive \ | ||
| --export ALL \ | ||
| --ntasks-per-node=1 \ | ||
| --time="${SLURM_TIME:-07:00:00}" \ | ||
| --nodelist="${SLURM_NODELIST:-}" \ | ||
| --partition="${SLURM_PARTITION:-amd-aig}" \ | ||
|
Comment on lines
+46
to
+48
|
||
| --cpus-per-task="${CPUS_PER_TASK:-128}" \ | ||
| bash -c " | ||
| readarray -t node_array < <(scontrol show hostnames \"\$SLURM_JOB_NODELIST\") | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,144 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #!/bin/bash | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ############################################################################### | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Prepare C4 English dataset for Megatron training with DeepSeek V3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # This script: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # 1. Downloads C4-en data from HuggingFace (configurable amount) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/allenai/c4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # cd c4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # git lfs pull --include "en/*" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # 2. Converts to JSONL format | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # 3. Tokenizes into Megatron .bin/.idx format using DeepSeekV3Tokenizer | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Usage: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # bash prepare_c4_data.sh [--num_shards N] [--data_dir /path/to/data] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # By default downloads 1 shard (~350MB compressed, ~3M documents) for testing. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Full C4-en has 1024 shards. Adjust --num_shards for more data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ############################################################################### | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| set -e | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # ======================== Configuration ======================== | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NUM_SHARDS=${NUM_SHARDS:-200} # Number of C4 shards to download (1-1024) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NUM_SHARDS=${NUM_SHARDS:-200} # Number of C4 shards to download (1-1024) | |
| NUM_SHARDS=${NUM_SHARDS:-1} # Number of C4 shards to download (1-1024) |
Copilot
AI
Feb 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PRIMUS_PATH defaults to a user-specific absolute path (/shared/john/Primus), which makes this script non-portable and likely to fail for other users/environments. Consider requiring PRIMUS_PATH to be provided (and exit with a clear message if unset) or deriving it relative to the repo root.
| NUM_SHARDS=${NUM_SHARDS:-200} # Number of C4 shards to download (1-1024) | |
| DATA_DIR=${DATA_DIR:-"/shared/c4"} | |
| PRIMUS_PATH=${PRIMUS_PATH:-"/shared/john/Primus"} | |
| SCRIPT_DIR="$(cd -- "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)" | |
| NUM_SHARDS=${NUM_SHARDS:-200} # Number of C4 shards to download (1-1024) | |
| DATA_DIR=${DATA_DIR:-"/shared/c4"} | |
| PRIMUS_PATH=${PRIMUS_PATH:-"${SCRIPT_DIR}/../Primus"} | |
| if [[ ! -d "$PRIMUS_PATH" ]]; then | |
| echo "Error: PRIMUS_PATH is not set to a valid directory: '$PRIMUS_PATH'" >&2 | |
| echo "Please set PRIMUS_PATH explicitly, for example:" >&2 | |
| echo " export PRIMUS_PATH=/path/to/Primus" >&2 | |
| exit 1 | |
| fi |
Copilot
AI
Feb 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header comment says the script "downloads" C4 shards and that the default is 1 shard, but the implementation explicitly skips downloading and defaults NUM_SHARDS to 200. Please align the documentation and defaults with the actual behavior (either implement download, or update comments and set NUM_SHARDS default accordingly).
Copilot
AI
Feb 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This merge step appends directly into the final JSONL_FILE with '>>'. If the script is interrupted or a shard is corrupt, you'll end up with a partial JSONL that then causes future runs to skip merging because the file exists. Write to a temporary file and atomically move it into place on success (and/or validate the output) to avoid leaving a bad cached artifact.
| for i in $(seq 0 $((NUM_SHARDS - 1))); do | |
| SHARD_NAME=$(printf "c4-train.%05d-of-01024.json.gz" "$i") | |
| SHARD_PATH="${RAW_DIR}/${SHARD_NAME}" | |
| echo " [${i}/${NUM_SHARDS}] Decompressing ${SHARD_NAME} ..." | |
| zcat "${SHARD_PATH}" >> "${JSONL_FILE}" | |
| done | |
| # Write to a temporary file first to avoid leaving a corrupted final JSONL | |
| TMP_JSONL_FILE="$(mktemp "${JSONL_DIR}/c4_en_train.jsonl.tmp.XXXXXX")" | |
| # Ensure the temporary file is cleaned up on failure or interruption | |
| cleanup_tmp() { | |
| if [ -n "${TMP_JSONL_FILE:-}" ] && [ -f "${TMP_JSONL_FILE}" ]; then | |
| rm -f "${TMP_JSONL_FILE}" | |
| fi | |
| } | |
| trap cleanup_tmp EXIT INT TERM | |
| for i in $(seq 0 $((NUM_SHARDS - 1))); do | |
| SHARD_NAME=$(printf "c4-train.%05d-of-01024.json.gz" "$i") | |
| SHARD_PATH="${RAW_DIR}/${SHARD_NAME}" | |
| echo " [${i}/${NUM_SHARDS}] Decompressing ${SHARD_NAME} ..." | |
| zcat "${SHARD_PATH}" >> "${TMP_JSONL_FILE}" | |
| done | |
| # Basic validation: ensure the merged file is non-empty before finalizing | |
| if [ ! -s "${TMP_JSONL_FILE}" ]; then | |
| echo "ERROR: Merged JSONL is empty; aborting." | |
| cleanup_tmp | |
| exit 1 | |
| fi | |
| # Move the completed temp file into place atomically | |
| mv "${TMP_JSONL_FILE}" "${JSONL_FILE}" | |
| # Prevent trap from deleting the now-final JSONL file | |
| TMP_JSONL_FILE="" | |
| trap - EXIT INT TERM |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -193,12 +193,26 @@ def inject( | |||||
| local_rank = torch.cuda.current_device() | ||||||
| r_total, r_used, r_free = get_rocm_smi_mem_info(local_rank) | ||||||
| r_ratio = r_used / r_total | ||||||
|
|
||||||
| # get the max rocm_mem_usage | ||||||
| usage_tensor = torch.tensor([r_used], device="cuda", dtype=torch.float32) | ||||||
|
||||||
| usage_tensor = torch.tensor([r_used], device="cuda", dtype=torch.float32) | |
| usage_tensor = torch.tensor([r_used], device="cuda", dtype=torch.int64) |
Copilot
AI
Feb 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ROCm SMI logging now depends on torch.tensor and torch.distributed.* inside the same try block as get_rocm_smi_mem_info. In unit tests this module monkeypatches 'torch' with a minimal fake that lacks tensor/distributed, so this will swallow the exception and drop the "rocm mem usage/free/total" segment entirely, breaking existing assertions. Consider constructing the local ROCm SMI string first, then optionally (in a separate guarded block) doing distributed collectives only when torch.distributed is available+initialized, so local stats remain logged even without distributed.
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_usage / r_total uses the local rank’s r_total when computing the max-usage ratio, which can be incorrect if ranks have different total HBM sizes (or if totals differ for any reason). To make this accurate, gather each rank’s r_total (or gather precomputed r_used/r_total ratios) and compute the max ratio corresponding to max_rank.
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,51 @@ | ||||||||||||||
| #!/bin/bash | ||||||||||||||
|
|
||||||||||||||
| export HF_TOKEN="your_hf_token" # make it your own hf token | ||||||||||||||
| export WANDB_API_KEY="your_wandb_api_key" # make it your own wandb api key | ||||||||||||||
|
Comment on lines
+3
to
+4
|
||||||||||||||
| export HF_TOKEN="your_hf_token" # make it your own hf token | |
| export WANDB_API_KEY="your_wandb_api_key" # make it your own wandb api key | |
| : "${HF_TOKEN:?Environment variable HF_TOKEN must be set}" | |
| : "${WANDB_API_KEY:?Environment variable WANDB_API_KEY must be set}" | |
| export HF_TOKEN | |
| export WANDB_API_KEY |
Copilot
AI
Feb 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLEAN_DOCKER_CONTAINER=1 will cause examples/run_local_pretrain.sh to remove all containers on the host (it runs "docker/podman ps -aq" and rm -f each). That’s a risky default for shared nodes; consider defaulting this to 0 and only enabling it explicitly when you’re sure it’s safe.
| export CLEAN_DOCKER_CONTAINER=1 | |
| # Set to 1 to allow run_slurm_pretrain.sh to clean up all Docker/Podman containers on the host. | |
| # Use 1 only on dedicated/non-shared nodes where this is safe. | |
| export CLEAN_DOCKER_CONTAINER=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config changes the default
pipeline_model_parallel_sizefrom the common default of 1 to 8, which is inconsistent with the MI355X DeepSeek-V3 FP8 config and most other configs. To avoid surprising users (and breaking single-node runs), consider keeping the default at 1 and relying onPRIMUS_PPto opt into PP>1.