openai · simon-marcus · Mar 26, 2026
diff --git a/...track_10min_16mb/2026-03-26_WaterLOO_FullRescore_SelfExclusion_0.0990/README.md b/...track_10min_16mb/2026-03-26_WaterLOO_FullRescore_SelfExclusion_0.0990/README.md
@@ -0,0 +1,99 @@
+# WaterLOO: Full-Rescore N-gram Cache with Self-Exclusion
+
+**val_bpb: 0.0990 (3-seed mean, std 0.00002) | ~15.87 MB | 8xH100 SXM**
+
+## Results
+
+| Seed | Steps | Pre-Quant BPB | Sliding BPB | N-gram BPB | Artifact |
+|------|-------|---------------|-------------|------------|----------|
+| 1337 | 6933  | 1.1395        | 1.1253      | **0.09897** | 15.89 MB |
+| 42   | 6930  | 1.1409        | 1.1268      | **0.09897** | 15.86 MB |
+| 2025 | 6930  | 1.1410        | 1.1271      | **0.09902** | 15.87 MB |
+| **Mean** | 6931 | **1.1405** | **1.1264** | **0.09899** | **15.87 MB** |
+| **Std**  | | | | **0.00002** | |
+
+## The Idea
+
+BROADSIDE showed that once you decouple the neural forward pass from the n-gram scoring, the usual two-pass bottleneck mostly disappears. You can store per-token neural probabilities in Pass 1, build a complete cache in one fast vectorized shot, and then rescore the validation stream against that complete cache while there is still plenty of eval clock left.
+
+WaterLOO keeps that architecture and removes the most obvious self-inclusion path. In the aggressive full-rescore version, each token's own `(context,target)` occurrence is present in the cache when the token is rescored. Here, Pass 2 performs **leave-one-out scoring**:
+
+- subtract `1` from the token's context count
+- subtract `1` from the token's `(context,target)` count
+- then apply the same backoff, `min_count`, entropy-adaptive alpha, and order multipliers as before
+
+So every token still benefits from a globally warm cache, but it no longer gets to vote for itself. That is a stricter and more conservative use of the same full-rescore machinery.
+
+## Architecture
+
+1. **Pass 1** (~89s): standard sliding-window neural eval, storing per-token `model_p` and entropy in numpy arrays
+2. **Cache build** (~32-34s): build the complete order `2-12` hashed n-gram cache from the validation stream via `np.bincount`
+3. **Pass 2** (~22s): rescore all tokens against the full cache with leave-one-out count subtraction
+
+The important result is that this still lands at `0.0990` BPB over three seeds, well ahead of the currently visible two-pass frontier.
+
+## Key Design Choices
+
+### Full-stream rescore
+
+Like BROADSIDE, this rescoring covers the full validation stream rather than only a fixed prefix. The gain is still mostly structural:
+
+- no second neural forward pass
+- vectorized cache construction
+- enough eval headroom to score all tokens rather than only the coldest chunks
+
+### Leave-one-out self-exclusion
+
+This is the main difference from the more aggressive companion submission. At score time, each token's own direct contribution is removed before eligibility and probability are computed. The cache stays global; the self-count does not.
+
+### N-gram parameters
+
+- order `2-12`
+- `4,194,304` buckets
+- alpha range `[0.05, 0.70]`
+- entropy-adaptive alpha
+- low orders suppressed, high orders boosted
+- `min_count >= 2`
+
+### Complementary training
+
+Complementary training remains enabled, so the neural model is still encouraged to spend capacity on tokens the n-gram stack is less likely to predict well.
+
+## Timing Budget (8xH100)
+
+| Phase | Time |
+|-------|------|
+| Training | 600s |
+| Diagnostic eval | ~2s |
+| GPTQ int6 export + roundtrip | ~7s |
+| Sliding window eval | ~75s |
+| N-gram Pass 1 | ~89s |
+| Cache build | ~33s |
+| N-gram Pass 2 | ~22s |
+| **Total eval** | **~144-145s** |
+
+## Reproduction
+
+```bash
+bash launch.sh base
+```
+
+Multi-seed package:
+
+```bash
+bash launch_multiseed.sh
+```
+
+This uses `SEEDS=1337,42,2025` by default and produces:
+
+```text
+logs/ppm_loo_seed1337.txt
+logs/ppm_loo_seed42.txt
+logs/ppm_loo_seed2025.txt
+```
+
+## Notes
+
+This submission is intended as the more conservative counterpart to the companion full-rescore result. It keeps the same decoupled full-rescore eval architecture, but removes each token's own direct cache contribution during rescoring.
+
+Co-authored with Codex.
diff --git a/records/track_10min_16mb/2026-03-26_WaterLOO_FullRescore_SelfExclusion_0.0990/launch.sh b/records/track_10min_16mb/2026-03-26_WaterLOO_FullRescore_SelfExclusion_0.0990/launch.sh
@@ -0,0 +1,73 @@
+#!/usr/bin/env bash
+# Launch leave-one-out PPM N-gram Rescore follow-up
+# Usage: bash launch.sh [base|smoke]
+set -euo pipefail
+
+MODE="${1:-base}"
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+TRAIN_SCRIPT="$SCRIPT_DIR/train_gpt.py"
+
+# Shared defaults
+export DATA_ROOT_MODE="${DATA_ROOT_MODE:-tmp}"
+export COMPLEMENT_ENABLED="${COMPLEMENT_ENABLED:-1}"
+export COMPLEMENT_ALPHA="${COMPLEMENT_ALPHA:-0.5}"
+export NGRAM_ENABLED="${NGRAM_ENABLED:-1}"
+export NGRAM_MIN_ORDER="${NGRAM_MIN_ORDER:-2}"
+export NGRAM_MAX_ORDER="${NGRAM_MAX_ORDER:-12}"
+export NGRAM_NUM_BUCKETS="${NGRAM_NUM_BUCKETS:-4194304}"
+export NGRAM_CHUNK_SIZE="${NGRAM_CHUNK_SIZE:-512}"
+export NGRAM_ALPHA_MIN="${NGRAM_ALPHA_MIN:-0.05}"
+export NGRAM_ALPHA_MAX="${NGRAM_ALPHA_MAX:-0.70}"
+export NGRAM_ENTROPY_CENTER="${NGRAM_ENTROPY_CENTER:-3.0}"
+export NGRAM_ENTROPY_SCALE="${NGRAM_ENTROPY_SCALE:-2.0}"
+export NGRAM_MIN_COUNT="${NGRAM_MIN_COUNT:-2}"
+export NGRAM_LEAVE_ONE_OUT="${NGRAM_LEAVE_ONE_OUT:-1}"
+export TTT_ENABLED="${TTT_ENABLED:-0}"
+export EVAL_STRIDE="${EVAL_STRIDE:-64}"
+
+# Data paths
+if [[ "${DATA_ROOT_MODE}" == "tmp" ]]; then
+    DATA_BASE="/tmp/parameter-golf-data"
+else
+    DATA_BASE="/workspace/parameter-golf/data"
+fi
+export DATA_PATH="${DATA_PATH:-${DATA_BASE}/datasets/fineweb10B_sp1024}"
+export TOKENIZER_PATH="${TOKENIZER_PATH:-${DATA_BASE}/tokenizers/fineweb_1024_bpe.model}"
+
+case "$MODE" in
+    smoke)
+        echo "=== SMOKE TEST (1xGPU, 180s, USE_COMPILE=0) ==="
+        export NPROC_PER_NODE="${NPROC_PER_NODE:-1}"
+        export MAX_WALLCLOCK_SECONDS="${MAX_WALLCLOCK_SECONDS:-180}"
+        export USE_COMPILE="${USE_COMPILE:-0}"
+        export NGRAM_MAX_ORDER="${NGRAM_MAX_ORDER:-9}"
+        export NGRAM_NUM_BUCKETS="${NGRAM_NUM_BUCKETS:-4194304}"
+        ;;
+    base)
+        echo "=== FULL RUN (8xGPU, 600s) ==="
+        export NPROC_PER_NODE="${NPROC_PER_NODE:-8}"
+        export MAX_WALLCLOCK_SECONDS="${MAX_WALLCLOCK_SECONDS:-600}"
+        export USE_COMPILE="${USE_COMPILE:-1}"
+        ;;
+    *)
+        echo "Unknown mode: $MODE (use 'base' or 'smoke')"
+        exit 1
+        ;;
+esac
+
+# Verify data
+if [[ -f "/workspace/parameter-golf/verify_runpod_data_ready.sh" ]]; then
+    bash /workspace/parameter-golf/verify_runpod_data_ready.sh "$DATA_PATH" "$TOKENIZER_PATH"
+fi
+
+echo "Train script: $TRAIN_SCRIPT"
+echo "Data path: $DATA_PATH"
+echo "NGRAM: orders=${NGRAM_MIN_ORDER}-${NGRAM_MAX_ORDER} buckets=${NGRAM_NUM_BUCKETS} alpha=[${NGRAM_ALPHA_MIN},${NGRAM_ALPHA_MAX}] leave_one_out=${NGRAM_LEAVE_ONE_OUT}"
+echo "COMPLEMENT: enabled=${COMPLEMENT_ENABLED} alpha=${COMPLEMENT_ALPHA}"
+
+NPROC="${NPROC_PER_NODE:-8}"
+if [[ "$NPROC" -eq 1 ]]; then
+    python3 "$TRAIN_SCRIPT"
+else
+    torchrun --standalone --nproc_per_node="$NPROC" "$TRAIN_SCRIPT"
+fi
diff --git a/...track_10min_16mb/2026-03-26_WaterLOO_FullRescore_SelfExclusion_0.0990/launch_multiseed.sh b/...track_10min_16mb/2026-03-26_WaterLOO_FullRescore_SelfExclusion_0.0990/launch_multiseed.sh
@@ -0,0 +1,29 @@
+#!/usr/bin/env bash
+# Launch the leave-one-out PPM candidate across the standard 3-seed package.
+# Usage:
+#   bash launch_multiseed.sh            # seeds 1337,42,2025
+#   SEEDS=1337,42 bash launch_multiseed.sh
+#   MODE=smoke bash launch_multiseed.sh
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+MODE="${MODE:-base}"
+SEEDS_CSV="${SEEDS:-1337,42,2025}"
+
+IFS=',' read -r -a SEEDS_ARR <<< "$SEEDS_CSV"
+
+echo "mode=$MODE"
+echo "seeds=${SEEDS_CSV}"
+echo "leave_one_out=${NGRAM_LEAVE_ONE_OUT:-1}"
+
+for seed in "${SEEDS_ARR[@]}"; do
+    seed="$(echo "$seed" | xargs)"
+    if [[ -z "$seed" ]]; then
+        continue
+    fi
+    export SEED="$seed"
+    export RUN_ID="ppm_loo_seed${seed}"
+    echo
+    echo "=== seed ${seed} ==="
+    bash "$SCRIPT_DIR/launch.sh" "$MODE"
+done
diff --git a/...rds/track_10min_16mb/2026-03-26_WaterLOO_FullRescore_SelfExclusion_0.0990/submission.json b/...rds/track_10min_16mb/2026-03-26_WaterLOO_FullRescore_SelfExclusion_0.0990/submission.json
@@ -0,0 +1,24 @@
+{
+  "author": "Simon Marcus",
+  "github_id": "simon-marcus",
+  "name": "WaterLOO: Full-Rescore N-gram Cache with Self-Exclusion",
+  "blurb": "Two-pass full-rescore n-gram eval with leave-one-out self-exclusion. Pass 1 stores per-token neural probabilities and entropies, the complete order-2-12 cache is built vectorially, and Pass 2 rescoring subtracts each token's own direct cache contribution before matching.",
+  "date": "2026-03-26",
+  "val_loss": 0.16713198,
+  "val_bpb": 0.09898524,
+  "val_loss_std": 0.00004,
+  "val_bpb_std": 0.00002,
+  "seeds": [1337, 42, 2025],
+  "seed_results": {
+    "1337": {"val_loss": 0.16710306, "val_bpb": 0.09896811},
+    "42":   {"val_loss": 0.16710815, "val_bpb": 0.09897112},
+    "2025": {"val_loss": 0.16718473, "val_bpb": 0.09901648}
+  },
+  "pre_quant_val_bpb": 1.14047,
+  "step_stop": 6931,
+  "wallclock_seconds": 600.0,
+  "eval_time_seconds": 144.77,
+  "bytes_total": 15873808,
+  "bytes_code": 115396,
+  "bytes_model": 15758412
+}