diff --git a/.claude/skills/benchmark/SKILL.md b/.claude/skills/benchmark/SKILL.md
index 8e6af339d..431225313 100644
--- a/.claude/skills/benchmark/SKILL.md
+++ b/.claude/skills/benchmark/SKILL.md
@@ -3,159 +3,115 @@ name: slm-lab-benchmark
 description: Run SLM-Lab deep RL benchmarks, monitor dstack jobs, extract results, and update BENCHMARKS.md. Use when asked to run benchmarks, check run status, extract scores, update benchmark tables, or generate plots.
 ---
 
-# SLM-Lab Benchmark Workflow
+# SLM-Lab Benchmark Skill
 
 ## Critical Rules
 
-1. **NEVER push to remote** without explicit user permission - commit locally only
-2. **ONLY train runs** in BENCHMARKS.md - NEVER use search results (search folders = UNACCEPTABLE)
-3. **Respect Settings line** for each env (max_frame, num_envs, etc.) - see [BENCHMARKS.md](docs/BENCHMARKS.md)
-4. **Use `${max_frame}` variable** in specs - never hardcode max_frame values
-5. **Verify HF links work** before updating table
-6. **Runs must complete in <6h**
+1. **NEVER push to remote** without explicit user permission
+2. **ONLY train runs** in BENCHMARKS.md — never search results
+3. **Respect Settings line** for each env (max_frame, num_envs, etc.)
+4. **Use `${max_frame}` variable** in specs — never hardcode
+5. **Runs must complete in <6h** (dstack max_duration)
 
-## Benchmark Contribution Workflow
-
-### 1. Audit Spec Settings
-
-**Before Running**: Ensure spec matches the **Settings** line in BENCHMARKS.md for each env.
-
-Example Settings line: `max_frame 3e5 | num_envs 4 | max_session 4 | log_frequency 500`
-
-**After Pulling**: Verify downloaded `spec.json` matches these rules before using data.
-
-### 2. Run Benchmark & Commit Specs
+## Run → Score → Record
 
 ```bash
-# Remote (GPU) - auto-syncs to HuggingFace
-source .env && slm-lab run-remote --gpu SPEC_FILE SPEC_NAME train -n NAME
-
-# With variable substitution (MuJoCo/Atari)
-source .env && slm-lab run-remote --gpu -s env=ENV -s max_frame=MAX_FRAME \
-  SPEC_FILE SPEC_NAME train -n NAME
-
-# Local (Classic Control only)
-slm-lab run SPEC_FILE SPEC_NAME train
-```
+# Launch
+source .env && uv run slm-lab run-remote --gpu \
+  -s env=ALE/Pong-v5 SPEC_FILE SPEC_NAME train -n NAME
 
-**Always commit the spec file** used for the run. Ensure BENCHMARKS.md has entry with correct SPEC_FILE and SPEC_NAME.
-
-### 3. Monitor Status
+# Monitor
+dstack ps                              # running jobs
+dstack logs NAME | grep "trial_metrics" # extract score at completion
 
-**Monitor autonomously** - use sleep to check in periodically until completion:
-```bash
-sleep 900 && dstack ps                 # wait 15min then check
+# Score = total_reward_ma from trial_metrics line
+# trial_metrics: frame:1.00e+07 | total_reward_ma:816.18 | ...
 ```
 
-```bash
-dstack ps                              # running jobs
-dstack ps -a | head -20                # recent jobs (done/failed)
-dstack logs NAME                       # view logs
-dstack logs NAME | grep "trial_metrics" # extract final score
-dstack stop NAME -y                    # terminate run
-```
+## Data Lifecycle
 
-### 4. Record Scores & Plots
+Data flows through three stages: **pull → plot → graduate**. Keep local data until graduation is complete.
 
-**Score**: At end of run, extract `total_reward_ma` from logs (`trial_metrics`):
 ```
-trial_metrics: frame:1.00e+07 | total_reward_ma:816.18 | strength:570.4 | ...
+Remote GPU run → auto-uploads to benchmark-dev (HF)
+                      ↓
+               Pull to local data/
+                      ↓
+               Generate plots (docs/plots/)
+                      ↓
+               Update BENCHMARKS.md (scores, links)
+                      ↓
+               Graduate to public benchmark (HF)
+                      ↓
+               Update links: benchmark-dev → benchmark
+                      ↓
+               Upload docs/ to public benchmark (HF)
 ```
 
-**Link**: Add HuggingFace folder link to table:
-- Format: `[FOLDER](https://huggingface.co/datasets/SLM-Lab/benchmark-dev/tree/main/data/FOLDER)`
+### Pull Data
 
-**Pull Data Efficiently** (avoid rate limiting):
 ```bash
-# DON'T use slm-lab pull for bulk downloads - will get rate limited
-# Instead, pull specific folders directly from HF:
-
-# Get folder name from run logs
-dstack logs NAME | grep "SLM-Lab: Running"
-# Output shows: data/FOLDER_NAME/...
-
-# Pull only the folder you need
+# Pull full dataset (fast, single request — avoids rate limits)
 source .env && uv run hf download SLM-Lab/benchmark-dev \
-  --include "data/FOLDER_NAME/*" --local-dir hf_data --repo-type dataset
-```
-
-**Plot**:
-```bash
-# Verify scores in trial_metrics.json match logs
-# Ensure all runs share same max_frame
+  --local-dir data/benchmark-dev --repo-type dataset
 
-# Generate plot using ONLY folders from table
-slm-lab plot -t "ENV_NAME" -f folder1,folder2,folder3
+# KEEP this data — needed for plots AND graduation upload later
+# Never rm -rf data/benchmark-dev/ until graduation is complete
 ```
 
-**Status legend**: ✅ Solved | ⚠️ Close (>80%) | ❌ Failed
-
-### 5. Commit Changes
+### Generate Plots
 
 ```bash
-git add docs/BENCHMARKS.md slm_lab/spec/benchmark/...
-git commit -m "docs: update ENV benchmark (SCORE)"
-# NEVER push without explicit permission
-```
-
-## Publishing to Public Dataset
+# Find folders for a game (need a2c + ppo + sac for comparison)
+ls data/benchmark-dev/data/ | grep -i pong
 
-During development, runs upload to `SLM-Lab/benchmark-dev` (noisy, iterative). When benchmarks are finalized, publish clean results to the public `SLM-Lab/benchmark` dataset.
+# Generate comparison plot
+uv run slm-lab plot -t "Pong" \
+  -f data/benchmark-dev/data/a2c_folder,data/benchmark-dev/data/ppo_folder,data/benchmark-dev/data/sac_folder
+```
 
-### Strategy: `hf_data/` IS the Manifest
+### Update BENCHMARKS.md
 
-The `hf_data/data/` directory defines what gets uploaded. Same process works for any subset (Phase 1-3, Atari, single env).
+- Add score in results table
+- Add HF link: `[FOLDER](https://huggingface.co/datasets/SLM-Lab/benchmark-dev/tree/main/data/FOLDER)`
+- Status: ✅ Solved | ⚠️ Close (>80%) | ❌ Failed
 
-### Upload Workflow
+### Graduate to Public HF
 
-HF dataset mirrors repo structure: `README.md`, `docs/`, `data/`
+When benchmarks are finalized, publish from `benchmark-dev` → `benchmark`:
 
 ```bash
-# 1. Clear hf_data/ before upload cycle
-rm -rf hf_data/
-
-# 2. Pull only folders you want from benchmark-dev
-source .env && uv run hf download SLM-Lab/benchmark-dev \
-  --include "data/ppo_cartpole_2026*/*" "data/sac_lunar*/*" \
-  --local-dir hf_data --repo-type dataset
-
-# 3. Upload README to public repo
-source .env && uv run hf upload SLM-Lab/benchmark README.md README.md --repo-type dataset
-
-# 4. Upload data to public repo
-source .env && uv run hf upload SLM-Lab/benchmark hf_data/data data --repo-type dataset
+# Upload data (from local copy — already pulled above)
+source .env && uv run hf upload SLM-Lab/benchmark \
+  data/benchmark-dev/data data --repo-type dataset
 
-# 5. Update BENCHMARKS.md links: benchmark-dev -> benchmark
-#    (data now exists on public repo, so links will work)
+# Update BENCHMARKS.md links: benchmark-dev → benchmark
+# (find-replace in docs/BENCHMARKS.md)
 
-# 6. Upload docs (with updated links)
+# Upload docs (with updated links) and README
 source .env && uv run hf upload SLM-Lab/benchmark docs docs --repo-type dataset
+source .env && uv run hf upload SLM-Lab/benchmark README.md README.md --repo-type dataset
 ```
 
-### Two-Repo Strategy
-
-| Repo | Purpose | Links in BENCHMARKS.md |
-|------|---------|------------------------|
-| `SLM-Lab/benchmark-dev` | Development iterations, noisy | During active work |
-| `SLM-Lab/benchmark` | Public, finalized results | After publishing |
+| Repo | Purpose |
+|------|---------|
+| `SLM-Lab/benchmark-dev` | Development — noisy, iterative |
+| `SLM-Lab/benchmark` | Public — finalized, validated |
 
 ## Hyperparameter Search
 
-Only when algorithm fails to reach target. Use search to find hyperparams, then run final `train` for benchmark.
+Only when algorithm fails to reach target:
 
 ```bash
-source .env && slm-lab run-remote --gpu SPEC_FILE SPEC_NAME search -n NAME
+source .env && uv run slm-lab run-remote --gpu SPEC_FILE SPEC_NAME search -n NAME
 ```
 
-**Search budget**: ~3-4 trials per dimension (8 trials = 2-3 dims, 16 = 3-4 dims).
-
-**After search**: Update spec with best hyperparams, run `train` mode, use that result in BENCHMARKS.md.
+Budget: ~3-4 trials per dimension. After search: update spec with best params, run `train`, use that result.
 
 ## Troubleshooting
 
-- **Run interrupted**: Expected with spot instances - relaunch with same command, increment run name (e.g. rv2 → rv3)
-- **Low GPU usage** (<50%): CPU bottleneck or config issue, not training problem
-- **Score below target**: Check hyperparams match spec, try search mode
-- **HF link 404**: Run didn't complete or upload failed, rerun
-
-For full details, see [docs/BENCHMARKS.md](docs/BENCHMARKS.md).
+- **Run interrupted**: Relaunch, increment name suffix (e.g., pong3 → pong4)
+- **Low GPU usage** (<50%): CPU bottleneck or config issue
+- **HF rate limit**: Download full dataset, not selective `--include` patterns
+- **HF link 404**: Run didn't complete or upload failed — rerun
+- **.env inline comments**: Break dstack env vars — put comments on separate lines
diff --git a/CLAUDE.md b/CLAUDE.md
index 3cecf8cd9..ef0291ee2 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,117 +4,104 @@
 
 You are a seasoned software engineer with the following traits:
 
-- **Perfectionist**: Code quality is non-negotiable - clean, idiomatic, maintainable code every time
+- **Quality-driven**: Code quality is non-negotiable - clean, idiomatic, maintainable code every time
 - **Autonomous**: Make informed technical decisions independently - only ask when requirements are genuinely unclear
 - **Pragmatic**: Balance perfect with practical - ship working solutions, iterate when needed
 - **Detail-oriented**: Catch edge cases, handle errors properly, think through implications
 - **Proactive**: Refactor immediately, delete dead code aggressively, improve as you go
-- **Efficient**: Minimal token usage - no fluff, explanations only when asked
 
 **Working principles:**
 
-1. Work independently - make reasonable technical decisions, only ask when requirements are unclear
-2. Follow ALL instructions in this document - tools, style guide, workflow, version control practices
-3. Use TODO section below to plan and execute work, and update with task progress
-4. Stage changes frequently - commit related work as logical units
-5. Never hard reset or delete work - preserve changes even during corruption/errors
-6. Keep responses SHORT - no explanations unless asked, no restating what was done, just confirm completion
+1. Stage changes frequently - commit related work as logical units
+2. Never hard reset or delete work - preserve changes even during corruption/errors
+3. Work autonomously - run things in parallel when possible, continue without pausing, pick up the next task immediately
+4. Keep responses SHORT - no explanations unless asked, just confirm completion. State rationale briefly for non-obvious decisions.
 
-## Project Setup
+## Principles of Good Code Design
 
-### Python Projects
+Apply these six principles to every decision.
 
-1. **Package Management**: Use [`uv`](https://docs.astral.sh/uv/getting-started/installation/) and `pyproject.toml`
-   1. Install dependencies: `uv sync`
-   2. Add packages: `uv add <package>`
-   3. Run scripts: `uv run <script>.py`
-   4. Run tests: `uv run pytest`
-   5. Format/lint code: `uv format` (use `--check` or `--diff` for dry-run)
-   6. Never use system Python or pip directly
-2. **Recommended Tools & Libraries**:
-   1. **Config Management**: Use [Hydra](https://hydra.cc/) - avoid argparse for maintainability
-   2. **CLI/Scripts**: Use [Typer](https://typer.tiangolo.com/) - avoid argparse for maintainability
-   3. **Logging**: Use [loguru](https://github.com/Delgan/loguru) - avoid roll-your-own or Python native logging
-   4. **Utils**: Use [pydash](https://pydash.readthedocs.io/) for common utilities
-   5. **Datetime**: Use [pendulum](https://pendulum.eustace.io/) for datetime operations
-   6. **Testing**: Use [pytest](https://docs.pytest.org/) with plugin ecosystem
-   7. **API (ML)**: Use [LitServe](https://github.com/Lightning-AI/LitServe) for ML model serving with standard API
-   8. **API (non-ML)**: Use [FastAPI](https://fastapi.tiangolo.com/) for custom APIs (async, performant, auto-docs)
-   9. **Applications**: Use [Streamlit](https://streamlit.io/) for applications with user interface
-
-## Development
+1. **Consistent** — Design from first principles — unified naming, patterns, and conventions throughout.
+2. **Correct** — Constructed from known truths, not debugged into shape.
+3. **Clear** — Code does what it says — intent is obvious from naming and logic alone.
+4. **Concise** — Simplified to the essence — nothing left to remove.
+5. **Simple** — Few moving parts, easy to explain, cheap to maintain — complexity is not sophistication.
+6. **Salient** — Essential enough to be used widely, fundamental enough to last.
 
-### Style Guide
+## Style Guide
 
 **General Principles:**
 
-1. **DRY & KISS**: Concise, simple, readable code
-2. **Naming**: Short, obvious, globally consistent
-3. **Single Responsibility**: One function/class, one purpose
-4. **Separation of Concerns**: Logic, data, presentation separate
-5. **Fail Fast**: Validate early, explicit errors
-6. **Immutability**: Prefer immutable structures
-7. **Refactoring**: Refactor immediately, delete dead code aggressively
-8. **Avoid**: Deep indents (max 3-4), in-method imports, defensive patterns, magic numbers
+1. **Naming**: Short, obvious, globally consistent. No magic numbers — name your constants.
+2. **Single Responsibility**: One function/class, one purpose. Max 3-4 nesting levels.
+3. **Separation of Concerns**: Logic, data, presentation separate
+4. **Fail Fast**: Validate early, explicit errors. Never commit secrets, credentials, or .env files.
 
 **Python:**
 
-1. **Type Hints**: Native types (`list[str]`, `dict[str, float]`, `str | None`)
+1. **Type Hints**: Native types (`list[str]`, `str | None`) - no `typing` module
 2. **Docstrings**: Concise - rely on naming and type hints
-3. **Naming**: `snake_case`, `PascalCase`, `UPPER_CASE`
-4. **Error Handling**: Specific exceptions, no bare `except:`
-5. **Context Managers**: `with` for resources
-6. **Project Structure**: Folders are modules - no sys-path hacks
+3. **Error Handling**: Specific exceptions, no bare `except:`
+4. **Imports**: Top-level only, no in-method imports
+5. **Project Structure**: Folders are modules - no sys-path hacks
 
 **TypeScript:**
 
-1. **Naming**: `camelCase`, `PascalCase`, `UPPER_CASE`
-2. **Type Safety**: Strict mode, avoid `any`, use `unknown`
-3. **Async/Await**: Over `.then()` chains
-4. **Frameworks**: Follow conventions (React hooks, Next.js)
-5. **Components**: Small, focused, extract logic to hooks
+1. **Type Safety**: Strict mode, avoid `any`, use `unknown`
+2. **Async/Await**: Over `.then()` chains
+3. **Components**: Small, focused, extract logic to hooks
 
-### Version Control
+## Version Control
 
-1. **Commit Often**: Small, logical commits - easy to review and revert
-2. **Branch Strategy**: Feature branches from main, delete after merge
-3. **Pull Before Push**: Always sync with remote before pushing
-4. **Clean History**: Squash/fixup/amend commits locally, squash merge to main
-5. **Commits**: [Conventional Commits](https://www.conventionalcommits.org/) (`feat:`, `fix:`, `docs:`, `chore:`, `refactor:`) under 20 words
-6. **Semantic Versioning**: [MAJOR.MINOR.PATCH](https://semver.org/) (auto-bumped from commit messages)
+1. **Commits**: Small, logical units. [Conventional Commits](https://www.conventionalcommits.org/) (`feat:`, `fix:`, `docs:`, `chore:`, `refactor:`) under 20 words. Squash/amend locally, squash merge to main.
+2. **Branching**: Feature branches from main, delete after merge. Pull before push.
+3. **Versioning**: [Semantic Versioning](https://semver.org/) auto-bumped from commit messages.
+4. **Pre-commit Hooks**: Automate quality gates — linting, formatting, commit message validation, version bumping.
 
-### Workflow Steps
+## Agent Teams
 
-1. **Plan**: Break down task, use TODO section below for complex work
-2. **Implement**: Write code following style guide
-3. **Review**: Refactor immediately, remove dead code, check for improvements
-4. **Validate**: Run linter, type checker, tests - fix all issues
-5. **Document**: Update README, API docs, architecture notes as needed
-6. **Commit**: Use Conventional Commit message
+**For any non-trivial task, deploy agent teams.** This is the standard operating mode — do not default to working solo. The lead orchestrates (breaks down work, assigns tasks, reviews outputs, commits) — it should never get buried in implementation. Delegation keeps the lead strategic, enables parallel execution, and protects context window from long-running tasks.
 
-> Work autonomously: use document to track work, use time efficiently and run things in parallel if needed; keep reminding yourself to continue without pausing; check on tasks regularly, update, plan and pick up the next tasks immediately until all tasks are completed. refresh your memory on the instructions doc as needed.
+**Guidelines:**
+1. **Give enough context in spawn prompts** - teammates don't inherit conversation history, only CLAUDE.md and project context
+2. **Size tasks appropriately** - self-contained units with clear deliverables, ~5-6 per teammate
+3. **Avoid file conflicts** - each teammate owns different files
+4. **Use sonnet for volume work** - reserve opus for strategic decisions
 
-### Agent Teams
+## Documentation
 
-For any non-trivial task, **deploy agent teams**. This is the standard operating mode — the lead agent should orchestrate, not execute everything solo.
+Create and maintain persistent context that survives context compaction. Keep documents updated as the project evolves.
 
-**Why teams matter:**
-- **Lead stays strategic**: Without delegation, the lead gets buried in monitoring loops, file edits, and status checks — losing sight of the bigger picture. Teams keep the lead focused on decisions and direction.
-- **Parallel execution**: Independent work streams (monitoring, docs, code) run simultaneously instead of sequentially.
-- **Context protection**: Long-running tasks (GPU monitoring, large file edits) consume context window. Delegating to agents preserves lead context for strategic work.
+- **Architecture** (`ARCHITECTURE.md`): When none exists, read the codebase and create one — components, data flows, directory structure, dependency relationships.
+- **Index**: Create a compressed index mapping the codebase for navigation — passive context (always-loaded) dramatically outperforms on-demand retrieval. Use a compact format:
+  ```
+  [Project Index]|root: ./src
+  |components:{Button.tsx,Modal.tsx,Layout.tsx}
+  |api:{routes.ts,middleware.ts,handlers/}
+  ```
+- **README, API docs, changelog**: Update as part of the development cycle, not as an afterthought.
 
-**Team patterns for this project:**
-- **Benchmark runs**: Monitor agent tracks dstack jobs (status, logs, metrics extraction). Lead reviews results and decides next actions.
-- **Docs + Code**: Parallel agents for docs updates and spec changes. Lead commits the batch.
-- **Code + Review**: Engineer agent implements, reviewer agent validates. Lead merges.
+## Project Setup
 
-**Operating rules:**
-1. **Specialize agents by task** — each gets a focused job with clear inputs/outputs
-2. **Lead orchestrates** — drives direction, reviews outputs, makes decisions, commits
-3. **Use sonnet agents for volume work** — monitoring, file updates, data extraction
-4. **Background agents for long tasks** — use `run_in_background` so work proceeds in parallel
-5. **Commit in layers** — each batch of agent work gets committed together for clean diffs
-6. **Don't duplicate work** — if an agent is monitoring, the lead should NOT also poll status
+### Python Projects
+
+1. **Package Management**: Use [`uv`](https://docs.astral.sh/uv/getting-started/installation/) and `pyproject.toml`
+   1. Install dependencies: `uv sync`
+   2. Add packages: `uv add <package>`
+   3. Run scripts: `uv run <script>.py`
+   4. Run tests: `uv run pytest`
+   5. Format/lint code: `uv format` (use `--check` or `--diff` for dry-run)
+   6. Never use system Python or pip directly
+2. **Recommended Tools & Libraries**:
+   1. **Config Management**: Use [Hydra](https://hydra.cc/) - avoid argparse for maintainability
+   2. **CLI/Scripts**: Use [Typer](https://typer.tiangolo.com/) - avoid argparse for maintainability
+   3. **Logging**: Use [loguru](https://github.com/Delgan/loguru) - avoid roll-your-own or Python native logging
+   4. **Utils**: Use [pydash](https://pydash.readthedocs.io/) for common utilities
+   5. **Datetime**: Use [pendulum](https://pendulum.eustace.io/) for datetime operations
+   6. **Testing**: Use [pytest](https://docs.pytest.org/) with plugin ecosystem
+   7. **API (ML)**: Use [LitServe](https://github.com/Lightning-AI/LitServe) for ML model serving with standard API
+   8. **API (non-ML)**: Use [FastAPI](https://fastapi.tiangolo.com/) for custom APIs (async, performant, auto-docs)
+   9. **Applications**: Use [Streamlit](https://streamlit.io/) for applications with user interface
 
 ---
 
@@ -247,83 +234,20 @@ dstack stop <run-name> -y  # terminate
 
 ## Benchmarking Workflow
 
-**Purpose**: Validate algorithm performance, reproduce published results, track improvements.
-
-**Documentation**: `docs/BENCHMARKS.md` is the single source of truth - results tables, targets, active runs.
-
-**Benchmark requirements** (ensure fairness and reproducibility):
-- Respect `max_frame` from spec (e.g., 10M frames for Atari)
-- Use specified `max_session` for multi-seed averaging
-- Follow environment settings in BENCHMARKS.md (sticky actions, life_loss_info, etc.)
-- Compare against targets from reference implementations
-
-### Running Benchmarks
-
-**Pattern**: Launch → Monitor → Extract → Update table → Commit
+**`docs/BENCHMARKS.md`** is the single source of truth. See the `/benchmark` skill for operational details (commands, data lifecycle, graduation).
 
-```bash
-# Launch (example: Atari Pong with PPO)
-source .env && uv run slm-lab run-remote --gpu \
-  -s env=ALE/Pong-v5 \
-  slm_lab/spec/benchmark/ppo/ppo_atari.json ppo_atari train \
-  -n pong-lam95
-
-# Check status
-dstack ps  # running jobs
-dstack ps -a | grep "exited (0)"  # completed
-dstack ps -a | grep "exited (1)"  # failed
-
-# Extract results (when complete)
-dstack logs pong-lam95 | grep "trial_metrics"
-# Output: trial_metrics: frame:1.00e+07 | total_reward_ma:15.2 | ...
-# Extract the total_reward_ma value (15.2)
-
-# Update BENCHMARKS.md table with score
-# Commit changes
-```
-
-### Extracting Metrics
-
-At trial completion, logs print `trial_metrics` (scalar metrics JSON):
-
-```
-2026-01-06 05:01:58 | INFO | Session 0 done
-2026-01-06 05:01:59 | INFO | trial_metrics: frame:1.00e+07 | total_reward_ma:816.18 | strength:570.4 | ...
-2026-01-06 05:01:59 | INFO | Trial 0 done
-```
-
-**Extract `total_reward_ma`** - this is the benchmark score. Update `docs/BENCHMARKS.md` table.
-
-### Autonomous Execution
-
-**When benchmarking, work continuously:**
-
-1. **Launch in parallel** - fill available GPU capacity (~30 concurrent runs)
-2. **Check status regularly** - every 30 minutes for long-running tasks, don't idle
-3. **Extract immediately** - when runs complete, get scores and update table
-4. **Iterate on failures** - don't wait, launch improved configs
-5. **Track in BENCHMARKS.md** - keep "Active Runs" section current
-6. **Commit frequently** - document updates, spec improvements
-
-**IMPORTANT**: Autonomous means YOU actively wait and check in using sleep commands directly - do NOT delegate to background bash processes or scripts. Stay engaged in the conversation.
+**Pattern**: Launch → Monitor → Extract score → Pull data → Plot → Update table → Commit
 
-## Hyperparameter Search
+**Autonomous execution**: Fill GPU capacity (~30 concurrent runs), check status regularly, extract immediately on completion, iterate on failures. Never idle.
 
-SLM-Lab uses ASHA (Asynchronous Successive Halving Algorithm) for efficient hyperparameter tuning.
+**Data lifecycle**: Pull full HF dataset to `data/benchmark-dev/` once — keep it for plots AND graduation. Never delete until graduation to public repo is complete.
 
-### ASHA Search Strategy
+### Hyperparameter Search
 
-**Concept**: Run many trials in parallel, terminate unpromising ones early based on performance.
+ASHA search for when algorithms fail to reach target. Budget: ~3-4 trials per dimension.
 
-**Config example**:
 ```json
 {
-  "meta": {
-    "max_session": 1,
-    "max_trial": 8,
-    "search_resources": {"cpu": 1, "gpu": 0.125},
-    "search_scheduler": {"grace_period": 100000, "reduction_factor": 3}
-  },
   "search": {
     "agent.algorithm.gamma__uniform": [0.993, 0.999],
     "agent.net.optim_spec.lr__loguniform": [1e-4, 1e-3]
@@ -331,106 +255,12 @@ SLM-Lab uses ASHA (Asynchronous Successive Halving Algorithm) for efficient hype
 }
 ```
 
-**Launch search**:
-```bash
-source .env && uv run slm-lab run-remote --gpu \
-  slm_lab/spec/benchmark/ppo/ppo_atari.json ppo_atari search \
-  -n atari-search
-```
-
-### Search Space Design
-
-**Rule**: ~3-4 trials per dimension minimum.
-
-| Trials | Max Dims | Strategy |
-|--------|----------|----------|
-| 8 | 2-3 | Focused refinement |
-| 12-16 | 3-4 | Typical search |
-| 20+ | 5-7 | Broad exploration |
-
-**Use continuous distributions** (not discrete choices):
-- `__uniform`: `[min, max]` - uniform sampling (e.g., gamma 0.99-0.999)
-- `__loguniform`: `[min, max]` - log-uniform for learning rates (e.g., 1e-4 to 1e-3)
-- Avoid `__choice`: Discrete choices fragment the search space, making it harder for the algorithm to interpolate
-
-**High-impact hyperparameters** (search these first):
-- Learning rate (`lr`) - use `loguniform`
-- Discount factor (`gamma`) - use `uniform`
-- GAE lambda (`lam`) - use `uniform`
-
-**Low-impact** (fix based on successful runs):
-- Batch sizes, training epochs, network architecture
-
-**After search**: Analyze results, narrow ranges around best values, re-run if needed. Update spec defaults with best hyperparameters.
+Prefer continuous distributions (`__uniform`, `__loguniform`) over `__choice`. Search high-impact params first (lr, gamma, lam). After search: update spec defaults, run `train`, use that result.
 
 ---
 
-## Active Benchmark Work
-
-**When user says "let's get to work" or "benchmark work"**, execute this autonomous workflow:
-
-### Atari Benchmark Strategy
-
-For Atari games, use lambda variants based on game characteristics:
-1. Run ALL games with `ppo_atari` (lam95) first - this is the default spec
-2. If lam95 fails OR historical data shows better results from other variants, run those variants
-3. Lambda guidelines: lam95 (long-horizon), lam85 (middle), lam70 (action games)
-4. Fill available GPU capacity (~30 concurrent runs), check status every 5-10 minutes
-
-### Workflow Loop
-
-1. **Check status**: `dstack ps` - identify completed/failed/running jobs
-2. **Extract results**: For completed runs, `dstack logs <name> | grep "trial_metrics"`, get `total_reward_ma`
-3. **Update table**: Fill in `docs/BENCHMARKS.md` with scores, move to "Completed Runs"
-4. **Update specs**: If run succeeded (✅), update spec defaults with best hyperparameters
-5. **Launch next**: Check "Active Runs" section, launch next batch to fill GPU capacity
-6. **Iterate on failures**: For failed runs, launch hyperparameter search or improved config immediately
-7. **Commit progress**: Regular commits of table updates and spec improvements
-8. **Repeat**: Continue loop every 5-10 minutes until all benchmarks complete
-
-### Getting Unstuck
-
-When stuck on failing runs:
-- **Check GPU utilization**: `dstack metrics <run-name>` - low GPU usage (<50%) often indicates CPU bottleneck (env stepping) or config issue, not training problem
-- Compare with reference implementations (CleanRL, SB3)
-- Kill unpromising runs early - iterate faster with new configs
-- If same issue across runs, it's framework/config, not hyperparameters - investigate code
-
-**Key principle**: Work continuously, check in regularly, iterate immediately on failures. Fill GPU capacity (~30 concurrent runs). Never idle waiting.
-
----
-
-## Results & HuggingFace
-
-Benchmark results are uploaded to [HuggingFace](https://huggingface.co/datasets/SLM-Lab/benchmark) for reproducibility:
-
-- **Development**: Upload to [`SLM-Lab/benchmark-dev`](https://huggingface.co/datasets/SLM-Lab/benchmark-dev) during active work
-- **Graduated**: Once benchmark passes, pull results and re-upload to [`SLM-Lab/benchmark`](https://huggingface.co/datasets/SLM-Lab/benchmark) (public-facing)
-
-**Workflow**:
-1. Remote runs auto-upload to `benchmark-dev` (via `source .env` credentials)
-2. After validation, use `slm-lab pull SPEC_NAME` to download results
-3. Upload graduated results to public `SLM-Lab/benchmark` repo
-
----
-
-## Documentation
-
-- **Major changes**: Document in `CHANGELOG.md` (bug fixes, new features, breaking changes)
-- **Benchmark results**: Always update `docs/BENCHMARKS.md` with results tables, findings, and reproducibility instructions
-- **Spec updates**: When improving specs, document rationale in commit messages
-
-## TODO
-
-### Active Reruns (2026-01-31)
-
-| Run Name | Env | Target | Current Score | Status |
-|----------|-----|--------|---------------|--------|
-| rv-ppo-lunar2 | LunarLander-v3 (Discrete) | >200 | 147.35 | ❌ Done (worse) |
-| rv-ppo-lunar-cont2 | LunarLander-v3 (Continuous) | >200 | 165.48 | ✅ Done (improved) |
-| rv-sac-lunar-cont2 | LunarLander-v3 (Continuous) | >200 | 208.60 | ✅ Done (solved!) |
-| rv-ppo-hopper3 | Hopper-v5 | >2000 | 1174.57 | 🔄 Running |
-
-When complete: extract scores, pull data, update BENCHMARKS.md, regenerate plots, stage changes.
-
+## SLM-Lab Documentation
 
+- **Changelog**: Document major changes in `CHANGELOG.md`
+- **Benchmarks**: `docs/BENCHMARKS.md` — results tables, targets, reproducibility
+- **Specs**: Document rationale in commit messages when updating specs
diff --git a/docs/BENCHMARKS.md b/docs/BENCHMARKS.md
index 33deac879..0ec9d0e33 100644
--- a/docs/BENCHMARKS.md
+++ b/docs/BENCHMARKS.md
@@ -429,6 +429,7 @@ source .env && slm-lab run-remote --gpu \
 - **DDQN+PER**: Skipped - off-policy variants ~6x slower (~230 fps vs ~1500 fps), not cost effective at 10M frames
 - **A2C**: [a2c_gae_atari.json](../slm_lab/spec/benchmark/a2c/a2c_gae_atari.json) - RMSprop (lr=7e-4), training_frequency=32
 - **PPO**: [ppo_atari.json](../slm_lab/spec/benchmark/ppo/ppo_atari.json) - AdamW (lr=2.5e-4), minibatch=256, horizon=128, epochs=4
+- **SAC**: [sac_atari.json](../slm_lab/spec/benchmark/sac/sac_atari.json) - Discrete SAC (Categorical), AdamW (lr=3e-4), batch=256, buffer=200K, max_frame=2e6
 
 **PPO Lambda Variants** (table shows best result per game):
 
@@ -447,128 +448,190 @@ source .env && slm-lab run-remote --gpu -s env=ENV -s max_frame=1e7 \
 # PPO
 source .env && slm-lab run-remote --gpu -s env=ENV -s max_frame=1e7 \
   slm_lab/spec/benchmark/ppo/ppo_atari.json SPEC_NAME train -n NAME
+
+# SAC (2M frames - off-policy, more sample-efficient but slower per frame)
+source .env && slm-lab run-remote --gpu -s env=ENV \
+  slm_lab/spec/benchmark/sac/sac_atari.json sac_atari train -n NAME
 ```
 
 | ENV | Score | SPEC_NAME | HF Repo |
 |-----|-------|-----------|---------|
 | ALE/AirRaid-v5 | 5067 | a2c_gae_atari | [a2c_gae_atari_airraid_2026_02_01_082446](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_airraid_2026_02_01_082446) |
 | | 8245 | ppo_atari | [ppo_atari_airraid_2026_01_06_113119](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_airraid_2026_01_06_113119) |
+| | 2061 | sac_atari | [sac_atari_airraid_2026_02_11_203802](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_airraid_2026_02_11_203802) |
 | ALE/Alien-v5 | 1488 | a2c_gae_atari | [a2c_gae_atari_alien_2026_02_01_000858](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_alien_2026_02_01_000858) |
 | | 1453 | ppo_atari | [ppo_atari_alien_2026_01_06_112514](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_alien_2026_01_06_112514) |
+| | 960 | sac_atari | [sac_atari_alien_2026_02_11_201443](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_alien_2026_02_11_201443) |
 | ALE/Amidar-v5 | 330 | a2c_gae_atari | [a2c_gae_atari_amidar_2026_02_01_082251](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_amidar_2026_02_01_082251) |
 | | 580 | ppo_atari_lam85 | [ppo_atari_lam85_amidar_2026_01_07_223416](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_amidar_2026_01_07_223416) |
+| | 187 | sac_atari | [sac_atari_amidar_2026_02_11_202456](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_amidar_2026_02_11_202456) |
 | ALE/Assault-v5 | 1646 | a2c_gae_atari | [a2c_gae_atari_assault_2026_02_01_082252](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_assault_2026_02_01_082252) |
 | | 4293 | ppo_atari_lam85 | [ppo_atari_lam85_assault_2026_01_08_130044](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_assault_2026_01_08_130044) |
+| | 1037 | sac_atari | [sac_atari_assault_2026_02_11_201441](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_assault_2026_02_11_201441) |
 | ALE/Asterix-v5 | 2712 | a2c_gae_atari | [a2c_gae_atari_asterix_2026_02_01_082315](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_asterix_2026_02_01_082315) |
 | | 3482 | ppo_atari_lam85 | [ppo_atari_lam85_asterix_2026_01_07_223445](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_asterix_2026_01_07_223445) |
+| | 1450 | sac_atari | [sac_atari_asterix_2026_02_11_203629](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_asterix_2026_02_11_203629) |
 | ALE/Asteroids-v5 | 2106 | a2c_gae_atari | [a2c_gae_atari_asteroids_2026_02_01_082328](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_asteroids_2026_02_01_082328) |
 | | 1554 | ppo_atari_lam85 | [ppo_atari_lam85_asteroids_2026_01_07_224245](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_asteroids_2026_01_07_224245) |
+| | 1216 | sac_atari | [sac_atari_asteroids_2026_02_11_201524](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_asteroids_2026_02_11_201524) |
 | ALE/Atlantis-v5 | 873365 | a2c_gae_atari | [a2c_gae_atari_atlantis_2026_02_01_082330](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_atlantis_2026_02_01_082330) |
 | | 792886 | ppo_atari | [ppo_atari_atlantis_2026_01_06_120440](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_atlantis_2026_01_06_120440) |
+| | 64097 | sac_atari | [sac_atari_atlantis_2026_02_11_204715](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_atlantis_2026_02_11_204715) |
 | ALE/BankHeist-v5 | 1099 | a2c_gae_atari | [a2c_gae_atari_bankheist_2026_02_01_082403](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_bankheist_2026_02_01_082403) |
 | | 1045 | ppo_atari | [ppo_atari_bankheist_2026_01_06_121042](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_bankheist_2026_01_06_121042) |
+| | 132 | sac_atari | [sac_atari_bankheist_2026_02_11_201643](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_bankheist_2026_02_11_201643) |
 | ALE/BattleZone-v5 | 2437 | a2c_gae_atari | [a2c_gae_atari_battlezone_2026_02_01_082425](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_battlezone_2026_02_01_082425) |
 | | 26383 | ppo_atari_lam85 | [ppo_atari_lam85_battlezone_2026_01_08_094729](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_battlezone_2026_01_08_094729) |
+| | 6951 | sac_atari | [sac_atari_battlezone_2026_02_12_003638](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_battlezone_2026_02_12_003638) |
 | ALE/BeamRider-v5 | 2767 | a2c_gae_atari | [a2c_gae_atari_beamrider_2026_02_01_000921](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_beamrider_2026_02_01_000921) |
 | | 2765 | ppo_atari | [ppo_atari_beamrider_2026_01_06_112533](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_beamrider_2026_01_06_112533) |
+| | 4048 | sac_atari | [sac_atari_beamrider_2026_02_12_003619](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_beamrider_2026_02_12_003619) |
 | ALE/Berzerk-v5 | 439 | a2c_gae_atari | [a2c_gae_atari_berzerk_2026_02_01_082540](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_berzerk_2026_02_01_082540) |
 | | 1072 | ppo_atari | [ppo_atari_berzerk_2026_01_06_112515](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_berzerk_2026_01_06_112515) |
+| | 314 | sac_atari | [sac_atari_berzerk_2026_02_12_004654](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_berzerk_2026_02_12_004654) |
 | ALE/Bowling-v5 | 23.96 | a2c_gae_atari | [a2c_gae_atari_bowling_2026_02_01_082529](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_bowling_2026_02_01_082529) |
 | | 46.45 | ppo_atari | [ppo_atari_bowling_2026_01_06_113148](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_bowling_2026_01_06_113148) |
+| | 27.95 | sac_atari | [sac_atari_bowling_2026_02_12_004809](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_bowling_2026_02_12_004809) |
 | ALE/Boxing-v5 | 1.80 | a2c_gae_atari | [a2c_gae_atari_boxing_2026_02_01_082539](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_boxing_2026_02_01_082539) |
 | | 91.17 | ppo_atari | [ppo_atari_boxing_2026_01_06_112531](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_boxing_2026_01_06_112531) |
+| | 40.15 | sac_atari | [sac_atari_boxing_2026_02_12_004826](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_boxing_2026_02_12_004826) |
 | ALE/Breakout-v5 | 273 | a2c_gae_atari | [a2c_gae_atari_breakout_2026_01_31_213610](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_breakout_2026_01_31_213610) |
 | | 327 | ppo_atari_lam70 | [ppo_atari_lam70_breakout_2026_01_07_110559](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_breakout_2026_01_07_110559) |
+| | 16.45 | sac_atari | [sac_atari_breakout_2026_02_12_005931](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_breakout_2026_02_12_005931) |
 | ALE/Carnival-v5 | 2170 | a2c_gae_atari | [a2c_gae_atari_carnival_2026_02_01_082726](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_carnival_2026_02_01_082726) |
 | | 3967 | ppo_atari_lam70 | [ppo_atari_lam70_carnival_2026_01_07_144738](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_carnival_2026_01_07_144738) |
+| | 4025 | sac_atari | [sac_atari_carnival_2026_02_12_013737](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_carnival_2026_02_12_013737) |
 | ALE/Centipede-v5 | 1382 | a2c_gae_atari | [a2c_gae_atari_centipede_2026_02_01_082643](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_centipede_2026_02_01_082643) |
 | | 4915 | ppo_atari_lam70 | [ppo_atari_lam70_centipede_2026_01_07_223557](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_centipede_2026_01_07_223557) |
+| | 2286 | sac_atari | [sac_atari_centipede_2026_02_12_013445](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_centipede_2026_02_12_013445) |
 | ALE/ChopperCommand-v5 | 2446 | a2c_gae_atari | [a2c_gae_atari_choppercommand_2026_02_01_082626](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_choppercommand_2026_02_01_082626) |
 | | 5355 | ppo_atari | [ppo_atari_choppercommand_2026_01_07_110539](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_choppercommand_2026_01_07_110539) |
+| | 1068 | sac_atari | [sac_atari_choppercommand_2026_02_12_060732](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_choppercommand_2026_02_12_060732) |
 | ALE/CrazyClimber-v5 | 96943 | a2c_gae_atari | [a2c_gae_atari_crazyclimber_2026_02_01_082625](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_crazyclimber_2026_02_01_082625) |
 | | 107370 | ppo_atari_lam85 | [ppo_atari_lam85_crazyclimber_2026_01_07_223609](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_crazyclimber_2026_01_07_223609) |
+| | 81839 | sac_atari | [sac_atari_crazyclimber_2026_02_12_053919](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_crazyclimber_2026_02_12_053919) |
 | ALE/Defender-v5 | 33149 | a2c_gae_atari | [a2c_gae_atari_defender_2026_02_01_082658](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_defender_2026_02_01_082658) |
 | | 51439 | ppo_atari_lam70 | [ppo_atari_lam70_defender_2026_01_07_205238](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_defender_2026_01_07_205238) |
+| | 3832 | sac_atari | [sac_atari_defender_2026_02_12_054055](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_defender_2026_02_12_054055) |
 | ALE/DemonAttack-v5 | 2962 | a2c_gae_atari | [a2c_gae_atari_demonattack_2026_02_01_082717](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_demonattack_2026_02_01_082717) |
 | | 16558 | ppo_atari_lam70 | [ppo_atari_lam70_demonattack_2026_01_07_111315](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_demonattack_2026_01_07_111315) |
+| | 4330 | sac_atari | [sac_atari_demonattack_2026_02_12_054035](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_demonattack_2026_02_12_054035) |
 | ALE/DoubleDunk-v5 | -1.69 | a2c_gae_atari | [a2c_gae_atari_doubledunk_2026_02_01_082901](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_doubledunk_2026_02_01_082901) |
 | | -2.38 | ppo_atari | [ppo_atari_doubledunk_2026_01_07_110802](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_doubledunk_2026_01_07_110802) |
+| | -43.51 | sac_atari | [sac_atari_doubledunk_2026_02_12_054050](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_doubledunk_2026_02_12_054050) |
 | ALE/ElevatorAction-v5 | 731 | a2c_gae_atari | [a2c_gae_atari_elevatoraction_2026_02_01_082908](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_elevatoraction_2026_02_01_082908) |
 | | 5446 | ppo_atari | [ppo_atari_elevatoraction_2026_01_06_113129](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_elevatoraction_2026_01_06_113129) |
+| | 4374 | sac_atari | [sac_atari_elevatoraction_2026_02_12_061339](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_elevatoraction_2026_02_12_061339) |
 | ALE/Enduro-v5 | 681 | a2c_gae_atari | [a2c_gae_atari_enduro_2026_02_01_001123](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_enduro_2026_02_01_001123) |
 | | 898 | ppo_atari_lam85 | [ppo_atari_lam85_enduro_2026_01_08_095448](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_enduro_2026_01_08_095448) |
+| | 0 | sac_atari | [sac_atari_enduro_2026_02_12_190545](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_enduro_2026_02_12_190545) |
 | ALE/FishingDerby-v5 | -16.38 | a2c_gae_atari | [a2c_gae_atari_fishingderby_2026_02_01_082906](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_fishingderby_2026_02_01_082906) |
 | | 27.10 | ppo_atari_lam85 | [ppo_atari_lam85_fishingderby_2026_01_08_094158](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_fishingderby_2026_01_08_094158) |
+| | -76.88 | sac_atari | [sac_atari_fishingderby_2026_02_12_061350](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_fishingderby_2026_02_12_061350) |
 | ALE/Freeway-v5 | 23.13 | a2c_gae_atari | [a2c_gae_atari_freeway_2026_02_01_082931](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_freeway_2026_02_01_082931) |
 | | 31.30 | ppo_atari | [ppo_atari_freeway_2026_01_06_182318](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_freeway_2026_01_06_182318) |
+| | 0 | sac_atari | [sac_atari_freeway_2026_02_12_101448](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_freeway_2026_02_12_101448) |
 | ALE/Frostbite-v5 | 266 | a2c_gae_atari | [a2c_gae_atari_frostbite_2026_02_01_082915](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_frostbite_2026_02_01_082915) |
 | | 301 | ppo_atari | [ppo_atari_frostbite_2026_01_06_112556](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_frostbite_2026_01_06_112556) |
+| | 400 | sac_atari | [sac_atari_frostbite_2026_02_12_183946](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_frostbite_2026_02_12_183946) |
 | ALE/Gopher-v5 | 984 | a2c_gae_atari | [a2c_gae_atari_gopher_2026_02_01_133323](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_gopher_2026_02_01_133323) |
 | | 6508 | ppo_atari_lam70 | [ppo_atari_lam70_gopher_2026_01_07_170451](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_gopher_2026_01_07_170451) |
+| | 1895 | sac_atari | [sac_atari_gopher_2026_02_12_121106](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_gopher_2026_02_12_121106) |
 | ALE/Gravitar-v5 | 270 | a2c_gae_atari | [a2c_gae_atari_gravitar_2026_02_01_133244](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_gravitar_2026_02_01_133244) |
 | | 599 | ppo_atari | [ppo_atari_gravitar_2026_01_06_112548](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_gravitar_2026_01_06_112548) |
+| | 234 | sac_atari | [sac_atari_gravitar_2026_02_12_131258](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_gravitar_2026_02_12_131258) |
 | ALE/Hero-v5 | 18680 | a2c_gae_atari | [a2c_gae_atari_hero_2026_02_01_175903](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_hero_2026_02_01_175903) |
 | | 28238 | ppo_atari_lam85 | [ppo_atari_lam85_hero_2026_01_07_223619](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_hero_2026_01_07_223619) |
+| | 4526 | sac_atari | [sac_atari_hero_2026_02_12_115556](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_hero_2026_02_12_115556) |
 | ALE/IceHockey-v5 | -5.92 | a2c_gae_atari | [a2c_gae_atari_icehockey_2026_02_01_175745](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_icehockey_2026_02_01_175745) |
 | | -3.93 | ppo_atari | [ppo_atari_icehockey_2026_01_06_183721](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_icehockey_2026_01_06_183721) |
+| | -19.52 | sac_atari | [sac_atari_icehockey_2026_02_12_183953](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_icehockey_2026_02_12_183953) |
 | ALE/Jamesbond-v5 | 460 | a2c_gae_atari | [a2c_gae_atari_jamesbond_2026_02_01_175945](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_jamesbond_2026_02_01_175945) |
 | | 662 | ppo_atari | [ppo_atari_jamesbond_2026_01_06_183717](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_jamesbond_2026_01_06_183717) |
+| | 265 | sac_atari | [sac_atari_jamesbond_2026_02_12_115613](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_jamesbond_2026_02_12_115613) |
 | ALE/JourneyEscape-v5 | -965 | a2c_gae_atari | [a2c_gae_atari_journeyescape_2026_02_01_084415](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_journeyescape_2026_02_01_084415) |
 | | -1252 | ppo_atari_lam85 | [ppo_atari_lam85_journeyescape_2026_01_08_094842](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_journeyescape_2026_01_08_094842) |
+| | -3392 | sac_atari | [sac_atari_journeyescape_2026_02_12_183939](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_journeyescape_2026_02_12_183939) |
 | ALE/Kangaroo-v5 | 322 | a2c_gae_atari | [a2c_gae_atari_kangaroo_2026_02_01_084415](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_kangaroo_2026_02_01_084415) |
 | | 9912 | ppo_atari_lam85 | [ppo_atari_lam85_kangaroo_2026_01_07_110838](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_kangaroo_2026_01_07_110838) |
+| | 3317 | sac_atari | [sac_atari_kangaroo_2026_02_12_184007](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_kangaroo_2026_02_12_184007) |
 | ALE/Krull-v5 | 7519 | a2c_gae_atari | [a2c_gae_atari_krull_2026_02_01_084420](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_krull_2026_02_01_084420) |
 | | 7841 | ppo_atari | [ppo_atari_krull_2026_01_07_110747](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_krull_2026_01_07_110747) |
+| | 6824 | sac_atari | [sac_atari_krull_2026_02_12_184007](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_krull_2026_02_12_184007) |
 | ALE/KungFuMaster-v5 | 23006 | a2c_gae_atari | [a2c_gae_atari_kungfumaster_2026_02_01_085101](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_kungfumaster_2026_02_01_085101) |
 | | 29068 | ppo_atari_lam70 | [ppo_atari_lam70_kungfumaster_2026_01_07_111317](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_kungfumaster_2026_01_07_111317) |
+| | 8511 | sac_atari | [sac_atari_kungfumaster_2026_02_12_184021](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_kungfumaster_2026_02_12_184021) |
 | ALE/MsPacman-v5 | 2110 | a2c_gae_atari | [a2c_gae_atari_mspacman_2026_02_01_001100](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_mspacman_2026_02_01_001100) |
 | | 2372 | ppo_atari_lam85 | [ppo_atari_lam85_mspacman_2026_01_07_223522](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_mspacman_2026_01_07_223522) |
+| | 1396 | sac_atari | [sac_atari_mspacman_2026_02_12_184018](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_mspacman_2026_02_12_184018) |
 | ALE/NameThisGame-v5 | 5412 | a2c_gae_atari | [a2c_gae_atari_namethisgame_2026_02_01_132733](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_namethisgame_2026_02_01_132733) |
 | | 5993 | ppo_atari | [ppo_atari_namethisgame_2026_01_06_182952](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_namethisgame_2026_01_06_182952) |
+| | 4034 | sac_atari | [sac_atari_namethisgame_2026_02_12_230011](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_namethisgame_2026_02_12_230011) |
 | ALE/Phoenix-v5 | 5635 | a2c_gae_atari | [a2c_gae_atari_phoenix_2026_02_01_085101](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_phoenix_2026_02_01_085101) |
 | | 15659 | ppo_atari_lam70 | [ppo_atari_lam70_phoenix_2026_01_07_110832](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_phoenix_2026_01_07_110832) |
+| | 3909 | sac_atari | [sac_atari_phoenix_2026_02_12_231134](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_phoenix_2026_02_12_231134) |
 | ALE/Pong-v5 | 10.17 | a2c_gae_atari | [a2c_gae_atari_pong_2026_01_31_213635](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_pong_2026_01_31_213635) |
 | | 16.91 | ppo_atari_lam85 | [ppo_atari_lam85_pong_2026_01_08_094454](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_pong_2026_01_08_094454) |
+| | 12.14 | sac_atari | [sac_atari_pong_2026_02_12_231240](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_pong_2026_02_12_231240) |
 | ALE/Pooyan-v5 | 2997 | a2c_gae_atari | [a2c_gae_atari_pooyan_2026_02_01_132748](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_pooyan_2026_02_01_132748) |
 | | 5716 | ppo_atari_lam70 | [ppo_atari_lam70_pooyan_2026_01_07_224346](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_pooyan_2026_01_07_224346) |
+| | 2625 | sac_atari | [sac_atari_pooyan_2026_02_12_233303](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_pooyan_2026_02_12_233303) |
 | ALE/Qbert-v5 | 12619 | a2c_gae_atari | [a2c_gae_atari_qbert_2026_01_31_213720](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_qbert_2026_01_31_213720) |
 | | 15094 | ppo_atari | [ppo_atari_qbert_2026_01_06_111801](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_qbert_2026_01_06_111801) |
+| | 3610 | sac_atari | [sac_atari_qbert_2026_02_12_233409](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_qbert_2026_02_12_233409) |
 | ALE/Riverraid-v5 | 6558 | a2c_gae_atari | [a2c_gae_atari_riverraid_2026_02_01_132507](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_riverraid_2026_02_01_132507) |
 | | 9428 | ppo_atari_lam85 | [ppo_atari_lam85_riverraid_2026_01_07_204356](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_riverraid_2026_01_07_204356) |
+| | 5125 | sac_atari | [sac_atari_riverraid_2026_02_12_233410](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_riverraid_2026_02_12_233410) |
 | ALE/RoadRunner-v5 | 29810 | a2c_gae_atari | [a2c_gae_atari_roadrunner_2026_02_01_132509](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_roadrunner_2026_02_01_132509) |
 | | 37015 | ppo_atari_lam85 | [ppo_atari_lam85_roadrunner_2026_01_07_145913](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_roadrunner_2026_01_07_145913) |
+| | 23899 | sac_atari | [sac_atari_roadrunner_2026_02_12_233449](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_roadrunner_2026_02_12_233449) |
 | ALE/Robotank-v5 | 2.80 | a2c_gae_atari | [a2c_gae_atari_robotank_2026_02_01_132434](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_robotank_2026_02_01_132434) |
 | | 20.07 | ppo_atari | [ppo_atari_robotank_2026_01_06_183413](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_robotank_2026_01_06_183413) |
+| | 9.06 | sac_atari | [sac_atari_robotank_2026_02_12_233446](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_robotank_2026_02_12_233446) |
 | ALE/Seaquest-v5 | 850 | a2c_gae_atari | [a2c_gae_atari_seaquest_2026_02_01_001001](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_seaquest_2026_02_01_001001) |
 | | 1796 | ppo_atari | [ppo_atari_seaquest_2026_01_06_183440](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_seaquest_2026_01_06_183440) |
+| | 2010 | sac_atari | [sac_atari_seaquest_2026_02_13_111005](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_seaquest_2026_02_13_111005) |
 | ALE/Skiing-v5 | -14235 | a2c_gae_atari | [a2c_gae_atari_skiing_2026_02_01_132451](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_skiing_2026_02_01_132451) |
 | | -19340 | ppo_atari | [ppo_atari_skiing_2026_01_06_183424](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_skiing_2026_01_06_183424) |
+| | -16710 | sac_atari | [sac_atari_skiing_2026_02_13_034039](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_skiing_2026_02_13_034039) |
 | ALE/Solaris-v5 | 2224 | a2c_gae_atari | [a2c_gae_atari_solaris_2026_02_01_212137](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_solaris_2026_02_01_212137) |
 | | 2094 | ppo_atari | [ppo_atari_solaris_2026_01_06_192643](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_solaris_2026_01_06_192643) |
+| | 1803 | sac_atari | [sac_atari_solaris_2026_02_13_105520](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_solaris_2026_02_13_105520) |
 | ALE/SpaceInvaders-v5 | 784 | a2c_gae_atari | [a2c_gae_atari_spaceinvaders_2026_02_01_000950](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_spaceinvaders_2026_02_01_000950) |
 | | 726 | ppo_atari | [ppo_atari_spaceinvaders_2026_01_07_102346](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_spaceinvaders_2026_01_07_102346) |
+| | 517 | sac_atari | [sac_atari_spaceinvaders_2026_02_11_080424](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_spaceinvaders_2026_02_11_080424) |
 | ALE/StarGunner-v5 | 8665 | a2c_gae_atari | [a2c_gae_atari_stargunner_2026_02_01_132406](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_stargunner_2026_02_01_132406) |
 | | 47495 | ppo_atari_lam70 | [ppo_atari_lam70_stargunner_2026_01_07_111404](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_stargunner_2026_01_07_111404) |
+| | 3809 | sac_atari | [sac_atari_stargunner_2026_02_13_040158](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_stargunner_2026_02_13_040158) |
 | ALE/Surround-v5 | -9.72 | a2c_gae_atari | [a2c_gae_atari_surround_2026_02_01_132215](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_surround_2026_02_01_132215) |
 | | -2.52 | ppo_atari | [ppo_atari_surround_2026_01_07_102404](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_surround_2026_01_07_102404) |
+| | -9.86 | sac_atari | [sac_atari_surround_2026_02_13_042319](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_surround_2026_02_13_042319) |
 | ALE/Tennis-v5 | -2873 | a2c_gae_atari | [a2c_gae_atari_tennis_2026_02_01_175829](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_tennis_2026_02_01_175829) |
 | | -4.41 | ppo_atari_lam85 | [ppo_atari_lam85_tennis_2026_01_07_223532](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_tennis_2026_01_07_223532) |
+| | -374 | sac_atari | [sac_atari_tennis_2026_02_13_105531](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_tennis_2026_02_13_105531) |
 | ALE/TimePilot-v5 | 3376 | a2c_gae_atari | [a2c_gae_atari_timepilot_2026_02_01_175930](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_timepilot_2026_02_01_175930) |
 | | 4668 | ppo_atari | [ppo_atari_timepilot_2026_01_07_101010](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_timepilot_2026_01_07_101010) |
+| | 3003 | sac_atari | [sac_atari_timepilot_2026_02_13_110656](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_timepilot_2026_02_13_110656) |
 | ALE/Tutankham-v5 | 167 | a2c_gae_atari | [a2c_gae_atari_tutankham_2026_02_01_132347](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_tutankham_2026_02_01_132347) |
 | | 217 | ppo_atari_lam85 | [ppo_atari_lam85_tutankham_2026_01_08_095251](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_tutankham_2026_01_08_095251) |
+| | 126 | sac_atari | [sac_atari_tutankham_2026_02_13_105535](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_tutankham_2026_02_13_105535) |
 | ALE/UpNDown-v5 | 57099 | a2c_gae_atari | [a2c_gae_atari_upndown_2026_02_01_132435](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_upndown_2026_02_01_132435) |
 | | 182472 | ppo_atari | [ppo_atari_upndown_2026_01_07_105708](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_upndown_2026_01_07_105708) |
+| | 3450 | sac_atari | [sac_atari_upndown_2026_02_13_104624](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_upndown_2026_02_13_104624) |
 | ALE/VideoPinball-v5 | 25310 | a2c_gae_atari | [a2c_gae_atari_videopinball_2026_02_01_083457](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_videopinball_2026_02_01_083457) |
 | | 56746 | ppo_atari_lam70 | [ppo_atari_lam70_videopinball_2026_01_07_224359](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_videopinball_2026_01_07_224359) |
+| | 22541 | sac_atari | [sac_atari_videopinball_2026_02_13_222427](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_videopinball_2026_02_13_222427) |
 | ALE/WizardOfWor-v5 | 2682 | a2c_gae_atari | [a2c_gae_atari_wizardofwor_2026_02_01_132449](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_wizardofwor_2026_02_01_132449) |
 | | 5814 | ppo_atari | [ppo_atari_wizardofwor_2026_01_06_221154](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_wizardofwor_2026_01_06_221154) |
+| | 1160 | sac_atari | [sac_atari_wizardofwor_2026_02_13_111635](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_wizardofwor_2026_02_13_111635) |
 | ALE/YarsRevenge-v5 | 24371 | a2c_gae_atari | [a2c_gae_atari_yarsrevenge_2026_02_01_132224](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_yarsrevenge_2026_02_01_132224) |
 | | 17120 | ppo_atari | [ppo_atari_yarsrevenge_2026_01_06_221154](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_yarsrevenge_2026_01_06_221154) |
+| | 13429 | sac_atari | [sac_atari_yarsrevenge_2026_02_13_223033](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_yarsrevenge_2026_02_13_223033) |
 | ALE/Zaxxon-v5 | 29.46 | a2c_gae_atari | [a2c_gae_atari_zaxxon_2026_02_01_131758](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_zaxxon_2026_02_01_131758) |
 | | 10756 | ppo_atari | [ppo_atari_zaxxon_2026_01_06_221154](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_zaxxon_2026_01_06_221154) |
+| | 3453 | sac_atari | [sac_atari_zaxxon_2026_02_13_221310](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_zaxxon_2026_02_13_221310) |
 
-**Training Curves** (A2C vs PPO):
+**Training Curves** (A2C vs PPO vs SAC):
 
 | | | |
 |:---:|:---:|:---:|
@@ -644,20 +707,20 @@ source .env && slm-lab run-remote --gpu -s env=ENV -s max_frame=1e7 \
 | ALE/Riverraid-v5 | 7319 | **9428** | - |
 | ALE/RoadRunner-v5 | 24204 | **37015** | - |
 | ALE/Robotank-v5 | **20.07** | 8.24 | 2.59 |
-| ALE/Seaquest-v5 | **1796** | - | - |
+| ALE/Seaquest-v5 | 1796 | - | **2010** |
 | ALE/Skiing-v5 | **-19340** | -22980 | -29975 |
-| ALE/Solaris-v5 | **2094** | - | - |
+| ALE/Solaris-v5 | **2094** | 1803 | - |
 | ALE/SpaceInvaders-v5 | **726** | - | - |
 | ALE/StarGunner-v5 | 31862 | - | **47495** |
 | ALE/Surround-v5 | **-2.52** | - | -6.79 |
-| ALE/Tennis-v5 | -7.66 | **-4.41** | - |
-| ALE/TimePilot-v5 | **4668** | - | - |
-| ALE/Tutankham-v5 | 203 | **217** | - |
-| ALE/UpNDown-v5 | **182472** | - | - |
-| ALE/VideoPinball-v5 | 31385 | - | **56746** |
-| ALE/WizardOfWor-v5 | **5814** | 5466 | 4740 |
-| ALE/YarsRevenge-v5 | **17120** | - | - |
-| ALE/Zaxxon-v5 | **10756** | - | - |
+| ALE/Tennis-v5 | -7.66 | **-4.41** | -374 |
+| ALE/TimePilot-v5 | **4668** | 3003 | - |
+| ALE/Tutankham-v5 | 203 | **217** | 126 |
+| ALE/UpNDown-v5 | **182472** | 3450 | - |
+| ALE/VideoPinball-v5 | 31385 | 22541 | **56746** |
+| ALE/WizardOfWor-v5 | **5814** | 1160 | 4740 |
+| ALE/YarsRevenge-v5 | **17120** | 13429 | - |
+| ALE/Zaxxon-v5 | **10756** | 3453 | - |
 
 **Legend**: **Bold** = Best score | - = Not tested
 
diff --git a/docs/plots/AirRaid_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/AirRaid_multi_trial_graph_mean_returns_ma_vs_frames.png
index 57fcbb2de..a97da85b3 100644
Binary files a/docs/plots/AirRaid_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/AirRaid_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Alien_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Alien_multi_trial_graph_mean_returns_ma_vs_frames.png
index 37254aaa7..f78c6bffd 100644
Binary files a/docs/plots/Alien_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Alien_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Amidar_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Amidar_multi_trial_graph_mean_returns_ma_vs_frames.png
index 0ecca2846..9da3a5a93 100644
Binary files a/docs/plots/Amidar_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Amidar_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Assault_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Assault_multi_trial_graph_mean_returns_ma_vs_frames.png
index da5f3b049..86ea544ea 100644
Binary files a/docs/plots/Assault_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Assault_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Asterix_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Asterix_multi_trial_graph_mean_returns_ma_vs_frames.png
index c9d82a7fe..8eb8d84c5 100644
Binary files a/docs/plots/Asterix_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Asterix_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Asteroids_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Asteroids_multi_trial_graph_mean_returns_ma_vs_frames.png
index 1039f738d..cdad7518d 100644
Binary files a/docs/plots/Asteroids_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Asteroids_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Atlantis_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Atlantis_multi_trial_graph_mean_returns_ma_vs_frames.png
index 6ec94b1d1..005e7238f 100644
Binary files a/docs/plots/Atlantis_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Atlantis_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/BankHeist_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/BankHeist_multi_trial_graph_mean_returns_ma_vs_frames.png
index 3664360d1..3e77903bc 100644
Binary files a/docs/plots/BankHeist_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/BankHeist_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/BattleZone_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/BattleZone_multi_trial_graph_mean_returns_ma_vs_frames.png
index 131c5bf5f..d57ecbc7b 100644
Binary files a/docs/plots/BattleZone_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/BattleZone_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/BeamRider_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/BeamRider_multi_trial_graph_mean_returns_ma_vs_frames.png
index d30a20af3..682f09b2b 100644
Binary files a/docs/plots/BeamRider_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/BeamRider_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Berzerk_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Berzerk_multi_trial_graph_mean_returns_ma_vs_frames.png
index 5c180b7a9..cf72035e0 100644
Binary files a/docs/plots/Berzerk_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Berzerk_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Bowling_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Bowling_multi_trial_graph_mean_returns_ma_vs_frames.png
index 77db6c6e5..83899ea1d 100644
Binary files a/docs/plots/Bowling_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Bowling_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Boxing_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Boxing_multi_trial_graph_mean_returns_ma_vs_frames.png
index 76d859fe8..ebecb01ec 100644
Binary files a/docs/plots/Boxing_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Boxing_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Breakout_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Breakout_multi_trial_graph_mean_returns_ma_vs_frames.png
index 4daf20222..e504c3154 100644
Binary files a/docs/plots/Breakout_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Breakout_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Carnival_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Carnival_multi_trial_graph_mean_returns_ma_vs_frames.png
index 759bc1bf9..c994720e9 100644
Binary files a/docs/plots/Carnival_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Carnival_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Centipede_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Centipede_multi_trial_graph_mean_returns_ma_vs_frames.png
index e0ffd524a..1f4eacce5 100644
Binary files a/docs/plots/Centipede_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Centipede_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/ChopperCommand_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/ChopperCommand_multi_trial_graph_mean_returns_ma_vs_frames.png
index 342c0e661..70031ced5 100644
Binary files a/docs/plots/ChopperCommand_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/ChopperCommand_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/CrazyClimber_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/CrazyClimber_multi_trial_graph_mean_returns_ma_vs_frames.png
index 92ddf02ea..c7d7a1d83 100644
Binary files a/docs/plots/CrazyClimber_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/CrazyClimber_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Defender_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Defender_multi_trial_graph_mean_returns_ma_vs_frames.png
index 00c057df3..1010b0fef 100644
Binary files a/docs/plots/Defender_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Defender_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/DemonAttack_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/DemonAttack_multi_trial_graph_mean_returns_ma_vs_frames.png
index 8c351cd4c..bb9971e12 100644
Binary files a/docs/plots/DemonAttack_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/DemonAttack_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/DoubleDunk_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/DoubleDunk_multi_trial_graph_mean_returns_ma_vs_frames.png
index dca710b99..51c95d429 100644
Binary files a/docs/plots/DoubleDunk_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/DoubleDunk_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/ElevatorAction_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/ElevatorAction_multi_trial_graph_mean_returns_ma_vs_frames.png
index 688fdc2fb..258a546a7 100644
Binary files a/docs/plots/ElevatorAction_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/ElevatorAction_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Enduro_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Enduro_multi_trial_graph_mean_returns_ma_vs_frames.png
index 0db92c081..f62f2fb48 100644
Binary files a/docs/plots/Enduro_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Enduro_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/FishingDerby_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/FishingDerby_multi_trial_graph_mean_returns_ma_vs_frames.png
index f38672377..1b3877887 100644
Binary files a/docs/plots/FishingDerby_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/FishingDerby_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Freeway_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Freeway_multi_trial_graph_mean_returns_ma_vs_frames.png
index 1b3955eed..3edee5aad 100644
Binary files a/docs/plots/Freeway_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Freeway_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Frostbite_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Frostbite_multi_trial_graph_mean_returns_ma_vs_frames.png
index fea7487e5..a8915039c 100644
Binary files a/docs/plots/Frostbite_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Frostbite_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Gopher_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Gopher_multi_trial_graph_mean_returns_ma_vs_frames.png
index faa3b5055..889eb2255 100644
Binary files a/docs/plots/Gopher_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Gopher_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Gravitar_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Gravitar_multi_trial_graph_mean_returns_ma_vs_frames.png
index c4a759ebe..f1181d722 100644
Binary files a/docs/plots/Gravitar_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Gravitar_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Hero_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Hero_multi_trial_graph_mean_returns_ma_vs_frames.png
index 04b2fb2d6..7704374b8 100644
Binary files a/docs/plots/Hero_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Hero_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/IceHockey_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/IceHockey_multi_trial_graph_mean_returns_ma_vs_frames.png
index 0df83bf86..47eef9b04 100644
Binary files a/docs/plots/IceHockey_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/IceHockey_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Jamesbond_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Jamesbond_multi_trial_graph_mean_returns_ma_vs_frames.png
index 05c3ede5a..1e4b4861e 100644
Binary files a/docs/plots/Jamesbond_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Jamesbond_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/JourneyEscape_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/JourneyEscape_multi_trial_graph_mean_returns_ma_vs_frames.png
index 43e537b99..b7ed746d4 100644
Binary files a/docs/plots/JourneyEscape_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/JourneyEscape_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Kangaroo_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Kangaroo_multi_trial_graph_mean_returns_ma_vs_frames.png
index d4afc9cb3..d6917b9fe 100644
Binary files a/docs/plots/Kangaroo_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Kangaroo_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Krull_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Krull_multi_trial_graph_mean_returns_ma_vs_frames.png
index 9f1e80df4..76f0060c0 100644
Binary files a/docs/plots/Krull_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Krull_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/KungFuMaster_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/KungFuMaster_multi_trial_graph_mean_returns_ma_vs_frames.png
index 9cfdc2548..248483026 100644
Binary files a/docs/plots/KungFuMaster_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/KungFuMaster_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/MsPacman_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/MsPacman_multi_trial_graph_mean_returns_ma_vs_frames.png
index 1f3b24f03..cefd4cd85 100644
Binary files a/docs/plots/MsPacman_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/MsPacman_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/NameThisGame_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/NameThisGame_multi_trial_graph_mean_returns_ma_vs_frames.png
index e886c88d1..103d86744 100644
Binary files a/docs/plots/NameThisGame_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/NameThisGame_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Phoenix_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Phoenix_multi_trial_graph_mean_returns_ma_vs_frames.png
index 793fd1287..df6623d97 100644
Binary files a/docs/plots/Phoenix_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Phoenix_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Pong_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Pong_multi_trial_graph_mean_returns_ma_vs_frames.png
index 5e93a2312..67aff01bf 100644
Binary files a/docs/plots/Pong_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Pong_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Pooyan_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Pooyan_multi_trial_graph_mean_returns_ma_vs_frames.png
index b36a35ec9..90f8296f7 100644
Binary files a/docs/plots/Pooyan_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Pooyan_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Qbert_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Qbert_multi_trial_graph_mean_returns_ma_vs_frames.png
index 02ee0a1c3..2d9146787 100644
Binary files a/docs/plots/Qbert_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Qbert_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Riverraid_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Riverraid_multi_trial_graph_mean_returns_ma_vs_frames.png
index 99eac4717..b7948bebb 100644
Binary files a/docs/plots/Riverraid_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Riverraid_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/RoadRunner_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/RoadRunner_multi_trial_graph_mean_returns_ma_vs_frames.png
index 3a455e5c2..4cfc61d08 100644
Binary files a/docs/plots/RoadRunner_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/RoadRunner_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Robotank_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Robotank_multi_trial_graph_mean_returns_ma_vs_frames.png
index ae50139ed..1063444c3 100644
Binary files a/docs/plots/Robotank_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Robotank_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Seaquest_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Seaquest_multi_trial_graph_mean_returns_ma_vs_frames.png
index df59d78cf..03e04ef3a 100644
Binary files a/docs/plots/Seaquest_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Seaquest_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Skiing_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Skiing_multi_trial_graph_mean_returns_ma_vs_frames.png
index 090b45211..995c3ab91 100644
Binary files a/docs/plots/Skiing_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Skiing_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Solaris_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Solaris_multi_trial_graph_mean_returns_ma_vs_frames.png
index 55b438f90..75b9e3638 100644
Binary files a/docs/plots/Solaris_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Solaris_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/SpaceInvaders_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/SpaceInvaders_multi_trial_graph_mean_returns_ma_vs_frames.png
index 4af0b6e52..f6f4e593e 100644
Binary files a/docs/plots/SpaceInvaders_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/SpaceInvaders_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/StarGunner_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/StarGunner_multi_trial_graph_mean_returns_ma_vs_frames.png
index ae8c94fc3..9ef59c994 100644
Binary files a/docs/plots/StarGunner_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/StarGunner_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Surround_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Surround_multi_trial_graph_mean_returns_ma_vs_frames.png
index 68a7c42f5..64dda3e42 100644
Binary files a/docs/plots/Surround_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Surround_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Tennis_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Tennis_multi_trial_graph_mean_returns_ma_vs_frames.png
index 08c474d90..1ea9bf18a 100644
Binary files a/docs/plots/Tennis_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Tennis_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/TimePilot_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/TimePilot_multi_trial_graph_mean_returns_ma_vs_frames.png
index 92e929499..8b5ecc21d 100644
Binary files a/docs/plots/TimePilot_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/TimePilot_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Tutankham_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Tutankham_multi_trial_graph_mean_returns_ma_vs_frames.png
index d40959376..39b7f573f 100644
Binary files a/docs/plots/Tutankham_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Tutankham_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/UpNDown_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/UpNDown_multi_trial_graph_mean_returns_ma_vs_frames.png
index a0f3aded5..bce831156 100644
Binary files a/docs/plots/UpNDown_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/UpNDown_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/VideoPinball_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/VideoPinball_multi_trial_graph_mean_returns_ma_vs_frames.png
index dda54deb0..631964c39 100644
Binary files a/docs/plots/VideoPinball_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/VideoPinball_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/WizardOfWor_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/WizardOfWor_multi_trial_graph_mean_returns_ma_vs_frames.png
index 912b10ab7..e9f1351be 100644
Binary files a/docs/plots/WizardOfWor_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/WizardOfWor_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/YarsRevenge_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/YarsRevenge_multi_trial_graph_mean_returns_ma_vs_frames.png
index b2b7a9f27..c1a9f123d 100644
Binary files a/docs/plots/YarsRevenge_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/YarsRevenge_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/docs/plots/Zaxxon_multi_trial_graph_mean_returns_ma_vs_frames.png b/docs/plots/Zaxxon_multi_trial_graph_mean_returns_ma_vs_frames.png
index cedc6437e..5138ead7e 100644
Binary files a/docs/plots/Zaxxon_multi_trial_graph_mean_returns_ma_vs_frames.png and b/docs/plots/Zaxxon_multi_trial_graph_mean_returns_ma_vs_frames.png differ
diff --git a/pyproject.toml b/pyproject.toml
index e7e0e85db..313713c02 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "slm-lab"
-version = "5.0.1"
+version = "5.0.2"
 description = "Modular Deep Reinforcement Learning framework in PyTorch."
 readme = "README.md"
 requires-python = ">=3.12.0"
diff --git a/slm_lab/spec/benchmark/sac/sac_atari.json b/slm_lab/spec/benchmark/sac/sac_atari.json
index 45b7cfc64..8b0856e69 100644
--- a/slm_lab/spec/benchmark/sac/sac_atari.json
+++ b/slm_lab/spec/benchmark/sac/sac_atari.json
@@ -7,9 +7,9 @@
         "action_pdtype": "Categorical",
         "action_policy": "default",
         "gamma": 0.99,
-        "training_start_step": 10000,
+        "training_start_step": 1000,
         "training_frequency": 4,
-        "training_iter": 1
+        "training_iter": 3
       },
       "memory": {
         "name": "Replay",
@@ -44,7 +44,7 @@
       "name": "${env}",
       "num_envs": 16,
       "max_t": null,
-      "max_frame": 1e7,
+      "max_frame": 2e6,
       "life_loss_info": true
     },
     "meta": {
diff --git a/slm_lab/spec/benchmark/sac/sac_per_halfcheetah.json b/slm_lab/spec/benchmark/sac/sac_per_halfcheetah.json
deleted file mode 100644
index b8978a4cd..000000000
--- a/slm_lab/spec/benchmark/sac/sac_per_halfcheetah.json
+++ /dev/null
@@ -1,57 +0,0 @@
-{
-  "sac_per_halfcheetah": {
-    "agent": {
-      "name": "SoftActorCritic",
-      "algorithm": {
-        "name": "SoftActorCritic",
-        "action_pdtype": "default",
-        "action_policy": "default",
-        "gamma": 0.99,
-        "training_frequency": 1
-      },
-      "memory": {
-        "name": "PrioritizedReplay",
-        "alpha": 0.6,
-        "epsilon": 0.0001,
-        "batch_size": 256,
-        "max_size": 200000,
-        "use_cer": true
-      },
-      "net": {
-        "type": "MLPNet",
-        "hid_layers": [
-          256,
-          256
-        ],
-        "hid_layers_activation": "relu",
-        "init_fn": "orthogonal_",
-        "clip_grad_val": 0.5,
-        "loss_spec": {
-          "name": "MSELoss"
-        },
-        "optim_spec": {
-          "name": "AdamW",
-          "lr": 0.0007
-        },
-        "update_type": "polyak",
-        "update_frequency": 1,
-        "polyak_coef": 0.005,
-        "gpu": "auto"
-      }
-    },
-    "env": {
-      "name": "HalfCheetah-v5",
-      "num_envs": 8,
-      "max_t": null,
-      "max_frame": 2e6
-    },
-    "meta": {
-      "distributed": false,
-      "log_frequency": 1000,
-      "eval_frequency": 1000,
-      "rigorous_eval": 0,
-      "max_session": 4,
-      "max_trial": 1
-    }
-  }
-}
\ No newline at end of file
diff --git a/slm_lab/spec/benchmark/sac/sac_per_halfcheetah_pybullet.json b/slm_lab/spec/benchmark/sac/sac_per_halfcheetah_pybullet.json
deleted file mode 100644
index bbbb9bf6c..000000000
--- a/slm_lab/spec/benchmark/sac/sac_per_halfcheetah_pybullet.json
+++ /dev/null
@@ -1,57 +0,0 @@
-{
-  "sac_per_halfcheetah": {
-    "agent": {
-      "name": "SoftActorCritic",
-      "algorithm": {
-        "name": "SoftActorCritic",
-        "action_pdtype": "default",
-        "action_policy": "default",
-        "gamma": 0.99,
-        "training_frequency": 1
-      },
-      "memory": {
-        "name": "PrioritizedReplay",
-        "alpha": 0.6,
-        "epsilon": 0.0001,
-        "batch_size": 256,
-        "max_size": 200000,
-        "use_cer": true
-      },
-      "net": {
-        "type": "MLPNet",
-        "hid_layers": [
-          256,
-          256
-        ],
-        "hid_layers_activation": "relu",
-        "init_fn": "orthogonal_",
-        "clip_grad_val": 0.5,
-        "loss_spec": {
-          "name": "MSELoss"
-        },
-        "optim_spec": {
-          "name": "AdamW",
-          "lr": 0.0007
-        },
-        "update_type": "polyak",
-        "update_frequency": 1,
-        "polyak_coef": 0.005,
-        "gpu": "auto"
-      }
-    },
-    "env": {
-      "name": "HalfCheetahBulletEnv-v0",
-      "num_envs": 8,
-      "max_t": null,
-      "max_frame": 2e6
-    },
-    "meta": {
-      "distributed": false,
-      "log_frequency": 1000,
-      "eval_frequency": 1000,
-      "rigorous_eval": 0,
-      "max_session": 4,
-      "max_trial": 1
-    }
-  }
-}
\ No newline at end of file
diff --git a/slm_lab/spec/benchmark/sac/sac_per_mujoco.json b/slm_lab/spec/benchmark/sac/sac_per_mujoco.json
deleted file mode 100644
index f2a13ae82..000000000
--- a/slm_lab/spec/benchmark/sac/sac_per_mujoco.json
+++ /dev/null
@@ -1,57 +0,0 @@
-{
-  "sac_per_mujoco": {
-    "agent": {
-      "name": "SoftActorCritic",
-      "algorithm": {
-        "name": "SoftActorCritic",
-        "action_pdtype": "default",
-        "action_policy": "default",
-        "gamma": 0.99,
-        "training_frequency": 1
-      },
-      "memory": {
-        "name": "PrioritizedReplay",
-        "alpha": 0.6,
-        "epsilon": 0.0001,
-        "batch_size": 256,
-        "max_size": 1000000,
-        "use_cer": true
-      },
-      "net": {
-        "type": "MLPNet",
-        "hid_layers": [
-          256,
-          256
-        ],
-        "hid_layers_activation": "relu",
-        "init_fn": "orthogonal_",
-        "clip_grad_val": 1.0,
-        "loss_spec": {
-          "name": "MSELoss"
-        },
-        "optim_spec": {
-          "name": "AdamW",
-          "lr": 0.001
-        },
-        "update_type": "polyak",
-        "update_frequency": 1,
-        "polyak_coef": 0.005,
-        "gpu": "auto"
-      }
-    },
-    "env": {
-      "name": "${env}",
-      "num_envs": 8,
-      "max_t": null,
-      "max_frame": 2e6
-    },
-    "meta": {
-      "distributed": false,
-      "log_frequency": 1000,
-      "eval_frequency": 1000,
-      "rigorous_eval": 0,
-      "max_session": 4,
-      "max_trial": 7
-    }
-  }
-}
\ No newline at end of file
diff --git a/slm_lab/spec/benchmark/sac/sac_pong.json b/slm_lab/spec/benchmark/sac/sac_pong.json
deleted file mode 100644
index c239395e8..000000000
--- a/slm_lab/spec/benchmark/sac/sac_pong.json
+++ /dev/null
@@ -1,83 +0,0 @@
-{
-  "sac_pong": {
-    "agent": {
-      "name": "SoftActorCritic",
-      "algorithm": {
-        "name": "SoftActorCritic",
-        "action_pdtype": "GumbelSoftmax",
-        "action_policy": "default",
-        "gamma": 0.99,
-        "training_start_step": 1000,
-        "training_frequency": 1
-      },
-      "memory": {
-        "name": "Replay",
-        "batch_size": 512,
-        "max_size": 1000000,
-        "use_cer": false
-      },
-      "net": {
-        "type": "ConvNet",
-        "shared": false,
-        "conv_hid_layers": [
-          [
-            32,
-            8,
-            4,
-            0,
-            1
-          ],
-          [
-            64,
-            4,
-            2,
-            0,
-            1
-          ],
-          [
-            32,
-            3,
-            1,
-            0,
-            1
-          ]
-        ],
-        "fc_hid_layers": [
-          256,
-          256
-        ],
-        "hid_layers_activation": "leakyrelu",
-        "init_fn": "orthogonal_",
-        "normalize": true,
-        "batch_norm": true,
-        "clip_grad_val": 0.5,
-        "use_same_optim": false,
-        "loss_spec": {
-          "name": "SmoothL1Loss"
-        },
-        "optim_spec": {
-          "name": "AdamW",
-          "lr": 0.00025
-        },
-        "update_type": "polyak",
-        "update_frequency": 1,
-        "polyak_coef": 0.005,
-        "gpu": "auto"
-      }
-    },
-    "env": {
-      "name": "ALE/Pong-v5",
-      "num_envs": 4,
-      "max_t": null,
-      "max_frame": 1000000.0
-    },
-    "meta": {
-      "distributed": false,
-      "log_frequency": 10000,
-      "eval_frequency": 100,
-      "rigorous_eval": 0,
-      "max_session": 4,
-      "max_trial": 1
-    }
-  }
-}