Skip to content

docs: SAC Atari benchmarks - all 58 games#531

Merged
kengz merged 1 commit intomasterfrom
feat/sac-atari
Feb 14, 2026
Merged

docs: SAC Atari benchmarks - all 58 games#531
kengz merged 1 commit intomasterfrom
feat/sac-atari

Conversation

@kengz
Copy link
Owner

@kengz kengz commented Feb 14, 2026

Summary

  • Complete SAC Atari benchmark across all 58 Atari games (2M frames, 4 seeds each)
  • Single universal spec (sac_atari.json): training_iter=3, Categorical, AdamW lr=3e-4
  • A2C+PPO+SAC comparison plots for all 58 games
  • Results graduated to public SLM-Lab/benchmark HF dataset
  • Streamlined CLAUDE.md and benchmark skill with data lifecycle docs
  • Removed stale SAC PER and sac_pong specs

Results

SAC generally underperforms PPO on Atari (wins ~10/58 games), useful as negative result.

Best SAC games: CrazyClimber 81839, Atlantis 64097, VideoPinball 22541
Worst SAC games: Tennis -374, FishingDerby -77, DoubleDunk -44, Enduro 0, Freeway 0

Test plan

  • All 58 scores extracted from trial_metrics and recorded in BENCHMARKS.md
  • All 58 HF data links verified pointing to public SLM-Lab/benchmark
  • All 58 comparison plots generated in docs/plots/
  • Audit passed (0 critical issues)
  • No code changes — docs, specs, and plots only

🤖 Generated with Claude Code

Complete SAC Atari benchmark across all 58 games (2M frames, 4 seeds).
Single universal spec (sac_atari.json): training_iter=3, Categorical, AdamW lr=3e-4.
A2C+PPO+SAC comparison plots and HF data graduated to public repo.
Streamlined CLAUDE.md and benchmark skill. Removed stale SAC PER specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kengz kengz merged commit 1124545 into master Feb 14, 2026
3 checks passed
@kengz kengz deleted the feat/sac-atari branch February 14, 2026 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant