Eval: loss-only path + compute receipt by StanByriukov02 · Pull Request #27 · 1x-technologies/1xgpt

StanByriukov02 · 2026-02-04T11:14:57Z

I was evaluating GENIE-style checkpoints and kept running into the same thing: sometimes I only need CE loss (teacher-forced logits), but the current eval loop still does MaskGIT refinement + decode/LPIPS/acc work.

This PR adds a clean fast path for that case.

What changed

--loss_only: compute CE loss from temporally teacher-forced logits and skip MaskGIT sampling (no refinement loop), skip decoding/LPIPS, and skip accuracy.
--skip_decode: keep sampling/logits, but skip decoding + LPIPS (useful if you still want generation metrics without the heavy decode pipeline).
--device {cpu|cuda}: makes it easier to reproduce runs on CPU-only machines too.
Receipt: prints compute_logits_calls and compute_logits_calls_per_frame, so you can see how many full forward-logits passes actually happened.
xformers becomes optional: if it’s not installed (or XFORMERS_DISABLED=true), the code falls back to the basic attention path.

Guarantees / scaling / what I’m not claiming

Guaranteed GPU-compute reduction for --loss_only: per generated frame, full forward-logits passes drop from ~maskgit_steps to 1. For maskgit_steps=2 that’s ~2× on the forward-pass-heavy part.
Scales with quality mode: if someone runs maskgit_steps=K, the compute ceiling is ~K× (so 5–10× is possible at K=5..10).
CPU side matters too: skipping decode/LPIPS/acc avoids a big chunk of CPU + pipeline overhead when you only care about loss, and the new receipt makes it obvious you’re not doing extra work.
Not promised: end-to-end Joules/frame for the full eval script (depends on dataset IO, decode, etc). I did run a small forward-pass microbench on an H100 to sanity check direction.

H100 sanity check (forward-pass microbench)

maskgit_steps=2, baseline does 2 compute_logits calls per timestep, loss_only does 1
measured ratio_J ≈ 1.83× lower Joules for the forward-pass segment (same model config, repeated loop to smooth power sampling)

Repro (receipt)

baseline:
python genie/evaluate.py --checkpoint_dir --maskgit_steps 2
loss-only:
python genie/evaluate.py --checkpoint_dir --maskgit_steps 2 --loss_only

You should see compute_logits_calls_per_frame drop ~2.00 -> ~1.00 for maskgit_steps=2.

This adds a fast path for CE loss evaluation that skips MaskGIT refinement sampling and the decode/LPIPS pipeline when you only care about loss. Also prints compute_logits_calls (and per-frame) so you can see exactly how many full forward logits passes happened. Notes: - Guaranteed: with --loss_only, compute_logits calls per generated frame drop from ~maskgit_steps to 1 (so maskgit_steps=2 is ~2x on the GPU-compute-heavy part). - Scales: if someone runs maskgit_steps=K for quality/sampling, the compute ceiling is ~Kx. - Not promised: end-to-end Joules/frame depends on your full eval setup; I only measured a small forward-pass microbench on H100 as a sanity check.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval: loss-only path + compute receipt#27

Eval: loss-only path + compute receipt#27
StanByriukov02 wants to merge 1 commit into1x-technologies:mainfrom
StanByriukov02:perf/loss-only-receipt

StanByriukov02 commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

StanByriukov02 commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants