You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Max generated tests per problem (keep low for training speed)
PISTON_TIMEOUT
No
30
Seconds before Piston request times out
Integration
Option A: Drop into your existing verl_reward_func
# In train_grpo_limr_zero3.py, add to verl_reward_func:elifdata_source=="open-r1/codeforces":
fromcodeforces_rewardimportcompute_score_codeforcesreturncompute_score_codeforces(solution_str, ground_truth, extra_info)
Option B: Use the bundled dispatcher
fromcodeforces_rewardimportverl_reward_func# handles all data_sources including codeforces
How It Works
Model output Ground truth (from dataset builder)
───────────── ───────────────────────────────────
## Problem 1 gt_list[0] = {
<reasoning> "id": "1234_A",
```python "tests": [{input, output}, ...],
<code> "generated_checker": "...",
``` "time_limit": 2.0,
...
## Problem 2 }
... gt_list[1] = { ... }
Parse: Split output on ## Problem K headers → extract last code block per section
Load tests: Official tests from ground_truth dict + generated tests from parquet (lazy-cached)
Execute: Send each solution to Piston with test input, expected output, checker, and limits
Binary per-problem (default): 1.0 only if ALL tests pass for that problem
Partial credit per-problem: Uncomment the alternative return in _score_single_cf_problem
Overall: Always returns fraction of problems fully solved
Notes
fail_fast=True stops testing a problem on first failure (faster during training)
Generated tests are cached in memory per-contest after first load
The function handles both single-problem and multi-problem formats automatically
Custom checkers are passed to Piston as checker.py when present
About
Minimal framework for LLMs to build persistent reasoning modules, enabling modularity, credit assignment, and the emergence of higher-order reasoning protocols.