metalanguage

CodeForces Multi-Problem Reward Function

Setup

pip install requests pyarrow pandas filelock

# Download generated tests (~110GB) — optional but recommended
pip install -U "huggingface_hub[cli,hf_xet]"
huggingface-cli download open-r1/codeforces \
    --repo-type=dataset \
    --include='generated_tests/*.parquet' \
    --max-workers=8 \
    --local-dir /path/to/cf_data

Environment Variables

Variable	Required	Default	Description
`PISTON_ENDPOINT`	Yes	`http://localhost:2000`	Piston sandbox URL
`CF_GENERATED_TESTS`	No	`""`	Path to dir containing `generated_tests/*.parquet`
`CF_MAX_GENERATED_TESTS`	No	`3`	Max generated tests per problem (keep low for training speed)
`PISTON_TIMEOUT`	No	`30`	Seconds before Piston request times out

Integration

Option A: Drop into your existing `verl_reward_func`

# In train_grpo_limr_zero3.py, add to verl_reward_func:
elif data_source == "open-r1/codeforces":
    from codeforces_reward import compute_score_codeforces
    return compute_score_codeforces(solution_str, ground_truth, extra_info)

Option B: Use the bundled dispatcher

from codeforces_reward import verl_reward_func
# handles all data_sources including codeforces

How It Works

Model output                    Ground truth (from dataset builder)
─────────────                   ───────────────────────────────────
## Problem 1                    gt_list[0] = {
<reasoning>                       "id": "1234_A",
```python                         "tests": [{input, output}, ...],
<code>                            "generated_checker": "...",
```                               "time_limit": 2.0,
                                  ...
## Problem 2                    }
...                             gt_list[1] = { ... }

Parse: Split output on ## Problem K headers → extract last code block per section
Load tests: Official tests from ground_truth dict + generated tests from parquet (lazy-cached)
Execute: Send each solution to Piston with test input, expected output, checker, and limits
Score: num_problems_correct / total_problems → reward ∈ [0, 1]

Scoring Modes

Binary per-problem (default): 1.0 only if ALL tests pass for that problem
Partial credit per-problem: Uncomment the alternative return in _score_single_cf_problem
Overall: Always returns fraction of problems fully solved

Notes

fail_fast=True stops testing a problem on first failure (faster during training)
Generated tests are cached in memory per-contest after first load
The function handles both single-problem and multi-problem formats automatically
Custom checkers are passed to Piston as checker.py when present

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
canvas_tool		canvas_tool
trainer_configs		trainer_configs
README.md		README.md
bench_eval.py		bench_eval.py
data_utils.py		data_utils.py
dynamicgen_dataset.py		dynamicgen_dataset.py
llm_string_parser.py		llm_string_parser.py
rewards.py		rewards.py
setup.sh		setup.sh
summarize_modules.py		summarize_modules.py
verl_train_dense.sh		verl_train_dense.sh
verl_train_dense_fully_async.sh		verl_train_dense_fully_async.sh
verl_train_moe.sh		verl_train_moe.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metalanguage

CodeForces Multi-Problem Reward Function

Setup

Environment Variables

Integration

Option A: Drop into your existing `verl_reward_func`

Option B: Use the bundled dispatcher

How It Works

Scoring Modes

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

metalanguage

CodeForces Multi-Problem Reward Function

Setup

Environment Variables

Integration

Option A: Drop into your existing verl_reward_func

Option B: Use the bundled dispatcher

How It Works

Scoring Modes

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Option A: Drop into your existing `verl_reward_func`

Packages