Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
multi_task.sh	multi_task.sh
multi_task.yaml	multi_task.yaml
requirements_ifbench.txt	requirements_ifbench.txt

Name

Last commit message

Last commit date

Multi-Task Evaluation Example

Configuring `multi_task.yaml`

eval.defaults defines inference parameters shared by every dataset entry. Override them inside an individual dataset block if needed.
eval.datasets enumerates the datasets to evaluate. Each entry should specify:
- name: a short identifier that appears in logs and dashboards.
- path: the path to the dataset JSONL file.
- rm_type: which reward function to use for scoring.
- n_samples_per_eval_prompt: how many candidate completions to generate per prompt.

When ifbench is used, slime/rollout/rm_hub/ifbench.py will automatically prepares the scoring environment, so no additional manual setup is required beyond providing the dataset path.