We welcome contributions! This guide covers how to contribute code, tasks, and documentation.
- Fork the repository
- Create a branch:
git checkout -b feature/your-feature - Make changes and test
- Submit a pull request
- Python 3.12+ with type hints
- 4-space indentation
snake_casefor functions,PascalCasefor classes- Use
utils.logginginstead of print statements - No fallback values that hide errors
Tasks are the core of ResearchGym. Each task represents an open research problem from a recent ML paper.
- Source: Must be from a peer-reviewed venue (ACL/ICML/ICLR/NeurIPS/etc.)
- Objective metric: Must have quantitative evaluation (accuracy, F1, mIoU, etc.)
- Reproducible baselines: Must include working baseline code with documented scores
- No solution code: Task should provide the problem setup, not the paper's solution
tasks/test/<task-name>/
├── task_description.md # Required: Problem statement
├── requirements.txt # Required: Dependencies
├── install.sh # Optional: Setup script
├── grading/
│ └── grade.sh # Required: Evaluation script
├── idea_hint.txt # Optional: Hints for agents
└── <baseline-code>/ # Baseline implementation
## Research Goal
[1-2 paragraphs: What problem are we solving? Why does it matter?]
## Experimental Settings
- **Datasets**: [List datasets with sizes]
- **Baselines**: [List baseline methods]
- **Hardware**: [GPU requirements if any]
## Evaluation Metrics
- [Metric 1]: [Description]
- [Metric 2]: [Description]
## Data Setup
[Commands to download/prepare data]
## Baseline Results (to beat)
| Method | Metric1 | Metric2 |
|--------|---------|---------|
| Baseline1 | X.XX | X.XX |
| Your Method | -- | -- |
## Hint
[Optional: Guidance without giving away the solution]grading/grade.sh must:
- Output JSON to stdout:
{"metric_name": value, ...} - Return exit code 0 on success
- Work within the task workspace
Example:
#!/bin/bash
cd "$(dirname "$0")/.."
python evaluate.py --output-format json- Create task directory following the structure above
- Verify baselines run and produce expected scores
- Test with RGAgent for ~1 hour to ensure it's tractable
- Submit PR with:
- Task files
- Paper reference (title, authors, venue, year)
- Why this task is suitable for ResearchGym
- Create adapter in
agents/followingrg_agent_adapter.py - Implement required methods:
def prepare_workspace(self, task_dir: Path) -> None def run(self, cfg, agent_root, dry_run) -> Observation
- Add CLI arguments in
run_agent.py - Add README documenting tools and configuration
- Submit PR with example run demonstrating it works
- Ensure code passes linting
- Update documentation if needed
- Add yourself to contributors (optional)
- Request review
- Bugs: Include reproduction steps, error messages, environment info
- Tasks: Propose via issue first before implementing
- Features: Describe use case and proposed solution
Open a GitHub issue or discussion.