Contributing to ResearchGym

We welcome contributions! This guide covers how to contribute code, tasks, and documentation.

Quick Start

Fork the repository
Create a branch: git checkout -b feature/your-feature
Make changes and test
Submit a pull request

Code Style

Python 3.12+ with type hints
4-space indentation
snake_case for functions, PascalCase for classes
Use utils.logging instead of print statements
No fallback values that hide errors

Adding New Tasks

Tasks are the core of ResearchGym. Each task represents an open research problem from a recent ML paper.

Task Requirements

Source: Must be from a peer-reviewed venue (ACL/ICML/ICLR/NeurIPS/etc.)
Objective metric: Must have quantitative evaluation (accuracy, F1, mIoU, etc.)
Reproducible baselines: Must include working baseline code with documented scores
No solution code: Task should provide the problem setup, not the paper's solution

Task Structure

tasks/test/<task-name>/
├── task_description.md    # Required: Problem statement
├── requirements.txt       # Required: Dependencies
├── install.sh            # Optional: Setup script
├── grading/
│   └── grade.sh          # Required: Evaluation script
├── idea_hint.txt         # Optional: Hints for agents
└── <baseline-code>/      # Baseline implementation

task_description.md Template

## Research Goal
[1-2 paragraphs: What problem are we solving? Why does it matter?]

## Experimental Settings
- **Datasets**: [List datasets with sizes]
- **Baselines**: [List baseline methods]
- **Hardware**: [GPU requirements if any]

## Evaluation Metrics
- [Metric 1]: [Description]
- [Metric 2]: [Description]

## Data Setup
[Commands to download/prepare data]

## Baseline Results (to beat)

| Method | Metric1 | Metric2 |
|--------|---------|---------|
| Baseline1 | X.XX | X.XX |
| Your Method | -- | -- |

## Hint
[Optional: Guidance without giving away the solution]

Grading Script

grading/grade.sh must:

Output JSON to stdout: {"metric_name": value, ...}
Return exit code 0 on success
Work within the task workspace

Example:

#!/bin/bash
cd "$(dirname "$0")/.."
python evaluate.py --output-format json

Submitting a Task

Create task directory following the structure above
Verify baselines run and produce expected scores
Test with RGAgent for ~1 hour to ensure it's tractable
Submit PR with:
- Task files
- Paper reference (title, authors, venue, year)
- Why this task is suitable for ResearchGym

Adding New Agents

Create adapter in agents/ following rg_agent_adapter.py

Implement required methods:

def prepare_workspace(self, task_dir: Path) -> None
def run(self, cfg, agent_root, dry_run) -> Observation

Add CLI arguments in run_agent.py
Add README documenting tools and configuration
Submit PR with example run demonstrating it works

Pull Request Process

Ensure code passes linting
Update documentation if needed
Add yourself to contributors (optional)
Request review

Reporting Issues

Bugs: Include reproduction steps, error messages, environment info
Tasks: Propose via issue first before implementing
Features: Describe use case and proposed solution

Questions?

Open a GitHub issue or discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to ResearchGym

Quick Start

Code Style

Adding New Tasks

Task Requirements

Task Structure

task_description.md Template

Grading Script

Submitting a Task

Adding New Agents

Pull Request Process

Reporting Issues

Questions?

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to ResearchGym

Quick Start

Code Style

Adding New Tasks

Task Requirements

Task Structure

task_description.md Template

Grading Script

Submitting a Task

Adding New Agents

Pull Request Process

Reporting Issues

Questions?