Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Python package

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
build:

runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest pytest-asyncio
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Test with pytest
run: |
export PYTHONPATH=$PYTHONPATH:$(pwd)/imobench-pylib/src
pytest imobench-pylib/tests/test_loader.py
17 changes: 13 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
# Superhuman Reasoning

This repository hosts projects and datasets created by Google DeepMind's
Superhuman Reasoning team.
Superhuman Reasoning team, led by Thang Luong.

## Projects
### AlphaGeometry
Nature [paper](https://www.nature.com/articles/s41586-023-06747-5).
See https://github.com/google-deepmind/alphageometry.

### AlphaGeometry2
[2024 IMO-silver achievement](https://deepmind.google/blog/ai-solves-imo-problems-at-silver-medal-level/).
See https://github.com/google-deepmind/alphageometry2.

### [IMO Bench](imobench/README.md)
A suite of advanced benchmarks designed to evaluate robust mathematical
reasoning in AI. Following our
[2025 IMO gold medal achievement](https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/),
[2025 IMO-gold achievement](https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/),
this release includes:

* *IMO-AnswerBench*: 400 challenging short-answer problems.
Expand All @@ -18,9 +25,11 @@ reasoning in AI. Following our
* *IMO-GradingBench*: A dataset of 1000 human gradings to advance automatic
evaluation.



## [Aletheia](aletheia/README.md)
A reasoning agent powered by Gemini Deep Think that can iteratively generate,
verify, and revise solutions.
A math research agent, powered by Gemini Deep Think, that can iteratively
generate, verify, and revise solutions. See [paper](aletheia/Aletheia.pdf).

This release includes prompts and outputs from Aletheia on research level math
problems.
Expand Down
Binary file added aletheia/Aletheia.pdf
Binary file not shown.
137 changes: 137 additions & 0 deletions imobench-pylib/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
Pipfile.lock

# PEP 582
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# IDEs
.vscode/
.idea/
*.swp
*.swo
*~

# macOS
.DS_Store

# Project specific
*.csv.gz
*.csv.bz2
178 changes: 178 additions & 0 deletions imobench-pylib/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Contributing to IMO Bench Python Library

Thank you for your interest in contributing to the IMO Bench Python library!

## Repository Structure

This library (`imobench-pylib`) is part of the larger [Superhuman Reasoning](https://github.com/google-deepmind/superhuman) repository by Google DeepMind.

## Types of Contributions

### Bug Reports

If you find a bug, please open an issue with:
- Clear description of the problem
- Minimal reproducible example
- Expected vs actual behavior
- Python version and environment details
- Relevant error messages and stack traces

### Feature Requests

For new features:
- Describe the use case
- Explain why it would benefit users
- Provide example API usage if possible

### Code Contributions

1. **Fork and Clone**
```bash
git clone https://github.com/YOUR-USERNAME/superhuman.git
cd superhuman/imobench-pylib
```

2. **Set Up Development Environment**
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e ".[dev]"
```

3. **Create a Branch**
```bash
git checkout -b feature/your-feature-name
```

4. **Make Changes**
- Follow existing code style
- Use type hints
- Add docstrings
- Keep functions focused and testable

5. **Write Tests**
```bash
# Add tests in tests/
pytest tests/
```

6. **Check Code Quality**
```bash
# Format code
black src/ tests/

# Type checking
mypy src/

# Linting
ruff check src/ tests/
```

7. **Run All Tests**
```bash
pytest tests/ -v --cov=imobench
```

8. **Commit and Push**
```bash
git add .
git commit -m "Add: brief description of changes"
git push origin feature/your-feature-name
```

9. **Open a Pull Request**
- Describe your changes clearly
- Link any related issues
- Ensure CI passes

## Development Guidelines

### Code Style

- Follow PEP 8
- Use type hints for all functions
- Maximum line length: 100 characters
- Use descriptive variable names

### Testing

- Write tests for new functionality
- Maintain or improve test coverage
- Test edge cases and error conditions
- Use pytest fixtures for common setup

### Documentation

- Add docstrings to all public functions/classes
- Update README if adding new features
- Add examples for new functionality
- Keep docstrings clear and concise

### Type Hints

```python
from typing import Optional, List

def load_data(
category: Optional[str] = None,
validate: bool = True
) -> List[Problem]:
"""Load problems with optional filtering.

Args:
category: Filter by category
validate: Enable validation

Returns:
List of Problem objects
"""
pass
```

## Project Structure

```
imobench-pylib/
├── src/imobench/ # Source code
│ ├── __init__.py # Public API
│ ├── types.py # Type definitions
│ ├── loader.py # Data loading
│ ├── validators.py # Validation logic
│ └── exceptions.py # Custom exceptions
├── tests/ # Test suite
├── examples/ # Usage examples
├── docs/ # Documentation
└── setup.py # Package configuration
```

## Commit Message Guidelines

Use clear, descriptive commit messages:

- `Add: new feature or functionality`
- `Fix: bug fix`
- `Update: modify existing functionality`
- `Refactor: code restructuring`
- `Docs: documentation changes`
- `Test: add or modify tests`
- `Chore: maintenance tasks`

Example:
```
Add: lazy loading support for gradingbench

- Implement iterator-based loading
- Add lazy parameter to load_gradingbench()
- Update tests and documentation
```

## Questions?

For questions about:
- **Library usage**: Open a GitHub issue
- **Dataset content**: See main repository
- **Research paper**: Check IMO Bench website

## License

By contributing, you agree that your contributions will be licensed under the Apache License 2.0.
Loading