google-deepmind · Ashutosh0x · Feb 6, 2026 · Feb 11, 2026 · Feb 12, 2026 · Feb 12, 2026
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -0,0 +1,32 @@
+name: Python package
+
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
+
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v5
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install pytest pytest-asyncio
+        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+    - name: Test with pytest
+      run: |
+        export PYTHONPATH=$PYTHONPATH:$(pwd)/imobench-pylib/src
+        pytest imobench-pylib/tests/test_loader.py
diff --git a/README.md b/README.md
@@ -1,14 +1,21 @@
 # Superhuman Reasoning
 
 This repository hosts projects and datasets created by Google DeepMind's
-Superhuman Reasoning team.
+Superhuman Reasoning team, led by Thang Luong.
 
 ## Projects
+### AlphaGeometry
+Nature [paper](https://www.nature.com/articles/s41586-023-06747-5).
+See https://github.com/google-deepmind/alphageometry.
+
+### AlphaGeometry2
+[2024 IMO-silver achievement](https://deepmind.google/blog/ai-solves-imo-problems-at-silver-medal-level/).
+See https://github.com/google-deepmind/alphageometry2.
 
 ### [IMO Bench](imobench/README.md)
 A suite of advanced benchmarks designed to evaluate robust mathematical
 reasoning in AI. Following our
-[2025 IMO gold medal achievement](https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/),
+[2025 IMO-gold achievement](https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/),
  this release includes:
 
 * *IMO-AnswerBench*: 400 challenging short-answer problems.
@@ -18,9 +25,11 @@ reasoning in AI. Following our
 * *IMO-GradingBench*: A dataset of 1000 human gradings to advance automatic
 evaluation.
 
+
+
 ## [Aletheia](aletheia/README.md)
-A reasoning agent powered by Gemini Deep Think that can iteratively generate,
-verify, and revise solutions.
+A math research agent, powered by Gemini Deep Think, that can iteratively
+generate, verify, and revise solutions. See [paper](aletheia/Aletheia.pdf).
 
 This release includes prompts and outputs from Aletheia on research level math
 problems.

diff --git a/aletheia/Aletheia.pdf b/aletheia/Aletheia.pdf
diff --git a/imobench-pylib/.gitignore b/imobench-pylib/.gitignore
@@ -0,0 +1,137 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+Pipfile.lock
+
+# PEP 582
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# IDEs
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# macOS
+.DS_Store
+
+# Project specific
+*.csv.gz
+*.csv.bz2
diff --git a/imobench-pylib/CONTRIBUTING.md b/imobench-pylib/CONTRIBUTING.md
@@ -0,0 +1,178 @@
+# Contributing to IMO Bench Python Library
+
+Thank you for your interest in contributing to the IMO Bench Python library!
+
+## Repository Structure
+
+This library (`imobench-pylib`) is part of the larger [Superhuman Reasoning](https://github.com/google-deepmind/superhuman) repository by Google DeepMind.
+
+## Types of Contributions
+
+### Bug Reports
+
+If you find a bug, please open an issue with:
+- Clear description of the problem
+- Minimal reproducible example
+- Expected vs actual behavior
+- Python version and environment details
+- Relevant error messages and stack traces
+
+### Feature Requests
+
+For new features:
+- Describe the use case
+- Explain why it would benefit users
+- Provide example API usage if possible
+
+### Code Contributions
+
+1. **Fork and Clone**
+   ```bash
+   git clone https://github.com/YOUR-USERNAME/superhuman.git
+   cd superhuman/imobench-pylib
+   ```
+
+2. **Set Up Development Environment**
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   pip install -e ".[dev]"
+   ```
+
+3. **Create a Branch**
+   ```bash
+   git checkout -b feature/your-feature-name
+   ```
+
+4. **Make Changes**
+- Follow existing code style
+   - Use type hints
+   - Add docstrings
+   - Keep functions focused and testable
+
+5. **Write Tests**
+   ```bash
+   # Add tests in tests/
+   pytest tests/
+   ```
+
+6. **Check Code Quality**
+   ```bash
+   # Format code
+   black src/ tests/
+
+   # Type checking
+   mypy src/
+
+   # Linting
+   ruff check src/ tests/
+   ```
+
+7. **Run All Tests**
+   ```bash
+   pytest tests/ -v --cov=imobench
+   ```
+
+8. **Commit and Push**
+   ```bash
+   git add .
+   git commit -m "Add: brief description of changes"
+   git push origin feature/your-feature-name
+   ```
+
+9. **Open a Pull Request**
+   - Describe your changes clearly
+   - Link any related issues
+   - Ensure CI passes
+
+## Development Guidelines
+
+### Code Style
+
+- Follow PEP 8
+- Use type hints for all functions
+- Maximum line length: 100 characters
+- Use descriptive variable names
+
+### Testing
+
+- Write tests for new functionality
+- Maintain or improve test coverage
+- Test edge cases and error conditions
+- Use pytest fixtures for common setup
+
+### Documentation
+
+- Add docstrings to all public functions/classes
+- Update README if adding new features
+- Add examples for new functionality
+- Keep docstrings clear and concise
+
+### Type Hints
+
+```python
+from typing import Optional, List
+
+def load_data(
+    category: Optional[str] = None,
+    validate: bool = True
+) -> List[Problem]:
+    """Load problems with optional filtering.
+
+    Args:
+        category: Filter by category
+        validate: Enable validation
+
+    Returns:
+        List of Problem objects
+    """
+    pass
+```
+
+## Project Structure
+
+```
+imobench-pylib/
+├── src/imobench/          # Source code
+│   ├── __init__.py        # Public API
+│   ├── types.py           # Type definitions
+│   ├── loader.py          # Data loading
+│   ├── validators.py      # Validation logic
+│   └── exceptions.py      # Custom exceptions
+├── tests/                 # Test suite
+├── examples/              # Usage examples
+├── docs/                  # Documentation
+└── setup.py              # Package configuration
+```
+
+## Commit Message Guidelines
+
+Use clear, descriptive commit messages:
+
+- `Add: new feature or functionality`
+- `Fix: bug fix`
+- `Update: modify existing functionality`
+- `Refactor: code restructuring`
+- `Docs: documentation changes`
+- `Test: add or modify tests`
+- `Chore: maintenance tasks`
+
+Example:
+```
+Add: lazy loading support for gradingbench
+
+- Implement iterator-based loading
+- Add lazy parameter to load_gradingbench()
+- Update tests and documentation
+```
+
+## Questions?
+
+For questions about:
+- **Library usage**: Open a GitHub issue
+- **Dataset content**: See main repository
+- **Research paper**: Check IMO Bench website
+
+## License
+
+By contributing, you agree that your contributions will be licensed under the Apache License 2.0.