Skip to content

Conversation

@Mtrya
Copy link
Owner

@Mtrya Mtrya commented Jan 19, 2026

Status

Draft PR for ongoing development on the dev branch.

Overview

This branch implements foundational infrastructure for AstroReason-Bench - a
benchmark suite for evaluating LLM agents on space mission planning problems.

Current Changes (ready for review/testing)

  • ✅ Completed: SPOT5 benchmark (CNES 2001 satellite photography scheduling)
    • Dataset with problem instances
    • Standalone verifier for validation + scoring
  • ✅ Completed: SatNet benchmark (2021 NASA/JPL DSN scheduling)
    • Dataset with problems and metadata
    • Standalone verifier for validation + scoring
    • Reference agentic baseline
  • ✅ Refactored: Repository structure
    • Migrated to standalone benchmarks/ model (no inter-dependencies)
    • Deprecated abstraction layers (toolkits/, engines/) - using established
      libraries directly
  • 🛠️ Environment: Migrated from uvpixi (enables tudatpy integration)

Roadmap Context

Phase 1 (current): Legacy benchmarks

  • ✅ spot5 (satellite photography scheduling)
  • ✅ satnet (NASA/JPL DSN scheduling)
  • 🔲 aeosbench (BUAA Earth observation)

Future phases: LEO constellations → Deep space → Rocket trajectories (if permits)

PR Purpose

This is a draft PR to track development progress. Main branch remains stable
snapshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants