Claude/headless phase4 019c e ru putfwx pma3 db m5f th by Iain-S · Pull Request #2 · Iain-S/brutus

Iain-S · 2025-11-21T17:13:09Z

No description provided.

This document outlines the work required to create a headless version of Julius that can be used with OpenAI Gymnasium for reinforcement learning research. Key findings: - Julius has clean separation between game logic and rendering - Game state is centralized in the city_data struct - Existing APIs can be wrapped for RL actions - Estimated 7-12 weeks for MVP implementation The scope covers: - Technical architecture analysis - Five implementation phases (headless engine, observation API, action API, Python bindings, testing) - Example API designs for C and Python - Challenges and solutions - Timeline and effort estimates Military scenarios excluded as requested.

Add HEADLESS_BUILD option to CMakeLists.txt to build Julius without SDL2/SDL2_mixer dependencies. This is the first step toward creating a gymnasium-compatible interface. Changes: - Added HEADLESS_BUILD option and conditional SDL2 dependency - Created src/platform/headless.c: minimal main loop without rendering - Created src/platform/headless_stubs.c: stub implementations for platform/screen and sound functions - Modified src/platform/arguments.c: use standard C string functions instead of SDL equivalents when in headless mode - Modified src/platform/log.c: use printf/fprintf instead of SDL logging when in headless mode The headless build successfully compiles and creates a working executable that can initialize the game engine without graphics or sound. Build with: cmake -DHEADLESS_BUILD=ON .. Part of Phase 1 from HEADLESS_GYMNASIUM_SCOPE.md

Add comprehensive observation API to extract game state for reinforcement learning. This provides structured access to all observable game metrics without requiring direct access to internal city_data structures. Changes: - Created src/gymnasium/observation.h: Defines gymnasium_observation_t structure containing all observable game state (ratings, finance, population, labor, resources, buildings, time, victory) - Created src/gymnasium/observation.c: Implements state extraction using existing public city_* APIs. Extracts ~40+ different metrics including: * Ratings (culture, prosperity, peace, favor) * Finance (treasury, tax rate, income/expenses) * Population (total, working age, sentiment) * Labor (workers, unemployment, wages) * Resources (food stocks, types, supply months) * Buildings (aggregated counts by category) * Migration (newcomers) * Culture coverage (entertainment, education, health) * Time (year, month) * Victory goals and status - Updated CMakeLists.txt: Added GYMNASIUM_FILES group to build - Created test/gymnasium/test_observation.c: Unit tests for observation API including: * Structure clearing * NULL pointer handling * Basic observation extraction * Range validation for all fields - Updated test/CMakeLists.txt: Added test_observation executable and test registration All tests pass (3/3). The observation API successfully extracts game state even without Caesar III data files loaded. Note: Some fields set to 0 where public APIs don't exist: - immigration/emigration amounts (internal to city_data) - housing capacity (no public accessor) - food consumed/produced last month (not exposed) - average religion coverage (per-god only) Part of Phase 2 from HEADLESS_GYMNASIUM_SCOPE.md

Add comprehensive action API to control the game programmatically for reinforcement learning. This provides 55 different actions organized into administrative, building, and destruction categories. Changes: - Created src/gymnasium/action.h: Defines action types, structures, and result codes. Includes: * 55 action types (5 administrative, 49 building, 1 demolition) * gymnasium_action_t structure for specifying actions * gymnasium_action_result_t for execution feedback * Error codes for detailed failure reasons - Created src/gymnasium/action.c: Implements action execution and validation. Features: * Action type to building type mapping * Administrative actions (adjust tax/wages) * Building construction via building_construction_place_building() * Land clearing via building_construction_clear_land() * Comprehensive validation (coordinates, action type, etc.) * Human-readable action names for debugging - Updated CMakeLists.txt: Added action.c to GYMNASIUM_FILES - Created test/gymnasium/test_action.c: Comprehensive unit tests including: * Action validation (valid/invalid types, coordinates) * Action name mapping * Administrative action execution (tax, wages) * Building action execution * Clear land action execution * Invalid execution handling * NULL pointer handling - Updated test/CMakeLists.txt: Added test_action executable All 8 tests passing (8/8). The action API successfully validates and executes actions even without a fully initialized game state. Supported actions include: - Administrative: tax up/down, wages up/down, wait - Buildings: housing, roads, services (health, education, entertainment, religion), infrastructure (markets, granaries, warehouses), production (farms, workshops, raw materials) - Destruction: clear land Part of Phase 3 from HEADLESS_GYMNASIUM_SCOPE.md

This commit adds complete Python bindings for the Julius gymnasium API: C Library API: - Added src/gymnasium/gymnasium.h: Public C API for environment control - Added src/gymnasium/gymnasium.c: Environment lifecycle, step execution, reward calculation - Implements create, destroy, reset, step, and get_observation functions - Reward system based on rating improvements and city performance - Victory/defeat detection with appropriate rewards/penalties CMake Build System: - Added BUILD_GYMNASIUM_LIB option to build libjulius_gym.so - Shared library includes all game code for Python to access - Proper library versioning (1.8.0) - Installs headers to include/julius_gym/ Python Package: - Created python/julius_gym/ package with gymnasium.Env implementation - JuliusEnv class wraps C library using ctypes - Complete observation space with 40+ metrics (ratings, finance, population, etc.) - Discrete action space with 55 actions (administrative, building, destruction) - Automatic library discovery and loading - Full type hints and docstrings Examples and Documentation: - python/examples/random_agent.py: Basic usage with random actions - python/examples/train_ppo.py: Training with stable-baselines3 PPO - python/README.md: Comprehensive documentation with usage examples - python/setup.py: Package installation configuration - python/test_import.py: Import verification script Updated .gitignore: - Added Python-specific patterns (__pycache__, *.pyc, etc.) The library compiles successfully and is ready for RL training once gymnasium is installed and Caesar III data files are provided.

This commit adds comprehensive RL training examples optimized for 4GB GPU memory constraints: Agent Examples: - train_a2c.py: A2C agent (500MB-1GB GPU, fastest training) - train_dqn.py: DQN agent (1-2GB GPU, good exploration) - train_ppo_optimized.py: PPO agent (1.5-2.5GB GPU, best performance) All agents use small networks ([128, 128]) and memory-efficient hyperparameters while maintaining effectiveness. Environment Wrappers (julius_gym/wrappers.py): - SimplifyObservation: Reduces observation space from 40+ to ~10 metrics - NormalizeObservation: Normalizes values to [-1, 1] range - FlattenObservation: Flattens dict to array for DQN - RewardShaping: Adds shaped rewards for faster learning - make_efficient_env(): Helper to create wrapped environment Training Utilities (examples/train_utils.py): - TensorboardCallback: Logs additional metrics - ProgressCallback: Prints training progress - EarlyStoppingCallback: Stops if no improvement - evaluate_agent(): Detailed evaluation with statistics - compare_agents(): Compare multiple trained agents Documentation (examples/README.md): - Comprehensive guide for all algorithms - Memory usage comparison table - Training tips and troubleshooting - Hyperparameter tuning guide - Advanced usage examples Key Features: - All scripts support GPU/CPU training - Automatic checkpointing every 10k steps - TensorBoard logging - Parallel environment support - Resume training from saved models - Detailed evaluation after training Memory Efficiency: - A2C: ~500MB-1GB (recommended for 4GB GPU) - DQN: ~1-2GB (small replay buffer) - PPO: ~1.5-2.5GB (small batches, fewer epochs) All agents are ready to train on Julius with Caesar III data.

claude and others added 14 commits November 17, 2025 12:02

Add Testing/ to .gitignore (CTest artifact)

2bcf2a0

Flatten observation space

7899330

More claude changes

b73020d

More claude changes

ad35e67

Fix observation struct alignment and add reset stability test

8b9f404

Redirect headless config paths and anchor run output

90a1eef

Ignore local run outputs

a616731

GPT stuff

9299f7d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/headless phase4 019c e ru putfwx pma3 db m5f th#2

Claude/headless phase4 019c e ru putfwx pma3 db m5f th#2
Iain-S wants to merge 14 commits intomasterfrom
claude/headless-phase4-019cERuPUTFWXPma3DbM5fTH

Iain-S commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Iain-S commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments