Claude/headless phase4 019c e ru putfwx pma3 db m5f th#2
Open
Claude/headless phase4 019c e ru putfwx pma3 db m5f th#2
Conversation
This document outlines the work required to create a headless version of Julius that can be used with OpenAI Gymnasium for reinforcement learning research. Key findings: - Julius has clean separation between game logic and rendering - Game state is centralized in the city_data struct - Existing APIs can be wrapped for RL actions - Estimated 7-12 weeks for MVP implementation The scope covers: - Technical architecture analysis - Five implementation phases (headless engine, observation API, action API, Python bindings, testing) - Example API designs for C and Python - Challenges and solutions - Timeline and effort estimates Military scenarios excluded as requested.
Add HEADLESS_BUILD option to CMakeLists.txt to build Julius without SDL2/SDL2_mixer dependencies. This is the first step toward creating a gymnasium-compatible interface. Changes: - Added HEADLESS_BUILD option and conditional SDL2 dependency - Created src/platform/headless.c: minimal main loop without rendering - Created src/platform/headless_stubs.c: stub implementations for platform/screen and sound functions - Modified src/platform/arguments.c: use standard C string functions instead of SDL equivalents when in headless mode - Modified src/platform/log.c: use printf/fprintf instead of SDL logging when in headless mode The headless build successfully compiles and creates a working executable that can initialize the game engine without graphics or sound. Build with: cmake -DHEADLESS_BUILD=ON .. Part of Phase 1 from HEADLESS_GYMNASIUM_SCOPE.md
Add comprehensive observation API to extract game state for reinforcement learning. This provides structured access to all observable game metrics without requiring direct access to internal city_data structures. Changes: - Created src/gymnasium/observation.h: Defines gymnasium_observation_t structure containing all observable game state (ratings, finance, population, labor, resources, buildings, time, victory) - Created src/gymnasium/observation.c: Implements state extraction using existing public city_* APIs. Extracts ~40+ different metrics including: * Ratings (culture, prosperity, peace, favor) * Finance (treasury, tax rate, income/expenses) * Population (total, working age, sentiment) * Labor (workers, unemployment, wages) * Resources (food stocks, types, supply months) * Buildings (aggregated counts by category) * Migration (newcomers) * Culture coverage (entertainment, education, health) * Time (year, month) * Victory goals and status - Updated CMakeLists.txt: Added GYMNASIUM_FILES group to build - Created test/gymnasium/test_observation.c: Unit tests for observation API including: * Structure clearing * NULL pointer handling * Basic observation extraction * Range validation for all fields - Updated test/CMakeLists.txt: Added test_observation executable and test registration All tests pass (3/3). The observation API successfully extracts game state even without Caesar III data files loaded. Note: Some fields set to 0 where public APIs don't exist: - immigration/emigration amounts (internal to city_data) - housing capacity (no public accessor) - food consumed/produced last month (not exposed) - average religion coverage (per-god only) Part of Phase 2 from HEADLESS_GYMNASIUM_SCOPE.md
Add comprehensive action API to control the game programmatically for reinforcement learning. This provides 55 different actions organized into administrative, building, and destruction categories. Changes: - Created src/gymnasium/action.h: Defines action types, structures, and result codes. Includes: * 55 action types (5 administrative, 49 building, 1 demolition) * gymnasium_action_t structure for specifying actions * gymnasium_action_result_t for execution feedback * Error codes for detailed failure reasons - Created src/gymnasium/action.c: Implements action execution and validation. Features: * Action type to building type mapping * Administrative actions (adjust tax/wages) * Building construction via building_construction_place_building() * Land clearing via building_construction_clear_land() * Comprehensive validation (coordinates, action type, etc.) * Human-readable action names for debugging - Updated CMakeLists.txt: Added action.c to GYMNASIUM_FILES - Created test/gymnasium/test_action.c: Comprehensive unit tests including: * Action validation (valid/invalid types, coordinates) * Action name mapping * Administrative action execution (tax, wages) * Building action execution * Clear land action execution * Invalid execution handling * NULL pointer handling - Updated test/CMakeLists.txt: Added test_action executable All 8 tests passing (8/8). The action API successfully validates and executes actions even without a fully initialized game state. Supported actions include: - Administrative: tax up/down, wages up/down, wait - Buildings: housing, roads, services (health, education, entertainment, religion), infrastructure (markets, granaries, warehouses), production (farms, workshops, raw materials) - Destruction: clear land Part of Phase 3 from HEADLESS_GYMNASIUM_SCOPE.md
This commit adds complete Python bindings for the Julius gymnasium API: C Library API: - Added src/gymnasium/gymnasium.h: Public C API for environment control - Added src/gymnasium/gymnasium.c: Environment lifecycle, step execution, reward calculation - Implements create, destroy, reset, step, and get_observation functions - Reward system based on rating improvements and city performance - Victory/defeat detection with appropriate rewards/penalties CMake Build System: - Added BUILD_GYMNASIUM_LIB option to build libjulius_gym.so - Shared library includes all game code for Python to access - Proper library versioning (1.8.0) - Installs headers to include/julius_gym/ Python Package: - Created python/julius_gym/ package with gymnasium.Env implementation - JuliusEnv class wraps C library using ctypes - Complete observation space with 40+ metrics (ratings, finance, population, etc.) - Discrete action space with 55 actions (administrative, building, destruction) - Automatic library discovery and loading - Full type hints and docstrings Examples and Documentation: - python/examples/random_agent.py: Basic usage with random actions - python/examples/train_ppo.py: Training with stable-baselines3 PPO - python/README.md: Comprehensive documentation with usage examples - python/setup.py: Package installation configuration - python/test_import.py: Import verification script Updated .gitignore: - Added Python-specific patterns (__pycache__, *.pyc, etc.) The library compiles successfully and is ready for RL training once gymnasium is installed and Caesar III data files are provided.
This commit adds comprehensive RL training examples optimized for 4GB GPU memory constraints: Agent Examples: - train_a2c.py: A2C agent (500MB-1GB GPU, fastest training) - train_dqn.py: DQN agent (1-2GB GPU, good exploration) - train_ppo_optimized.py: PPO agent (1.5-2.5GB GPU, best performance) All agents use small networks ([128, 128]) and memory-efficient hyperparameters while maintaining effectiveness. Environment Wrappers (julius_gym/wrappers.py): - SimplifyObservation: Reduces observation space from 40+ to ~10 metrics - NormalizeObservation: Normalizes values to [-1, 1] range - FlattenObservation: Flattens dict to array for DQN - RewardShaping: Adds shaped rewards for faster learning - make_efficient_env(): Helper to create wrapped environment Training Utilities (examples/train_utils.py): - TensorboardCallback: Logs additional metrics - ProgressCallback: Prints training progress - EarlyStoppingCallback: Stops if no improvement - evaluate_agent(): Detailed evaluation with statistics - compare_agents(): Compare multiple trained agents Documentation (examples/README.md): - Comprehensive guide for all algorithms - Memory usage comparison table - Training tips and troubleshooting - Hyperparameter tuning guide - Advanced usage examples Key Features: - All scripts support GPU/CPU training - Automatic checkpointing every 10k steps - TensorBoard logging - Parallel environment support - Resume training from saved models - Detailed evaluation after training Memory Efficiency: - A2C: ~500MB-1GB (recommended for 4GB GPU) - DQN: ~1-2GB (small replay buffer) - PPO: ~1.5-2.5GB (small batches, fewer epochs) All agents are ready to train on Julius with Caesar III data.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.