Skip to content

Claude/headless phase4 019c e ru putfwx pma3 db m5f th#2

Open
Iain-S wants to merge 14 commits intomasterfrom
claude/headless-phase4-019cERuPUTFWXPma3DbM5fTH
Open

Claude/headless phase4 019c e ru putfwx pma3 db m5f th#2
Iain-S wants to merge 14 commits intomasterfrom
claude/headless-phase4-019cERuPUTFWXPma3DbM5fTH

Conversation

@Iain-S
Copy link
Owner

@Iain-S Iain-S commented Nov 21, 2025

No description provided.

claude and others added 14 commits November 17, 2025 12:02
This document outlines the work required to create a headless version
of Julius that can be used with OpenAI Gymnasium for reinforcement
learning research.

Key findings:
- Julius has clean separation between game logic and rendering
- Game state is centralized in the city_data struct
- Existing APIs can be wrapped for RL actions
- Estimated 7-12 weeks for MVP implementation

The scope covers:
- Technical architecture analysis
- Five implementation phases (headless engine, observation API,
  action API, Python bindings, testing)
- Example API designs for C and Python
- Challenges and solutions
- Timeline and effort estimates

Military scenarios excluded as requested.
Add HEADLESS_BUILD option to CMakeLists.txt to build Julius without
SDL2/SDL2_mixer dependencies. This is the first step toward creating
a gymnasium-compatible interface.

Changes:
- Added HEADLESS_BUILD option and conditional SDL2 dependency
- Created src/platform/headless.c: minimal main loop without rendering
- Created src/platform/headless_stubs.c: stub implementations for
  platform/screen and sound functions
- Modified src/platform/arguments.c: use standard C string functions
  instead of SDL equivalents when in headless mode
- Modified src/platform/log.c: use printf/fprintf instead of SDL
  logging when in headless mode

The headless build successfully compiles and creates a working
executable that can initialize the game engine without graphics or
sound.

Build with: cmake -DHEADLESS_BUILD=ON ..

Part of Phase 1 from HEADLESS_GYMNASIUM_SCOPE.md
Add comprehensive observation API to extract game state for
reinforcement learning. This provides structured access to all
observable game metrics without requiring direct access to
internal city_data structures.

Changes:
- Created src/gymnasium/observation.h: Defines gymnasium_observation_t
  structure containing all observable game state (ratings, finance,
  population, labor, resources, buildings, time, victory)

- Created src/gymnasium/observation.c: Implements state extraction
  using existing public city_* APIs. Extracts ~40+ different metrics
  including:
  * Ratings (culture, prosperity, peace, favor)
  * Finance (treasury, tax rate, income/expenses)
  * Population (total, working age, sentiment)
  * Labor (workers, unemployment, wages)
  * Resources (food stocks, types, supply months)
  * Buildings (aggregated counts by category)
  * Migration (newcomers)
  * Culture coverage (entertainment, education, health)
  * Time (year, month)
  * Victory goals and status

- Updated CMakeLists.txt: Added GYMNASIUM_FILES group to build

- Created test/gymnasium/test_observation.c: Unit tests for
  observation API including:
  * Structure clearing
  * NULL pointer handling
  * Basic observation extraction
  * Range validation for all fields

- Updated test/CMakeLists.txt: Added test_observation executable
  and test registration

All tests pass (3/3). The observation API successfully extracts
game state even without Caesar III data files loaded.

Note: Some fields set to 0 where public APIs don't exist:
- immigration/emigration amounts (internal to city_data)
- housing capacity (no public accessor)
- food consumed/produced last month (not exposed)
- average religion coverage (per-god only)

Part of Phase 2 from HEADLESS_GYMNASIUM_SCOPE.md
Add comprehensive action API to control the game programmatically
for reinforcement learning. This provides 55 different actions
organized into administrative, building, and destruction categories.

Changes:
- Created src/gymnasium/action.h: Defines action types, structures,
  and result codes. Includes:
  * 55 action types (5 administrative, 49 building, 1 demolition)
  * gymnasium_action_t structure for specifying actions
  * gymnasium_action_result_t for execution feedback
  * Error codes for detailed failure reasons

- Created src/gymnasium/action.c: Implements action execution
  and validation. Features:
  * Action type to building type mapping
  * Administrative actions (adjust tax/wages)
  * Building construction via building_construction_place_building()
  * Land clearing via building_construction_clear_land()
  * Comprehensive validation (coordinates, action type, etc.)
  * Human-readable action names for debugging

- Updated CMakeLists.txt: Added action.c to GYMNASIUM_FILES

- Created test/gymnasium/test_action.c: Comprehensive unit tests
  including:
  * Action validation (valid/invalid types, coordinates)
  * Action name mapping
  * Administrative action execution (tax, wages)
  * Building action execution
  * Clear land action execution
  * Invalid execution handling
  * NULL pointer handling

- Updated test/CMakeLists.txt: Added test_action executable

All 8 tests passing (8/8). The action API successfully validates
and executes actions even without a fully initialized game state.

Supported actions include:
- Administrative: tax up/down, wages up/down, wait
- Buildings: housing, roads, services (health, education,
  entertainment, religion), infrastructure (markets, granaries,
  warehouses), production (farms, workshops, raw materials)
- Destruction: clear land

Part of Phase 3 from HEADLESS_GYMNASIUM_SCOPE.md
This commit adds complete Python bindings for the Julius gymnasium API:

C Library API:
- Added src/gymnasium/gymnasium.h: Public C API for environment control
- Added src/gymnasium/gymnasium.c: Environment lifecycle, step execution, reward calculation
- Implements create, destroy, reset, step, and get_observation functions
- Reward system based on rating improvements and city performance
- Victory/defeat detection with appropriate rewards/penalties

CMake Build System:
- Added BUILD_GYMNASIUM_LIB option to build libjulius_gym.so
- Shared library includes all game code for Python to access
- Proper library versioning (1.8.0)
- Installs headers to include/julius_gym/

Python Package:
- Created python/julius_gym/ package with gymnasium.Env implementation
- JuliusEnv class wraps C library using ctypes
- Complete observation space with 40+ metrics (ratings, finance, population, etc.)
- Discrete action space with 55 actions (administrative, building, destruction)
- Automatic library discovery and loading
- Full type hints and docstrings

Examples and Documentation:
- python/examples/random_agent.py: Basic usage with random actions
- python/examples/train_ppo.py: Training with stable-baselines3 PPO
- python/README.md: Comprehensive documentation with usage examples
- python/setup.py: Package installation configuration
- python/test_import.py: Import verification script

Updated .gitignore:
- Added Python-specific patterns (__pycache__, *.pyc, etc.)

The library compiles successfully and is ready for RL training once
gymnasium is installed and Caesar III data files are provided.
This commit adds comprehensive RL training examples optimized for
4GB GPU memory constraints:

Agent Examples:
- train_a2c.py: A2C agent (500MB-1GB GPU, fastest training)
- train_dqn.py: DQN agent (1-2GB GPU, good exploration)
- train_ppo_optimized.py: PPO agent (1.5-2.5GB GPU, best performance)

All agents use small networks ([128, 128]) and memory-efficient
hyperparameters while maintaining effectiveness.

Environment Wrappers (julius_gym/wrappers.py):
- SimplifyObservation: Reduces observation space from 40+ to ~10 metrics
- NormalizeObservation: Normalizes values to [-1, 1] range
- FlattenObservation: Flattens dict to array for DQN
- RewardShaping: Adds shaped rewards for faster learning
- make_efficient_env(): Helper to create wrapped environment

Training Utilities (examples/train_utils.py):
- TensorboardCallback: Logs additional metrics
- ProgressCallback: Prints training progress
- EarlyStoppingCallback: Stops if no improvement
- evaluate_agent(): Detailed evaluation with statistics
- compare_agents(): Compare multiple trained agents

Documentation (examples/README.md):
- Comprehensive guide for all algorithms
- Memory usage comparison table
- Training tips and troubleshooting
- Hyperparameter tuning guide
- Advanced usage examples

Key Features:
- All scripts support GPU/CPU training
- Automatic checkpointing every 10k steps
- TensorBoard logging
- Parallel environment support
- Resume training from saved models
- Detailed evaluation after training

Memory Efficiency:
- A2C: ~500MB-1GB (recommended for 4GB GPU)
- DQN: ~1-2GB (small replay buffer)
- PPO: ~1.5-2.5GB (small batches, fewer epochs)

All agents are ready to train on Julius with Caesar III data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments