Alien Soldier (J) source code disassembly project. The source assembles with the AS Macro Assembler.
AS Assembler: http://john.ccac.rwth-aachen.de:8000/as/index.html
Inspired by: https://github.com/lab313ru/quackshot_src/
git clone <repo>
cd alien_soldier_src
# Place original ROM in project root:
# Alien Soldier (J) [!].bin
make init # Extract data, build ROM, create reference
make compare # Verify build matches referencealien_soldier_src/
├── bin/ # Assembler tools
│ ├── asw.exe # AS Macro Assembler 1.42 Beta
│ ├── p2bin.exe # Object to binary converter
│ └── *.msg # Assembler message catalogs
├── data/ # Binary data (generated by make init)
│ ├── artcomp/ # Compressed tiles, LZSS (~122 files)
│ ├── artunc/ # Uncompressed tiles and sprites (~289 files)
│ ├── mappings/ # Tile mappings (~72 files)
│ ├── other/ # Palettes, tables, misc data (~67 files)
│ ├── sound/ # PCM samples and sound data (~29 files)
│ └── data_addrs.txt # Addresses and categories for ROM extraction
├── scripts/ # Python tooling (see Scripts section)
├── src/ # Include files
│ ├── equals.inc # Constants and equates
│ ├── macros.inc # Assembler macros
│ ├── ports.inc # Hardware I/O port definitions
│ └── ram_addrs.inc # RAM address definitions (~1014 lines)
├── logs/ # Trace and analysis output
├── movies/ # TAS movie files for automated testing
│ ├── dammit,truncated-aliensoldier.gmv # TAS speedrun
│ ├── alien_soldier_j_longplay.gmv # Full playthrough
│ └── alien_soldier_j_menus.gmv # Menu navigation
├── workflow/ # Documentation workflow state
├── gens_automation/ # Modified Gens emulator (auto-cloned)
├── alien_soldier_j.s # Main disassembled source (~120k lines)
├── Makefile # Build and automation system
└── README.md
| Script | Purpose |
|---|---|
analyze_procedures.py |
Automated procedure analysis with emulator |
bintrace_parser.py |
Parses binary traces, generates Graphviz/story logs |
build_rom.py |
Orchestrates ROM assembly (AS → p2bin) |
clean_project.py |
Cross-platform cleanup of build artifacts |
compare_roms.py |
Binary comparison of built vs original ROM |
compare_states.py |
Compares emulator state dumps (RAM, VRAM, registers) |
compare_traces.py |
Compares two CPU traces for divergence |
debug_pointers.py |
Binary search for pointer issues (24 parallel workers) |
extract_data_addrs.py |
Parses listing to extract binclude addresses → data_addrs.txt |
extract_symbols.py |
Extracts ~7000 symbols from AS listing file |
find_unnamed_procedures.py |
Lists procedures still named sub_*, loc_* |
find_unreferenced_labels.py |
Finds labels with no references (dead code) |
generate_analysis_report.py |
Generates HTML report from analysis data |
init_project.py |
Full project initialization (split → build → reference) |
prepare_batch.py |
Prepares batch of procedures for documentation |
rename_procedures.py |
Applies rename CSV to source file |
report_pointers.py |
Generates report from pointer debugging session |
split_data_from_listing.py |
Extracts data sections from AS listing |
split_data_from_rom.py |
Extracts and decompresses tile data from ROM |
unpack_data.py |
Decompresses LZSS data from artcomp/ to uncompressed/ |
validate_movie_descriptions.py |
Validates TAS movie file integrity |
make init # Initialize project (requires original ROM)
make build # Assemble source → asbuilt.bin (2MB)
make compare # Compare asbuilt.bin with reference ROM
make split # Re-extract binary data from original ROM
make clean # Remove all build artifacts
make symbols # Extract symbols from listing file
make build-gens # Build modified Gens emulator (requires VS2022)Semi-automated procedure naming with Claude AI assistance:
# 1. Set movie type for the session
make set-movie MOVIE=tas
# 2. Prepare batch of unnamed procedures
make prepare-batch COUNT=40
# → Creates workflow/batch_procedures.txt with context
# 3. [Claude reads batch, creates workflow/rename_batch.csv]
# 4. Apply renames to source
make rename
# 5. Verify changes
make build && make compare
# 6. Commit changes
git commit -am "Document N procedures"Automated procedure analysis using emulator screenshots:
# 1. Generate reference screenshots (required once)
make reference MOVIE=tas
# 2. Find procedures that need analysis
make find-unanalyzed
# 3. Run automated analysis
make analyze MOVIE=tas
# 4. Generate HTML report
make report MOVIE=tasFind which code change breaks the game:
# 1. Generate reference screenshots
make reference MOVIE=tas
# 2. Modify ROM and rebuild
make build
# 3. Run visual comparison (collects 10 differences)
make debug MOVIE=tasBinary search for problematic ROM regions:
# 1. Generate reference (required)
make reference MOVIE=tas
# 2. Run pointer debugger (24 parallel workers)
make debug-pointers MOVIE=tas START=1BD000 END=100000
# → Tests by inserting padding at different addresses
# → Works backwards from END to minimize displacement
# → Stops when first visual difference found
# → Saves state dumps and screenshots for analysisDetailed execution analysis with binary traces:
# 1. Capture trace for frame range
make trace-frames MOVIE=tas START=10500 END=10600
# → Creates logs/trace_10500_10600.btrc (~8MB for 100 frames)
# 2. Generate human-readable story log
make trace-story LOG=logs/trace_10500_10600.btrc
# → Shows: [Function+$offset] Read/Write/DMA operations
# 3. Generate Graphviz diagrams
make trace-graph LOG=logs/trace_10500_10600.btrc
# → Creates 3 DOT files + PNG: pointers, dma, callers
# 4. View statistics
make trace-stats LOG=logs/trace_10500_10600.btrcmake help # Show all available targets
make show-movie # Display current movie setting
make stop # Kill all running Gens emulator instances- scripts/build_rom.py - Python build script that orchestrates the build
- asw.exe - Macro Assembler (AS 1.42 Beta) assembles alien_soldier_j.s → alien_soldier_j.p
- p2bin.exe - Converts .p object file → asbuilt.bin (2 MB)
- ✅ Build succeeds with byte-accurate ROM output (no warnings)
- ✅ Checksum validation disabled in source (original ROM had incorrect checksum)
- 📁
alien_soldier_j.binis the reference ROM with these fixes applied
A custom binary tracing system was added to the Gens-automation emulator for debugging ROM issues.
- Compact binary format: ~20 bytes per event vs ~250 bytes text
- Memory aggregation: Sequential reads/writes merged into blocks
- DMA tracking: Captures all DMA transfers with source/destination
- Pointer detection: Heuristic flagging of pointer table loads
- Symbol support: Shows function names like
Reset+$4instead of$000204 - Graphviz output: Visual diagrams of data flow and pointer tables
| Type | Description |
|---|---|
| FRAME | Frame boundary marker |
| READ/WRITE | Memory access (1/2/4 bytes) |
| READ_BLOCK/WRITE_BLOCK | Aggregated sequential accesses |
| VRAM_W/VRAM_R | Video RAM access |
| CRAM_W | Color RAM (palette) access |
| VSRAM_W | Vertical scroll RAM access |
| DMA | DMA transfer (source, dest, length, type) |
- Story log: Human-readable event log with symbols
- Pointers graph: Shows pointer table → target relationships
- DMA graph: Shows ROM/RAM → VRAM data flow
- Callers graph: Shows which functions trigger DMA transfers
- System:
Sys_VBlankHandler,Sys_MainGameLoop,Sys_InitGameMode - Graphics:
Gfx_VBlankDMATransfer,Gfx_FadePaletteTransition,Gfx_InitVideoMode - Sprites:
Sprite_ProcessDMAQueue,Sprite_PrepareOAM,Sprite_CalculatePosition - Animation:
Anim_UpdateFrame,Anim_AdvanceFrameOffset - Input:
Input_ReadController,Input_ProcessButtons - Sound:
Sound_UpdateDriver,Sound_WriteYM2612,Sound_KeyOffAllChannels - Bosses:
Boss_*prefixed labels for boss state machines - Enemies:
Enemy_*prefixed labels - UI:
UI_InitializeSEGAScreen,UI_HandlePasswordInput - Stages:
Stage_*prefixed labels for level-specific code
- ~1014 RAM addresses defined in
src/ram_addrs.inc - Includes sprite tables, DMA queues, game state, player data
Problem: Adding padding after org $E8000 causes graphics corruption around frame 10500 in TAS playback.
Root Cause Analysis (via binary tracing):
-
Pointer Storage in RAM: The game stores ROM pointers in entity structures at runtime
- Example:
dword_FFA408stores pointer to sprite animation data - Code writes
move.l #word_E86AA,8(a5)during entity initialization
- Example:
-
Data Shift Effect: When padding is added after
org $E8000:- All labels after the padding shift by the padding size (e.g., 32 bytes)
- Assembler correctly updates all code references to use new addresses
- BUT: Pointers already stored in RAM from earlier frames are now stale
-
Trace Evidence (frame 10500):
Broken ROM: $FFA408 = $000E86CA (shifted address) Working ROM: $FFA408 = $000E86AA (original address) Difference: 0x20 = 32 bytes (padding size) -
DMA Transfer Mismatch:
Broken: DMA $0F7EE0 -> VRAM (shifted source) Working: DMA $0F7EC0 -> VRAM (correct source)
Why This Happens:
- Entity pointers are written to RAM during level/boss initialization (before frame 10500)
- The TAS movie was recorded with the original ROM layout
- When ROM changes, game state diverges, but old pointers remain in RAM
- Result: DMA reads from wrong ROM addresses → corrupted graphics
Potential Solutions (not implemented):
- Find all pointer table initializations and verify they use labels (not hardcoded)
- Check for any tables in ROM that contain absolute addresses to data after $E8000
- Ensure no code caches ROM addresses in RAM across level transitions
- The issue may be inherent to how the game's entity system works
The game uses a custom LZSS-based compression algorithm for tile graphics.
Format:
- Header: 16-bit decompressed size
- Data: Variable length compressed stream
Compression modes (determined by control byte):
Bit 7 = 1: LZSS backreference
- Bits 5-2: Length (0-31) + 1
- Bits 1-0 + next byte: Window offset (0-1023) + 1
Bit 7 = 0:
Bit 6 = 1, Bit 5 = 1: RLE pairs (alternating)
Bit 6 = 1, Bit 5 = 0: RLE pairs (2-byte pattern)
Bit 6 = 0, Bit 5 = 1: RLE single byte
Bit 6 = 0, Bit 5 = 0: Literal data
- Located in
data/directory - ~120+ compressed tile sets extracted
- File naming:
tiles_<ROM_ADDR>.bin
The game uses a complex sprite mapping system found around $E8000-$E9000 in ROM.
dc.w $8XX ; Sprite count + flags (bit 15 = end marker)
dc.l <longword_data> ; 32-bit data (tile + attributes)
dc.w <Y_offset> ; Y position offset
dc.w $8XX ; Next sprite data...IDA disassembled these as: dc.l byte_FXXXX+$Y000000 where:
byte_FXXXX- Points to tile data (address ~$F0000-$F5000)+$Y000000- Large offset encoding attributes
These values are NOT ROM addresses but VDP tile indices with extended attributes (palette, priority, flip flags).
- Byte-accurate ROM assembly
- Basic code documentation (~200+ functions labeled)
- RAM address definitions (~1014 addresses)
- Tile decompression algorithm documented
- Binary trace system for debugging
- Symbol extraction from assembler output
- Documentation workflow with batch processing
- Visual debugging with parallel workers
- Pointer issue debugging tools
- Fix $E8000-$121932 pointer caching issue
- Create tile compressor (reverse of decompressor)
- Complete entity system documentation
- Document all jump tables and pointer tables
- Separate data into structured include files
- Full gameplay testing beyond TAS verification
- Disassembly work using IDA Pro
- AS Macro Assembler by Alfred Arnold
- Gens-automation emulator (https://github.com/oranguthang/gens_automation)
- Inspired by lab313ru's Quackshot disassembly
- Original game by Treasure (1995)
This is a work of reverse engineering for educational and preservation purposes. The original game is copyright Treasure and Sega.