Skip to content

oranguthang/alien_soldier_src

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alien Soldier (J) Disassembly

Alien Soldier (J) source code disassembly project. The source assembles with the AS Macro Assembler.

AS Assembler: http://john.ccac.rwth-aachen.de:8000/as/index.html

Inspired by: https://github.com/lab313ru/quackshot_src/

Quick Start

git clone <repo>
cd alien_soldier_src

# Place original ROM in project root:
#   Alien Soldier (J) [!].bin

make init           # Extract data, build ROM, create reference
make compare        # Verify build matches reference

Project Structure

alien_soldier_src/
├── bin/                           # Assembler tools
│   ├── asw.exe                    # AS Macro Assembler 1.42 Beta
│   ├── p2bin.exe                  # Object to binary converter
│   └── *.msg                      # Assembler message catalogs
├── data/                          # Binary data (generated by make init)
│   ├── artcomp/                   # Compressed tiles, LZSS (~122 files)
│   ├── artunc/                    # Uncompressed tiles and sprites (~289 files)
│   ├── mappings/                  # Tile mappings (~72 files)
│   ├── other/                     # Palettes, tables, misc data (~67 files)
│   ├── sound/                     # PCM samples and sound data (~29 files)
│   └── data_addrs.txt             # Addresses and categories for ROM extraction
├── scripts/                       # Python tooling (see Scripts section)
├── src/                           # Include files
│   ├── equals.inc                 # Constants and equates
│   ├── macros.inc                 # Assembler macros
│   ├── ports.inc                  # Hardware I/O port definitions
│   └── ram_addrs.inc              # RAM address definitions (~1014 lines)
├── logs/                          # Trace and analysis output
├── movies/                        # TAS movie files for automated testing
│   ├── dammit,truncated-aliensoldier.gmv  # TAS speedrun
│   ├── alien_soldier_j_longplay.gmv       # Full playthrough
│   └── alien_soldier_j_menus.gmv          # Menu navigation
├── workflow/                      # Documentation workflow state
├── gens_automation/               # Modified Gens emulator (auto-cloned)
├── alien_soldier_j.s              # Main disassembled source (~120k lines)
├── Makefile                       # Build and automation system
└── README.md

Scripts

Script Purpose
analyze_procedures.py Automated procedure analysis with emulator
bintrace_parser.py Parses binary traces, generates Graphviz/story logs
build_rom.py Orchestrates ROM assembly (AS → p2bin)
clean_project.py Cross-platform cleanup of build artifacts
compare_roms.py Binary comparison of built vs original ROM
compare_states.py Compares emulator state dumps (RAM, VRAM, registers)
compare_traces.py Compares two CPU traces for divergence
debug_pointers.py Binary search for pointer issues (24 parallel workers)
extract_data_addrs.py Parses listing to extract binclude addresses → data_addrs.txt
extract_symbols.py Extracts ~7000 symbols from AS listing file
find_unnamed_procedures.py Lists procedures still named sub_*, loc_*
find_unreferenced_labels.py Finds labels with no references (dead code)
generate_analysis_report.py Generates HTML report from analysis data
init_project.py Full project initialization (split → build → reference)
prepare_batch.py Prepares batch of procedures for documentation
rename_procedures.py Applies rename CSV to source file
report_pointers.py Generates report from pointer debugging session
split_data_from_listing.py Extracts data sections from AS listing
split_data_from_rom.py Extracts and decompresses tile data from ROM
unpack_data.py Decompresses LZSS data from artcomp/ to uncompressed/
validate_movie_descriptions.py Validates TAS movie file integrity

Makefile Workflows

Build Workflow

make init               # Initialize project (requires original ROM)
make build              # Assemble source → asbuilt.bin (2MB)
make compare            # Compare asbuilt.bin with reference ROM
make split              # Re-extract binary data from original ROM
make clean              # Remove all build artifacts
make symbols            # Extract symbols from listing file
make build-gens         # Build modified Gens emulator (requires VS2022)

Documentation Workflow

Semi-automated procedure naming with Claude AI assistance:

# 1. Set movie type for the session
make set-movie MOVIE=tas

# 2. Prepare batch of unnamed procedures
make prepare-batch COUNT=40
# → Creates workflow/batch_procedures.txt with context

# 3. [Claude reads batch, creates workflow/rename_batch.csv]

# 4. Apply renames to source
make rename

# 5. Verify changes
make build && make compare

# 6. Commit changes
git commit -am "Document N procedures"

Analysis Workflow

Automated procedure analysis using emulator screenshots:

# 1. Generate reference screenshots (required once)
make reference MOVIE=tas

# 2. Find procedures that need analysis
make find-unanalyzed

# 3. Run automated analysis
make analyze MOVIE=tas

# 4. Generate HTML report
make report MOVIE=tas

Debugging Workflow (Visual)

Find which code change breaks the game:

# 1. Generate reference screenshots
make reference MOVIE=tas

# 2. Modify ROM and rebuild
make build

# 3. Run visual comparison (collects 10 differences)
make debug MOVIE=tas

Debugging Workflow (Pointer Issues)

Binary search for problematic ROM regions:

# 1. Generate reference (required)
make reference MOVIE=tas

# 2. Run pointer debugger (24 parallel workers)
make debug-pointers MOVIE=tas START=1BD000 END=100000
# → Tests by inserting padding at different addresses
# → Works backwards from END to minimize displacement
# → Stops when first visual difference found
# → Saves state dumps and screenshots for analysis

CPU Tracing Workflow

Detailed execution analysis with binary traces:

# 1. Capture trace for frame range
make trace-frames MOVIE=tas START=10500 END=10600
# → Creates logs/trace_10500_10600.btrc (~8MB for 100 frames)

# 2. Generate human-readable story log
make trace-story LOG=logs/trace_10500_10600.btrc
# → Shows: [Function+$offset] Read/Write/DMA operations

# 3. Generate Graphviz diagrams
make trace-graph LOG=logs/trace_10500_10600.btrc
# → Creates 3 DOT files + PNG: pointers, dma, callers

# 4. View statistics
make trace-stats LOG=logs/trace_10500_10600.btrc

Utility Commands

make help               # Show all available targets
make show-movie         # Display current movie setting
make stop               # Kill all running Gens emulator instances

Build System

Build Process

  1. scripts/build_rom.py - Python build script that orchestrates the build
  2. asw.exe - Macro Assembler (AS 1.42 Beta) assembles alien_soldier_j.s → alien_soldier_j.p
  3. p2bin.exe - Converts .p object file → asbuilt.bin (2 MB)

Build Status

  • ✅ Build succeeds with byte-accurate ROM output (no warnings)
  • ✅ Checksum validation disabled in source (original ROM had incorrect checksum)
  • 📁 alien_soldier_j.bin is the reference ROM with these fixes applied

Binary Trace System

A custom binary tracing system was added to the Gens-automation emulator for debugging ROM issues.

Trace Features

  • Compact binary format: ~20 bytes per event vs ~250 bytes text
  • Memory aggregation: Sequential reads/writes merged into blocks
  • DMA tracking: Captures all DMA transfers with source/destination
  • Pointer detection: Heuristic flagging of pointer table loads
  • Symbol support: Shows function names like Reset+$4 instead of $000204
  • Graphviz output: Visual diagrams of data flow and pointer tables

Trace Event Types

Type Description
FRAME Frame boundary marker
READ/WRITE Memory access (1/2/4 bytes)
READ_BLOCK/WRITE_BLOCK Aggregated sequential accesses
VRAM_W/VRAM_R Video RAM access
CRAM_W Color RAM (palette) access
VSRAM_W Vertical scroll RAM access
DMA DMA transfer (source, dest, length, type)

Output Formats

  • Story log: Human-readable event log with symbols
  • Pointers graph: Shows pointer table → target relationships
  • DMA graph: Shows ROM/RAM → VRAM data flow
  • Callers graph: Shows which functions trigger DMA transfers

Code Documentation Progress

Documented Functions (~200+ labels renamed)

  • System: Sys_VBlankHandler, Sys_MainGameLoop, Sys_InitGameMode
  • Graphics: Gfx_VBlankDMATransfer, Gfx_FadePaletteTransition, Gfx_InitVideoMode
  • Sprites: Sprite_ProcessDMAQueue, Sprite_PrepareOAM, Sprite_CalculatePosition
  • Animation: Anim_UpdateFrame, Anim_AdvanceFrameOffset
  • Input: Input_ReadController, Input_ProcessButtons
  • Sound: Sound_UpdateDriver, Sound_WriteYM2612, Sound_KeyOffAllChannels
  • Bosses: Boss_* prefixed labels for boss state machines
  • Enemies: Enemy_* prefixed labels
  • UI: UI_InitializeSEGAScreen, UI_HandlePasswordInput
  • Stages: Stage_* prefixed labels for level-specific code

RAM Addresses Documented

  • ~1014 RAM addresses defined in src/ram_addrs.inc
  • Includes sprite tables, DMA queues, game state, player data

Known Issues

$E8000-$121932 Graphics Corruption

Problem: Adding padding after org $E8000 causes graphics corruption around frame 10500 in TAS playback.

Root Cause Analysis (via binary tracing):

  1. Pointer Storage in RAM: The game stores ROM pointers in entity structures at runtime

    • Example: dword_FFA408 stores pointer to sprite animation data
    • Code writes move.l #word_E86AA,8(a5) during entity initialization
  2. Data Shift Effect: When padding is added after org $E8000:

    • All labels after the padding shift by the padding size (e.g., 32 bytes)
    • Assembler correctly updates all code references to use new addresses
    • BUT: Pointers already stored in RAM from earlier frames are now stale
  3. Trace Evidence (frame 10500):

    Broken ROM: $FFA408 = $000E86CA  (shifted address)
    Working ROM: $FFA408 = $000E86AA  (original address)
    Difference: 0x20 = 32 bytes (padding size)
    
  4. DMA Transfer Mismatch:

    Broken:  DMA $0F7EE0 -> VRAM (shifted source)
    Working: DMA $0F7EC0 -> VRAM (correct source)
    

Why This Happens:

  • Entity pointers are written to RAM during level/boss initialization (before frame 10500)
  • The TAS movie was recorded with the original ROM layout
  • When ROM changes, game state diverges, but old pointers remain in RAM
  • Result: DMA reads from wrong ROM addresses → corrupted graphics

Potential Solutions (not implemented):

  1. Find all pointer table initializations and verify they use labels (not hardcoded)
  2. Check for any tables in ROM that contain absolute addresses to data after $E8000
  3. Ensure no code caches ROM addresses in RAM across level transitions
  4. The issue may be inherent to how the game's entity system works

Tile Compression

The game uses a custom LZSS-based compression algorithm for tile graphics.

Decompression Algorithm

Format:

  • Header: 16-bit decompressed size
  • Data: Variable length compressed stream

Compression modes (determined by control byte):

Bit 7 = 1: LZSS backreference
  - Bits 5-2: Length (0-31) + 1
  - Bits 1-0 + next byte: Window offset (0-1023) + 1

Bit 7 = 0:
  Bit 6 = 1, Bit 5 = 1: RLE pairs (alternating)
  Bit 6 = 1, Bit 5 = 0: RLE pairs (2-byte pattern)
  Bit 6 = 0, Bit 5 = 1: RLE single byte
  Bit 6 = 0, Bit 5 = 0: Literal data

Tile Data

  • Located in data/ directory
  • ~120+ compressed tile sets extracted
  • File naming: tiles_<ROM_ADDR>.bin

Sprite Mapping Data Structure

The game uses a complex sprite mapping system found around $E8000-$E9000 in ROM.

Structure Format (per sprite frame)

dc.w $8XX               ; Sprite count + flags (bit 15 = end marker)
dc.l <longword_data>    ; 32-bit data (tile + attributes)
dc.w <Y_offset>         ; Y position offset
dc.w $8XX               ; Next sprite data...

The 32-bit Longword Format

IDA disassembled these as: dc.l byte_FXXXX+$Y000000 where:

  • byte_FXXXX - Points to tile data (address ~$F0000-$F5000)
  • +$Y000000 - Large offset encoding attributes

These values are NOT ROM addresses but VDP tile indices with extended attributes (palette, priority, flip flags).

Project Status

Completed

  • Byte-accurate ROM assembly
  • Basic code documentation (~200+ functions labeled)
  • RAM address definitions (~1014 addresses)
  • Tile decompression algorithm documented
  • Binary trace system for debugging
  • Symbol extraction from assembler output
  • Documentation workflow with batch processing
  • Visual debugging with parallel workers
  • Pointer issue debugging tools

Future Work

  • Fix $E8000-$121932 pointer caching issue
  • Create tile compressor (reverse of decompressor)
  • Complete entity system documentation
  • Document all jump tables and pointer tables
  • Separate data into structured include files
  • Full gameplay testing beyond TAS verification

Credits

License

This is a work of reverse engineering for educational and preservation purposes. The original game is copyright Treasure and Sega.

About

Alien Soldier (J) source code (assemble with AS)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •