Skip to content

Conversation

@kami922
Copy link
Contributor

@kami922 kami922 commented Jan 2, 2026

Summary

This PR refactors moduledata parsing for Go versions 1.16-1.24 to eliminate repetitive version-specific switch statements by using a data-driven approach with field offset tables.

Problem

The current codebase has ~430 lines of duplicate code for parsing moduledata across different Go versions. Each version requires nearly identical parsing logic, making the code difficult to maintain and extend.

Before this PR:

  • Go 1.16-1.24 moduledata parsing: ~430 lines of switch statements
  • Adding Go 1.25 support: ~100 lines of duplicate code
  • Maintenance burden: High (need to update all version cases consistently)

Solution

Implemented a generic field offset layout system that:

  1. Defines field positions in data tables (not code)
  2. Uses single generic parser for all versions
  3. Separates data (layouts) from logic (parsing)

After this PR:

  • Go 1.16-1.24 moduledata parsing: ~62 lines of generic code
  • Adding Go 1.25 support: ~8 lines of layout data
  • Maintenance burden: Low (changes isolated to data tables)

Code reduction: 85% (430 lines → 62 lines)

Changes

New Files

  1. objfile/layouts.go (394 lines)

    • FieldInfo and ModuleDataLayout structs for defining binary layouts
    • Field offset tables for Go versions 1.16, 1.18, 1.20, 1.21-1.24
    • parseModuleDataGeneric() - version-agnostic parsing function
    • validateAndConvertModuleData() - validation for Go 1.18+
    • validateAndConvertModuleData_116() - validation for Go 1.16-1.17
    • Helper functions: readPointer(), readSlice(), getFieldOffset()
  2. objfile/layouts_test.go (380 lines)

    • TestLayoutOffsets_Match_StructDefinitions - Verifies layout offsets match actual structs
    • TestParseModuleDataGeneric_BackwardCompatibility - Ensures identical output
    • TestVersionMapping - Tests version alias handling
    • TestReadPointer - Tests pointer reading with different endianness
    • TestReadSlice - Tests Go slice structure parsing
    • TestModuleDataIntermediate_FieldTypes - Verifies struct types

Modified Files

objfile/objfile.go

  • Replaced lines 289-715 (~430 lines of version switches)
  • Now uses generic parser with layout tables (~62 lines)
  • Net reduction: 368 lines

Testing

All tests passing

  1. Unit Tests

    • Layout offset verification for all supported versions
    • Backward compatibility tests ensure identical output
    • Endianness handling (little-endian and big-endian)
    • Version mapping (1.22/1.23/1.24 → 1.21 layout)
  2. Integration Testing

    • Tested against real Go 1.17.2 binary (testproject)
    • Output matches original implementation exactly
    • All existing objfile tests pass
  3. Test Coverage

    $ go test ./objfile -v
    === RUN   TestLayoutOffsets_Match_StructDefinitions
    === RUN   TestLayoutOffsets_Match_StructDefinitions/ModuleData121_64
    === RUN   TestLayoutOffsets_Match_StructDefinitions/ModuleData121_32
    === RUN   TestLayoutOffsets_Match_StructDefinitions/ModuleData120_64
    === RUN   TestLayoutOffsets_Match_StructDefinitions/ModuleData118_64
    === RUN   TestLayoutOffsets_Match_StructDefinitions/ModuleData116_64
    --- PASS: TestLayoutOffsets_Match_StructDefinitions (0.00s)
    ...
    PASS
    ok      github.com/mandiant/GoReSym/objfile     0.094s
    

Extensibility
Adding Go 1.25 support:

  • Before: Copy-paste ~100 lines, modify version numbers
  • After: Add 8 lines of field offsets to layout table

Future Work

This refactoring establishes a pattern that can be extended to:

  1. Older Go versions (1.2-1.15) - same approach
  2. Type parsing (lines 1180-1700 in objfile.go) - similar switches
  3. Other version-specific structures throughout the codebase

Estimated additional savings: ~500-600 more lines could be eliminated by applying this pattern to type parsing.

)

This commit refactors moduledata parsing for Go versions 1.16-1.24 to use
a data-driven approach with field offset tables instead of repetitive
version-specific switch statements.

Changes:
- Add objfile/layouts.go with generic field offset layout system
- Add objfile/layouts_test.go with comprehensive test coverage
- Refactor objfile/objfile.go to use generic parser for Go 1.16-1.24
- Reduce code from ~430 lines of switches to ~60 lines of generic code

Benefits:
- 85% code reduction for supported versions (430 → 62 lines)
- Adding new Go versions requires only ~8 lines of layout data
- Single parsing logic eliminates duplicate code
- Comprehensive test suite ensures correctness

Testing:
- All existing tests pass
- New tests verify layout offsets match struct definitions
- Backward compatibility tests ensure identical output
- Tested against real Go 1.17.2 binary (testproject)

Fixes mandiant#55
@kami922 kami922 force-pushed the refactor/moduledata-parsing branch from bb7034e to 47c5458 Compare January 2, 2026 13:55
@kami922
Copy link
Contributor Author

kami922 commented Jan 2, 2026

Hello I made a few mistakes while committing I had pushed commits to wrong branch earlier and then push wrong commits to this branch as well now i fixed it.

@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Jan 2, 2026

This is good work! It passes all the tests and looks reasonable to me. Before merging I'd like if it could be extended to cover the additional cases in the objfile ModuleDataTable routine and the type parsing as you suggest.

For testing you can run the build_test_files.sh script which will use docker to generate many test binaries for different go runtime versions. Once those are generated you can run go test which will run GoReSym main_test.go against all versions and architectures of the generated test files.

Extends the layout table approach to older Go versions (1.5-1.15),
eliminating 228 lines of repetitive switch statements.

Changes:
- Add layout definitions for Go 1.5-1.6, 1.7, and 1.8-1.15 to layouts.go
- Add validateAndConvertModuleData_Legacy() for Go 1.7-1.15
- Add validateAndConvertModuleData_Legacy_NoTypes() for Go 1.5-1.6
- Refactor objfile.go case "1.2" block (228 lines → 75 lines)
- Add comprehensive tests for legacy versions (17 new test cases)

Benefits:
- 67% code reduction in legacy version handling
- Consistent parsing approach across Go 1.5-1.24 (20 versions)
- Improved maintainability and extensibility
- Single source of truth for moduledata layouts

Testing:
- All existing unit tests pass
- Added TestLayoutOffsets_Legacy_Versions (6 subtests)
- Added TestVersionMapping_Legacy (11 subtests)
- Total: 17 new test cases validating legacy version support

Code Metrics:
- objfile/objfile.go: 1,783 → 1,628 lines (-155 lines)
- objfile/layouts.go: 436 → 564 lines (+128 lines infrastructure)
- objfile/layouts_test.go: 380 → 590 lines (+210 lines tests)
- Net: -155 lines of duplicate code eliminated

Addresses maintainer feedback on PR mandiant#78 (moduledata portion).
Type parsing refactoring (Phase 2) to follow in next commit.
Refactors type parsing (ParseType_impl) to use the layout table approach,
eliminating 158 lines of repetitive version-specific switch statements.

Changes:
- Add RtypeLayout system to layouts.go for Go 1.5-1.24
- Add parseRtypeGeneric() for generic type parsing
- Add getRtypeFieldOffset() helper function
- Refactor ParseType_impl() to use generic parser (208 lines → 50 lines)
- Add comprehensive tests for Rtype layouts (23 test cases)

Benefits:
- 76% code reduction in type parsing (208 lines → 50 lines)
- Consistent parsing approach for all Go 1.5-1.24 type structures
- Single source of truth for Rtype layouts
- Improved maintainability

Testing:
- All existing tests pass
- Added TestRtypeLayoutOffsets (3 subtests)
- Added TestRtypeVersionMapping (20 subtests)
- Total: 23 new test cases validating Rtype support

Code Metrics:
- objfile/objfile.go: 1,628 → 1,470 lines (-158 lines)
- objfile/layouts.go: 564 → 786 lines (+222 lines infrastructure)
- objfile/layouts_test.go: 590 → 748 lines (+158 lines tests)

Combined with Phase 1:
- Total code reduction: 313 lines from objfile.go
- ModuleData + Type parsing: ~650 lines → ~125 lines (81% reduction)

Completes maintainer feedback on PR mandiant#78 (type parsing portion).
@kami922
Copy link
Contributor Author

kami922 commented Jan 3, 2026

@stevemk14ebr Hello and good day to you. I have ran build_test_files.sh for generating binaries 1.15-1.22 and ran go test and it passed all tests in my terminal
image


// FieldInfo describes a single field's location and type in a binary structure
type FieldInfo struct {
Name string // Field name (e.g., "Text", "Types")
Copy link
Collaborator

@stevemk14ebr stevemk14ebr Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the name and type an enumeration instead of strings. This should be a bit faster and add more structure

@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Jan 4, 2026

The large switch on Interface runtime version in ParseType_impl could use for the same type of offset map to the methods list. We want to turn as many of these large version switches into offset maps as we can, most of the memory reading logic is already offset based rather than martialed for these cases.

case Interface:
		// type interfaceType struct {
		// 	rtype
		// 	pkgPath name      // import path (pointer)
		// 	methods []imethod // sorted by hash
		// }
		switch runtimeVersion {

…adata with enums; generic parsers updated; tests adjusted; add InterfaceLayout scaffolding
@stevemk14ebr
Copy link
Collaborator

This looks great to me, thanks for cleaning this up! It passes all my unit tests as well as my manual testing.

@stevemk14ebr stevemk14ebr merged commit a3ebe90 into mandiant:master Jan 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants