Skip to content

Conversation

@vaderkos
Copy link
Contributor

@vaderkos vaderkos commented Dec 11, 2025

This PR introduces Gi, compact, immutable, read-only in-memory representation of GameInfo with identical API.

Why:

The built-in GameInfo eagerly materializes Lua tables per row, leading to high memory usage, GC pressure, and slow iteration on large tables. Since the underlying database is static, a more aggressive and optimized representation is both safe and practical.

What improved:

  • ~8x lower static memory usage across all GameInfo tables
  • Order-of-magnitude faster iteration and filtering on large tables
  • O(1) indexed access by primary key (numeric or string)
  • Strict schema validation to surface silent bugs early
  • More details below 👇

How:

Each table is assembled once by analyzing column statistics and selecting the cheapest possible storage strategy per column (constants, bit-packed booleans, packed integer ranges, dictionaries, or sparse maps).
All row data is encoded into flat arrays, primary-key indices are built upfront, and rows are exposed as immutable views that decode values lazily on access.

Future direction:

Gi is intended to progressively replace the built-in GameInfo where possible,
serving as a safer and significantly more performant alternative with compatible access patterns.

Low-level design notes (for reviewers)

Gi internal design (CLICK TO EXPAND)

Assumptions:

  • Database is static (no schema or row mutations after load)
  • Rows are immutable
  • Schema is stable

Assembly pipeline:

  • Fetch schema and rows from SQLite
  • Analyze column statistics (cardinality, dominance, ranges)
  • Select optimal encoding per column
  • Build primary key indices
  • Encode all rows into flat arrays
  • Expose a read-only table and row API

Column encodings:

  • Each column is encoded independently using the cheapest viable representation:
    • CONST – single value for entire column
    • BOOL – 1 bit per row
    • PINT – packed integer range
    • DICT – dictionary-encoded values, bit-packed
    • SPARSE – dominant default + sparse overrides
    • SCALAR – one Lua slot per row (fallback)

Memory optimizations:

  • Column-wise storage instead of per-row tables
  • Bit-packing for booleans and small integers
  • Dictionary interning shared across all tables
  • Lazy row materialization with weak caching
  • NIL sentinel to allow uniform encoding and analysis

Access model:

  • Rows are addressed by base offsets into flat storage
  • Primary key lookup is O(1)
  • Iteration avoids allocations
  • Row values are decoded on demand
  • Strict accessors throw on schema violations

Condensed benchmark results

Scenario GameInfo (OLD) Gi (NEW) NEW vs OLD Notes
Total memory 🔴 ~25 MB 🟢 ~3 MB 🟢 ~8x less Steady-state memory after loading all tables
Policies() 🔴 ~1031 µs/call ~734 B retained/call 🟢 ~8.6 µs/call ~0.8 B retained/call 🟢 ~120x faster
Policies('PortraitIndex > 50') 🔴 ~549 µs/call 🟢 ~97 µs/call 🟢 ~5.7x faster
Policies({ CultureCost = 10 }) 🔴 ~960 µs/call 🟢 ~345 µs/call 🟢 ~2.8x faster Structured filters benefit from new iterator model
Column access 🟢 ~0.4 µs 🔴 ~13.6 µs 🔴 ~34x slower ABI validation/decoding + safety checks + depends on the data
Column existence check 🔴 N/A 🟢 ~1 µs 🟢 NEW only Explicit, safe column presence validation
Net impact (typical usage) 🔴 High GC + CPU 🟢 Low GC + CPU 🟢 Strong win Query-heavy paths dominate overall execution time, so the large gains in row iteration and query performance outweigh the slower per-column access, making its impact negligible in real workloads.

Test methodology summary:

CLICK TO EXPAND
  1. Benchmarks run with GC fully stabilized (triple collectgarbage) before and after each measurement.
  2. Benchmarks were run on the Policies table because it is one of the largest and used tables.
  3. Each benchmark executes 5,000 iterations per scenario; totals are measured once and per-call metrics derived from aggregates.
  4. Timing measured via os.clock(); memory measured via collectgarbage("count")
  5. Source code used for benchmarking available here
  6. Memory split into:
  • Allocated: heap growth before GC (approximate allocation pressure)
  • Retained: heap delta after forced GC (leak / long-lived memory)
  1. GameInfo vs Gi correctness validated by row-by-row value comparison across all tables.
  2. Retained memory in Gi implementation verified by re-running benchmarks to exclude false positives (memory leaks).

Benchmark results / actual numbers

CLICK TO EXPAND
Implementation Time/ms (Total) Time/µs (Per call) Retained/MB (Total) Retained/bytes (Per call) Allocated/MB (Total) Allocated/bytes (Per call) Note
OLD column access 2.000 0.400 0.0003 0.053 0.0003 0.061
NEW column access 68.000 13.600 0 0 0 0.010
NEW column existence check 5.000 1.000 0 0 0 0.010
OLD.Policies() 5156.000 1031.200 3.4983 733.651 4.3185 905.651
NEW.Policies() 43.000 8.600 0.0039 0.819 0.7910 165.885 Tested. Retained memory in that case is not a memory leak.
OLD.Policies('PortraitIndex > 50') 2745.000 549.000 0 0 0.8202 172.000 Allocated memory could be significantly higher in practice, since the SQL execution is handled inside the engine rather than the Lua runtime, making this metric unrepresentative of the actual allocated memory consumption in this case.
NEW.Policies('PortraitIndex > 50') 483.000 96.600 0 0 6.1670 1293.303
NEW.Policies('PortraitIndex > ?', 50) 485.000 97.000 0 0 6.5398 1371.505
OLD.Policies({ CultureCost = 10 }) 4801.000 960.200 0 0 1.1254 236.011
NEW.Policies({ CultureCost = 10 }) 1727.000 345.400 0 0 1.4538 304.893

🛠️ Fixes and improvements

CLICK TO EXPAND
  • 🛠️ Direct access for single primary-key tables

    • If a table has exactly one primary key and it is a string or number, Gi allows direct indexed access.
    • Example:
      Gi.Defines["START_YEAR"]
    • This was not possible with the built-in GameInfo.
  • 🛠️ Boolean values supported in table-based filters

    • Boolean values are now accepted and mapped explicitly:
      Gi.CitySpecializations({ MustBeCoastal = true })     -- MustBeCoastal = 1
      Gi.CitySpecializations({ MustBeCoastal = false })    -- MustBeCoastal IS NULL OR 0
    • The legacy filtering syntax still works, but is now faster.
  • 🛠️ Parameterized SQL filters supported

    • SQL filters can now use parameters:
      Gi.Technology_Flavors("FlavorType = ?", "FLAVOR_GROWTH")
      Gi.CitySpecializations("MustBeCoastal = ?", false)   -- MustBeCoastal = 0
    • NOTE: Parameterized boolean filters do not match NULL values.
    • The legacy SQL filter syntax remains supported and is now faster.

⚠️ Breaking changes

CLICK TO EXPAND

These changes are intentional and designed to surface bugs that were previously silent
(e.g. typos, invalid indexes, or nonexistent schema elements).

  • ⚠️ Invalid indexes now throw
    GameInfo.Defines[0] -- returns nil
    Gi.Defines[0]       -- throws
  • ⚠️ Accessing nonexistent columns throws
    Gi.Specialists[0].Nonexistent   -- throws
    NOTE: No error is thrown if the column exists but the row value is nil.
  • ⚠️ Accessing nonexistent tables throws
    Gi.TableThatDoesNotExist -- throws
  • ⚠️ Filtering by nonexistent columns throws
    Gi.Defines({ Unknown = 10 }) -- throws

💡 New features

CLICK TO EXPAND
  • 💡 Row call without arguments returns a detached row copy
    • Calling a row with no arguments produces a fully populated, modifiable table:
      local row = Gi.Specialists[0]()
      The returned table is:
      • Fully detached from internal state
      • A simple hash map: column_name -> value
      • Safe to modify and reuse for any purpose
  • 💡 Row call with arguments performs column existence checks
    • Calling a row with column names returns booleans indicating whether each column is
      defined in the table schema, not whether the value in the specific row is non-nil:
      Gi.Specialists[0]("nonexistent", "CulturePerTurn", "ID")
      -- false, true, true
    • Existence vs value example:
      -- Column exists, but value in this row is nil
      Gi.Specialists[0]("Civilopedia")
      -- true
      
      -- Column does not exist in schema
      Gi.Specialists[0]("DefinitelyNotAColumn")
      -- false
  • 💡 Calling Gi with arguments returns existing tables
    • Multiple tables can be retrieved safely in one call:
      Units, Policies, Nonexistent = Gi("Units", "Policies", "Nonexistent")
      -- table, table, nil

P.S.
Special thanks to @schnetziomi5, @azum4roll, @axatin

@vaderkos vaderkos force-pushed the gi branch 2 times, most recently from 6eece6a to 4944ea2 Compare December 12, 2025 17:10
@vaderkos vaderkos force-pushed the gi branch 2 times, most recently from f0d940d to 64b7a7e Compare January 4, 2026 12:52
@vaderkos vaderkos changed the title Add CPK.DB.Gi - GameInfo implementation Add CPK.DB.Gi - GameInfo fast and memory-optimized subtitute Jan 4, 2026
@vaderkos vaderkos marked this pull request as ready for review January 4, 2026 22:54
@azum4roll
Copy link
Collaborator

  • ⚠️ Invalid indexes now throw
    lua GameInfo.Defines[0] -- returns nil Gi.Defines[0] -- throws

I didn't expect this one to throw.

@vaderkos
Copy link
Contributor Author

vaderkos commented Jan 6, 2026

  • ⚠️ Invalid indexes now throw
    lua GameInfo.Defines[0] -- returns nil Gi.Defines[0] -- throws

I didn't expect this one to throw.

@azum4roll Defines table doesn't have number primary key but only string primary key, so specifying 0 is invalid.

@azum4roll
Copy link
Collaborator

So only when the table is missing the ID column, not when you try to get a nil row.

@vaderkos
Copy link
Contributor Author

vaderkos commented Jan 7, 2026

So only when the table is missing the ID column, not when you try to get a nil row.

@azum4roll Exactly, but not only for the ID column but any string or number primary column, it doesn't have to be named ID.

For example
Defines table has primary string column Name, so it allows Defines[string] but throws on Defines[any other not string]

The logic goes like this.

  • Only one numeric primary column allows Table[integer] prohibits anything else.
  • Only one numeric primary column and string column Type allows Table[integer] and Table[string] prohibits anything else.
  • Only one string primary column allows Table[string] prohibits anything else.
  • No primary column or multiple primary columns prohibits Table[anything], you have to use Table(filter?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants