Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ dev = ["pyrefly>=0.25.0", "pytest>=8.3.4", "ruff>=0.11.8"]
[tool.uv.workspace]
members = [
"src/lab/azure-document-intelligence-lab",
"src/private/app/factorio-cycle-calculator",
"src/private/app/git-commit-heatmap",
"src/private/app/html-sm-processor",
"src/private/app/llm-text-splitter",
Expand All @@ -33,6 +34,7 @@ nbgv-python = { workspace = true }
[tool.pyrefly]
project-includes = [
"src/lab/azure-document-intelligence-lab",
"src/private/app/factorio-cycle-calculator",
"src/private/app/git-commit-heatmap",
"src/private/app/html-sm-processor",
"src/private/app/llm-text-splitter",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Factorio data-raw-dump.json analysis

Date: 2026-02-11

## Scope and sources

- Analyzed file: `/mnt/c/Users/zhang/AppData/Roaming/Factorio/script-output/data-raw-dump.json`
- File size: ~27.8 MB
- Reference docs:
- Factorio data.raw schema project (README): https://github.com/jacquev6/factorio-data-raw-json-schema
- Factorio Lua API (Data::raw / AnyPrototype / Data lifecycle): https://lua-api.factorio.com/latest/types/Data.html#raw
- Factorio modding overview and data.raw listing: https://wiki.factorio.com/Modding
- data.raw listing of built-in prototypes: https://wiki.factorio.com/Data.raw

Note: The raw full JSON schema file is available at the URL below, but the fetch attempt could not extract content in this environment (likely due to size). Use it if you want schema validation.

- https://raw.githubusercontent.com/jacquev6/factorio-data-raw-json-schema/refs/heads/main/factorio-data-raw-json-schema.full.json

## High-level structure (matches Lua API documentation)

The dump is the JSON serialization of `data.raw`, which is documented as:

```lua
raw :: dictionary[string -> dictionary[string -> AnyPrototype]]
```

In practice, the file is a dictionary whose keys are prototype type names (e.g., `item`, `recipe`, `technology`). Each value is a dictionary from prototype name to prototype object.

### Structural checks

- Top-level categories: **251**
- Total prototype objects: **5137**
- All top-level values are JSON objects.
- Every prototype has both `name` and `type` fields.
- The `name` field matches its dictionary key.
- The `type` field matches its category key.

This consistency indicates a clean dump and makes it safe to treat `(category, name)` as a unique identifier.

## Distribution of prototypes

Top 20 categories by number of prototypes:

| Rank | Prototype type | Count |
| ---: | ---------------------- | ----: |
| 1 | optimized-particle | 845 |
| 2 | recipe | 659 |
| 3 | noise-expression | 504 |
| 4 | technology | 275 |
| 5 | item | 241 |
| 6 | explosion | 225 |
| 7 | corpse | 177 |
| 8 | optimized-decorative | 160 |
| 9 | virtual-signal | 155 |
| 10 | tile | 150 |
| 11 | item-subgroup | 136 |
| 12 | smoke-with-trigger | 101 |
| 13 | delayed-active-trigger | 100 |
| 14 | ambient-sound | 95 |
| 15 | tips-and-tricks-item | 81 |
| 16 | trivial-smoke | 67 |
| 17 | segment | 60 |
| 18 | noise-function | 48 |
| 19 | sprite | 44 |
| 20 | simple-entity | 41 |

### Singleton categories

There are **119** categories with exactly one prototype. Examples include:
`accumulator`, `achievement`, `beacon`, `character`, `character-corpse`, `map-settings`, `map-gen-presets`, `rocket-silo`, `space-platform-hub`, `surface`, `utility-constants`.

This is normal for “global” or “singleton” systems (map settings, GUI style, utility constants, etc.).

## Example prototype names (samples)

A few representative names by category:

- `item`: accumulator, active-provider-chest, advanced-circuit, agricultural-tower, assembling-machine-1, assembling-machine-2
- `recipe`: accumulator, accumulator-recycling, acid-neutralisation, advanced-circuit, advanced-oil-processing
- `technology`: advanced-asteroid-processing, advanced-circuit, advanced-material-processing, agriculture, artillery
- `fluid`: ammonia, ammoniacal-solution, crude-oil, fluoroketone-cold, heavy-oil
- `tile`: acid-refined-concrete, ammoniacal-ocean, artificial-jellynut-soil, brash-ice, concrete

## Observations and domain notes

1. **Space Age content is present.** Categories and names such as `space-platform-hub`, `space-connection`, `planet`, and `quality` indicate the Space Age mod is active in this dump (consistent with the wiki’s data.raw listing for 2.0.65 + Space Age).

2. **Very large “content” categories.** Particles, recipes, noise expressions, and technologies dominate the size. Tools should expect these to be the biggest memory/time drivers.

3. **Type system is stable but dynamic.** The schema project notes that `data-raw-dump.json` is large, uses dynamic typing, and includes quirks such as empty arrays serialized as `{}`. It also recommends lenient number handling (integers can be floats in practice) and allowing additional properties for forward compatibility.

4. **data.raw is data-stage only.** The `data` table is populated during the prototype stage (data.lua, data-updates.lua, data-final-fixes.lua) and then frozen. This dump is a snapshot after the data stage has completed.

## Practical implications for tooling

- **Treat `(type, name)` as a stable key.** It is consistent in this dump.
- **Plan for scale.** Thousands of entries and deep nested objects are normal.
- **Be lenient with numeric types.** Many fields documented as integers appear as floats in practice.
- **Allow unknown properties.** The schema project explicitly allows additional properties for compatibility.
- **Handle array/object quirks.** Some arrays may appear as `{}` in JSON output; tooling should normalize these to empty arrays when needed.

## Follow-up ideas

- Validate against the full JSON schema or generate a partial schema for specific domains (e.g., items/recipes only) to simplify downstream typing.
- Build a per-category “field histogram” (top properties and type variability) to identify dynamic fields.
- Normalize known quirks (empty arrays as `{}`) before processing.
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Addendum: machine speed, 900 petroleum gas/min example, and test script

Date: 2026-02-11

## Machine speed impact (same recipe, different machines)

Given a recipe with `energy_required` and a machine with `crafting_speed`, the per-machine throughput is:

- `effective_crafting_speed = crafting_speed * (1 + speed_bonus)`
- `cycle_seconds = (energy_required or 0.5) / effective_crafting_speed`
- `output_rate = result_amount / cycle_seconds`
- `input_rate = ingredient_amount / cycle_seconds`

Productivity applies only when:

- the recipe allows it (`allow_productivity == true`),
- the machine allows it (`allowed_effects` includes `productivity`), and
- the result is not marked `ignored_by_productivity`.

This is why the same recipe can yield different rates across different machines: the difference is strictly due to `crafting_speed` and module/base effects.

## Example: 900 petroleum gas per minute

Target: 900 petroleum gas/min = 15 petroleum gas/s.

Using only the advanced oil processing chain (no coal liquefaction), with the dump values:

- `advanced-oil-processing` (oil refinery): 5 s → heavy-oil 25, light-oil 45, petroleum-gas 55
- per refinery: 5 HO/s, 9 LO/s, 11 PG/s
- `heavy-oil-cracking` (chemical plant/biochamber): 2 s → light-oil 30
- per plant: consumes 20 HO/s, produces 15 LO/s
- `light-oil-cracking` (chemical plant/biochamber): 2 s → petroleum-gas 20
- per plant: consumes 15 LO/s, produces 10 PG/s

Let A = refineries, H = heavy cracking, L = light cracking.

Steady-state with no leftover fluids:

- HO balance: `5A - 20H = 0` → `H = 0.25A`
- LO balance: `9A + 15H - 15L = 0` → `L = 0.6A + H = 0.85A`
- PG rate: `PG = 11A + 10L = 19.5A`

To reach 15 PG/s:

- `A = 15 / 19.5 = 0.76923`
- `H = 0.19231`
- `L = 0.65385`

This is the fractional solution. In an integer program, you can:

1. keep the balance constraints as inequalities (allow leftovers), or
2. enforce exact balance and allow overproduction with a penalty, or
3. relax integer constraints for early planning, then round and re-optimize.

Example integer reference:

- `A = 1` (no cracking) gives 11 PG/s = 660 PG/min (short of target).
- `A = 2` (no cracking) gives 22 PG/s = 1320 PG/min (over target).

If you enforce zero leftovers and integer counts, you must scale the ratio 20:5:17 (from the wiki) or accept overproduction with a penalty term.

## Icon + localization test script

Script path:

- `src/private/app/factorio-cycle-calculator/.AGENT/scripts/check_icons_and_locale.py`

It validates:

- icon paths for selected recipes, fluids, items, and machines
- PNG dimensions (if available)
- localization strings from `data/<mod>/locale/<lang>/*.cfg`

Usage (example):

- `python check_icons_and_locale.py --data-dir /mnt/c/Program\ Files/Factorio/data`
- `python check_icons_and_locale.py --data-raw /mnt/c/Users/zhang/AppData/Roaming/Factorio/script-output/data-raw-dump.json --data-dir /mnt/c/Program\ Files/Factorio/data --locale en`

## Notes on missing item/entity localization and subgroup icons

The missing locale entries reported by the script are expected and consistent
with how Factorio data is organized:

- Placeable buildings often only have `entity-name` localization entries. The
corresponding `item-name` can be missing (for example, `oil-refinery`,
`chemical-plant`, `biochamber`). For UI labels, prefer `item-name` and fall
back to `entity-name` when the item key is not present.
- `item-subgroup` entries are internal categorization metadata. They frequently
have no localization entry and no icon. If you need a label or icon, prefer
the parent `item-group` or fall back to the raw subgroup name.
Loading
Loading