Multistage optimizer for rockets

Calculate optimal mass for each stage given a deltaV requirement on a multistage rocket.

Finds the lowest total mass for a rocket by optimizing the weight distribution between all stages, without altering total deltaV of the rocket.

Rocket and engines are specified in a json.
Sample files for each are in the config folder.

Mathematical background, which was also the source for this project can be found here:
http://www.projectrho.com/public_html/rocket/multistage.php

Config

Options can be found in config/config.json.

useMultiCore: if program should utilize all available cores.
fullGridMode: how the full-grid solver iterates distributions: materialize (default) or streaming (generate on the fly, avoids a giant distributions table; useful groundwork for GPU work but can be slower on CPU).
backend: compute backend: auto (default), cpu, or cuda.
- auto tries to load the CUDA plugin and falls back to CPU automatically if unavailable.
backendPath: optional backend plugin path (file or directory). If not set, search order is: MSO_BACKEND_PATH, <exe_dir>/backends/, <exe_dir>/.
gpuDevice: optional GPU device id (default: 0, -1 lets the backend pick a default).
precision: the number of discretization steps used to split total deltaV across stages (1..255).
- Example (2 stages, precision 150): stage split candidates include 1/150 + 149/150, 2/150 + 148/150, ...
maxRAM: maximum allowed RAM usage in bytes. Supports human-readable values like 16GB (base 1024).
verbose: debugging output (slow; only coherent in single-threaded mode).
enginesPath: path to JSON where engines are defined.
rocketPath: path to JSON where rocket is defined.

Runtime budgets

maxCombinations: deterministic work budget. If set, the program chooses the largest precision that keeps the number of evaluated distributions <= maxCombinations (and within maxRAM).
maxCombinationsCpu / maxCombinationsCuda: optional per-backend overrides (when present, they take precedence over maxCombinations).
With zoom.enabled=true, maxCombinations applies to the total work across both passes (coarse + refined). The current default split is ~70% coarse, remainder refined (the refinement radius may be reduced to stay within budget).
targetSeconds: best-effort runtime target. If set (and maxCombinations is not set), the program runs a short calibration and derives a maxCombinations budget from it.
targetSecondsCpu / targetSecondsCuda: optional per-backend overrides (when present, they take precedence over targetSeconds).
- Note: targetSeconds is not deterministic across different machines (and can vary slightly even on the same machine).

Zoom refinement (coarse-to-fine)

Enable a fast coarse search and then refine only around the best candidates at a higher precision:

zoom.enabled: enable zoom mode.
zoom.fine_precision: final precision (defaults to precision).
zoom.coarse_precision: coarse pass precision (defaults to fine_precision/2).
zoom.window_coarse_steps: neighborhood radius around the coarse winner(s), expressed in coarse steps (converted to fine units internally).
zoom.top_k: refine the best K coarse candidates (reduces risk of missing the true optimum).

When zoom is enabled, the program prints # Combinations (coarse) and # Combinations (refined). (There is no single distribution index for the refined search, so it won't print kg at Distribution number ....)

Reporting (soft warnings)

reporting.min_stage_dv_mps: warn if a stage contributes less than this amount of deltaV (suggesting to remove/merge that stage). This does not affect optimization; it only prints warnings.

Example config snippet

{
  "useMultiCore": true,
  "precision": 100,
  "maxRAM": "16GB",
  "enginesPath": "config/engines.json",
  "rocketPath": "config/rocket_4stage.json",
  "maxCombinations": 5000000,
  "zoom": {
    "enabled": true,
    "fine_precision": 120,
    "coarse_precision": 60,
    "window_coarse_steps": 2,
    "top_k": 3
  },
  "reporting": {
    "min_stage_dv_mps": 200
  }
}

Benchmark

Run a benchmark and emit JSON with the git head for easy comparisons:

MultistageOptimizer.exe --benchmark --benchmark-config config/benchmark_ci.json --benchmark-threads 1 --benchmark-iterations 3

You can point --benchmark-config at any config file (including ones that enable zoom, maxCombinations, etc.). CI uses config/benchmark_ci.json as a stable performance regression check.

Benchmark JSON output includes:

min_mass
best_distribution_units (discretized deltaV units per stage)
best_distribution_dv_mps (derived deltaV per stage)
memory_peak_working_set_bytes / memory_peak_private_bytes (host RAM usage of the process)
gpu_memory_peak_used_bytes / gpu_memory_total_bytes (best-effort GPU VRAM delta during the run; requires NVIDIA NVML, otherwise gpu_memory_available=false)

Strategy compare wrapper

Run multiple strategies derived from the same config (full-grid baseline, zoom, budget variants) and print a combined JSON report:

MultistageOptimizer.exe --benchmark-compare --benchmark-config config/benchmark.json --benchmark-threads 1 --benchmark-iterations 3

For a human-friendly side-by-side view, use:

MultistageOptimizer.exe --benchmark-compare --benchmark-compare-format table --benchmark-config config/benchmark.json --benchmark-threads 1 --benchmark-iterations 3

Strategies

The compare wrapper derives strategies from the same config:

full_grid (baseline): brute-force evaluate all distributions at the baseline precision, with zoom and runtime budgets removed.
- If zoom.enabled=true, the baseline precision is zoom.fine_precision (not precision) so it compares against the final-resolution result; this can exceed maxRAM for many-stage rockets.
zoom: two-pass search: full-grid at zoom.coarse_precision, then refine around the best zoom.top_k coarse candidates in a window of zoom.window_coarse_steps (converted to fine units) at zoom.fine_precision. Budgets are removed.
budget_full: full-grid search with runtime budgets enabled (maxCombinations or targetSeconds), and zoom removed.
- The program chooses the largest precision that fits the budget (and within maxRAM), which can be higher than precision.
budget_zoom: zoom search with runtime budgets enabled.
- The coarse pass uses ~70% of the budget; the refinement radius may be reduced to stay within the remaining budget.

If the baseline full_grid fails (usually due to RAM), comparisons are omitted and the output includes a note explaining why.

Use --benchmark-max-seconds <sec> to fail the run if the average total time exceeds the limit (this is what CI uses). Benchmark runs also append a CSV row to benchmark_results.csv (override with --benchmark-csv <path>).

Build

Make (wraps CMake):
- make build
- make test
- make distclean
- Options: make BUILD_DIR=build-ninja CONFIG=Release (plus GENERATOR=... / ARCH=... if needed).
CMake (recommended):
- Visual Studio generator:
  - cmake -S . -B build -G "Visual Studio 17 2022" -A x64
  - cmake --build build --config Release
  - tests: ctest --test-dir build -C Release --output-on-failure
  - exe: build/bin/MultistageOptimizer.exe
- Ninja (faster incremental builds):
  - cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
  - cmake --build build
  - tests: ctest --test-dir build --output-on-failure
  - exe: build/bin/MultistageOptimizer.exe
CUDA plugin (optional):
- Configure with -DMSO_BUILD_CUDA_BACKEND=ON (default) and ensure mso_backend_cuda ends up next to the executable (or in backends/).
- Control codegen via -DMSO_CUDA_ARCHITECTURES=native (or e.g. 86;90, all-major).
Manual build (no CMake):
- clang++ (quick build, no IDE):
  - clang++ -std=c++17 -O2 -DNDEBUG -I. MultistageOptimizer.cpp -o MultistageOptimizer.exe
  - tests: clang++ -std=c++17 -O2 -DNDEBUG -I. -Itests tests/*.cpp -o MultistageOptimizerTests.exe

Releases (continuous builds)

GitHub Actions publishes rolling releases for every commit to main/master:

Tag format: b<commit-count> zero-padded to 5 digits (for example b00123).
Assets: MultistageOptimizer-windows-x64-b00123.zip containing MultistageOptimizer.exe + sample config/.

Limitations

Upwards of six stages we get into realms of impossible amount of necessary RAM, with higher precisions, as the number of different distributions is calculated with n choose r where n is the precision - 1 and r is the number of stages - 1.
To mitigate the effect, you can lower the precision. You can try out what what number of distributions exist here: https://www.calculatorsoup.com/calculators/discretemathematics/combinations.php

GPU acceleration (OpenCL/CUDA/ROCm)

The performance hotspot is evaluating the rocket mass for each deltaV distribution, which is embarrassingly parallel. The CUDA backend avoids materializing/transferring the full distributions table by enumerating distributions on the device (index → composition unranking), evaluating mass in-kernel, and reducing to the best candidate.

Optional plugin backends (llama.cpp-style)

The main executable is CPU-only and can load optional GPU plugins at runtime.

Currently implemented:

mso_backend_cuda (NVIDIA CUDA)

Select via config (backend) or CLI (--backend auto|cpu|cuda). auto tries CUDA and falls back to CPU if the plugin can’t be loaded or no CUDA device is present.

Plugin search order:

backendPath / --backend-path (file or directory)
MSO_BACKEND_PATH
<exe_dir>/backends/
<exe_dir>/

Note: the CUDA backend is currently used only for non-zoom full-grid runs.

For CPU execution, materializing the distribution table can still be faster (it's mostly a big sequential memory write, and keeps per-evaluation overhead low). The codebase includes both approaches: optimizer::FullGridMode::Materialize (default) and optimizer::FullGridMode::Streaming (on-the-fly generation, useful as a starting point for a GPU backend).

If you want to go down that route, OpenCL is the most portable single-backend option (works across NVIDIA/AMD/Intel), while CUDA/ROCm can be added as vendor-specific backends later.

Currently the program assumes that the engines always work at their vaccuum efficiency. I tried using the sl isp of engines for the first stage only. But that always leads to a tiny first stage, because it is so "inefficient" and therefore shouldnt be very big according to the program. Because of that, the first stage engine burns for only a very short time, causing the second stage to also burn near the surface.

KSP Mods

For using this easily and read out all necessary information in KSP, one should use:

Kerbal Engineer Redux -> read out total mass and deltaV
Real Fuels -> dry and wet mass of tank
Procedural Parts -> easily and very finely adjust the size of tanks

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
backend		backend
backends/cuda		backends/cuda
config		config
math		math
optimizer		optimizer
read_json		read_json
repomix		repomix
rocket_definition		rocket_definition
tests		tests
util		util
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Jeb.ico		Jeb.ico
LICENSE.md		LICENSE.md
Makefile		Makefile
MultistageOptimizer.cpp		MultistageOptimizer.cpp
MultistageOptimizer.rc		MultistageOptimizer.rc
README.md		README.md
open-issues.md		open-issues.md
resource.h		resource.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multistage optimizer for rockets

Config

Runtime budgets

Zoom refinement (coarse-to-fine)

Reporting (soft warnings)

Example config snippet

Benchmark

Strategy compare wrapper

Build

Releases (continuous builds)

Limitations

GPU acceleration (OpenCL/CUDA/ROCm)

Optional plugin backends (llama.cpp-style)

KSP Mods

About

Uh oh!

Releases 3

Packages

Languages

License

FlorianZimmer/MultistageOptimizer

Folders and files

Latest commit

History

Repository files navigation

Multistage optimizer for rockets

Config

Runtime budgets

Zoom refinement (coarse-to-fine)

Reporting (soft warnings)

Example config snippet

Benchmark

Strategy compare wrapper

Build

Releases (continuous builds)

Limitations

GPU acceleration (OpenCL/CUDA/ROCm)

Optional plugin backends (llama.cpp-style)

KSP Mods

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages