Skip to content

ModelsLab/fusion

Repository files navigation

Fusion

Fusion is a cross-platform Go CLI for model and kernel optimization workflows. The long-term goal is one CLI that can plan GPU-specific optimizations, generate Triton or CUDA kernel candidates, run them on local or remote Linux GPU machines, benchmark before vs after, and keep the winning variants.

Today Fusion already gives you a useful foundation:

  • ModelsLab-backed chat sessions and Modelslab-only auth
  • browser-based fusion login that hands off from modelslab.com back into the local CLI
  • an embedded optimization knowledge base with GPU profiles, strategies, skills, examples, and source references
  • a public Markdown-first knowledgebase/ corpus that compiles into the shipped SQLite index
  • a packed SQLite BM25 search index generated from the curated knowledge files
  • host capability detection with explicit warnings on unsupported setups
  • target management for local, ssh, and sim
  • benchmark and profile execution against those targets
  • persisted artifacts for before/after comparisons
  • target-aware optimization planning
  • optimization sessions that persist retrieved context, backend candidates, and stage artifacts
  • CuTe DSL, Triton, and CUDA workspace scaffolding with build and verify flows

Fusion can run on macOS for planning, artifact management, ModelsLab setup, and SSH orchestration. Real CUDA compilation, profiling, and authoritative kernel performance validation still need a Linux machine with NVIDIA tooling.

Status

What works today:

  • fusion and fusion chat as a chat-first agent entry point
  • fusion env detect|doctor
  • fusion generate keychain
  • fusion optimize plan with a curated GPU and optimization knowledge base
  • fusion kb list|search|show|context backed by an embedded SQLite BM25 index
  • fusion update kb to rebuild a local Markdown-backed knowledge snapshot and SQLite index
  • fusion optimize session create|list|show
  • fusion optimize cute init|build|verify|benchmark
  • fusion optimize triton init|build|verify|benchmark
  • fusion optimize cuda init|build|verify|benchmark
  • fusion target add|list|show|remove|default
  • fusion target exec and fusion target copy
  • fusion benchmark run and fusion benchmark compare
  • fusion profile run
  • release packaging for Linux, macOS, and Windows

What is not implemented yet:

  • Modelslab-backed Triton/CUDA/CuTe code generation
  • automatic optimization loops that generate, run, score, and retain winning kernels

Install

Prebuilt binaries

Linux and macOS:

curl -fsSL https://raw.githubusercontent.com/ModelsLab/fusion/main/scripts/install.sh | sh

Pin a specific release or install into a custom directory:

curl -fsSL https://raw.githubusercontent.com/ModelsLab/fusion/main/scripts/install.sh | \
  FUSION_VERSION=v0.2.1 INSTALL_DIR="$HOME/.local/bin" sh

Windows PowerShell:

irm https://raw.githubusercontent.com/ModelsLab/fusion/main/scripts/install.ps1 | iex

From source

go install github.com/ModelsLab/fusion/cmd/fusion@latest

Local build

make build
./bin/fusion version

GitHub release workflow

Push a version tag and GitHub Actions will publish tar.gz and .zip assets for:

  • Linux amd64, arm64
  • macOS amd64, arm64
  • Windows amd64, arm64
git tag v0.2.1
git push origin v0.2.1

The release workflow uploads matching archives plus checksums.txt.

Quick Start

Connect Fusion to ModelsLab in the browser:

fusion login

Or configure it manually for CI or headless environments:

fusion auth set \
  --token "$MODELSLAB_API_KEY" \
  --model openai-gpt-5.4-pro

Store Hugging Face and GitHub tokens for model and private-repo workflows:

fusion hf login --token "$HF_TOKEN"
fusion github login --token "$GITHUB_TOKEN"

Validate them:

fusion hf whoami
fusion github whoami

Fusion shell commands automatically expose:

  • HF_TOKEN, HUGGING_FACE_HUB_TOKEN
  • GITHUB_TOKEN, GH_TOKEN

That lets the agent download models from Hugging Face and work against private GitHub repos. For private HTTPS git operations, prefer gh commands or git with an Authorization header using $GITHUB_TOKEN instead of embedding secrets into URLs.

Start the interactive agent shell:

fusion

By default, fusion resumes the latest chat session for the current working directory. Start a fresh one explicitly when you want a clean thread:

fusion chat --new

Resume a saved project session directly:

fusion chat --session latest
fusion chat --session 20260307-120501-my-project

Run a single natural-language turn:

fusion chat "optimize qwen2.5-72b for 4090 decode latency and compare AWQ vs Triton"

Inside chat, Fusion can use tools for:

  • listing, reading, writing, replacing, and deleting files
  • running shell commands locally or on configured targets
  • searching the knowledge base
  • creating optimization sessions and retrieving skill/context packets
  • building optimization plans
  • running benchmark and profile workflows
  • scaffolding and running CuTe DSL, Triton, and CUDA workspaces

Chat-local commands:

/help
/history 12
/sessions
/resume latest
/new
/model gpt-5
/cd ~/projects/my-model
/save
/tools
/session
/exit

The local slash commands are for session control only. The model still does the real engineering work through Fusion tools.

Common Commands

fusion -h
fusion version
fusion login
fusion hf login --token "$HF_TOKEN"
fusion github login --token "$GITHUB_TOKEN"
fusion env doctor --backend all --fix-script
fusion kb search "blackwell attention"
fusion optimize plan --gpu h100 --workload decode --operator attention

Inspect the current host:

fusion env detect
fusion gpu detect

Search the embedded optimization corpus:

fusion kb search "paged attention"
fusion kb show --kind gpu --id rtx4090
fusion kb context --gpu b200 --workload decode --operators attention,kv-cache --precision fp8 --runtime vllm

Rebuild a private local knowledge base from Markdown docs:

fusion update kb

This bootstraps ~/.config/fusion/knowledgebase/ if needed, rebuilds the SQLite index under ~/.config/fusion/knowledge/, and makes future Fusion runs prefer that rebuilt local knowledge base.

Session Workflow

Fusion chat sessions are stored under ~/.config/fusion/sessions/.

  • fusion auto-resumes the latest session for the current working directory.
  • fusion chat --new starts a clean thread in the same directory.
  • fusion chat --session latest resumes the newest session for the current directory.
  • /sessions lists recent sessions and marks the current one with *.
  • /resume <id> switches sessions without leaving the shell.

See what the current machine can and cannot do:

fusion env detect
fusion env doctor --backend all --fix-script

Register a remote Ubuntu target over SSH:

fusion generate keychain --name gpulab

Paste the printed public key into your GPU provider, then register the target with the generated private key path:

fusion target add \
  --name lab-4090 \
  --mode ssh \
  --host 203.0.113.10 \
  --user ubuntu \
  --gpu rtx4090 \
  --key ~/.ssh/id_ed25519 \
  --remote-dir ~/fusion \
  --default

Register a non-authoritative proxy/sim target:

fusion target add \
  --name sim-h100-on-4090 \
  --mode sim \
  --gpu h100 \
  --proxy-gpu rtx4090

List configured targets:

fusion target list
fusion target show --name lab-4090

Run a command directly on a target:

fusion target exec --name lab-4090 --command "nvidia-smi"

Copy files to a remote target:

fusion target copy \
  --name lab-4090 \
  --src ./kernels \
  --dst ~/fusion/kernels \
  --recursive

Plan optimizations for a configured target:

fusion optimize plan \
  --target lab-4090 \
  --model llama-3.1-8b \
  --workload decode \
  --operator attention \
  --operator kv-cache \
  --precision bf16

Create a CuTe DSL workspace and compile or verify it on a target:

fusion optimize cute init \
  --name cute-add-one \
  --output ./cute-add-one \
  --gpu-arch sm90

fusion optimize cute build \
  --workspace ./cute-add-one \
  --target lab-4090 \
  --gpu-arch sm89

fusion optimize cute verify \
  --workspace ./cute-add-one \
  --target lab-4090 \
  --gpu-arch sm89

fusion optimize cute benchmark \
  --workspace ./cute-add-one \
  --target lab-4090 \
  --gpu-arch sm89

Create a session-backed Triton or CUDA candidate loop:

fusion optimize session create \
  --name qwen-b200 \
  --gpu b200 \
  --model qwen2.5-72b \
  --workload decode \
  --operator attention \
  --operator kv-cache \
  --precision fp8 \
  --runtime vllm \
  --query "optimize qwen decode attention on b200"

fusion optimize triton init \
  --session <session-id> \
  --name attention-triton

fusion optimize cuda init \
  --session <session-id> \
  --name attention-cuda

fusion optimize triton build \
  --session <session-id> \
  --candidate triton-attention-triton

fusion optimize triton verify \
  --session <session-id> \
  --candidate triton-attention-triton

fusion optimize session show --id <session-id>

The same session flow now works for CuTe candidates:

fusion optimize cute init \
  --session <session-id> \
  --name attention-cute

fusion optimize cute benchmark \
  --session <session-id> \
  --candidate cute-dsl-attention-cute

Or plan for a GPU directly:

fusion optimize plan \
  --gpu rtx4090 \
  --model llama-3.1-8b \
  --workload decode \
  --operator attention \
  --operator kv-cache \
  --precision bf16

Run a benchmark and compare before/after artifacts:

fusion benchmark run \
  --target lab-4090 \
  --name before \
  --command "python benchmark.py"

fusion benchmark run \
  --target lab-4090 \
  --name after \
  --command "python benchmark_optimized.py"

fusion benchmark compare \
  --before ~/Library/Application\\ Support/fusion/artifacts/benchmarks/<before>.json \
  --after ~/Library/Application\\ Support/fusion/artifacts/benchmarks/<after>.json

Pass metrics explicitly when your benchmark command does not print them:

fusion benchmark run \
  --target lab-4090 \
  --name before \
  --command "python benchmark.py >/tmp/bench.log" \
  --metrics "tokens_per_sec=142.5 latency_ms=7.9"

Run a profile command on a remote or local target:

fusion profile run \
  --target lab-4090 \
  --tool ncu \
  --command "ncu --set full python benchmark.py"

Target Modes

Fusion supports three execution modes:

  • local: run on the current machine
  • ssh: run on a remote Linux machine over SSH
  • sim: use a proxy machine or proxy GPU while targeting another GPU profile

Recommended usage:

  • use local when the current machine actually has the intended NVIDIA stack
  • use ssh for real Ubuntu GPU boxes
  • use sim for rough iteration, compatibility work, and non-authoritative proxy runs

sim mode is intentionally explicit about its limitations. It does not emulate an H100, B200, or any other GPU with performance fidelity on top of a different GPU. It is useful for:

  • iterating against a target GPU profile
  • validating command and artifact flow
  • rough proxy benchmarking with warnings

Authoritative performance numbers still require the real target GPU.

Host Limitations

Fusion reports host limitations with fusion env detect.

On macOS, expect:

  • planning and artifact workflows to work
  • SSH orchestration to work
  • local CUDA compilation to be unavailable unless the host actually has a supported NVIDIA stack
  • local Nsight profiling to be unavailable in normal modern macOS setups

In practice, macOS is best treated as a control plane for:

  • planning
  • ModelsLab login and session setup
  • target registration
  • remote execution over SSH
  • comparing benchmark and profile artifacts

Benchmark And Profile Artifacts

Fusion stores artifacts under the user config directory. On macOS this is typically:

~/Library/Application Support/fusion/artifacts

Current artifact types:

  • benchmarks/*.json
  • profiles/*.json

Benchmark metrics are parsed from:

  • JSON printed to stdout, for example {"tokens_per_sec": 125, "latency_ms": 8}
  • key/value lines, for example tokens_per_sec=125
  • the optional --metrics flag

fusion benchmark compare compares wall time plus any common metric keys found in both artifacts.

Command Summary

Core commands:

  • fusion login
  • fusion auth login|show|set|logout
  • fusion env detect
  • fusion gpu detect|normalize
  • fusion kb list|search|show|context
  • fusion optimize plan
  • fusion target add|list|show|remove|default|exec|copy
  • fusion benchmark run|compare
  • fusion profile run

Testing

Run the full test suite:

go test ./...

Current tests cover:

  • knowledge-base loading and search
  • optimization planner scoring
  • artifact metric parsing
  • target validation
  • target resolution
  • local and sim execution behavior
  • local file and directory copy behavior
  • benchmark comparison helper logic

Run formatters before opening a PR:

gofmt -w $(find . -name '*.go' -print)

Repository Layout

  • cmd/fusion: CLI entrypoint
  • internal/config: local config and ModelsLab token storage
  • internal/modelslab: ModelsLab API and browser-login constants
  • internal/system: host and toolchain detection
  • internal/targets: target validation and execution semantics
  • internal/runner: local and SSH command/copy execution
  • internal/artifacts: benchmark and profile artifact storage
  • internal/kb: embedded knowledge base loader, SQLite BM25 search, and context packet compiler
  • internal/optimize: optimization planner and recommendation engine
  • knowledge: source-backed GPU, strategy, skill, example, and search-index assets embedded into the binary
  • scripts: install helpers and knowledge-index generation
  • .github/workflows: CI and tagged release pipelines
  • .goreleaser.yaml: cross-platform packaging config
  • docs: architecture and roadmap notes

Current Gaps

Fusion is not yet a full autonomous kernel writer. The missing pieces are important:

  • Modelslab-backed Triton/CUDA generation
  • kernel correctness verification
  • Triton/CUDA compile pipelines
  • session-oriented optimization loops
  • promotion logic for winning kernels per GPU family and workload shape

Those pieces should build on the current target, benchmark, profile, and artifact foundation instead of bypassing it.

Roadmap

  1. Modelslab-backed Triton/CUDA kernel generation inside the CLI
  2. correctness verification commands for generated kernels
  3. first-class compile commands for Triton and CUDA C++
  4. structured optimization sessions that chain plan, generate, benchmark, profile, and compare

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages