tokensmith

Stop Paying the Token Tax.

tokensmith is a Rust CLI that detects your hardware, recommends a local model/runtime configuration, manages model downloads, starts local serving, and exposes OpenAI-compatible APIs. It focuses on safe defaults, OOM avoidance, and operator controls for monitoring and safe stop.

What is tokensmith?

Running local LLMs is usually manual and brittle: model choice is unclear, memory limits are easy to exceed, and runtime operations are fragmented. tokensmith orchestrates this workflow:

Hardware profiling (doctor)
Explainable model/config recommendation (recommend)
Model artifact management (pull)
Runtime/server lifecycle (up, stop, ps, logs)
OpenAI-compatible serving (/v1/chat/completions, /v1/completions, SSE streaming)
Resource monitoring with warning thresholds (status, monitor)

Quickstart

tokensmith doctor
tokensmith recommend --task code --mode balanced
tokensmith pull <id>
tokensmith up --task code --mode balanced --detach
tokensmith status
tokensmith monitor --watch
tokensmith stop

Commands

tokensmith doctor
tokensmith recommend --task code|chat [--mode fast|balanced|quality]
tokensmith pull <model_id>
tokensmith up --task code|chat [--mode fast|balanced|quality] [--ctx 4096] [--port 8000] [--host 127.0.0.1] [--detach]
tokensmith status
tokensmith monitor [--interval 1s] [--watch] [--json] [--warn-mem 80%] [--warn-cpu 300%]
tokensmith throttle --mode fast|balanced|quality
tokensmith stop [--force-after 5s]
tokensmith kill
tokensmith ps
tokensmith logs [--follow]

Monitoring and Safe Stop

Use tokensmith monitor to inspect:

RSS memory (MB)
CPU %
thread count
uptime
system total/free memory when available

Threshold warnings:

tokensmith monitor --watch --warn-mem 80% --warn-cpu 300%

When warnings trigger, use:

tokensmith throttle --mode fast
tokensmith stop

Safe stop behavior:

Send SIGTERM
Wait --force-after (default 5s)
Escalate to SIGKILL if still alive

OpenAI-Compatible Clients

export OPENAI_BASE_URL=http://127.0.0.1:8000/v1
export OPENAI_API_KEY=local

Then use your normal OpenAI SDK pointing at OPENAI_BASE_URL.

Continue Extension Integration (VS Code)

Use a dedicated VS Code profile/workspace so this does not affect your main cloud OpenAI setup.

Start tokensmith:

tokensmith up --task code --mode fast --ctx 4096 --port 8000 --host 127.0.0.1 --detach

Point Continue to tokensmith (OpenAI-compatible endpoint):

apiBase: http://127.0.0.1:8000/v1
apiKey: local
model: your loaded local model id (for example qwen2.5-3b-instruct)

Example Continue config snippet:

models:
  - title: Local Qwen (tokensmith)
    provider: openai
    model: qwen2.5-3b-instruct
    apiBase: http://127.0.0.1:8000/v1
    apiKey: local

Validate quickly:

tokensmith logs --calls --follow

You should see model_call proxied / model_call proxied_stream lines during Continue requests.

Continue Troubleshooting

502 ... exceed_context_size_error: Request context (chat history + attached files + instructions) is larger than runtime n_ctx. Fix by lowering context in Continue, starting a new chat, or increasing startup context with --ctx.
502 ... 503 Loading model: Runtime is still loading. Wait and retry.
502 ... error sending request for url (http://127.0.0.1:8001...): llama-server backend is not reachable. Check startup logs and runtime process.
llama-server exited early ... model is corrupted or incomplete: Re-pull model artifact.

Notes

macOS Metal path (MVP) is designed for llama.cpp (llama-server) first.
Binary search order:
1. ~/.tokensmith/bin/llama-server
2. PATH
If llama-server is missing, doctor provides guidance and the adapter can still respond with placeholder behavior for local testing.

Roadmap

CUDA backend support
Better Windows process/metrics parity
LAN sharing mode
Larger curated model registry
Multi-process management (tokensmith ps for more than one active server)

Development

cargo fmt
cargo test
cargo run -- recommend --task code --mode balanced

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
models		models
requests		requests
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tokensmith

What is tokensmith?

Quickstart

Commands

Monitoring and Safe Stop

OpenAI-Compatible Clients

Continue Extension Integration (VS Code)

Continue Troubleshooting

Notes

Roadmap

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tokensmith

What is tokensmith?

Quickstart

Commands

Monitoring and Safe Stop

OpenAI-Compatible Clients

Continue Extension Integration (VS Code)

Continue Troubleshooting

Notes

Roadmap

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages