OpenClaw + Local LLM: A Field Guide to What Actually Works (and What Absolutely Doesn't)

So you want to run OpenClaw with local LLMs via Ollama on a consumer NVIDIA GPU. Noble goal. We spent an unreasonable amount of time so you don't have to. This repo has model comparisons, benchmark data, setup automation, and a painfully honest record of every dumb mistake we made along the way.

Tested on RTX 5070 Ti 16GB -- OpenClaw 2026.3.13 -- Ollama 0.17.7

Why Does This Repo Exist?

Because every "best local LLM" article on the internet tests models for chat -- you know, "write me a poem about cats" stuff. Nobody tests for agent tool calling, which is the part that actually matters for OpenClaw. Its system prompt is 12,000+ tokens of tool schemas and workspace files, which turns out to be a fantastic way to make most small models completely fall apart. We tested 13 models. Only 2 survived. The other 11 died in increasingly creative ways.

Model Comparison (RTX 5070 Ti 16GB)

Model	Size	Search	Single Op	Compound Op	Verdict
qwen3-vl:8b-instruct	6.1 GB	✅	✅	✅	Recommended
mistral-small3.1:24b	15 GB	✅	✅	✅	Stable but slow (~10 tok/s)
qwen2.5:14b-instruct	9.0 GB	✅	⚠️	⚠️	Works but verbose/hesitant
qwen3:4b	2.5 GB	✅	✅	❌	Compound ops fail
gpt-oss:20b	13 GB	❌	✅/❌	—	Extremely unstable
phi4:14b	9.1 GB	—	—	—	No tools API support
llama3.2-vision:11b	7.8 GB	—	—	—	No tools API support
llama3.1:8b	4.9 GB	❌	❌	—	Echoes text, no tool calls
+ 5 more models					All failed

The thing nobody tells you: AGENTS.md instruction quality matters more than model size. Turns out you can make an 8B model behave if you just ask nicely (and very specifically). See setup guide for the details that took us way too long to figure out.

Inference Engine: Ollama vs vLLM (a.k.a. "We Tried Both So You Don't Have To")

Metric	Ollama	vLLM
Single request tok/s	130	5-11
Install complexity	One liner	Docker build (Blackwell needs special image)
WSL2 support	Native	`pin_memory` limitation, major perf hit
Multi-user concurrency	Sequential	PagedAttention batching (vLLM advantage)

vLLM is genuinely great at concurrent serving. But for a single-user Telegram bot? Ollama wins by 10-25x, and you get to keep your sanity during installation.

Architecture

Telegram / Browser
  | HTTP / WebSocket
OpenClaw (Mac, npm global) — Telegram bot, Agent UI
  | Ollama native API (http://PC_LAN_IP:11434)
Ollama (PC Linux systemd)
  |-- qwen3-vl:8b-instruct (recommended, vision + tools, no thinking)
  | http://PC_LAN_IP:8080
SearXNG (PC Docker) — Local search, no API key needed

Docs

Document	Description
Setup Guide	Full installation tutorial with Claude Code automation prompts
Model Benchmark	Detailed tok/s, TTFT, VRAM data across context windows
Pitfalls	24 battle-tested pitfalls with root causes and fixes

Skills

This project uses skills from kc_ai_skills:

Skill	Description
llm-benchmark	Automated Ollama model benchmark with CPU offload detection
searxng	OpenClaw local search integration via SearXNG

Quick Start (for the Impatient)

See the Setup Guide for the full walkthrough. But if you just want the cliff notes:

PC: Install Ollama + SearXNG + pull qwen3-vl:8b-instruct
Mac: Install OpenClaw + configure openclaw.json + write AGENTS.md (this part is more important than you think)
Test: Search + tool calling via Telegram or Web UI, then breathe a sigh of relief when it actually works

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenClaw + Local LLM: A Field Guide to What Actually Works (and What Absolutely Doesn't)

Why Does This Repo Exist?

Model Comparison (RTX 5070 Ti 16GB)

Inference Engine: Ollama vs vLLM (a.k.a. "We Tried Both So You Don't Have To")

Architecture

Docs

Skills

Quick Start (for the Impatient)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

OpenClaw + Local LLM: A Field Guide to What Actually Works (and What Absolutely Doesn't)

Why Does This Repo Exist?

Model Comparison (RTX 5070 Ti 16GB)

Inference Engine: Ollama vs vLLM (a.k.a. "We Tried Both So You Don't Have To")

Architecture

Docs

Skills

Quick Start (for the Impatient)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages