A small, security-focused CLI wrapper around Ollama for one job:
read local file evidence -> produce a constrained answer -> log everything needed to audit the run
This project is intentionally narrow. It is not a general autonomous agent framework. It is an evidence-gated runner that tries to make hallucination and unsafe file access expensive or impossible by default.
Quick operator runbook: OPERATOR_QUICKREF.md
- Operator quick reference
- What this project is and is not
- Why this exists
- End-to-end architecture
- Execution flow (
chatvsask) - Evidence and admissibility model
- Security model for file reads
- Model routing and token budget behavior
- Output quality controls (format + footer)
- Run logging and auditability
- Setup and quickstart
- CLI usage and recipes
- Phase 2 indexing and query
- Configuration reference
- Error codes and troubleshooting
- Testing and verification
- Extending safely
- Practical limitations
For day-to-day commands and failure triage, use:
What it is:
- A deterministic orchestrator around one model call (
chat) or two model calls (ask). - A strict tool-call protocol plus evidence validation.
- A sandboxed local file reader with typed failure modes.
- A reproducible run logger (
runs/<run_id>/run.json).
What it is not:
- Not LangChain, not a planner/executor loop, not an unbounded tool agent.
- Not a generic distributed retrieval framework; retrieval is local, deterministic, and evidence-first.
- Not a "trust the model by default" UX.
Common failure patterns in LLM + tool systems are well-known:
- The model claims it read a file that it never read.
- Tool-call JSON is malformed or mixed with prose and silently ignored.
- File access is too broad (path traversal, hidden files, absolute path reads).
- Partial reads are treated as full coverage.
- Documents contain prompt injection content and the model follows it.
local-agent turns these into explicit contracts:
- strict tool-call parsing
- fail-closed evidence gates
- sandboxed file access policy
- typed error codes
- auditable run logs with redaction
Core modules:
agent/__main__.py- CLI parsing (
chat,ask) - model selection (fast/big/default)
- ask state machine
- evidence validation and fail-closed behavior
- second-pass output checks and retry logic
- run logging
- CLI parsing (
agent/tools.pyToolSpec,ToolError,TOOLSread_text_fileimplementation- sandbox policy initialization and path validation
agent/protocol.py- strict + robust tool-call parsing
- supports prefix JSON tool-call extraction and trailing text capture
configs/default.yaml- model defaults, token/time budgets, security policy
tests/test_tools_security.py- sandbox and resolution behavior regression tests
SECURITY.md- manual security verification checklist
Single model call:
- Send user prompt.
- Print model response.
- Log sanitized raw response and metadata.
No tool use, no evidence gates.
Two-pass control flow with one model-requested tool call:
- Pass 1 (tool-selection prompt):
- Model must either:
- answer directly, or
- emit
{"type":"tool_call","name":"...","args":{...}}
- Model must either:
- Runner parses tool call:
- strict parse first
- prefix JSON parse fallback if response starts with tool-call JSON and contains trailing text
- If a tool call is emitted:
- execute tool
- validate evidence
- optional auto re-read for full-evidence questions when first read was truncated
- Pass 2 (answer-only prompt):
- tools forbidden
- output quality checks enforced
- If formatting violations:
- one retry with stricter prompt
- if still invalid -> typed failure
Important:
- If question semantics require file evidence and admissible evidence is not acquired, runner returns typed failure and does not ask model to guess.
- The runner may perform one additional
read_text_filecall itself for full-evidence rereads. This is not a model-requested second tool choice; it is runner-side evidence completion logic.
read_text_file evidence contract:
path(absolute)sha256(hash of full text)chars_full(full length)chars_returned(returned text length)truncated(bool)text(possibly truncated content)
Evidence is rejected when:
- required fields are missing
- field types are wrong
- char counts are inconsistent
- file is empty for summary-style tasks
- tool returned error
If evidence is invalid/missing when required:
- run fails closed
- returns typed JSON failure
- no second-pass "best effort" answer
Security policy is configured at startup from configs/default.yaml.
Controls:
- allowlisted roots (
allowed_roots) - allowlisted extensions (
allowed_exts) - deny absolute/anchored paths (
deny_absolute_paths) - deny hidden path segments (
deny_hidden_paths) - optional emergency bypass (
allow_any_path, default false) - root validation behavior (
auto_create_allowed_roots,roots_must_be_within_security_root)
Path request styles:
- Bare filename (no slash/backslash)
- Example:
note.md - Searched across allowlisted roots in order
- Exactly one match -> allowed
- Multiple matches ->
AMBIGUOUS_PATH - None ->
FILE_NOT_FOUND(if search path was valid but file missing)
- Explicit subpath (contains slash/backslash)
- Example:
allowed/corpus/project/note.md - Treated as
security_root-relative (same anchor asworkrootwhen configured) - Must still fall within an allowlisted root
Additional protections:
- lexical containment checks before existence checks
- strict resolve checks for existing paths (symlink escape defense)
- allowlisted roots are validated after
resolve(strict=True)when containment is enabled - extension and hidden-path checks before content read
Model selection supports default and split-model operation:
- Legacy/default: only
modelconfigured -> both passes usemodel - Split mode:
- pass 1 defaults to
model_fastwhenprefer_fastis true - pass 2 may upgrade to
model_bigwhen question matchesbig_triggers
- pass 1 defaults to
CLI overrides:
--fast: force fast model for both passes--big: force big model for answer pass--full: force full evidence read attempt when tool used
Budget controls:
max_tokensandtimeout_sbase valuesmax_tokens_big_secondandtimeout_s_big_secondfor large answer passmax_chars_full_readcap for runner-side rereads
Pass 2 includes explicit constraints:
- no tool calls
- no tool-call JSON envelopes
- no markdown tables (bullet/paragraph style preferred)
- no claims beyond provided evidence
- include canonical evidence scope footer
Validation checks:
- table detector heuristic
- tool-call detector on pass 2 output
- exact scope-footer last-line check
Retry behavior:
- one retry for format violations
- fast-path optimization: if only missing scope footer, append locally and skip retry
- if retry still violates format ->
SECOND_PASS_FORMAT_VIOLATION
Scope footer format:
Scope: full evidence from read_text_file (5159/5159), sha256=14e424b8f1f06f8c2e2f43867f52f37f6ffb95f8434f743f2a94f367a7d2c999
Each invocation writes:
runs/<run_id>/run.json
Key logged fields:
- run metadata (
mode, question/prompt, timings) - model selection (
raw_first_model,raw_second_model) - raw model responses with
message.thinkingstripped - tool trace
- evidence status (
required,status, truncation, char counts) - retry metadata (if used)
- final assistant text
Redaction rule:
- for file-read results, logs keep metadata +
text_previewonly (first 800 chars) - full file text is not logged by default
Requirements:
- Python 3.11+
- Ollama reachable from the runtime environment (default
http://127.0.0.1:11434) - repo config available at
configs/default.yaml(always used; see Config location below) - default dependency set in
requirements.txtincludes Torch + embedding stack for Phase 3 torch-first operation
Ollama host selection:
- Effective precedence for the Ollama base URL is:
--ollama-base-url, otherwiseLOCAL_AGENT_OLLAMA_BASE_URL, otherwiseOLLAMA_BASE_URL, otherwise repo configollama_base_url, otherwise the built-in defaulthttp://127.0.0.1:11434. LOCAL_AGENT_WORKROOTand--workrootonly change the external data root. They do not change which Ollama host is used.- Example local default:
python -m agent doctor --json- Example LAN-hosted Ollama on a second PC:
export LOCAL_AGENT_OLLAMA_BASE_URL=http://192.168.1.25:11434
python -m agent doctor --json
python -m agent chat "ping"
python -m agent ask "Summarize indexed evidence."Install (editable):
python -m venv .venv
.\.venv\Scripts\activate
pip install -e .On Linux/macOS, use:
source .venv/bin/activate
pip install -e .Install from requirements (torch-first default environment):
pip install -r requirements.txtIf you want a lean environment without Torch, install only core dependencies explicitly instead of requirements.txt, for example:
pip install requests PyYAML
pip install -e .phase3.embed.provider: torch will fail unless Torch + embedding dependencies are installed.
This repo now includes a minimal .devcontainer/devcontainer.json for Python 3.11 development. It is intentionally small: it installs the package in editable mode with the optional dev extra so you can run the test suite and general CLI commands, but it does not change the runtime architecture or assume Ollama is running inside the container.
Open the repository in a devcontainer or GitHub Codespace, then use:
python -m unittest discover -s tests -v
python -m agent doctor --no-ollamaThe devcontainer installs:
python -m pip install -e ".[dev]"If you also want the optional Torch embedding stack in the container, install it explicitly:
python -m pip install -e ".[dev,torch-embed]"Ollama remains external. From a Codespace or any other remote Linux dev environment, point the CLI at a reachable Ollama host with:
export LOCAL_AGENT_OLLAMA_BASE_URL=http://<reachable-host>:11434Use this for a remote Linux box, a forwarded tunnel, or another host that exposes the Ollama API. Do not assume http://127.0.0.1:11434 inside the devcontainer unless you have explicitly arranged that network path yourself.
Workroot also remains external to the repo. Provide it explicitly instead of storing live data under the checkout:
mkdir -p /workspaces/local-agent-workroot/allowed/corpus
mkdir -p /workspaces/local-agent-workroot/allowed/scratch
mkdir -p /workspaces/local-agent-workroot/runs
export LOCAL_AGENT_WORKROOT=/workspaces/local-agent-workrootDepending on your environment, that workroot may be a mounted volume, a copied dataset, or a separately provisioned directory. The repo still expects config in configs/default.yaml and data in the external workroot.
Config location (important):
- Runtime always loads config from the repo file:
local-agent/configs/default.yaml. - Launch directory does not change which config file is selected.
- Root semantics:
config_rootcomes from the loaded config path,package_rootfrom installed code location, optionalworkrootcomes from--workroot/LOCAL_AGENT_WORKROOT/ configworkroot, andsecurity_rootis the path anchor used for tool security and run logs. - Effective Ollama endpoint precedence is
--ollama-base-url, thenLOCAL_AGENT_OLLAMA_BASE_URL, thenOLLAMA_BASE_URL, then configollama_base_url, then the built-in defaulthttp://127.0.0.1:11434.
Optional devcontainer:
.devcontainer/devcontainer.jsonis intentionally minimal.- It mounts and exports only
LOCAL_AGENT_WORKROOT; it does not automatically configure an Ollama host. - If your environment exposes the host machine at
host.docker.internal, setLOCAL_AGENT_OLLAMA_BASE_URL=http://host.docker.internal:11434yourself.
Split repo/workroot setup (no workroot config required):
- Keep your single live config in repo:
local-agent/configs/default.yaml. - The shipped config sets
security.allowed_rootstoallowed/andruns/, which are resolved relative tosecurity_root(and typically share the same directory asworkroot). - With the default layout in this repo, set
workroot(and thussecurity_root) to the sibling data root../local-agent-workroot/, so the effective allowed roots become:../local-agent-workroot/allowed/../local-agent-workroot/runs/
- Phase 2 source roots stay under that external workroot:
allowed/corpus/allowed/scratch/
- Keep
security.roots_must_be_within_security_root: trueand ensureworkrootpoints at the desired data root (default in this repo:../local-agent-workroot/). Ensure allowlisted dirs exist (or keepauto_create_allowed_roots: true):
allowed/
runs/
Smoke test:
.venv\\Scripts\\python -m agent chat "ping"
.venv\\Scripts\\python -m agent ask "Read allowed/corpus/secret.md and summarize it."
local-agent ask "Read allowed/corpus/secret.md and summarize it."
local-agent --workroot ../local-agent-workroot ask "Read allowed/corpus/secret.md and summarize it."
local-agent --ollama-base-url http://127.0.0.1:11434 doctorRemote/devcontainer note:
- Remote Ollama changes the trust boundary from loopback-only to “whoever serves that URL”. Use only endpoints you control on localhost, a private LAN, or a private dev environment.
--ollama-base-url/LOCAL_AGENT_OLLAMA_BASE_URLaccept onlyscheme://host[:port]. Paths, query strings, fragments, and embedded credentials are rejected.- Do not expose Ollama (
11434) publicly from Codespaces/devcontainers. Keep forwarding opt-in and private; usepython -m agent doctor --no-ollamawhen you only need offline checks.
Basic:
python -m agent chat "<prompt>"
python -m agent ask "<question>"
python -m agent doctor
python -m agent doctor --no-ollama
python -m agent --ollama-base-url http://host.docker.internal:11434 doctor
local-agent chat "<prompt>"
local-agent ask "<question>"
local-agent doctor
local-agent --workroot ../local-agent-workroot ask "<question>"ask flags:
--big--fast--full
Common patterns:
- Summarize a file in
allowed/corpus/:
python -m agent ask "Read allowed/corpus/test1a.md and summarize it in 5 bullets."- Disambiguate duplicate names:
python -m agent ask "Read allowed/corpus/test1a.md and summarize it."- Request high-depth synthesis:
python -m agent ask --big "Read allowed/corpus/test1a.md and give a thorough synthesis."Phase 2 introduces retrieval-ready markdown indexing with a "two sources, one index" model:
- sources are document categories (for example
corpusandscratch) - index is one unified SQLite DB containing documents, chunks, provenance, and typedness metadata
Important behavior:
askis now grounded by retrieval evidence (lexical + vector)- no vault note YAML is modified
- typed/untyped classification is stored in index metadata, not in note frontmatter
- missing metadata is explicit:
metadata=absentwhen frontmatter is missingmetadata=unknownwhen frontmatter exists but parse/typedness is indeterminate
Commands:
local-agent index
local-agent index --rebuild
local-agent query "coherence" --limit 5
local-agent embed --json
local-agent memory list --json
local-agent doctor
local-agent doctor --no-ollama
local-agent doctor --require-phase3 --jsonPhase 3 adds embeddings, retrieval fusion, and durable memory stores with explicit provenance invariants.
Phase 3 now defaults to phase3.embed.provider: torch.
Install optional embedding dependencies:
pip install -e ".[torch-embed]"No silent downloads are allowed during local-agent embed.
You must either:
- set
phase3.embed.torch.local_model_pathto a local model directory, or - pre-populate local cache and set
phase3.embed.torch.cache_dir.
If model files are unavailable locally, embed fails closed with PHASE3_EMBED_ERROR.
Embed corpus chunks from phase2 index:
local-agent embed [--model <id>] [--rebuild] [--batch-size N] [--limit N] [--dry-run] [--no-prune] [--json]By default, local-agent embed prunes orphan embeddings (rows not present in current phase2 chunk keys).
To disable pruning for a run, use local-agent embed --no-prune.
Doctor phase3 readiness (strict mode):
local-agent doctor --require-phase3 --jsonDurable memory commands:
local-agent memory add --type preference --source manual --content "..."
local-agent memory list --json
local-agent memory delete <memory_id>
local-agent memory export memory/export.jsonCitation hygiene option:
phase3.ask.citation_validation.require_in_snapshot: trueenforces that cited chunk keys must come from the retrieved evidence snapshot used for that run.- Recommended for fail-closed behavior: combine with
phase3.ask.citation_validation.strict: true. phase3.ask.citation_validation.heading_matchcontrols heading comparison (exact|prefix|ignore); defaultprefixavoids brittle failures when citations reference a parent heading.phase3.ask.citation_validation.normalize_heading: truenormalizes whitespace and trailing punctuation (for exampleH1: Freeform Journaling:andH1: Freeform Journaling).phase3.ask.evidence.top_ncontrols the snapshot/prompt evidence bandwidth (default8).- If strict snapshot checks are too tight, raise
top_nmodestly (for example8 -> 12or16); tradeoff is larger prompt and larger evidence logging payload before caps.
Top-level:
model,model_fast,model_bigprefer_fastbig_triggersmax_tokens,max_tokens_big_secondtimeout_s,timeout_s_big_secondread_full_on_thoroughmax_chars_full_readfull_evidence_triggerstemperatureollama_base_url- runtime overrides:
--ollama-base-url, thenLOCAL_AGENT_OLLAMA_BASE_URL, thenOLLAMA_BASE_URL phase2(index_db_path,sources,chunking.max_chars,chunking.overlap)phase3embeddings_db_pathembedprovider(torchdefault,ollamaoptional)model_idpreprocess,chunk_preprocess_sig,query_preprocess_sigbatch_sizetorch.local_model_pathtorch.cache_dirtorch.device,torch.dtypetorch.batch_size,torch.max_lengthtorch.pooling,torch.normalizetorch.trust_remote_code,torch.offline_only
retrieve(lexical_k,vector_k,vector_fetch_k,rel_path_prefix,fusion)ask.evidence(top_n)ask.citation_validation(enabled,strict,require_in_snapshot,heading_match,normalize_heading)runs(log_evidence_excerpts,max_total_evidence_chars,max_excerpt_chars)memory(durable_db_path,enabled)
Security (security:):
allowed_rootsallowed_extsdeny_absolute_pathsdeny_hidden_pathsallow_any_pathauto_create_allowed_rootsroots_must_be_within_security_root
Current defaults in this repo are intentionally conservative:
- only
.md,.txt,.jsonreads - allowlisted roots limited to
allowed/andruns/under the activesecurity_root(derived from the configured workroot) - phase2 source roots default to
allowed/corpus/andallowed/scratch/under thatsecurity_root - absolute/hidden path denial enabled
- Precedence:
--ollama-base-urlflag >LOCAL_AGENT_OLLAMA_BASE_URLenv >OLLAMA_BASE_URLenv > configollama_base_url> built-in defaulthttp://127.0.0.1:11434. - Values must include
http://orhttps://and a host (optionally:port); trailing slash is trimmed and invalid values fail fast. - Local default:
http://127.0.0.1:11434. - Remote/LAN: set
LOCAL_AGENT_OLLAMA_BASE_URL=http://<lan-host>:11434(or use--ollama-base-url) and the same resolved host is used by Ollama-backed doctor/embed, ask/chat, and retrieval smokes. Offline doctor (--no-ollama) and torch-backed embed do not validate unrelated Ollama URL settings at startup. - Devcontainer/Codespaces: the container does not run Ollama; point
LOCAL_AGENT_OLLAMA_BASE_URLat a host you control on the LAN/VPN. Do not expose Ollama to the public internet; keep it firewalled.
- A minimal
.devcontainer/devcontainer.jsonis provided for Python 3.11. It mounts a persistent volume at/workspaces/local-agent-workrootand exportsLOCAL_AGENT_WORKROOTthere (workroot stays outside the repo). postCreateCommandinstalls the project in editable mode with dev extras (pip install -e ".[dev]") and creates the expected workroot subdirectories.- Codespaces/devcontainer sessions should point at a remote/LAN Ollama host via
--ollama-base-url,LOCAL_AGENT_OLLAMA_BASE_URL, orOLLAMA_BASE_URL; the devcontainer does not configure an Ollama host by default.
Typed failure format:
{"ok": false, "error_code": "...", "error_message": "..."}Frequent codes and first checks:
CONFIG_ERROR- verify
security.allowed_rootsresolve to valid directories
- verify
PATH_DENIED- check extension allowlist, hidden segments, traversal/absolute path use
FILE_NOT_FOUND- file not found under allowlisted roots
AMBIGUOUS_PATH- duplicate bare filename; use explicit subpath
EVIDENCE_NOT_ACQUIRED- model did not produce admissible tool call when evidence required
FILE_EMPTY- source file empty for summarize request
EVIDENCE_TRUNCATED- full evidence required but read remained partial
UNEXPECTED_TOOL_CALL_SECOND_PASS- model violated answer-only phase
SECOND_PASS_FORMAT_VIOLATION- output still violated format after one retry
DOCTOR_INDEX_DB_MISSING- preflight found no index DB at configured
phase2.index_db_path - run
python -m agent index --rebuild --json
- preflight found no index DB at configured
DOCTOR_CHUNKER_SIG_MISMATCH- preflight found stale chunking fingerprint vs configured phase2 chunking
- run
python -m agent index --scheme obsidian_v1 --rebuild --json(or your configured scheme)
DOCTOR_EMBED_OUTDATED_REQUIRE_PHASE3- preflight found embedding rows that do not match current phase3 model/preprocess/chunk hashes
- run
python -m agent embed --json(or--rebuild --json)
DOCTOR_EMBED_RUNTIME_FINGERPRINT_MISMATCH- embedding provider/runtime fingerprint changed since embeddings were written
- run
python -m agent embed --rebuild --json
DOCTOR_PHASE3_EMBEDDINGS_DB_MISSING- phase3-required preflight found no embeddings DB
- run
python -m agent embed --json
DOCTOR_MEMORY_DANGLING_EVIDENCE- durable memory references chunk keys that are no longer present in phase2 index
- delete or repair dangling memory records
DOCTOR_PHASE3_RETRIEVAL_NOT_READY- embeddings metadata looked valid but retrieval readiness smoke test failed
- verify embed provider runtime availability, then run
python -m agent embed --rebuild --jsonand re-run doctor
Debug tip:
- open latest
runs/<run_id>/run.json - inspect
resolved_config_path,config_root,package_root,workroot, andsecurity_rootfirst - inspect
tool_trace,evidence_status,raw_first,raw_second, and retry fields
Run unit tests:
python -m unittest discover -s tests -vCoverage includes:
- allowlisted read success
- explicit subpath success
- explicit subpath
security_rootanchoring (independent of process CWD) - ambiguous bare filename rejection
- extension and hidden path denial (including
.env) - traversal/absolute path denial
security_roottop-level file rejection when not allowlisted- fail-closed misconfiguration behavior
- symlink escape denial (POSIX test)
Manual security checklist:
- see
SECURITY.md
Doctor tip:
- use
python -m agent doctor --no-ollamato skip only Ollama network checks. - with
phase3.embed.provider: torch, retrieval smoke still runs under--no-ollama.
Create a clean, shareable zip (without .venv/, .git/, caches, or run logs):
python scripts/make_release_zip.py
python scripts/make_release_zip.py --dry-run
python scripts/make_release_zip.py --include-workroot--include-workroot adds only a curated subset (local-agent-workroot top-level boot/docs files plus allowed/.gitkeep and allowed/sample/** when present), and always excludes local-agent-workroot/runs/**.
Optional local cleanup helper:
python scripts/clean_artifacts.py --dry-run
python scripts/clean_artifacts.pyIf you add tools:
- Add new
ToolSpecinagent/tools.py. - Decide if output is admissible evidence.
- If admissible, add explicit validator in runner logic.
- Keep pass boundaries strict:
- pass 1: tool decision
- pass 2: answer only from provided tool output
- Add tests for security and contract behavior.
Intentional limits:
- single model-requested tool call per ask run
- bounded read/token budgets
- strict formatting and protocol checks can produce "hard fails" rather than graceful-but-risky answers
Non-goals:
- broad autonomous task execution
- unrestricted filesystem exploration
- hidden-file or arbitrary-extension access by default
This runner is built around three constraints:
- Finitude: bounded resources are explicit, not hidden.
- Integrity: only typed evidence is admissible for evidence-required asks.
- Scope discipline: partial coverage must be disclosed mechanically.
Mental model:
- a small "epistemic linter" around local-file Q&A, optimized for correctness and auditability over flexibility.