local-agent

A small, security-focused CLI wrapper around Ollama for one job:

read local file evidence -> produce a constrained answer -> log everything needed to audit the run

This project is intentionally narrow. It is not a general autonomous agent framework. It is an evidence-gated runner that tries to make hallucination and unsafe file access expensive or impossible by default.

Quick operator runbook: OPERATOR_QUICKREF.md

Operator quick reference
What this project is and is not
Why this exists
End-to-end architecture
Execution flow (chat vs ask)
Evidence and admissibility model
Security model for file reads
Model routing and token budget behavior
Output quality controls (format + footer)
Run logging and auditability
Setup and quickstart
CLI usage and recipes
Phase 2 indexing and query
Configuration reference
Error codes and troubleshooting
Testing and verification
Extending safely
Practical limitations

Operator quick reference

For day-to-day commands and failure triage, use:

OPERATOR_QUICKREF.md

What this project is and is not

What it is:

A deterministic orchestrator around one model call (chat) or two model calls (ask).
A strict tool-call protocol plus evidence validation.
A sandboxed local file reader with typed failure modes.
A reproducible run logger (runs/<run_id>/run.json).

What it is not:

Not LangChain, not a planner/executor loop, not an unbounded tool agent.
Not a generic distributed retrieval framework; retrieval is local, deterministic, and evidence-first.
Not a "trust the model by default" UX.

Why this exists

Common failure patterns in LLM + tool systems are well-known:

The model claims it read a file that it never read.
Tool-call JSON is malformed or mixed with prose and silently ignored.
File access is too broad (path traversal, hidden files, absolute path reads).
Partial reads are treated as full coverage.
Documents contain prompt injection content and the model follows it.

local-agent turns these into explicit contracts:

strict tool-call parsing
fail-closed evidence gates
sandboxed file access policy
typed error codes
auditable run logs with redaction

End-to-end architecture

Core modules:

agent/__main__.py
- CLI parsing (chat, ask)
- model selection (fast/big/default)
- ask state machine
- evidence validation and fail-closed behavior
- second-pass output checks and retry logic
- run logging
agent/tools.py
- ToolSpec, ToolError, TOOLS
- read_text_file implementation
- sandbox policy initialization and path validation
agent/protocol.py
- strict + robust tool-call parsing
- supports prefix JSON tool-call extraction and trailing text capture
configs/default.yaml
- model defaults, token/time budgets, security policy
tests/test_tools_security.py
- sandbox and resolution behavior regression tests
SECURITY.md
- manual security verification checklist

Execution flow

`chat` mode

Single model call:

Send user prompt.
Print model response.
Log sanitized raw response and metadata.

No tool use, no evidence gates.

`ask` mode

Two-pass control flow with one model-requested tool call:

Pass 1 (tool-selection prompt):
- Model must either:
  - answer directly, or
  - emit {"type":"tool_call","name":"...","args":{...}}
Runner parses tool call:
- strict parse first
- prefix JSON parse fallback if response starts with tool-call JSON and contains trailing text
If a tool call is emitted:
- execute tool
- validate evidence
- optional auto re-read for full-evidence questions when first read was truncated
Pass 2 (answer-only prompt):
- tools forbidden
- output quality checks enforced
If formatting violations:
- one retry with stricter prompt
- if still invalid -> typed failure

Important:

If question semantics require file evidence and admissible evidence is not acquired, runner returns typed failure and does not ask model to guess.
The runner may perform one additional read_text_file call itself for full-evidence rereads. This is not a model-requested second tool choice; it is runner-side evidence completion logic.

Evidence and admissibility model

read_text_file evidence contract:

path (absolute)
sha256 (hash of full text)
chars_full (full length)
chars_returned (returned text length)
truncated (bool)
text (possibly truncated content)

Evidence is rejected when:

required fields are missing
field types are wrong
char counts are inconsistent
file is empty for summary-style tasks
tool returned error

If evidence is invalid/missing when required:

run fails closed
returns typed JSON failure
no second-pass "best effort" answer

Security model for `read_text_file`

Security policy is configured at startup from configs/default.yaml.

Controls:

allowlisted roots (allowed_roots)
allowlisted extensions (allowed_exts)
deny absolute/anchored paths (deny_absolute_paths)
deny hidden path segments (deny_hidden_paths)
optional emergency bypass (allow_any_path, default false)
root validation behavior (auto_create_allowed_roots, roots_must_be_within_security_root)

Path request styles:

Bare filename (no slash/backslash)

Example: note.md
Searched across allowlisted roots in order
Exactly one match -> allowed
Multiple matches -> AMBIGUOUS_PATH
None -> FILE_NOT_FOUND (if search path was valid but file missing)

Explicit subpath (contains slash/backslash)

Example: allowed/corpus/project/note.md
Treated as security_root-relative (same anchor as workroot when configured)
Must still fall within an allowlisted root

Additional protections:

lexical containment checks before existence checks
strict resolve checks for existing paths (symlink escape defense)
allowlisted roots are validated after resolve(strict=True) when containment is enabled
extension and hidden-path checks before content read

Model routing and budget behavior

Model selection supports default and split-model operation:

Legacy/default: only model configured -> both passes use model
Split mode:
- pass 1 defaults to model_fast when prefer_fast is true
- pass 2 may upgrade to model_big when question matches big_triggers

CLI overrides:

--fast: force fast model for both passes
--big: force big model for answer pass
--full: force full evidence read attempt when tool used

Budget controls:

max_tokens and timeout_s base values
max_tokens_big_second and timeout_s_big_second for large answer pass
max_chars_full_read cap for runner-side rereads

Output quality controls

Pass 2 includes explicit constraints:

no tool calls
no tool-call JSON envelopes
no markdown tables (bullet/paragraph style preferred)
no claims beyond provided evidence
include canonical evidence scope footer

Validation checks:

table detector heuristic
tool-call detector on pass 2 output
exact scope-footer last-line check

Retry behavior:

one retry for format violations
fast-path optimization: if only missing scope footer, append locally and skip retry
if retry still violates format -> SECOND_PASS_FORMAT_VIOLATION

Scope footer format:

Scope: full evidence from read_text_file (5159/5159), sha256=14e424b8f1f06f8c2e2f43867f52f37f6ffb95f8434f743f2a94f367a7d2c999

Run logging and auditability

Each invocation writes:

runs/<run_id>/run.json

Key logged fields:

run metadata (mode, question/prompt, timings)
model selection (raw_first_model, raw_second_model)
raw model responses with message.thinking stripped
tool trace
evidence status (required, status, truncation, char counts)
retry metadata (if used)
final assistant text

Redaction rule:

for file-read results, logs keep metadata + text_preview only (first 800 chars)
full file text is not logged by default

Setup and quickstart

Requirements:

Python 3.11+
Ollama reachable from the runtime environment (default http://127.0.0.1:11434)
repo config available at configs/default.yaml (always used; see Config location below)
default dependency set in requirements.txt includes Torch + embedding stack for Phase 3 torch-first operation

Ollama host selection:

Effective precedence for the Ollama base URL is: --ollama-base-url, otherwise LOCAL_AGENT_OLLAMA_BASE_URL, otherwise OLLAMA_BASE_URL, otherwise repo config ollama_base_url, otherwise the built-in default http://127.0.0.1:11434.
LOCAL_AGENT_WORKROOT and --workroot only change the external data root. They do not change which Ollama host is used.
Example local default:

python -m agent doctor --json

Example LAN-hosted Ollama on a second PC:

export LOCAL_AGENT_OLLAMA_BASE_URL=http://192.168.1.25:11434
python -m agent doctor --json
python -m agent chat "ping"
python -m agent ask "Summarize indexed evidence."

Install (editable):

python -m venv .venv
.\.venv\Scripts\activate
pip install -e .

On Linux/macOS, use:

source .venv/bin/activate
pip install -e .

Install from requirements (torch-first default environment):

pip install -r requirements.txt

Lean install (no torch)

If you want a lean environment without Torch, install only core dependencies explicitly instead of requirements.txt, for example:

pip install requests PyYAML
pip install -e .

phase3.embed.provider: torch will fail unless Torch + embedding dependencies are installed.

Optional devcontainer / Codespaces development

This repo now includes a minimal .devcontainer/devcontainer.json for Python 3.11 development. It is intentionally small: it installs the package in editable mode with the optional dev extra so you can run the test suite and general CLI commands, but it does not change the runtime architecture or assume Ollama is running inside the container.

Open the repository in a devcontainer or GitHub Codespace, then use:

python -m unittest discover -s tests -v
python -m agent doctor --no-ollama

The devcontainer installs:

python -m pip install -e ".[dev]"

If you also want the optional Torch embedding stack in the container, install it explicitly:

python -m pip install -e ".[dev,torch-embed]"

Ollama remains external. From a Codespace or any other remote Linux dev environment, point the CLI at a reachable Ollama host with:

export LOCAL_AGENT_OLLAMA_BASE_URL=http://<reachable-host>:11434

Use this for a remote Linux box, a forwarded tunnel, or another host that exposes the Ollama API. Do not assume http://127.0.0.1:11434 inside the devcontainer unless you have explicitly arranged that network path yourself.

Workroot also remains external to the repo. Provide it explicitly instead of storing live data under the checkout:

mkdir -p /workspaces/local-agent-workroot/allowed/corpus
mkdir -p /workspaces/local-agent-workroot/allowed/scratch
mkdir -p /workspaces/local-agent-workroot/runs
export LOCAL_AGENT_WORKROOT=/workspaces/local-agent-workroot

Depending on your environment, that workroot may be a mounted volume, a copied dataset, or a separately provisioned directory. The repo still expects config in configs/default.yaml and data in the external workroot.

Config location (important):

Runtime always loads config from the repo file: local-agent/configs/default.yaml.
Launch directory does not change which config file is selected.
Root semantics: config_root comes from the loaded config path, package_root from installed code location, optional workroot comes from --workroot / LOCAL_AGENT_WORKROOT / config workroot, and security_root is the path anchor used for tool security and run logs.
Effective Ollama endpoint precedence is --ollama-base-url, then LOCAL_AGENT_OLLAMA_BASE_URL, then OLLAMA_BASE_URL, then config ollama_base_url, then the built-in default http://127.0.0.1:11434.

Optional devcontainer:

.devcontainer/devcontainer.json is intentionally minimal.
It mounts and exports only LOCAL_AGENT_WORKROOT; it does not automatically configure an Ollama host.
If your environment exposes the host machine at host.docker.internal, set LOCAL_AGENT_OLLAMA_BASE_URL=http://host.docker.internal:11434 yourself.

Split repo/workroot setup (no workroot config required):

Keep your single live config in repo: local-agent/configs/default.yaml.
The shipped config sets security.allowed_roots to allowed/ and runs/, which are resolved relative to security_root (and typically share the same directory as workroot).
With the default layout in this repo, set workroot (and thus security_root) to the sibling data root ../local-agent-workroot/, so the effective allowed roots become:
- ../local-agent-workroot/allowed/
- ../local-agent-workroot/runs/
Phase 2 source roots stay under that external workroot:
- allowed/corpus/
- allowed/scratch/
Keep security.roots_must_be_within_security_root: true and ensure workroot points at the desired data root (default in this repo: ../local-agent-workroot/). Ensure allowlisted dirs exist (or keep auto_create_allowed_roots: true):

allowed/
runs/

Smoke test:

.venv\\Scripts\\python -m agent chat "ping"
.venv\\Scripts\\python -m agent ask "Read allowed/corpus/secret.md and summarize it."
local-agent ask "Read allowed/corpus/secret.md and summarize it."
local-agent --workroot ../local-agent-workroot ask "Read allowed/corpus/secret.md and summarize it."
local-agent --ollama-base-url http://127.0.0.1:11434 doctor

Remote/devcontainer note:

Remote Ollama changes the trust boundary from loopback-only to “whoever serves that URL”. Use only endpoints you control on localhost, a private LAN, or a private dev environment.
--ollama-base-url / LOCAL_AGENT_OLLAMA_BASE_URL accept only scheme://host[:port]. Paths, query strings, fragments, and embedded credentials are rejected.
Do not expose Ollama (11434) publicly from Codespaces/devcontainers. Keep forwarding opt-in and private; use python -m agent doctor --no-ollama when you only need offline checks.

CLI usage and recipes

Basic:

python -m agent chat "<prompt>"
python -m agent ask "<question>"
python -m agent doctor
python -m agent doctor --no-ollama
python -m agent --ollama-base-url http://host.docker.internal:11434 doctor
local-agent chat "<prompt>"
local-agent ask "<question>"
local-agent doctor
local-agent --workroot ../local-agent-workroot ask "<question>"

ask flags:

--big
--fast
--full

Common patterns:

Summarize a file in allowed/corpus/:

python -m agent ask "Read allowed/corpus/test1a.md and summarize it in 5 bullets."

Disambiguate duplicate names:

python -m agent ask "Read allowed/corpus/test1a.md and summarize it."

Request high-depth synthesis:

python -m agent ask --big "Read allowed/corpus/test1a.md and give a thorough synthesis."

Phase 2 indexing and query

Phase 2 introduces retrieval-ready markdown indexing with a "two sources, one index" model:

sources are document categories (for example corpus and scratch)
index is one unified SQLite DB containing documents, chunks, provenance, and typedness metadata

Important behavior:

ask is now grounded by retrieval evidence (lexical + vector)
no vault note YAML is modified
typed/untyped classification is stored in index metadata, not in note frontmatter
missing metadata is explicit:
- metadata=absent when frontmatter is missing
- metadata=unknown when frontmatter exists but parse/typedness is indeterminate

Commands:

local-agent index
local-agent index --rebuild
local-agent query "coherence" --limit 5
local-agent embed --json
local-agent memory list --json
local-agent doctor
local-agent doctor --no-ollama
local-agent doctor --require-phase3 --json

Phase 3 adds embeddings, retrieval fusion, and durable memory stores with explicit provenance invariants.

Torch-first embedding setup (offline)

Phase 3 now defaults to phase3.embed.provider: torch.

Install optional embedding dependencies:

pip install -e ".[torch-embed]"

No silent downloads are allowed during local-agent embed. You must either:

set phase3.embed.torch.local_model_path to a local model directory, or
pre-populate local cache and set phase3.embed.torch.cache_dir.

If model files are unavailable locally, embed fails closed with PHASE3_EMBED_ERROR.

Phase 3 command reference

Embed corpus chunks from phase2 index:

local-agent embed [--model <id>] [--rebuild] [--batch-size N] [--limit N] [--dry-run] [--no-prune] [--json]

By default, local-agent embed prunes orphan embeddings (rows not present in current phase2 chunk keys). To disable pruning for a run, use local-agent embed --no-prune.

Doctor phase3 readiness (strict mode):

local-agent doctor --require-phase3 --json

Durable memory commands:

local-agent memory add --type preference --source manual --content "..."
local-agent memory list --json
local-agent memory delete <memory_id>
local-agent memory export memory/export.json

Citation hygiene option:

phase3.ask.citation_validation.require_in_snapshot: true enforces that cited chunk keys must come from the retrieved evidence snapshot used for that run.
Recommended for fail-closed behavior: combine with phase3.ask.citation_validation.strict: true.
phase3.ask.citation_validation.heading_match controls heading comparison (exact|prefix|ignore); default prefix avoids brittle failures when citations reference a parent heading.
phase3.ask.citation_validation.normalize_heading: true normalizes whitespace and trailing punctuation (for example H1: Freeform Journaling: and H1: Freeform Journaling).
phase3.ask.evidence.top_n controls the snapshot/prompt evidence bandwidth (default 8).
If strict snapshot checks are too tight, raise top_n modestly (for example 8 -> 12 or 16); tradeoff is larger prompt and larger evidence logging payload before caps.

Configuration reference

Top-level:

model, model_fast, model_big
prefer_fast
big_triggers
max_tokens, max_tokens_big_second
timeout_s, timeout_s_big_second
read_full_on_thorough
max_chars_full_read
full_evidence_triggers
temperature
ollama_base_url
runtime overrides: --ollama-base-url, then LOCAL_AGENT_OLLAMA_BASE_URL, then OLLAMA_BASE_URL
phase2 (index_db_path, sources, chunking.max_chars, chunking.overlap)
phase3
- embeddings_db_path
- embed
  - provider (torch default, ollama optional)
  - model_id
  - preprocess, chunk_preprocess_sig, query_preprocess_sig
  - batch_size
  - torch.local_model_path
  - torch.cache_dir
  - torch.device, torch.dtype
  - torch.batch_size, torch.max_length
  - torch.pooling, torch.normalize
  - torch.trust_remote_code, torch.offline_only
- retrieve (lexical_k, vector_k, vector_fetch_k, rel_path_prefix, fusion)
- ask.evidence (top_n)
- ask.citation_validation (enabled, strict, require_in_snapshot, heading_match, normalize_heading)
- runs (log_evidence_excerpts, max_total_evidence_chars, max_excerpt_chars)
- memory (durable_db_path, enabled)

Security (security:):

allowed_roots
allowed_exts
deny_absolute_paths
deny_hidden_paths
allow_any_path
auto_create_allowed_roots
roots_must_be_within_security_root

Current defaults in this repo are intentionally conservative:

only .md, .txt, .json reads
allowlisted roots limited to allowed/ and runs/ under the active security_root (derived from the configured workroot)
phase2 source roots default to allowed/corpus/ and allowed/scratch/ under that security_root
absolute/hidden path denial enabled

Ollama host selection (local vs remote)

Precedence: --ollama-base-url flag > LOCAL_AGENT_OLLAMA_BASE_URL env > OLLAMA_BASE_URL env > config ollama_base_url > built-in default http://127.0.0.1:11434.
Values must include http:// or https:// and a host (optionally :port); trailing slash is trimmed and invalid values fail fast.
Local default: http://127.0.0.1:11434.
Remote/LAN: set LOCAL_AGENT_OLLAMA_BASE_URL=http://<lan-host>:11434 (or use --ollama-base-url) and the same resolved host is used by Ollama-backed doctor/embed, ask/chat, and retrieval smokes. Offline doctor (--no-ollama) and torch-backed embed do not validate unrelated Ollama URL settings at startup.
Devcontainer/Codespaces: the container does not run Ollama; point LOCAL_AGENT_OLLAMA_BASE_URL at a host you control on the LAN/VPN. Do not expose Ollama to the public internet; keep it firewalled.

Optional devcontainer / Codespaces

A minimal .devcontainer/devcontainer.json is provided for Python 3.11. It mounts a persistent volume at /workspaces/local-agent-workroot and exports LOCAL_AGENT_WORKROOT there (workroot stays outside the repo).
postCreateCommand installs the project in editable mode with dev extras (pip install -e ".[dev]") and creates the expected workroot subdirectories.
Codespaces/devcontainer sessions should point at a remote/LAN Ollama host via --ollama-base-url, LOCAL_AGENT_OLLAMA_BASE_URL, or OLLAMA_BASE_URL; the devcontainer does not configure an Ollama host by default.

Error codes and troubleshooting

Typed failure format:

{"ok": false, "error_code": "...", "error_message": "..."}

Frequent codes and first checks:

CONFIG_ERROR
- verify security.allowed_roots resolve to valid directories
PATH_DENIED
- check extension allowlist, hidden segments, traversal/absolute path use
FILE_NOT_FOUND
- file not found under allowlisted roots
AMBIGUOUS_PATH
- duplicate bare filename; use explicit subpath
EVIDENCE_NOT_ACQUIRED
- model did not produce admissible tool call when evidence required
FILE_EMPTY
- source file empty for summarize request
EVIDENCE_TRUNCATED
- full evidence required but read remained partial
UNEXPECTED_TOOL_CALL_SECOND_PASS
- model violated answer-only phase
SECOND_PASS_FORMAT_VIOLATION
- output still violated format after one retry
DOCTOR_INDEX_DB_MISSING
- preflight found no index DB at configured phase2.index_db_path
- run python -m agent index --rebuild --json
DOCTOR_CHUNKER_SIG_MISMATCH
- preflight found stale chunking fingerprint vs configured phase2 chunking
- run python -m agent index --scheme obsidian_v1 --rebuild --json (or your configured scheme)
DOCTOR_EMBED_OUTDATED_REQUIRE_PHASE3
- preflight found embedding rows that do not match current phase3 model/preprocess/chunk hashes
- run python -m agent embed --json (or --rebuild --json)
DOCTOR_EMBED_RUNTIME_FINGERPRINT_MISMATCH
- embedding provider/runtime fingerprint changed since embeddings were written
- run python -m agent embed --rebuild --json
DOCTOR_PHASE3_EMBEDDINGS_DB_MISSING
- phase3-required preflight found no embeddings DB
- run python -m agent embed --json
DOCTOR_MEMORY_DANGLING_EVIDENCE
- durable memory references chunk keys that are no longer present in phase2 index
- delete or repair dangling memory records
DOCTOR_PHASE3_RETRIEVAL_NOT_READY
- embeddings metadata looked valid but retrieval readiness smoke test failed
- verify embed provider runtime availability, then run python -m agent embed --rebuild --json and re-run doctor

Debug tip:

open latest runs/<run_id>/run.json
inspect resolved_config_path, config_root, package_root, workroot, and security_root first
inspect tool_trace, evidence_status, raw_first, raw_second, and retry fields

Testing and verification

Run unit tests:

python -m unittest discover -s tests -v

Coverage includes:

allowlisted read success
explicit subpath success
explicit subpath security_root anchoring (independent of process CWD)
ambiguous bare filename rejection
extension and hidden path denial (including .env)
traversal/absolute path denial
security_root top-level file rejection when not allowlisted
fail-closed misconfiguration behavior
symlink escape denial (POSIX test)

Manual security checklist:

see SECURITY.md

Doctor tip:

use python -m agent doctor --no-ollama to skip only Ollama network checks.
with phase3.embed.provider: torch, retrieval smoke still runs under --no-ollama.

Release zip

Create a clean, shareable zip (without .venv/, .git/, caches, or run logs):

python scripts/make_release_zip.py
python scripts/make_release_zip.py --dry-run
python scripts/make_release_zip.py --include-workroot

--include-workroot adds only a curated subset (local-agent-workroot top-level boot/docs files plus allowed/.gitkeep and allowed/sample/** when present), and always excludes local-agent-workroot/runs/**.

Optional local cleanup helper:

python scripts/clean_artifacts.py --dry-run
python scripts/clean_artifacts.py

Extending safely

If you add tools:

Add new ToolSpec in agent/tools.py.
Decide if output is admissible evidence.
If admissible, add explicit validator in runner logic.
Keep pass boundaries strict:
- pass 1: tool decision
- pass 2: answer only from provided tool output
Add tests for security and contract behavior.

Practical limitations

Intentional limits:

single model-requested tool call per ask run
bounded read/token budgets
strict formatting and protocol checks can produce "hard fails" rather than graceful-but-risky answers

Non-goals:

broad autonomous task execution
unrestricted filesystem exploration
hidden-file or arbitrary-extension access by default

Design philosophy

This runner is built around three constraints:

Finitude: bounded resources are explicit, not hidden.
Integrity: only typed evidence is admissible for evidence-required asks.
Scope discipline: partial coverage must be disclosed mechanically.

Mental model:

a small "epistemic linter" around local-file Q&A, optimized for correctness and auditability over flexibility.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.devcontainer		.devcontainer
agent		agent
configs		configs
scripts		scripts
tests		tests
.gitignore		.gitignore
OPERATOR_QUICKREF.md		OPERATOR_QUICKREF.md
README.md		README.md
SECURITY.md		SECURITY.md
SKILLS.json		SKILLS.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

local-agent

Table of contents

Operator quick reference

What this project is and is not

Why this exists

End-to-end architecture

Execution flow

chat mode

ask mode

Evidence and admissibility model

Security model for read_text_file

Model routing and budget behavior

Output quality controls

Run logging and auditability

Setup and quickstart

Lean install (no torch)

Optional devcontainer / Codespaces development

CLI usage and recipes

Phase 2 indexing and query

Torch-first embedding setup (offline)

Phase 3 command reference

Configuration reference

Ollama host selection (local vs remote)

Optional devcontainer / Codespaces

Error codes and troubleshooting

Testing and verification

Release zip

Extending safely

Practical limitations

Design philosophy

About

Topics

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`chat` mode

`ask` mode

Security model for `read_text_file`

Packages