Skip to content

duck-lint/local-agent

Repository files navigation

local-agent

A small, security-focused CLI wrapper around Ollama for one job:

read local file evidence -> produce a constrained answer -> log everything needed to audit the run

This project is intentionally narrow. It is not a general autonomous agent framework. It is an evidence-gated runner that tries to make hallucination and unsafe file access expensive or impossible by default.

Quick operator runbook: OPERATOR_QUICKREF.md

Table of contents

  • Operator quick reference
  • What this project is and is not
  • Why this exists
  • End-to-end architecture
  • Execution flow (chat vs ask)
  • Evidence and admissibility model
  • Security model for file reads
  • Model routing and token budget behavior
  • Output quality controls (format + footer)
  • Run logging and auditability
  • Setup and quickstart
  • CLI usage and recipes
  • Phase 2 indexing and query
  • Configuration reference
  • Error codes and troubleshooting
  • Testing and verification
  • Extending safely
  • Practical limitations

Operator quick reference

For day-to-day commands and failure triage, use:

What this project is and is not

What it is:

  • A deterministic orchestrator around one model call (chat) or two model calls (ask).
  • A strict tool-call protocol plus evidence validation.
  • A sandboxed local file reader with typed failure modes.
  • A reproducible run logger (runs/<run_id>/run.json).

What it is not:

  • Not LangChain, not a planner/executor loop, not an unbounded tool agent.
  • Not a generic distributed retrieval framework; retrieval is local, deterministic, and evidence-first.
  • Not a "trust the model by default" UX.

Why this exists

Common failure patterns in LLM + tool systems are well-known:

  • The model claims it read a file that it never read.
  • Tool-call JSON is malformed or mixed with prose and silently ignored.
  • File access is too broad (path traversal, hidden files, absolute path reads).
  • Partial reads are treated as full coverage.
  • Documents contain prompt injection content and the model follows it.

local-agent turns these into explicit contracts:

  • strict tool-call parsing
  • fail-closed evidence gates
  • sandboxed file access policy
  • typed error codes
  • auditable run logs with redaction

End-to-end architecture

Core modules:

  • agent/__main__.py
    • CLI parsing (chat, ask)
    • model selection (fast/big/default)
    • ask state machine
    • evidence validation and fail-closed behavior
    • second-pass output checks and retry logic
    • run logging
  • agent/tools.py
    • ToolSpec, ToolError, TOOLS
    • read_text_file implementation
    • sandbox policy initialization and path validation
  • agent/protocol.py
    • strict + robust tool-call parsing
    • supports prefix JSON tool-call extraction and trailing text capture
  • configs/default.yaml
    • model defaults, token/time budgets, security policy
  • tests/test_tools_security.py
    • sandbox and resolution behavior regression tests
  • SECURITY.md
    • manual security verification checklist

Execution flow

chat mode

Single model call:

  1. Send user prompt.
  2. Print model response.
  3. Log sanitized raw response and metadata.

No tool use, no evidence gates.

ask mode

Two-pass control flow with one model-requested tool call:

  1. Pass 1 (tool-selection prompt):
    • Model must either:
      • answer directly, or
      • emit {"type":"tool_call","name":"...","args":{...}}
  2. Runner parses tool call:
    • strict parse first
    • prefix JSON parse fallback if response starts with tool-call JSON and contains trailing text
  3. If a tool call is emitted:
    • execute tool
    • validate evidence
    • optional auto re-read for full-evidence questions when first read was truncated
  4. Pass 2 (answer-only prompt):
    • tools forbidden
    • output quality checks enforced
  5. If formatting violations:
    • one retry with stricter prompt
    • if still invalid -> typed failure

Important:

  • If question semantics require file evidence and admissible evidence is not acquired, runner returns typed failure and does not ask model to guess.
  • The runner may perform one additional read_text_file call itself for full-evidence rereads. This is not a model-requested second tool choice; it is runner-side evidence completion logic.

Evidence and admissibility model

read_text_file evidence contract:

  • path (absolute)
  • sha256 (hash of full text)
  • chars_full (full length)
  • chars_returned (returned text length)
  • truncated (bool)
  • text (possibly truncated content)

Evidence is rejected when:

  • required fields are missing
  • field types are wrong
  • char counts are inconsistent
  • file is empty for summary-style tasks
  • tool returned error

If evidence is invalid/missing when required:

  • run fails closed
  • returns typed JSON failure
  • no second-pass "best effort" answer

Security model for read_text_file

Security policy is configured at startup from configs/default.yaml.

Controls:

  • allowlisted roots (allowed_roots)
  • allowlisted extensions (allowed_exts)
  • deny absolute/anchored paths (deny_absolute_paths)
  • deny hidden path segments (deny_hidden_paths)
  • optional emergency bypass (allow_any_path, default false)
  • root validation behavior (auto_create_allowed_roots, roots_must_be_within_security_root)

Path request styles:

  1. Bare filename (no slash/backslash)
  • Example: note.md
  • Searched across allowlisted roots in order
  • Exactly one match -> allowed
  • Multiple matches -> AMBIGUOUS_PATH
  • None -> FILE_NOT_FOUND (if search path was valid but file missing)
  1. Explicit subpath (contains slash/backslash)
  • Example: allowed/corpus/project/note.md
  • Treated as security_root-relative (same anchor as workroot when configured)
  • Must still fall within an allowlisted root

Additional protections:

  • lexical containment checks before existence checks
  • strict resolve checks for existing paths (symlink escape defense)
  • allowlisted roots are validated after resolve(strict=True) when containment is enabled
  • extension and hidden-path checks before content read

Model routing and budget behavior

Model selection supports default and split-model operation:

  • Legacy/default: only model configured -> both passes use model
  • Split mode:
    • pass 1 defaults to model_fast when prefer_fast is true
    • pass 2 may upgrade to model_big when question matches big_triggers

CLI overrides:

  • --fast: force fast model for both passes
  • --big: force big model for answer pass
  • --full: force full evidence read attempt when tool used

Budget controls:

  • max_tokens and timeout_s base values
  • max_tokens_big_second and timeout_s_big_second for large answer pass
  • max_chars_full_read cap for runner-side rereads

Output quality controls

Pass 2 includes explicit constraints:

  • no tool calls
  • no tool-call JSON envelopes
  • no markdown tables (bullet/paragraph style preferred)
  • no claims beyond provided evidence
  • include canonical evidence scope footer

Validation checks:

  • table detector heuristic
  • tool-call detector on pass 2 output
  • exact scope-footer last-line check

Retry behavior:

  • one retry for format violations
  • fast-path optimization: if only missing scope footer, append locally and skip retry
  • if retry still violates format -> SECOND_PASS_FORMAT_VIOLATION

Scope footer format:

Scope: full evidence from read_text_file (5159/5159), sha256=14e424b8f1f06f8c2e2f43867f52f37f6ffb95f8434f743f2a94f367a7d2c999

Run logging and auditability

Each invocation writes:

runs/<run_id>/run.json

Key logged fields:

  • run metadata (mode, question/prompt, timings)
  • model selection (raw_first_model, raw_second_model)
  • raw model responses with message.thinking stripped
  • tool trace
  • evidence status (required, status, truncation, char counts)
  • retry metadata (if used)
  • final assistant text

Redaction rule:

  • for file-read results, logs keep metadata + text_preview only (first 800 chars)
  • full file text is not logged by default

Setup and quickstart

Requirements:

  • Python 3.11+
  • Ollama reachable from the runtime environment (default http://127.0.0.1:11434)
  • repo config available at configs/default.yaml (always used; see Config location below)
  • default dependency set in requirements.txt includes Torch + embedding stack for Phase 3 torch-first operation

Ollama host selection:

  • Effective precedence for the Ollama base URL is: --ollama-base-url, otherwise LOCAL_AGENT_OLLAMA_BASE_URL, otherwise OLLAMA_BASE_URL, otherwise repo config ollama_base_url, otherwise the built-in default http://127.0.0.1:11434.
  • LOCAL_AGENT_WORKROOT and --workroot only change the external data root. They do not change which Ollama host is used.
  • Example local default:
python -m agent doctor --json
  • Example LAN-hosted Ollama on a second PC:
export LOCAL_AGENT_OLLAMA_BASE_URL=http://192.168.1.25:11434
python -m agent doctor --json
python -m agent chat "ping"
python -m agent ask "Summarize indexed evidence."

Install (editable):

python -m venv .venv
.\.venv\Scripts\activate
pip install -e .

On Linux/macOS, use:

source .venv/bin/activate
pip install -e .

Install from requirements (torch-first default environment):

pip install -r requirements.txt

Lean install (no torch)

If you want a lean environment without Torch, install only core dependencies explicitly instead of requirements.txt, for example:

pip install requests PyYAML
pip install -e .

phase3.embed.provider: torch will fail unless Torch + embedding dependencies are installed.

Optional devcontainer / Codespaces development

This repo now includes a minimal .devcontainer/devcontainer.json for Python 3.11 development. It is intentionally small: it installs the package in editable mode with the optional dev extra so you can run the test suite and general CLI commands, but it does not change the runtime architecture or assume Ollama is running inside the container.

Open the repository in a devcontainer or GitHub Codespace, then use:

python -m unittest discover -s tests -v
python -m agent doctor --no-ollama

The devcontainer installs:

python -m pip install -e ".[dev]"

If you also want the optional Torch embedding stack in the container, install it explicitly:

python -m pip install -e ".[dev,torch-embed]"

Ollama remains external. From a Codespace or any other remote Linux dev environment, point the CLI at a reachable Ollama host with:

export LOCAL_AGENT_OLLAMA_BASE_URL=http://<reachable-host>:11434

Use this for a remote Linux box, a forwarded tunnel, or another host that exposes the Ollama API. Do not assume http://127.0.0.1:11434 inside the devcontainer unless you have explicitly arranged that network path yourself.

Workroot also remains external to the repo. Provide it explicitly instead of storing live data under the checkout:

mkdir -p /workspaces/local-agent-workroot/allowed/corpus
mkdir -p /workspaces/local-agent-workroot/allowed/scratch
mkdir -p /workspaces/local-agent-workroot/runs
export LOCAL_AGENT_WORKROOT=/workspaces/local-agent-workroot

Depending on your environment, that workroot may be a mounted volume, a copied dataset, or a separately provisioned directory. The repo still expects config in configs/default.yaml and data in the external workroot.

Config location (important):

  • Runtime always loads config from the repo file: local-agent/configs/default.yaml.
  • Launch directory does not change which config file is selected.
  • Root semantics: config_root comes from the loaded config path, package_root from installed code location, optional workroot comes from --workroot / LOCAL_AGENT_WORKROOT / config workroot, and security_root is the path anchor used for tool security and run logs.
  • Effective Ollama endpoint precedence is --ollama-base-url, then LOCAL_AGENT_OLLAMA_BASE_URL, then OLLAMA_BASE_URL, then config ollama_base_url, then the built-in default http://127.0.0.1:11434.

Optional devcontainer:

  • .devcontainer/devcontainer.json is intentionally minimal.
  • It mounts and exports only LOCAL_AGENT_WORKROOT; it does not automatically configure an Ollama host.
  • If your environment exposes the host machine at host.docker.internal, set LOCAL_AGENT_OLLAMA_BASE_URL=http://host.docker.internal:11434 yourself.

Split repo/workroot setup (no workroot config required):

  • Keep your single live config in repo: local-agent/configs/default.yaml.
  • The shipped config sets security.allowed_roots to allowed/ and runs/, which are resolved relative to security_root (and typically share the same directory as workroot).
  • With the default layout in this repo, set workroot (and thus security_root) to the sibling data root ../local-agent-workroot/, so the effective allowed roots become:
    • ../local-agent-workroot/allowed/
    • ../local-agent-workroot/runs/
  • Phase 2 source roots stay under that external workroot:
    • allowed/corpus/
    • allowed/scratch/
  • Keep security.roots_must_be_within_security_root: true and ensure workroot points at the desired data root (default in this repo: ../local-agent-workroot/). Ensure allowlisted dirs exist (or keep auto_create_allowed_roots: true):
allowed/
runs/

Smoke test:

.venv\\Scripts\\python -m agent chat "ping"
.venv\\Scripts\\python -m agent ask "Read allowed/corpus/secret.md and summarize it."
local-agent ask "Read allowed/corpus/secret.md and summarize it."
local-agent --workroot ../local-agent-workroot ask "Read allowed/corpus/secret.md and summarize it."
local-agent --ollama-base-url http://127.0.0.1:11434 doctor

Remote/devcontainer note:

  • Remote Ollama changes the trust boundary from loopback-only to “whoever serves that URL”. Use only endpoints you control on localhost, a private LAN, or a private dev environment.
  • --ollama-base-url / LOCAL_AGENT_OLLAMA_BASE_URL accept only scheme://host[:port]. Paths, query strings, fragments, and embedded credentials are rejected.
  • Do not expose Ollama (11434) publicly from Codespaces/devcontainers. Keep forwarding opt-in and private; use python -m agent doctor --no-ollama when you only need offline checks.

CLI usage and recipes

Basic:

python -m agent chat "<prompt>"
python -m agent ask "<question>"
python -m agent doctor
python -m agent doctor --no-ollama
python -m agent --ollama-base-url http://host.docker.internal:11434 doctor
local-agent chat "<prompt>"
local-agent ask "<question>"
local-agent doctor
local-agent --workroot ../local-agent-workroot ask "<question>"

ask flags:

  • --big
  • --fast
  • --full

Common patterns:

  1. Summarize a file in allowed/corpus/:
python -m agent ask "Read allowed/corpus/test1a.md and summarize it in 5 bullets."
  1. Disambiguate duplicate names:
python -m agent ask "Read allowed/corpus/test1a.md and summarize it."
  1. Request high-depth synthesis:
python -m agent ask --big "Read allowed/corpus/test1a.md and give a thorough synthesis."

Phase 2 indexing and query

Phase 2 introduces retrieval-ready markdown indexing with a "two sources, one index" model:

  • sources are document categories (for example corpus and scratch)
  • index is one unified SQLite DB containing documents, chunks, provenance, and typedness metadata

Important behavior:

  • ask is now grounded by retrieval evidence (lexical + vector)
  • no vault note YAML is modified
  • typed/untyped classification is stored in index metadata, not in note frontmatter
  • missing metadata is explicit:
    • metadata=absent when frontmatter is missing
    • metadata=unknown when frontmatter exists but parse/typedness is indeterminate

Commands:

local-agent index
local-agent index --rebuild
local-agent query "coherence" --limit 5
local-agent embed --json
local-agent memory list --json
local-agent doctor
local-agent doctor --no-ollama
local-agent doctor --require-phase3 --json

Phase 3 adds embeddings, retrieval fusion, and durable memory stores with explicit provenance invariants.

Torch-first embedding setup (offline)

Phase 3 now defaults to phase3.embed.provider: torch.

Install optional embedding dependencies:

pip install -e ".[torch-embed]"

No silent downloads are allowed during local-agent embed. You must either:

  • set phase3.embed.torch.local_model_path to a local model directory, or
  • pre-populate local cache and set phase3.embed.torch.cache_dir.

If model files are unavailable locally, embed fails closed with PHASE3_EMBED_ERROR.

Phase 3 command reference

Embed corpus chunks from phase2 index:

local-agent embed [--model <id>] [--rebuild] [--batch-size N] [--limit N] [--dry-run] [--no-prune] [--json]

By default, local-agent embed prunes orphan embeddings (rows not present in current phase2 chunk keys). To disable pruning for a run, use local-agent embed --no-prune.

Doctor phase3 readiness (strict mode):

local-agent doctor --require-phase3 --json

Durable memory commands:

local-agent memory add --type preference --source manual --content "..."
local-agent memory list --json
local-agent memory delete <memory_id>
local-agent memory export memory/export.json

Citation hygiene option:

  • phase3.ask.citation_validation.require_in_snapshot: true enforces that cited chunk keys must come from the retrieved evidence snapshot used for that run.
  • Recommended for fail-closed behavior: combine with phase3.ask.citation_validation.strict: true.
  • phase3.ask.citation_validation.heading_match controls heading comparison (exact|prefix|ignore); default prefix avoids brittle failures when citations reference a parent heading.
  • phase3.ask.citation_validation.normalize_heading: true normalizes whitespace and trailing punctuation (for example H1: Freeform Journaling: and H1: Freeform Journaling).
  • phase3.ask.evidence.top_n controls the snapshot/prompt evidence bandwidth (default 8).
  • If strict snapshot checks are too tight, raise top_n modestly (for example 8 -> 12 or 16); tradeoff is larger prompt and larger evidence logging payload before caps.

Configuration reference

Top-level:

  • model, model_fast, model_big
  • prefer_fast
  • big_triggers
  • max_tokens, max_tokens_big_second
  • timeout_s, timeout_s_big_second
  • read_full_on_thorough
  • max_chars_full_read
  • full_evidence_triggers
  • temperature
  • ollama_base_url
  • runtime overrides: --ollama-base-url, then LOCAL_AGENT_OLLAMA_BASE_URL, then OLLAMA_BASE_URL
  • phase2 (index_db_path, sources, chunking.max_chars, chunking.overlap)
  • phase3
    • embeddings_db_path
    • embed
      • provider (torch default, ollama optional)
      • model_id
      • preprocess, chunk_preprocess_sig, query_preprocess_sig
      • batch_size
      • torch.local_model_path
      • torch.cache_dir
      • torch.device, torch.dtype
      • torch.batch_size, torch.max_length
      • torch.pooling, torch.normalize
      • torch.trust_remote_code, torch.offline_only
    • retrieve (lexical_k, vector_k, vector_fetch_k, rel_path_prefix, fusion)
    • ask.evidence (top_n)
    • ask.citation_validation (enabled, strict, require_in_snapshot, heading_match, normalize_heading)
    • runs (log_evidence_excerpts, max_total_evidence_chars, max_excerpt_chars)
    • memory (durable_db_path, enabled)

Security (security:):

  • allowed_roots
  • allowed_exts
  • deny_absolute_paths
  • deny_hidden_paths
  • allow_any_path
  • auto_create_allowed_roots
  • roots_must_be_within_security_root

Current defaults in this repo are intentionally conservative:

  • only .md, .txt, .json reads
  • allowlisted roots limited to allowed/ and runs/ under the active security_root (derived from the configured workroot)
  • phase2 source roots default to allowed/corpus/ and allowed/scratch/ under that security_root
  • absolute/hidden path denial enabled

Ollama host selection (local vs remote)

  • Precedence: --ollama-base-url flag > LOCAL_AGENT_OLLAMA_BASE_URL env > OLLAMA_BASE_URL env > config ollama_base_url > built-in default http://127.0.0.1:11434.
  • Values must include http:// or https:// and a host (optionally :port); trailing slash is trimmed and invalid values fail fast.
  • Local default: http://127.0.0.1:11434.
  • Remote/LAN: set LOCAL_AGENT_OLLAMA_BASE_URL=http://<lan-host>:11434 (or use --ollama-base-url) and the same resolved host is used by Ollama-backed doctor/embed, ask/chat, and retrieval smokes. Offline doctor (--no-ollama) and torch-backed embed do not validate unrelated Ollama URL settings at startup.
  • Devcontainer/Codespaces: the container does not run Ollama; point LOCAL_AGENT_OLLAMA_BASE_URL at a host you control on the LAN/VPN. Do not expose Ollama to the public internet; keep it firewalled.

Optional devcontainer / Codespaces

  • A minimal .devcontainer/devcontainer.json is provided for Python 3.11. It mounts a persistent volume at /workspaces/local-agent-workroot and exports LOCAL_AGENT_WORKROOT there (workroot stays outside the repo).
  • postCreateCommand installs the project in editable mode with dev extras (pip install -e ".[dev]") and creates the expected workroot subdirectories.
  • Codespaces/devcontainer sessions should point at a remote/LAN Ollama host via --ollama-base-url, LOCAL_AGENT_OLLAMA_BASE_URL, or OLLAMA_BASE_URL; the devcontainer does not configure an Ollama host by default.

Error codes and troubleshooting

Typed failure format:

{"ok": false, "error_code": "...", "error_message": "..."}

Frequent codes and first checks:

  • CONFIG_ERROR
    • verify security.allowed_roots resolve to valid directories
  • PATH_DENIED
    • check extension allowlist, hidden segments, traversal/absolute path use
  • FILE_NOT_FOUND
    • file not found under allowlisted roots
  • AMBIGUOUS_PATH
    • duplicate bare filename; use explicit subpath
  • EVIDENCE_NOT_ACQUIRED
    • model did not produce admissible tool call when evidence required
  • FILE_EMPTY
    • source file empty for summarize request
  • EVIDENCE_TRUNCATED
    • full evidence required but read remained partial
  • UNEXPECTED_TOOL_CALL_SECOND_PASS
    • model violated answer-only phase
  • SECOND_PASS_FORMAT_VIOLATION
    • output still violated format after one retry
  • DOCTOR_INDEX_DB_MISSING
    • preflight found no index DB at configured phase2.index_db_path
    • run python -m agent index --rebuild --json
  • DOCTOR_CHUNKER_SIG_MISMATCH
    • preflight found stale chunking fingerprint vs configured phase2 chunking
    • run python -m agent index --scheme obsidian_v1 --rebuild --json (or your configured scheme)
  • DOCTOR_EMBED_OUTDATED_REQUIRE_PHASE3
    • preflight found embedding rows that do not match current phase3 model/preprocess/chunk hashes
    • run python -m agent embed --json (or --rebuild --json)
  • DOCTOR_EMBED_RUNTIME_FINGERPRINT_MISMATCH
    • embedding provider/runtime fingerprint changed since embeddings were written
    • run python -m agent embed --rebuild --json
  • DOCTOR_PHASE3_EMBEDDINGS_DB_MISSING
    • phase3-required preflight found no embeddings DB
    • run python -m agent embed --json
  • DOCTOR_MEMORY_DANGLING_EVIDENCE
    • durable memory references chunk keys that are no longer present in phase2 index
    • delete or repair dangling memory records
  • DOCTOR_PHASE3_RETRIEVAL_NOT_READY
    • embeddings metadata looked valid but retrieval readiness smoke test failed
    • verify embed provider runtime availability, then run python -m agent embed --rebuild --json and re-run doctor

Debug tip:

  • open latest runs/<run_id>/run.json
  • inspect resolved_config_path, config_root, package_root, workroot, and security_root first
  • inspect tool_trace, evidence_status, raw_first, raw_second, and retry fields

Testing and verification

Run unit tests:

python -m unittest discover -s tests -v

Coverage includes:

  • allowlisted read success
  • explicit subpath success
  • explicit subpath security_root anchoring (independent of process CWD)
  • ambiguous bare filename rejection
  • extension and hidden path denial (including .env)
  • traversal/absolute path denial
  • security_root top-level file rejection when not allowlisted
  • fail-closed misconfiguration behavior
  • symlink escape denial (POSIX test)

Manual security checklist:

  • see SECURITY.md

Doctor tip:

  • use python -m agent doctor --no-ollama to skip only Ollama network checks.
  • with phase3.embed.provider: torch, retrieval smoke still runs under --no-ollama.

Release zip

Create a clean, shareable zip (without .venv/, .git/, caches, or run logs):

python scripts/make_release_zip.py
python scripts/make_release_zip.py --dry-run
python scripts/make_release_zip.py --include-workroot

--include-workroot adds only a curated subset (local-agent-workroot top-level boot/docs files plus allowed/.gitkeep and allowed/sample/** when present), and always excludes local-agent-workroot/runs/**.

Optional local cleanup helper:

python scripts/clean_artifacts.py --dry-run
python scripts/clean_artifacts.py

Extending safely

If you add tools:

  1. Add new ToolSpec in agent/tools.py.
  2. Decide if output is admissible evidence.
  3. If admissible, add explicit validator in runner logic.
  4. Keep pass boundaries strict:
    • pass 1: tool decision
    • pass 2: answer only from provided tool output
  5. Add tests for security and contract behavior.

Practical limitations

Intentional limits:

  • single model-requested tool call per ask run
  • bounded read/token budgets
  • strict formatting and protocol checks can produce "hard fails" rather than graceful-but-risky answers

Non-goals:

  • broad autonomous task execution
  • unrestricted filesystem exploration
  • hidden-file or arbitrary-extension access by default

Design philosophy

This runner is built around three constraints:

  • Finitude: bounded resources are explicit, not hidden.
  • Integrity: only typed evidence is admissible for evidence-required asks.
  • Scope discipline: partial coverage must be disclosed mechanically.

Mental model:

  • a small "epistemic linter" around local-file Q&A, optimized for correctness and auditability over flexibility.

Releases

No releases published

Packages

 
 
 

Contributors

Languages