8Dionysus · 8Dionysus · Mar 30, 2026 · Mar 30, 2026
diff --git a/README.md b/README.md
@@ -91,8 +91,8 @@ For the shortest next route by intent:
 - if you need playbook meaning, activation doctrine, or authored execution bundles, go to [`aoa-playbooks`](https://github.com/8Dionysus/aoa-playbooks)
 - if you need the Windows host and WSL bridge workflow, read [docs/WINDOWS_BRIDGE](docs/WINDOWS_BRIDGE.md), [docs/WINDOWS_SETUP](docs/WINDOWS_SETUP.md), and [docs/WINDOWS_PERFORMANCE](docs/WINDOWS_PERFORMANCE.md)
 - if you need runtime benchmark ownership, storage, and manifest rules, read [docs/RUNTIME_BENCH_POLICY](docs/RUNTIME_BENCH_POLICY.md)
-- if you need the bounded llama.cpp A/B runtime pilot next to the validated Ollama path, read [docs/LLAMACPP_PILOT](docs/LLAMACPP_PILOT.md)
-- if you need bounded local-model trial contracts, W4 supervised edits, or the promoted W5/W6 local-worker path, read [docs/LOCAL_AI_TRIALS](docs/LOCAL_AI_TRIALS.md)
+- if you need the promoted local Qwen runtime path on `5403`, the retained Ollama control path on `5401`, or the bounded `llama.cpp` comparison and promotion lineage, read [docs/LLAMACPP_PILOT](docs/LLAMACPP_PILOT.md)
+- if you need bounded local-model trial contracts, the adopted LangGraph execution posture, or the promoted W5/W6 local-worker path, read [docs/LOCAL_AI_TRIALS](docs/LOCAL_AI_TRIALS.md)
 - if you need normative host posture or machine-readable host-facts capture, read [docs/REFERENCE_PLATFORM](docs/REFERENCE_PLATFORM.md) and [docs/REFERENCE_PLATFORM_SPEC](docs/REFERENCE_PLATFORM_SPEC.md)
 - if you need to tune the runtime to the current machine, confirm driver freshness, or decide which preset the host should prefer, read [docs/MACHINE_FIT_POLICY](docs/MACHINE_FIT_POLICY.md)
 - if you need a compact record of platform-specific quirks, adaptations, and portability notes, read [docs/PLATFORM_ADAPTATION_POLICY](docs/PLATFORM_ADAPTATION_POLICY.md)
@@ -185,6 +185,9 @@ The repository now includes:
 ## Current status
 
 `abyss-stack` is now a live multi-service runtime with stateful storage, local and Intel-aware inference paths, monitoring, host-facts capture, machine-fit capture, platform-adaptation logging, and landed federation advisory seams for sibling AoA repositories.
+The current bounded local-worker posture is `llama.cpp`-first on `5403`, with Ollama retained on `5401` as the control and rollback path.
+`LangGraph` is now the adopted execution layer for bounded long-horizon and autonomy-focused local-worker flows, while the earlier W0-W4 runner lineage remains available as the historical baseline.
+The current Intel embeddings posture still uses OVMS; any move from OpenVINO serving to OpenVINO GenAI should be treated as a separate reviewed stack change.
 The first live consumer step has now landed in `langchain-api` through opt-in `POST /run/federated`, which can consume advisory playbook and memo seams without changing the default `POST /run` path.
 The next large step is no longer bootstrap or mirror landing, or whether the live runtime should consume those seams at all; it is deciding how broadly and how deeply the runtime loop should rely on those already-landed seams.
 

diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
@@ -18,18 +18,21 @@ Persistent state and retrieval substrate:
 
 Workflow coordination and pipeline surfaces:
 - n8n
+- LangGraph for bounded local-worker execution, pause/resume, and milestone-gated recovery flows
 
 ### 3. Inference layer
 
 Local and accelerator-aware model serving:
-- Ollama
-- OVMS and Intel-oriented model serving
+- llama.cpp as the promoted local GGUF-serving path for bounded local-worker flows
+- Ollama as the retained control and rollback path
+- OVMS as the current Intel/OpenVINO-oriented serving path for embeddings
+- a future OpenVINO GenAI migration as a separate stack change, not part of the current promoted path
 
 ### 4. Gateway and agent API layer
 
 Model routing and agent-facing runtime APIs:
 - LiteLLM
-- LangChain API or successor service modules
+- LangChain API service modules, including the control-path `langchain-api` surface and the promoted local-worker `langchain-api-llamacpp` surface
 
 This layer may also host the runtime return wrapper that rebuilds context from a last valid anchor rather than continuing under drift.
 

diff --git a/docs/LANGGRAPH_PILOT.md b/docs/LANGGRAPH_PILOT.md
@@ -2,18 +2,18 @@
 
 ## Purpose
 
-This document defines the bounded LangGraph sidecar pilot for `abyss-stack`.
+This document defines the bounded LangGraph sidecar pilot for `abyss-stack` and records the execution-layer decision that came out of it.
 
 It is not a new service and not a migration of `aoa-local-ai-trials`.
-It is a comparison layer for one W4-shaped supervised edit flow.
+It began as a comparison layer for one W4-shaped supervised edit flow and now serves as the origin surface for the adopted bounded execution layer used by `W5` and `W6`.
 
 ## Current pilot
 
 Program id:
 - `langgraph-sidecar-pilot-v1`
 - `langgraph-sidecar-llamacpp-v1` for the disposable backend-promotion fixture gate
 
-Current runtime path:
+Current origin runtime path:
 - `intel-full -> langchain-api /run -> ollama-native`
 
 Current cases:
@@ -63,6 +63,11 @@ The sidecar pilot does not:
 - replace `langchain-api /run`
 - widen W4 into autonomous long-horizon execution
 
+Current adopted role:
+- `LangGraph` is the preferred bounded execution layer for `W5`, `W6`, and follow-on local-worker flows
+- `aoa-local-ai-trials` remains the historical baseline for `W0` through `W4`
+- `aoa-langgraph-pilot` remains the W4-shaped comparison and fixture surface
+
 ## Artifacts
 
 Runtime truth:
@@ -93,4 +98,5 @@ The sidecar should answer a narrow question:
 - does LangGraph improve pause/resume and recovery clarity for a bounded supervised edit flow
 - without reducing W4 safety, scope discipline, or reportability
 
-Until that answer is positive, the existing runner remains the execution baseline.
+That answer is now positive for bounded local-worker flows.
+Keep the sidecar pilot as the comparison and origin surface, and use the `W5` and `W6` contracts for the adopted execution posture.
diff --git a/docs/LLAMACPP_PILOT.md b/docs/LLAMACPP_PILOT.md
@@ -2,12 +2,14 @@
 
 ## Purpose
 
-This document defines the bounded `llama.cpp` sidecar pilot for `abyss-stack`.
+This document defines the bounded `llama.cpp` sidecar pilot for `abyss-stack` and records the promoted runtime posture that came out of it.
 
-It exists to answer a narrow question:
+The pilot originally existed to answer a narrow question:
 
 **does a `llama.cpp` sidecar improve the local Qwen runtime posture on this machine without replacing the validated canonical Ollama path yet?**
 
+That question is now answered positively for the current bounded local-worker path.
+
 ## Boundary
 
 The pilot is:
@@ -19,15 +21,22 @@ The pilot is:
 The pilot is not:
 - a silent replacement for the canonical local runtime
 - a proof-layer quality verdict
-- a claim that `llama.cpp` is already promoted into machine-fit canon
+- a claim that every service in the stack should immediately move off the retained control path
+
+## Current promoted posture
+
+The current preferred bounded local-worker path is:
 
-## Current default posture
+`intel-full -> langchain-api-llamacpp /run -> llama.cpp + route-api`
 
-The validated canonical path remains:
+The retained control and rollback path remains:
 
-`intel-full -> langchain-api /run -> litellm/ollama + route-api`
+`intel-full -> langchain-api /run -> ollama-native + route-api`
 
-The `llama.cpp` pilot is intentionally separate from that path until a reviewed promotion decision says otherwise.
+The pilot script remains intentionally useful after promotion:
+- to refresh bounded backend comparisons
+- to verify the promoted sidecar posture
+- to keep the control path honest without making it the default worker substrate
 
 ## What the pilot reuses
 
@@ -44,19 +53,19 @@ This keeps the pilot honest:
 - same quantized resident artifact
 - different serving runtime
 
-## Pilot services
+## Promoted and control services
 
-When the pilot is active, it adds two localhost-only services:
+When the promoted path is active, it uses two localhost-only services:
 
 - `llama-cpp` -> `http://127.0.0.1:11435`
 - `langchain-api-llamacpp` -> `http://127.0.0.1:5403/health`
 
-The canonical services stay in place:
+The control-path services stay in place:
 
 - `ollama` -> `http://127.0.0.1:11434`
 - `langchain-api` -> `http://127.0.0.1:5401/health`
 
-That separation preserves honest A/B comparison.
+That separation preserves honest A/B comparison, rollback, and future challenger evaluation.
 
 ## Operator commands
 
@@ -189,11 +198,12 @@ Promotion packets stay runtime-local too and capture:
 
 A green or promising pilot does not automatically change the machine-fit record.
 
-Promotion requires:
+Promotion required:
 - reviewed comparison output
 - a clear recommendation that the sidecar is better for the intended bounded path
 - an explicit update to machine-fit and the validated runtime docs
 
-Until then:
-- Ollama remains the validated preferred path
-- `llama.cpp` remains an optional pilot substrate
+Current result:
+- `llama.cpp` is the preferred bounded local-worker path
+- Ollama remains the validated control and rollback path
+- any OpenVINO-side shift to OpenVINO GenAI should be reviewed separately from the `llama.cpp` promotion decision
diff --git a/docs/LOCAL_AI_TRIALS.md b/docs/LOCAL_AI_TRIALS.md
@@ -31,7 +31,7 @@ Control baseline:
 Promoted bounded-worker path:
 - runtime path: `http://127.0.0.1:5403/run`
 - backend: `llama.cpp`
-- orchestration: `LangGraph` for `W5` and `W6`
+- orchestration: `LangGraph` for `W5`, `W6`, and the current bounded local-worker posture
 
 Durable program roots now in use:
 - `qwen-local-pilot-v1`
@@ -126,11 +126,9 @@ What it does not do:
 - it does not upgrade runtime success into portable proof wording
 - it does not collapse `W4` into a silent monolithic mutator
 
-## LangGraph sidecar pilot
+## LangGraph sidecar origin and promoted role
 
-The current trial runner remains the execution baseline.
-
-An optional comparison layer now also exists:
+The original comparison layer still exists:
 
 ```bash
 scripts/aoa-langgraph-pilot materialize
@@ -146,6 +144,12 @@ scripts/aoa-langgraph-pilot --url http://127.0.0.1:5403/run --program-id langgra
 
 Use [LANGGRAPH_PILOT](LANGGRAPH_PILOT.md) for the sidecar contract.
 
+That sidecar surface established the now-adopted execution posture:
+
+- `aoa-local-ai-trials` remains the historical baseline for `W0` through `W4`
+- `LangGraph` is now the primary orchestration layer for `W5`, `W6`, and the current bounded local-worker path
+- `aoa-langgraph-pilot` remains the W4-shaped comparison and fixture surface rather than the full execution baseline
+
 ## W5 long-horizon pilot
 
 The next bounded scenario layer lives beside the earlier waves:

diff --git a/docs/MACHINE_FIT_POLICY.md b/docs/MACHINE_FIT_POLICY.md
@@ -29,7 +29,7 @@ Use this layer for:
 - preferred preset or profile selection for the current host
 - current driver posture for visible accelerators
 - package freshness for the host packages that matter to the runtime path
-- validated local runtime settings such as bounded Ollama thread or batch posture
+- validated local runtime settings such as bounded `llama.cpp` serving posture or control-path fallback settings
 - warnings about noisy host envelopes that can distort latency-sensitive work
 - compact refs to host facts, benchmark evidence, and adaptation records
 
@@ -140,5 +140,10 @@ scripts/aoa-machine-fit \
 
 It does not own the global meaning of sibling AoA layers, and it does not replace runtime benchmarks or proof artifacts.
 
-An optional runtime sidecar pilot, such as a bounded `llama.cpp` comparison, does not change the preferred machine-fit posture by itself.
-Only a reviewed promotion decision should move a pilot path into the validated preferred runtime path.
+A bounded runtime comparison by itself does not change the preferred machine-fit posture.
+Only a reviewed promotion decision should move a candidate path into the validated preferred runtime path.
+
+The current reviewed posture is:
+- `llama.cpp` as the preferred bounded local-worker path on `5403`
+- Ollama as the retained control and rollback path on `5401`
+- the Intel embeddings path still on OVMS, with any OpenVINO GenAI migration handled as a separate reviewed change
diff --git a/docs/RUNTIME_BENCH_POLICY.md b/docs/RUNTIME_BENCH_POLICY.md
@@ -125,6 +125,9 @@ scripts/aoa-qwen-bench --preset intel-full
 This runner stays on the intended `langchain-api /run` path and writes machine-local evidence under `${AOA_STACK_ROOT}/Logs/runtime-benchmarks/runs/`.
 It performs one uncounted warmup call per case before measured repeats so warm-latency reads stay warm by definition instead of by accident.
 
+The default helper posture now targets the promoted local-worker path on `5403`.
+Use an explicit `--url`, `--backend-label`, `--runtime-variant`, and `--target-label` when you want to refresh the retained Ollama control path on `5401`.
+
 Refresh the durable catalog after new runs:
 
 ```bash
@@ -175,14 +178,14 @@ That helper may reuse runtime benchmark artifacts as evidence inside case packet
 
 ## Optional backend-parity pilot
 
-For a bounded `llama.cpp` versus Ollama comparison on the same host and the same `langchain-api /run` contract, use:
+For a bounded refresh of the promoted `llama.cpp` path against the retained Ollama control path on the same host and the same `langchain-api /run` contract, use:
 
 ```bash
 scripts/aoa-llamacpp-pilot run --preset intel-full
 ```
 
-That pilot runs a fresh Ollama baseline on `5401`, a fresh `llama.cpp` sidecar bench on `5403`, and writes a comparison packet under `${AOA_STACK_ROOT}/Logs/runtime-benchmarks/comparisons/`.
-It is a runtime-parity aid, not a promotion decision by itself.
+That pilot runs a fresh Ollama control bench on `5401`, a fresh `llama.cpp` sidecar bench on `5403`, and writes a comparison packet under `${AOA_STACK_ROOT}/Logs/runtime-benchmarks/comparisons/`.
+It remains a runtime-parity and challenger-evaluation aid even after the current `llama.cpp` promotion.
 
 Use the catalog layer to answer:
 - what the latest baseline run was for a target label

diff --git a/docs/SERVICE_CATALOG.md b/docs/SERVICE_CATALOG.md
@@ -15,24 +15,26 @@ This file maps the first migrated runtime modules to their intended services.
 
 ## `30-local-inference.yml`
 
-- `ollama` — local LLM and embedding serving
+- `ollama` — retained local control and rollback serving surface for Qwen chat and fallback embeddings
 
 ## `31-intel-inference.yml`
 
-- `ovms` — Intel and OpenVINO oriented model serving
+- `ovms` — current Intel and OpenVINO oriented model serving surface for embeddings
+- any migration from OVMS/OpenVINO serving to OpenVINO GenAI is a separate reviewed stack change
 
 ## `32-llamacpp-inference.yml`
 
-- `llama-cpp` — optional OpenAI-compatible GGUF serving sidecar for bounded backend-parity work
-- reuses a resolved local GGUF model file rather than changing the canonical validated Ollama path
+- `llama-cpp` — promoted OpenAI-compatible GGUF serving surface for bounded local-worker flows
+- reuses a resolved local GGUF model file and now backs the preferred local Qwen worker path on `5403`
+- keeps Ollama in place as the control and rollback path
 
 ## `40-llm-gateway.yml`
 
 - `litellm` — model gateway and routing facade
 
 ## `41-agent-api.yml`
 
-- `langchain-api` — base agent-facing runtime API
+- `langchain-api` — base control-path agent-facing runtime API on `5401`
 - default embeddings path — Ollama-first
 - may consume a public-safe return policy file and emit runtime return events
 - now also exposes opt-in `POST /run/federated` for live advisory consumption of `route-api` playbook and memo seams
@@ -45,9 +47,16 @@ This file maps the first migrated runtime modules to their intended services.
 
 ## `44-llamacpp-agent-sidecar.yml`
 
-- `langchain-api-llamacpp` — optional sidecar agent API bound to a `llama.cpp` backend on a separate host port
-- preserves the canonical `langchain-api` service and `5401` path for honest A/B comparison
-- keeps embeddings on OVMS for Intel-aware pilot runs
+- `langchain-api-llamacpp` — promoted bounded local-worker API bound to a `llama.cpp` backend on `5403`
+- is the preferred local Qwen worker path for the current promoted `W5/W6` substrate
+- preserves the base `langchain-api` service and `5401` path as the control and rollback surface
+- keeps embeddings on OVMS for the current Intel-aware posture
+
+## Execution layer
+
+- `LangGraph` is now the adopted bounded execution layer for the `W5` and `W6` local-worker flows
+- it remains a CLI-side execution surface rather than a long-running network service
+- the original `aoa-langgraph-pilot` remains useful as the W4-shaped comparison and fixture surface
 
 ## `43-federation-router.yml`
 

diff --git a/docs/machine-fit/machine-fit.public.json.example b/docs/machine-fit/machine-fit.public.json.example
@@ -126,12 +126,19 @@
       "tools",
       "observability"
     ],
-    "preferred_runtime_path": "intel-full -> langchain-api /run -> litellm/ollama + route-api",
-    "validated_acceleration_posture": "OVMS embeddings on Intel GPU; Qwen chat via Ollama; Intel NPU is visible but not yet part of the validated canonical path.",
+    "preferred_runtime_path": "intel-full -> langchain-api-llamacpp /run -> llama.cpp + route-api",
+    "validated_acceleration_posture": "OVMS embeddings on Intel GPU; Qwen chat on the promoted llama.cpp path; Ollama remains the validated control path; Intel NPU is visible but not yet part of the validated canonical path.",
     "validated_settings": {
-      "LC_OLLAMA_NUM_THREAD": "6",
-      "LC_OLLAMA_NUM_BATCH": "32",
-      "LC_OLLAMA_THINK": "false"
+      "AOA_LLAMACPP_DEVICE": "none",
+      "AOA_LLAMACPP_NO_OP_OFFLOAD": "1",
+      "AOA_LLAMACPP_THREADS": "4",
+      "AOA_LLAMACPP_THREADS_BATCH": "4",
+      "AOA_LLAMACPP_THREADS_HTTP": "2",
+      "AOA_LLAMACPP_CTX_SIZE": "4096",
+      "AOA_LLAMACPP_BATCH_SIZE": "512",
+      "AOA_LLAMACPP_UBATCH_SIZE": "128",
+      "AOA_LLAMACPP_REASONING": "off",
+      "AOA_LLAMACPP_THINK": "none"
     },
     "recommended_overlays": [],
     "current_overlays": [],
@@ -148,7 +155,7 @@
   },
   "fit_verdict": {
     "status": "qualified",
-    "summary": "Preferred preset is intel-full. Qwen chat should stay on langchain-api /run through the validated local path. Relevant host packages are current in the configured Fedora repositories.",
+    "summary": "Preferred preset is intel-full. Qwen chat should stay on the promoted llama.cpp path, with Ollama retained as the control path. Relevant host packages are current in the configured Fedora repositories.",
     "next_actions": [
       "Run scripts/aoa-doctor --preset intel-full before launch.",
       "Refresh host facts when the host or kernel changes.",
@@ -158,7 +165,7 @@
       "kernel update",
       "linux-firmware update",
       "mesa or Intel runtime update",
-      "Ollama or langchain-api runtime change",
+      "llama.cpp, Ollama control-path, or langchain-api runtime change",
       "host load envelope change before latency-sensitive trials"
     ]
   },