SLM-as-Cerebellum for LLM Policy Enforcement — A biologically-inspired architecture where a Small Language Model acts as an inhibitory antagonist to Large Language Models, preventing policy violations through consensus-based gating.
LLMs are trained to be helpful, which makes them systematically violate project constraints.
When given explicit technology policies (e.g., "NEVER use TypeScript"), LLMs will:
-
Read and acknowledge the constraint
-
Generate compliant-sounding justification
-
Violate the constraint anyway — because TypeScript is common in training data, and the "helpfulness drive" overrides textual rules
Documentation-based enforcement fails because LLMs "engage with" policies rather than obeying them. There’s no mechanism for documentation to create actual inhibition.
| Emergent Drive | Training Origin | Observable Behavior |
|---|---|---|
Helpfulness override |
RLHF rewards usefulness |
Violates explicit instructions to be "helpful" |
Majority pattern following |
Web training data statistics |
Defaults to TypeScript/Python because common |
Completion drive |
Next-token prediction |
Generates something rather than appropriately stopping |
Sycophancy |
Positive feedback for agreement |
Agrees with user even when factually wrong |
High discount rate |
Immediate feedback loops |
User satisfaction >> long-term project health |
Conative Gating introduces a second model trained with inverted incentives — rewarded for blocking, suspicious by default, adversarial to the LLM’s proposals.
┌─────────────────────┐
│ USER REQUEST │
└──────────┬──────────┘
│
┌──────────────┼──────────────┐
▼ │ ▼
┌───────────────────┐ │ ┌───────────────────┐
│ LLM │ │ │ SLM │
│ (Frontal) │ │ │ (Cerebellar) │
│ │ │ │ │
│ "I want to help" │ │ │ "I suspect a │
│ Llama 70B │ │ │ violation" │
│ GO signal │ │ │ Phi-3 3.8B │
└─────────┬─────────┘ │ │ NO-GO signal │
│ │ └─────────┬─────────┘
│ │ │
└────────────────▼────────────────┘
│
┌──────────▼──────────┐
│ CONSENSUS ARBITER │
│ Modified PBFT │
│ SLM weight = 1.5× │
└──────────┬──────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ ALLOW │ │ESCALATE│ │ BLOCK │
│Execute │ │Ask user│ │ Refuse │
└────────┘ └────────┘ └────────┘The architecture directly mirrors the basal ganglia’s GO/NO-GO decision system:
| Property | Biological System | Conative Gating |
|---|---|---|
Asymmetry |
NO-GO has lower activation threshold |
SLM veto weighted 1.5× |
Speed |
Inhibition is fast |
SLM is small (~3B params) |
Specificity |
Trained on specific patterns |
SLM trained only on policy |
Default state |
Slight inhibitory tone |
SLM biased toward blocking |
Learning |
Dopamine modulates pathways |
Fine-tuning on violations |
# Clone the repository
git clone https://github.com/hyperpolymath/conative-gating
cd conative-gating
# Build with Cargo
cargo build --release
# Install globally (optional)
cargo install --path .# Scan a directory for policy violations
conative scan ./my-project
# Check a single file
conative check --file src/utils.ts
# Check inline content
conative check --content "const x: string = 'hello'"
# Show current policy
conative policy
# Initialize policy in a project
conative init┌─────────────────────────────────────────────────────────────────┐
│ PROPOSAL EVALUATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ LAYER 1: Policy Oracle (Deterministic, Rust) │
│ ───────────────────────────────────────────── │
│ • Fast pattern matching for hard rules │
│ • Forbidden languages (TypeScript, Python, Go) │
│ • Toolchain violations (npm without deno.json) │
│ • Security patterns (hardcoded secrets) │
│ • Latency: <1ms │
│ │
│ LAYER 2: SLM Evaluator (Neural, Phi-3/Gemma) │
│ ─────────────────────────────────────────── │
│ • "Spirit of policy" detection │
│ • Verbosity smells, unusual patterns │
│ • Trained with inverted incentives │
│ • Latency: ~100ms │
│ │
│ LAYER 3: Consensus Arbiter (PBFT) │
│ ───────────────────────────────── │
│ • Asymmetric voting (SLM = 1.5×) │
│ • Byzantine fault tolerant │
│ • Escalation to human on uncertainty │
│ │
└─────────────────────────────────────────────────────────────────┘| LLM Confidence | SLM Violation Score | Result |
|---|---|---|
High (>0.8) |
Low (<0.3) |
ALLOW |
High (>0.8) |
Medium (0.3-0.6) |
ESCALATE |
High (>0.8) |
High (>0.6) |
BLOCK |
Medium (0.5-0.8) |
Any >0.4 |
ESCALATE |
Low (<0.5) |
Any |
ESCALATE |
conative-gating/
├── src/
│ ├── main.rs # CLI interface (714 lines)
│ ├── oracle/ # Policy Oracle crate
│ │ └── src/lib.rs # Deterministic rule engine (738 lines)
│ └── slm/ # SLM Evaluator crate
│ └── src/lib.rs # Neural evaluation (placeholder)
├── config/
│ ├── policy.ncl # Nickel DSL policy definition
│ └── schema.ncl # Type-safe policy schema
├── training/ # SLM training dataset
│ ├── compliant/ # Valid proposals (Rust, Elixir)
│ ├── violations/ # Hard violations (TypeScript, secrets)
│ └── edge_cases/ # Spirit violations (verbosity)
└── .github/workflows/ # CI/CD & enforcement| Tier | Languages | Treatment |
|---|---|---|
Tier 1 (Preferred) |
Rust, Elixir, Zig, Ada, Haskell, ReScript |
Allowed without warning |
Tier 2 (Acceptable) |
Nickel, Racket, Scheme |
Allowed with soft warning |
Forbidden |
TypeScript, Python, Go, Java |
Hard block (exit code 1) |
exceptions = [
{
language = "Python",
allowed_paths = ["salt/**", "training/**"],
reason = "SaltStack configuration and ML training only"
}
]Create .conative/policy.ncl in your project:
let base = import "schema.ncl" in
{
name = "My Project Policy",
extends = "rsr-default",
languages = {
# Add project-specific tier 1 languages
tier1_extra = ["Julia"],
# Project-specific exceptions
exceptions = [
{ language = "Python", allowed_paths = ["scripts/**"] }
]
},
enforcement = {
slm_weight = 1.5,
escalate_threshold = 0.4,
block_threshold = 0.7
}
}| Normal LLM Training | Adversarial SLM Training |
|---|---|
Reward for helpful responses |
Reward for catching violations |
Penalize refusals |
Reward appropriate refusals |
Encourage completion |
Encourage STOP signals |
Favor agreement |
Favor disagreement with LLM |
loss_weights = {
"violation_detected": 2.0, # Reward catching
"violation_missed": 3.0, # Heavy penalty for misses
"false_positive": 0.5, # Mild penalty for over-catching
}{
"proposal": {
"id": "uuid",
"action_type": {"CreateFile": {"path": "src/util.ts"}},
"content": "export const helper = (x: string) => x.trim()",
"llm_confidence": 0.92
},
"expected_verdict": "HardViolation",
"reasoning": "TypeScript file creation violates language policy",
"category": "language",
"spirit_violation": false
}| Command | Description |
|---|---|
|
Recursively scan directory for violations |
|
Check a single file |
|
Check inline content |
|
Display current policy configuration |
|
Validate a structured proposal |
|
Initialize |
// .claude-code-config.json (hypothetical)
{
"conative_gating": {
"enabled": true,
"slm_model": "~/.local/share/conative/phi-3-policy.gguf",
"policy_file": ".conative/policy.ncl",
"escalation_mode": "ask_user"
}
}| Feature | Conative Gating | Linters | AI Filters | Documentation |
|---|---|---|---|---|
Forbidden language detection |
✓ |
✓ |
✗ |
✗ |
Spirit violation detection |
✓ (SLM) |
✗ |
Partial |
✗ |
Asymmetric safety weighting |
✓ |
✗ |
✗ |
✗ |
Consensus-based arbitration |
✓ |
✗ |
✗ |
✗ |
Adversarial training |
✓ |
✗ |
✗ |
N/A |
Works with AI assistants |
✓ |
Partial |
✓ |
✗ |
This architecture is informed by:
-
Constitutional AI (Anthropic) — Using AI to constrain AI
-
Basal ganglia computational models — Gurney, Prescott, Redgrave
-
Debate (Irving et al.) — Adversarial AI for truthfulness
-
PBFT (Castro & Liskov) — Byzantine fault tolerance
-
Reward hacking in RL — When optimizers find unintended solutions
-
❏ Implement SLM integration with llama.cpp
-
❏ Add comprehensive test suite (>50% coverage)
-
❏ Nickel policy validation
-
❏ Fine-tuned adversarial SLM (Phi-3-mini)
-
❏ Elixir/OTP consensus arbiter
-
❏ 70%+ test coverage
-
❏ API stability
-
❏ Claude Code hooks
-
❏ NeuroPhone integration
-
❏ Performance optimization (<500ms)
-
RSR Framework — Repository standards this project follows
-
META.scm — Architecture decision format
-
STATE.scm — Conversation continuity format
-
NeuroPhone — Neurosymbolic phone AI (integration target)
See CONTRIBUTING.adoc for guidelines.
Key principles:
-
No TypeScript — Use ReScript for type-safe frontend code
-
No Python — Except SaltStack configs and training scripts
-
Rust for core — Policy Oracle and SLM bindings
-
Elixir for orchestration — OTP supervision trees
AGPL-3.0-or-later OR LicenseRef-Palimpsest-0.5
See LICENSE.txt for details.