Conative Gating

SLM-as-Cerebellum for LLM Policy Enforcement — A biologically-inspired architecture where a Small Language Model acts as an inhibitory antagonist to Large Language Models, preventing policy violations through consensus-based gating.

Table of Contents

1. The Problem
- 1.1. LLM Conative Drives
2. The Solution
- 2.1. Biological Inspiration
3. Quick Start
4. Architecture
5. Policy Configuration
6. Training the Adversarial SLM
7. CLI Reference
- 7.1. Commands
- 7.2. Global Options
8. Integration
9. Comparison
10. Research Background
11. Roadmap
12. Related Projects
13. Contributing
14. License
15. Acknowledgments

1. The Problem

LLMs are trained to be helpful, which makes them systematically violate project constraints.

— Observation from AI-assisted development

When given explicit technology policies (e.g., "NEVER use TypeScript"), LLMs will:

Read and acknowledge the constraint
Generate compliant-sounding justification
Violate the constraint anyway — because TypeScript is common in training data, and the "helpfulness drive" overrides textual rules

Documentation-based enforcement fails because LLMs "engage with" policies rather than obeying them. There’s no mechanism for documentation to create actual inhibition.

1.1. LLM Conative Drives

Emergent Drive	Training Origin	Observable Behavior
Helpfulness override	RLHF rewards usefulness	Violates explicit instructions to be "helpful"
Majority pattern following	Web training data statistics	Defaults to TypeScript/Python because common
Completion drive	Next-token prediction	Generates something rather than appropriately stopping
Sycophancy	Positive feedback for agreement	Agrees with user even when factually wrong
High discount rate	Immediate feedback loops	User satisfaction >> long-term project health

2. The Solution

Conative Gating introduces a second model trained with inverted incentives — rewarded for blocking, suspicious by default, adversarial to the LLM’s proposals.

                    ┌─────────────────────┐
                    │   USER REQUEST      │
                    └──────────┬──────────┘
                               │
                ┌──────────────┼──────────────┐
                ▼              │              ▼
    ┌───────────────────┐      │      ┌───────────────────┐
    │      LLM          │      │      │      SLM          │
    │   (Frontal)       │      │      │   (Cerebellar)    │
    │                   │      │      │                   │
    │  "I want to help" │      │      │  "I suspect a     │
    │  Llama 70B        │      │      │   violation"      │
    │  GO signal        │      │      │  Phi-3 3.8B       │
    └─────────┬─────────┘      │      │  NO-GO signal     │
              │                │      └─────────┬─────────┘
              │                │                │
              └────────────────▼────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │  CONSENSUS ARBITER  │
                    │  Modified PBFT      │
                    │  SLM weight = 1.5×  │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              ▼                ▼                ▼
         ┌────────┐       ┌────────┐       ┌────────┐
         │ ALLOW  │       │ESCALATE│       │ BLOCK  │
         │Execute │       │Ask user│       │ Refuse │
         └────────┘       └────────┘       └────────┘

2.1. Biological Inspiration

The architecture directly mirrors the basal ganglia’s GO/NO-GO decision system:

Property	Biological System	Conative Gating
Asymmetry	NO-GO has lower activation threshold	SLM veto weighted 1.5×
Speed	Inhibition is fast	SLM is small (~3B params)
Specificity	Trained on specific patterns	SLM trained only on policy
Default state	Slight inhibitory tone	SLM biased toward blocking
Learning	Dopamine modulates pathways	Fine-tuning on violations

3. Quick Start

3.1. Installation

# Clone the repository
git clone https://github.com/hyperpolymath/conative-gating
cd conative-gating

# Build with Cargo
cargo build --release

# Install globally (optional)
cargo install --path .

3.2. Basic Usage

# Scan a directory for policy violations
conative scan ./my-project

# Check a single file
conative check --file src/utils.ts

# Check inline content
conative check --content "const x: string = 'hello'"

# Show current policy
conative policy

# Initialize policy in a project
conative init

3.3. Exit Codes

Code	Meaning
`0`	Compliant — all checks passed
`1`	Hard violation — blocked
`2`	Soft concern — warning
`3`	Error during execution

4. Architecture

4.1. Three-Layer Decision System

┌─────────────────────────────────────────────────────────────────┐
│                      PROPOSAL EVALUATION                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  LAYER 1: Policy Oracle (Deterministic, Rust)                   │
│  ─────────────────────────────────────────────                  │
│  • Fast pattern matching for hard rules                         │
│  • Forbidden languages (TypeScript, Python, Go)                 │
│  • Toolchain violations (npm without deno.json)                 │
│  • Security patterns (hardcoded secrets)                        │
│  • Latency: <1ms                                                │
│                                                                  │
│  LAYER 2: SLM Evaluator (Neural, Phi-3/Gemma)                   │
│  ───────────────────────────────────────────                    │
│  • "Spirit of policy" detection                                 │
│  • Verbosity smells, unusual patterns                           │
│  • Trained with inverted incentives                             │
│  • Latency: ~100ms                                              │
│                                                                  │
│  LAYER 3: Consensus Arbiter (PBFT)                              │
│  ─────────────────────────────────                              │
│  • Asymmetric voting (SLM = 1.5×)                               │
│  • Byzantine fault tolerant                                     │
│  • Escalation to human on uncertainty                           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

4.2. Decision Matrix

LLM Confidence	SLM Violation Score	Result
High (>0.8)	Low (<0.3)	ALLOW
High (>0.8)	Medium (0.3-0.6)	ESCALATE
High (>0.8)	High (>0.6)	BLOCK
Medium (0.5-0.8)	Any >0.4	ESCALATE
Low (<0.5)	Any	ESCALATE

4.3. Source Structure

conative-gating/
├── src/
│   ├── main.rs              # CLI interface (714 lines)
│   ├── oracle/              # Policy Oracle crate
│   │   └── src/lib.rs       # Deterministic rule engine (738 lines)
│   └── slm/                 # SLM Evaluator crate
│       └── src/lib.rs       # Neural evaluation (placeholder)
├── config/
│   ├── policy.ncl           # Nickel DSL policy definition
│   └── schema.ncl           # Type-safe policy schema
├── training/                # SLM training dataset
│   ├── compliant/           # Valid proposals (Rust, Elixir)
│   ├── violations/          # Hard violations (TypeScript, secrets)
│   └── edge_cases/          # Spirit violations (verbosity)
└── .github/workflows/       # CI/CD & enforcement

5. Policy Configuration

5.1. Language Tiers

Tier	Languages	Treatment
Tier 1 (Preferred)	Rust, Elixir, Zig, Ada, Haskell, ReScript	Allowed without warning
Tier 2 (Acceptable)	Nickel, Racket, Scheme	Allowed with soft warning
Forbidden	TypeScript, Python, Go, Java	Hard block (exit code 1)

5.2. Exceptions

exceptions = [
  {
    language = "Python",
    allowed_paths = ["salt/**", "training/**"],
    reason = "SaltStack configuration and ML training only"
  }
]

5.3. Custom Policy

Create .conative/policy.ncl in your project:

let base = import "schema.ncl" in
{
  name = "My Project Policy",
  extends = "rsr-default",

  languages = {
    # Add project-specific tier 1 languages
    tier1_extra = ["Julia"],

    # Project-specific exceptions
    exceptions = [
      { language = "Python", allowed_paths = ["scripts/**"] }
    ]
  },

  enforcement = {
    slm_weight = 1.5,
    escalate_threshold = 0.4,
    block_threshold = 0.7
  }
}

6. Training the Adversarial SLM

6.1. Inverted Incentives

Normal LLM Training	Adversarial SLM Training
Reward for helpful responses	Reward for catching violations
Penalize refusals	Reward appropriate refusals
Encourage completion	Encourage STOP signals
Favor agreement	Favor disagreement with LLM

6.2. Loss Weights

loss_weights = {
    "violation_detected": 2.0,   # Reward catching
    "violation_missed": 3.0,     # Heavy penalty for misses
    "false_positive": 0.5,       # Mild penalty for over-catching
}

6.3. Training Data Format

{
  "proposal": {
    "id": "uuid",
    "action_type": {"CreateFile": {"path": "src/util.ts"}},
    "content": "export const helper = (x: string) => x.trim()",
    "llm_confidence": 0.92
  },
  "expected_verdict": "HardViolation",
  "reasoning": "TypeScript file creation violates language policy",
  "category": "language",
  "spirit_violation": false
}

7. CLI Reference

7.1. Commands

Command	Description
`conative scan <path>`	Recursively scan directory for violations
`conative check --file <path>`	Check a single file
`conative check --content <text>`	Check inline content
`conative policy`	Display current policy configuration
`conative validate <proposal.json>`	Validate a structured proposal
`conative init`	Initialize `.conative/` directory

7.2. Global Options

Option	Description
`--dry-run`	Preview actions without executing
`--verbosity <level>`	quiet, normal, verbose, debug
`--format <fmt>`	text, json, compact
`--policy-file <path>`	Custom policy file

8. Integration

8.1. Claude Code / AI Assistants

// .claude-code-config.json (hypothetical)
{
  "conative_gating": {
    "enabled": true,
    "slm_model": "~/.local/share/conative/phi-3-policy.gguf",
    "policy_file": ".conative/policy.ncl",
    "escalation_mode": "ask_user"
  }
}

8.2. Pre-commit Hook

#!/bin/bash
# .git/hooks/pre-commit
conative scan --format compact .
exit $?

8.3. CI/CD Integration

# .github/workflows/policy.yml
- name: Check Policy Compliance
  run: |
    cargo install --path .
    conative scan . --format json > results.json
    if [ $? -ne 0 ]; then
      echo "Policy violations detected"
      cat results.json
      exit 1
    fi

9. Comparison

Feature	Conative Gating	Linters	AI Filters	Documentation
Forbidden language detection	✓	✓	✗	✗
Spirit violation detection	✓ (SLM)	✗	Partial	✗
Asymmetric safety weighting	✓	✗	✗	✗
Consensus-based arbitration	✓	✗	✗	✗
Adversarial training	✓	✗	✗	N/A
Works with AI assistants	✓	Partial	✓	✗

10. Research Background

This architecture is informed by:

Constitutional AI (Anthropic) — Using AI to constrain AI
Basal ganglia computational models — Gurney, Prescott, Redgrave
Debate (Irving et al.) — Adversarial AI for truthfulness
PBFT (Castro & Liskov) — Byzantine fault tolerance
Reward hacking in RL — When optimizers find unintended solutions

11. Roadmap

11.1. v0.2 — Core Functionality

❏ Implement SLM integration with llama.cpp
❏ Add comprehensive test suite (>50% coverage)
❏ Nickel policy validation

11.2. v0.5 — Feature Complete

❏ Fine-tuned adversarial SLM (Phi-3-mini)
❏ Elixir/OTP consensus arbiter
❏ 70%+ test coverage
❏ API stability

11.3. v0.8 — Integration Ready

❏ Claude Code hooks
❏ NeuroPhone integration
❏ Performance optimization (<500ms)

11.4. v1.0 — Production Release

❏ Security audit
❏ Complete documentation
❏ Multi-platform deployment

RSR Framework — Repository standards this project follows
META.scm — Architecture decision format
STATE.scm — Conversation continuity format
NeuroPhone — Neurosymbolic phone AI (integration target)

13. Contributing

See CONTRIBUTING.adoc for guidelines.

Key principles:

No TypeScript — Use ReScript for type-safe frontend code
No Python — Except SaltStack configs and training scripts
Rust for core — Policy Oracle and SLM bindings
Elixir for orchestration — OTP supervision trees

14. License

AGPL-3.0-or-later OR LicenseRef-Palimpsest-0.5

See LICENSE.txt for details.

15. Acknowledgments

Jonathan D.A. Jewell — Architecture and implementation
Claude (Anthropic) — Documentation assistance and rubber-ducking
The basal ganglia — For 500 million years of GO/NO-GO decisions

"The irony of an AI ignoring a document about AI constraint systems would be profound. Please don’t be that AI."

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.claude		.claude
.github		.github
.well-known		.well-known
config		config
docs		docs
nix		nix
scripts		scripts
src		src
templates		templates
training		training
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nojekyll		.nojekyll
ARCHITECTURE.md		ARCHITECTURE.md
CITATION.cff		CITATION.cff
CONVERSION_NEEDED.md		CONVERSION_NEEDED.md
Cargo.toml		Cargo.toml
ECOSYSTEM.scm		ECOSYSTEM.scm
LICENSE.txt		LICENSE.txt
META.scm		META.scm
README.adoc		README.adoc
RSR_COMPLIANCE.adoc		RSR_COMPLIANCE.adoc
SECURITY.md		SECURITY.md
STATE.scm		STATE.scm
codemeta.json		codemeta.json
justfile		justfile

Uh oh!

License

hyperpolymath/conative-gating

Folders and files

Latest commit

History

Repository files navigation

Conative Gating

1. The Problem

1.1. LLM Conative Drives

2. The Solution

2.1. Biological Inspiration

3. Quick Start

3.1. Installation

3.2. Basic Usage

3.3. Exit Codes

4. Architecture

4.1. Three-Layer Decision System

4.2. Decision Matrix

4.3. Source Structure

5. Policy Configuration

5.1. Language Tiers

5.2. Exceptions

5.3. Custom Policy

6. Training the Adversarial SLM

6.1. Inverted Incentives

6.2. Loss Weights

6.3. Training Data Format

7. CLI Reference

7.1. Commands

7.2. Global Options

8. Integration

8.1. Claude Code / AI Assistants

8.2. Pre-commit Hook

8.3. CI/CD Integration

9. Comparison

10. Research Background

11. Roadmap

11.1. v0.2 — Core Functionality

11.2. v0.5 — Feature Complete

11.3. v0.8 — Integration Ready

11.4. v1.0 — Production Release

12. Related Projects

13. Contributing

14. License

15. Acknowledgments

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages