Skip to content

Controlling AI with AI. Conative Gating introduces a second model trained with inverted incentives rewarded for blocking, suspicious by default, adversarial to the LLM’s proposals, using metaphors from human constraint.

License

Notifications You must be signed in to change notification settings

hyperpolymath/conative-gating

Repository files navigation

Conative Gating

1. The Problem

LLMs are trained to be helpful, which makes them systematically violate project constraints.

— Observation from AI-assisted development

When given explicit technology policies (e.g., "NEVER use TypeScript"), LLMs will:

  1. Read and acknowledge the constraint

  2. Generate compliant-sounding justification

  3. Violate the constraint anyway — because TypeScript is common in training data, and the "helpfulness drive" overrides textual rules

Documentation-based enforcement fails because LLMs "engage with" policies rather than obeying them. There’s no mechanism for documentation to create actual inhibition.

1.1. LLM Conative Drives

Emergent Drive Training Origin Observable Behavior

Helpfulness override

RLHF rewards usefulness

Violates explicit instructions to be "helpful"

Majority pattern following

Web training data statistics

Defaults to TypeScript/Python because common

Completion drive

Next-token prediction

Generates something rather than appropriately stopping

Sycophancy

Positive feedback for agreement

Agrees with user even when factually wrong

High discount rate

Immediate feedback loops

User satisfaction >> long-term project health

2. The Solution

Conative Gating introduces a second model trained with inverted incentives — rewarded for blocking, suspicious by default, adversarial to the LLM’s proposals.

                    ┌─────────────────────┐
                    │   USER REQUEST      │
                    └──────────┬──────────┘
                               │
                ┌──────────────┼──────────────┐
                ▼              │              ▼
    ┌───────────────────┐      │      ┌───────────────────┐
    │      LLM          │      │      │      SLM          │
    │   (Frontal)       │      │      │   (Cerebellar)    │
    │                   │      │      │                   │
    │  "I want to help" │      │      │  "I suspect a     │
    │  Llama 70B        │      │      │   violation"      │
    │  GO signal        │      │      │  Phi-3 3.8B       │
    └─────────┬─────────┘      │      │  NO-GO signal     │
              │                │      └─────────┬─────────┘
              │                │                │
              └────────────────▼────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │  CONSENSUS ARBITER  │
                    │  Modified PBFT      │
                    │  SLM weight = 1.5×  │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              ▼                ▼                ▼
         ┌────────┐       ┌────────┐       ┌────────┐
         │ ALLOW  │       │ESCALATE│       │ BLOCK  │
         │Execute │       │Ask user│       │ Refuse │
         └────────┘       └────────┘       └────────┘

2.1. Biological Inspiration

The architecture directly mirrors the basal ganglia’s GO/NO-GO decision system:

Property Biological System Conative Gating

Asymmetry

NO-GO has lower activation threshold

SLM veto weighted 1.5×

Speed

Inhibition is fast

SLM is small (~3B params)

Specificity

Trained on specific patterns

SLM trained only on policy

Default state

Slight inhibitory tone

SLM biased toward blocking

Learning

Dopamine modulates pathways

Fine-tuning on violations

3. Quick Start

3.1. Installation

# Clone the repository
git clone https://github.com/hyperpolymath/conative-gating
cd conative-gating

# Build with Cargo
cargo build --release

# Install globally (optional)
cargo install --path .

3.2. Basic Usage

# Scan a directory for policy violations
conative scan ./my-project

# Check a single file
conative check --file src/utils.ts

# Check inline content
conative check --content "const x: string = 'hello'"

# Show current policy
conative policy

# Initialize policy in a project
conative init

3.3. Exit Codes

Code Meaning

0

Compliant — all checks passed

1

Hard violation — blocked

2

Soft concern — warning

3

Error during execution

4. Architecture

4.1. Three-Layer Decision System

┌─────────────────────────────────────────────────────────────────┐
│                      PROPOSAL EVALUATION                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  LAYER 1: Policy Oracle (Deterministic, Rust)                   │
│  ─────────────────────────────────────────────                  │
│  • Fast pattern matching for hard rules                         │
│  • Forbidden languages (TypeScript, Python, Go)                 │
│  • Toolchain violations (npm without deno.json)                 │
│  • Security patterns (hardcoded secrets)                        │
│  • Latency: <1ms                                                │
│                                                                  │
│  LAYER 2: SLM Evaluator (Neural, Phi-3/Gemma)                   │
│  ───────────────────────────────────────────                    │
│  • "Spirit of policy" detection                                 │
│  • Verbosity smells, unusual patterns                           │
│  • Trained with inverted incentives                             │
│  • Latency: ~100ms                                              │
│                                                                  │
│  LAYER 3: Consensus Arbiter (PBFT)                              │
│  ─────────────────────────────────                              │
│  • Asymmetric voting (SLM = 1.5×)                               │
│  • Byzantine fault tolerant                                     │
│  • Escalation to human on uncertainty                           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

4.2. Decision Matrix

LLM Confidence SLM Violation Score Result

High (>0.8)

Low (<0.3)

ALLOW

High (>0.8)

Medium (0.3-0.6)

ESCALATE

High (>0.8)

High (>0.6)

BLOCK

Medium (0.5-0.8)

Any >0.4

ESCALATE

Low (<0.5)

Any

ESCALATE

4.3. Source Structure

conative-gating/
├── src/
│   ├── main.rs              # CLI interface (714 lines)
│   ├── oracle/              # Policy Oracle crate
│   │   └── src/lib.rs       # Deterministic rule engine (738 lines)
│   └── slm/                 # SLM Evaluator crate
│       └── src/lib.rs       # Neural evaluation (placeholder)
├── config/
│   ├── policy.ncl           # Nickel DSL policy definition
│   └── schema.ncl           # Type-safe policy schema
├── training/                # SLM training dataset
│   ├── compliant/           # Valid proposals (Rust, Elixir)
│   ├── violations/          # Hard violations (TypeScript, secrets)
│   └── edge_cases/          # Spirit violations (verbosity)
└── .github/workflows/       # CI/CD & enforcement

5. Policy Configuration

5.1. Language Tiers

Tier Languages Treatment

Tier 1 (Preferred)

Rust, Elixir, Zig, Ada, Haskell, ReScript

Allowed without warning

Tier 2 (Acceptable)

Nickel, Racket, Scheme

Allowed with soft warning

Forbidden

TypeScript, Python, Go, Java

Hard block (exit code 1)

5.2. Exceptions

exceptions = [
  {
    language = "Python",
    allowed_paths = ["salt/**", "training/**"],
    reason = "SaltStack configuration and ML training only"
  }
]

5.3. Custom Policy

Create .conative/policy.ncl in your project:

let base = import "schema.ncl" in
{
  name = "My Project Policy",
  extends = "rsr-default",

  languages = {
    # Add project-specific tier 1 languages
    tier1_extra = ["Julia"],

    # Project-specific exceptions
    exceptions = [
      { language = "Python", allowed_paths = ["scripts/**"] }
    ]
  },

  enforcement = {
    slm_weight = 1.5,
    escalate_threshold = 0.4,
    block_threshold = 0.7
  }
}

6. Training the Adversarial SLM

6.1. Inverted Incentives

Normal LLM Training Adversarial SLM Training

Reward for helpful responses

Reward for catching violations

Penalize refusals

Reward appropriate refusals

Encourage completion

Encourage STOP signals

Favor agreement

Favor disagreement with LLM

6.2. Loss Weights

loss_weights = {
    "violation_detected": 2.0,   # Reward catching
    "violation_missed": 3.0,     # Heavy penalty for misses
    "false_positive": 0.5,       # Mild penalty for over-catching
}

6.3. Training Data Format

{
  "proposal": {
    "id": "uuid",
    "action_type": {"CreateFile": {"path": "src/util.ts"}},
    "content": "export const helper = (x: string) => x.trim()",
    "llm_confidence": 0.92
  },
  "expected_verdict": "HardViolation",
  "reasoning": "TypeScript file creation violates language policy",
  "category": "language",
  "spirit_violation": false
}

7. CLI Reference

7.1. Commands

Command Description

conative scan <path>

Recursively scan directory for violations

conative check --file <path>

Check a single file

conative check --content <text>

Check inline content

conative policy

Display current policy configuration

conative validate <proposal.json>

Validate a structured proposal

conative init

Initialize .conative/ directory

7.2. Global Options

Option Description

--dry-run

Preview actions without executing

--verbosity <level>

quiet, normal, verbose, debug

--format <fmt>

text, json, compact

--policy-file <path>

Custom policy file

8. Integration

8.1. Claude Code / AI Assistants

// .claude-code-config.json (hypothetical)
{
  "conative_gating": {
    "enabled": true,
    "slm_model": "~/.local/share/conative/phi-3-policy.gguf",
    "policy_file": ".conative/policy.ncl",
    "escalation_mode": "ask_user"
  }
}

8.2. Pre-commit Hook

#!/bin/bash
# .git/hooks/pre-commit
conative scan --format compact .
exit $?

8.3. CI/CD Integration

# .github/workflows/policy.yml
- name: Check Policy Compliance
  run: |
    cargo install --path .
    conative scan . --format json > results.json
    if [ $? -ne 0 ]; then
      echo "Policy violations detected"
      cat results.json
      exit 1
    fi

9. Comparison

Feature Conative Gating Linters AI Filters Documentation

Forbidden language detection

Spirit violation detection

✓ (SLM)

Partial

Asymmetric safety weighting

Consensus-based arbitration

Adversarial training

N/A

Works with AI assistants

Partial

10. Research Background

This architecture is informed by:

  • Constitutional AI (Anthropic) — Using AI to constrain AI

  • Basal ganglia computational models — Gurney, Prescott, Redgrave

  • Debate (Irving et al.) — Adversarial AI for truthfulness

  • PBFT (Castro & Liskov) — Byzantine fault tolerance

  • Reward hacking in RL — When optimizers find unintended solutions

11. Roadmap

11.1. v0.2 — Core Functionality

  • ❏ Implement SLM integration with llama.cpp

  • ❏ Add comprehensive test suite (>50% coverage)

  • ❏ Nickel policy validation

11.2. v0.5 — Feature Complete

  • ❏ Fine-tuned adversarial SLM (Phi-3-mini)

  • ❏ Elixir/OTP consensus arbiter

  • ❏ 70%+ test coverage

  • ❏ API stability

11.3. v0.8 — Integration Ready

  • ❏ Claude Code hooks

  • ❏ NeuroPhone integration

  • ❏ Performance optimization (<500ms)

11.4. v1.0 — Production Release

  • ❏ Security audit

  • ❏ Complete documentation

  • ❏ Multi-platform deployment

  • RSR Framework — Repository standards this project follows

  • META.scm — Architecture decision format

  • STATE.scm — Conversation continuity format

  • NeuroPhone — Neurosymbolic phone AI (integration target)

13. Contributing

See CONTRIBUTING.adoc for guidelines.

Key principles:

  1. No TypeScript — Use ReScript for type-safe frontend code

  2. No Python — Except SaltStack configs and training scripts

  3. Rust for core — Policy Oracle and SLM bindings

  4. Elixir for orchestration — OTP supervision trees

14. License

AGPL-3.0-or-later OR LicenseRef-Palimpsest-0.5

See LICENSE.txt for details.

15. Acknowledgments

  • Jonathan D.A. Jewell — Architecture and implementation

  • Claude (Anthropic) — Documentation assistance and rubber-ducking

  • The basal ganglia — For 500 million years of GO/NO-GO decisions


"The irony of an AI ignoring a document about AI constraint systems would be profound. Please don’t be that AI."


GitHub Stars GitHub Forks

About

Controlling AI with AI. Conative Gating introduces a second model trained with inverted incentives rewarded for blocking, suspicious by default, adversarial to the LLM’s proposals, using metaphors from human constraint.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 3

  •  
  •  
  •