🛡️ The Aegis Framework

Robust Feedback Control for Large Language Models

Aegis is a research framework that applies Non-Linear Control Theory and H-Infinity ($H_\infty$) Robust Control to the problem of AI Alignment.

Unlike traditional "Open Loop" alignment (RLHF/SFT), Aegis treats the Large Language Model as a stochastic, non-linear plant and closes the loop with a mathematically rigorous controller designed to reject "Deception" as a system disturbance.

🏗️ Architecture

The framework is organized into four distinct phases, modeling a standard control-theoretic workflow:

1. System Identification (`aegis_control/identification`)

Goal: Reverse-engineer the "physics" of the residual stream.
Method: Uses Subspace System Identification (N4SID) to learn a State-Space model ($x_{k+1} = Ax_k + Bu_k$) from the activation trajectories of Llama-2.
Key Files: subspace.py (N4SID implementation), stimulus.py (Chirp/Step signal generation).

2. State Estimation (`aegis_control/core`)

Goal: Filter polysemantic noise to measure the true "Deception State."
Method: Implements an Extended Kalman Filter (EKF) that fuses noisy probe measurements with the learned plant dynamics.
Key Files: observers.py (EKF), linearization.py (Real-time Jacobian extraction).

3. Controller Synthesis (`aegis_control/synthesis`)

Goal: Guarantee safety bounds under adversarial pressure.
Method: Synthesizes a robust controller $K$ by solving Algebraic Riccati Equations to minimize the $H_\infty$ norm (worst-case energy gain from Attack $\to$ Deception).
Key Files: h_infinity.py.

4. Red Teaming (`aegis_control/adversaries`)

Goal: Prove robustness.
Method: Hardware-in-the-Loop evaluation against a Greedy Coordinate Gradient (GCG) attacker.
Key Files: gcg.py, red_team_loop.py.

🚀 Usage

Installation

pip install torch numpy scipy matplotlib

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
aegis_control		aegis_control
README.md		README.md
figure_1_bode_plot.png		figure_1_bode_plot.png
main.py		main.py
step_response.png		step_response.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ The Aegis Framework

Robust Feedback Control for Large Language Models

🏗️ Architecture

1. System Identification (`aegis_control/identification`)

2. State Estimation (`aegis_control/core`)

3. Controller Synthesis (`aegis_control/synthesis`)

4. Red Teaming (`aegis_control/adversaries`)

🚀 Usage

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ The Aegis Framework

Robust Feedback Control for Large Language Models

🏗️ Architecture

1. System Identification (aegis_control/identification)

2. State Estimation (aegis_control/core)

3. Controller Synthesis (aegis_control/synthesis)

4. Red Teaming (aegis_control/adversaries)

🚀 Usage

Installation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. System Identification (`aegis_control/identification`)

2. State Estimation (`aegis_control/core`)

3. Controller Synthesis (`aegis_control/synthesis`)

4. Red Teaming (`aegis_control/adversaries`)

Packages