MIOps for AI researchers AND agent developers #29

skunk-1 · 2025-11-11T16:54:14Z

skunk-1
Nov 11, 2025

There are arguably 150 people who know how to train frontier models and even fewer who really understand the resulting artifacts. The goal of Mechanistic Interpretability (MI) is to help us approach the last problem. But if the goal is to discover vulnerabilities then wouldn't we need many more eyeballs than a small pool of AI frontier researchers? What if we gave tools to agent developers to build risk discovery tools themselves? To argue by analogy, Tim Bernards Lee reduced (the unworkable) SGML to HTML and the browser allowed users to view source. Arguably that ability to copy+paste and edit the html code was a major scaling factor. Can we do the same for MI?

We want to test this network effects hypothesis. To do that we at Krnel have open sourced representation engineering infra we call “Policy neurons” that performs SOTA on detection and controls for agent security. Could you/your team try it out and give us some feedback? It’s minimal for now, but we want to see what people think. You can see the details at

Repo: https://github.com/krnel-ai/krnel-graph
Introductory Blog: https://krnel.ai/blog/2025-10-21-krnel-graph/
Guardrails: https://krnel.ai/blog/2025-10-29-kg-guardrail-example/
Extending the graph (adversarial training): To come

The graph is our attempt to democratize model risk discovery by:

abstracting the development and operational complexities
strongly typed primitives with provenance and collaboration features
initially focusing on simpler scalable methods (linear probes and not SAEs)
agentic workflow friendly (examples to come)
sharing of artifacts through a model zoo (to come)

We would value your feedback and comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIOps for AI researchers AND agent developers #29

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

MIOps for AI researchers AND agent developers #29

Uh oh!

Uh oh!

skunk-1 Nov 11, 2025

Replies: 0 comments

skunk-1
Nov 11, 2025