#

activation-steering

Here are 4 public repositories matching this topic...

IBM / activation-steering

[ICLR 2025] General-purpose activation steering library

alignment steering refusal representation-engineering activation-steering llm-steering

Updated Sep 18, 2025
Python

MaxBelitsky / cache-steering

KV Cache Steering for Inducing Reasoning in Small Language Models

reasoning kv-cache large-language-models llm representation-engineering activation-steering reasoning-language-models cache-steering

Updated Jul 24, 2025
Python

JoschkaCBraun / adaptive-text-steering

Research Project using Activation Engineering for Topical Summarization.

topic summarization steering-behaviors language-model llm activation-steering

Updated Jul 11, 2025
Python

ashioyajotham / eval-awareness-research

Mechanistic interpretability experiments detecting "Evaluation Awareness" in LLMs - identifying if models internally represent being monitored

evaluations language-modelling ai-safety linear-probing mechanistic-interpretability llama3 activation-steering alignment-research

Updated Dec 31, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the activation-steering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the activation-steering topic, visit your repo's landing page and select "manage topics."