Linear Steering in Residual Stream Experiment & Learning

Instead of retraining an AI to sound like a customer service agent, I tried to find a simple "politeness dial" hidden inside its brain.

**The Big Question**
[?] Is "customer service speak" a single, clean switch I can flip, or is it tangled up with everything else? 
  - The hype says prompt engineering works like magic. 
    > My hypothesis: `it's just a clumsy way of nudging this hidden dial.`

**How I Tested It**
I collected examples of that overly polite chatbot speak ("I'd be happy to assist you with that!") and normal direct speech, then calculated the difference between their "brain states" like finding which lights blink differently when someone's being professionally nice. 
This gave me a direction vector: a recipe for `"customer service mode."`

**The Crucial Trick**
At first, the recipe was too strong and broke the AI's brain, producing nonsense. I had to scale it down; like turning a volume knob to 0.3 instead of 11 to keep the AI working normally.

**Finding the Sweet Spot**
Where you inject this vector matters:
- Layers 0-3: breaks grammar ("We apologies for the inconvenient")
- Layers 9-12: does nothing (too late)
- **Layer 6**: perfect. This is where tone forms but hasn't become specific words yet `the semantic bottleneck`.

**Proof It Actually Works**
I injected the "customer service dial" into completely neutral prompts like `"The login isn't working."`
The AI reliably responded with `"I sincerely apologize for the inconvenience you're experiencing with the login process. Let me help you resolve that right away."` 
- That's causal evidence: 
    > `I can isolate and mechanically control high-level concepts without changing the model itself.`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear Steering in Residual Stream Experiment & Learning #42

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Linear Steering in Residual Stream Experiment & Learning #42

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions