You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We study how massive activations (activations orders of magnitude larger than median values) emerge in LLMs, their function within the sentence, their role in influencing attention and bias terms. Specifically, we seek to answer:
How do massive activations influence the internal representations of LLMs?
How do massive activations impact text generation?
What causes massive activations to emerge in LLMs?
Owners
Tenghai, Shivam
Current results
We have systematically examined massive activations across multiple state-of-the-art models (LLaMA-2-7B, LLaMA-3.2-3B, DeepSeek-R1-Distill-Llama-8B) and found:
Massive activations predominantly occur at sequence boundaries and persist across multiple layers
Function words and punctuation show greater sensitivity to activation clipping than content words
Zeroing out these activations increases perplexity by 110-139x, particularly disrupting sentence boundaries
Massive activations serve as "attention anchors" that help maintain coherent text generation across context transitions
Currently preparing research paper detailing these findings
Project Status
We have completed initial identification, intervention, and clipping experiments. Our work reveals that massive activations encode critical structural information that guides model reasoning, particularly at sequence boundaries. Currently expanding analysis to quantify emergence patterns during training and exploring probing of massive activation.
Further Research
Apply probing techniques to massive activations: Implement linear and non-linear probes to identify what specific linguistic or structural features are encoded within massive activations. This could reveal whether these activations encode grammatical transitions, discourse markers, semantic boundaries, or other structural information. Probing results could also help develop interpretable metrics for measuring the functional role of these activations across different model architectures.
Investigate emergence patterns in nanoGPT during training: Conduct a controlled study tracking the formation and evolution of massive activations throughout the training process of nanoGPT. By capturing activation statistics at regular checkpoints (e.g., every 1000 steps), we can identify precisely when and how these extreme activation values first emerge, how they evolve in magnitude and location, and what training dynamics might trigger their development. This controlled environment would allow us to correlate emergence patterns with improvements in specific capabilities like maintaining coherence across sentence boundaries.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Research Question
We study how massive activations (activations orders of magnitude larger than median values) emerge in LLMs, their function within the sentence, their role in influencing attention and bias terms. Specifically, we seek to answer:
Owners
Tenghai, Shivam
Current results
We have systematically examined massive activations across multiple state-of-the-art models (LLaMA-2-7B, LLaMA-3.2-3B, DeepSeek-R1-Distill-Llama-8B) and found:
Currently preparing research paper detailing these findings
Project Status
We have completed initial identification, intervention, and clipping experiments. Our work reveals that massive activations encode critical structural information that guides model reasoning, particularly at sequence boundaries. Currently expanding analysis to quantify emergence patterns during training and exploring probing of massive activation.
Further Research
Beta Was this translation helpful? Give feedback.
All reactions