Massive Activations in Large Language Models #21

TheoHLong · 2025-03-10T03:08:50Z

TheoHLong
Mar 10, 2025

Research Question

We study how massive activations (activations orders of magnitude larger than median values) emerge in LLMs, their function within the sentence, their role in influencing attention and bias terms. Specifically, we seek to answer:

How do massive activations influence the internal representations of LLMs?
How do massive activations impact text generation?
What causes massive activations to emerge in LLMs?

Owners

Tenghai, Shivam

Current results

We have systematically examined massive activations across multiple state-of-the-art models (LLaMA-2-7B, LLaMA-3.2-3B, DeepSeek-R1-Distill-Llama-8B) and found:

Massive activations predominantly occur at sequence boundaries and persist across multiple layers
Function words and punctuation show greater sensitivity to activation clipping than content words
Zeroing out these activations increases perplexity by 110-139x, particularly disrupting sentence boundaries
Massive activations serve as "attention anchors" that help maintain coherent text generation across context transitions

Currently preparing research paper detailing these findings

Project Status

We have completed initial identification, intervention, and clipping experiments. Our work reveals that massive activations encode critical structural information that guides model reasoning, particularly at sequence boundaries. Currently expanding analysis to quantify emergence patterns during training and exploring probing of massive activation.

Further Research

Apply probing techniques to massive activations: Implement linear and non-linear probes to identify what specific linguistic or structural features are encoded within massive activations. This could reveal whether these activations encode grammatical transitions, discourse markers, semantic boundaries, or other structural information. Probing results could also help develop interpretable metrics for measuring the functional role of these activations across different model architectures.
Investigate emergence patterns in nanoGPT during training: Conduct a controlled study tracking the formation and evolution of massive activations throughout the training process of nanoGPT. By capturing activation statistics at regular checkpoints (e.g., every 1000 steps), we can identify precisely when and how these extreme activation values first emerge, how they evolve in magnitude and location, and what training dynamics might trigger their development. This controlled environment would allow us to correlate emergence patterns with improvements in specific capabilities like maintaining coherence across sentence boundaries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive Activations in Large Language Models #21

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Massive Activations in Large Language Models #21

Uh oh!

Uh oh!

TheoHLong Mar 10, 2025

Research Question

Owners

Current results

Project Status

Further Research

Replies: 0 comments

TheoHLong
Mar 10, 2025