Is it possible to smooth FC2 input?

Hi Developers,

Recently when I try to apply smoothquant in my side (Qwen3-1.7B), I found that the **FC2** (or the **down_proj** in Qwen-like definition) is not concluded in the smoothed layers. However, I observed that the static per-tensor scaling factors in this layer input, can be **extremely large** if no smooth is applied.

```
Layer 0: {'q_proj_input': 0.009227362204724409, 'o_proj_input': 0.021776574803149606, 'gate_input': 0.010765255905511811, 'down_input': 0.15748031496062992}

Layer 1: {'q_proj_input': 0.008427657480314961, 'o_proj_input': 0.011441929133858268, 'gate_input': 0.015071358267716535, 'down_input': 1.236220472440945}

Layer 2: {'q_proj_input': 0.009781003937007874, 'o_proj_input': 0.018331692913385825, 'gate_input': 0.023375984251968504, 'down_input': 133.03937007874015}

...

Layer 26: {'q_proj_input': 0.03297244094488189, 'o_proj_input': 2.031496062992126, 'gate_input': 0.022637795275590553, 'down_input': 11.21259842519685}

Layer 27: {'q_proj_input': 0.03641732283464567, 'o_proj_input': 3.0078740157480315, 'gate_input': 0.035679133858267716, 'down_input': 23.433070866141733}
```

As you can see, the `down_input` here refers to the per-tensor scale in **down_proj** (should be the same to fc2 in OPT). When accumulated through layers, the down_input scale becomes extremely large, which means the **outliers** here explode! Then of course, the final ppl does not look acceptable (from original ~16 to quantized ~90)

If we can apply smooth to this layer, I believe the result can improve a lot. May I know if you have tried to implement that? Many thanks!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it possible to smooth FC2 input? #108

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is it possible to smooth FC2 input? #108

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions