Why the gradient scaling factor is multiplied before quantization?

https://github.com/Tiiiger/QPyTorch/blob/ed0d8b17680254799f2f3960e9e7f848b8bb9db4/qtorch/optim/optim_low.py#L81

In `OptimLP`, the gradient scaling factor is multiplied before quantization. However, grad scaling is meant to prevent possible underflow of low precision quantized gradient values. I think the current implementation cannot prevent underflow.

Maybe the correct implementation is to multiply the scaling factor after quantization.

```python

p.grad.data = self.grad_quant(p.grad.data) * self.grad_scaling

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why the gradient scaling factor is multiplied before quantization? #59

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why the gradient scaling factor is multiplied before quantization? #59

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions