MX Quantization About Subnorm

Hi~ great work ! I have some questions about the choice of private_exp. The quantization scales of subnormal and normal values ​​should be different. Why private_exp clip to min_exp? I think it should clip to 1.0.

As shown in the figure：
alpha = 2**(shared_exp - emax), alpha is a scaling factor
private_exp = floor(log2(abs(A/alpha)).clip(1 or min_exp), A is input tensor
quantize_scale = 2**(private_exp - m)


        if exp_bits != 0:
            private_exp = torch.floor(torch.log2(torch.abs(A) + (A == 0).type(A.dtype)))
        
            # #The minimum representable exponent for 8 exp bits is -126
            # min_exp = -(2 ** (exp_bits - 1)) + 2
            # private_exp = private_exp.clip(min=min_exp)
        
            # subnorm and norm part has different scale
            # private_exp >= 1, norm scale
            # private_exp < 1, subnorm scale
            private_exp = private_exp.clip(min=1.0)
        else:
            private_exp = None

![Image](https://github.com/user-attachments/assets/383e40fd-fe81-4825-a55c-077d61f7cfad)

[code](https://github.com/microsoft/microxcaling/blob/main/mx/elemwise_ops.py#L140)
[image](https://arxiv.org/abs/2310.16836)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MX Quantization About Subnorm #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MX Quantization About Subnorm #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions