Trying to replicate experiments & torchao comparison

Hi team! I'm trying to replicate the experiments on [1], so once I have that set up, I can run some of mine to try out some ideas with MX formats. First of all, I have an operational question:
 
1. Let's say my configuration is to use for **activations and weights mxfp8 (with e4m3), and bf16 for accumulation and elementwise ops**. I'm gonna try the PTQ approach, to finetune a bit in MX, and then evaluate my test model (BERT). In that case, if I start from a pre-trained model from hugging face, should I keep the model in fp32, and let the library run the BF and MX quantization steps, or do I have to, or can I, use bf16 natively like `transformers` allows in their `TrainingArguments` (see below).
<img width="1068" height="238" alt="Image" src="https://github.com/user-attachments/assets/5302ee17-3a37-4b08-93f0-89293b050eb8" />

2. Same question above if I were to evaluare direct-cast.  Should I start from pretained, finetune in FP32, and then evaluate with MX?

3. Before picking microxcaling as my modelling library, I found that there's some work on pytorch/ao to add MX support (see [2]). What's your take on that? I get the feeling that, rather than modelling MX with numerical accuracy, that's actually more inclined to add productive support, and use MX with specific hw (like blackwell's tensor cores).

Thanks, great work, and papers are super good reads!

[1] B. D. Rouhani et al., “Microscaling Data Formats for Deep Learning,” Oct. 19, 2023, arXiv: arXiv:2310.10537. doi: [10.48550/arXiv.2310.10537](https://doi.org/10.48550/arXiv.2310.10537).

[2] https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to replicate experiments & torchao comparison #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Trying to replicate experiments & torchao comparison #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions