Skip to content

using fp32 data format to simulate mx data format is not equivalent? #6

@jackiechen19940708

Description

@jackiechen19940708

It's nice work but I have some questions:
we see that we use template T=float (see following code)
(1)to use fp32 to represent mx data format and
(2)simulating mx format calculation operation using fp32
so I think this may exist in-equivalent with real mx data format representation and operation. Do you use FPGA to evaluate how much error between the fp32-simuation and real mx data format?
template<typename T> __global__ void quantize_mx_cuda_kernel( const T* __restrict__ input, const int scale_bits, const int elem_ebits, const int elem_mbits, const float elem_max_norm, const float* __restrict__ max_values, const long total_size, const int axis_size, const int post_axis_size, const bool flush_fp32_subnorms, const RoundingMode rounding_mode, T* __restrict__ output ) {

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions