Skip to content

Unclear raw input behavior for representable values #5

@kippesp

Description

@kippesp

MAX_FLOAT can be represented in both fp32 and fp64. Its raw value is 0x1.fffffep+127. As hex fp32 and fp64, this would be 0x7f7fffff and 0x47efffffe0000000. BiNums correctly converts from raw form to their base-10 formats as 3.40282346638528859811704e+38.

With the input command binums.exe 0x1.fffffep127, binums will output the correct raw bits representations for fp64/32/16/etc. But the as number representation is (probably) correct only for fp64. Having floathex imply fp64 seems fine, but this causes difficulty if one wants to see floathex represented in float32. (It takes two steps: first floathex-->hex_fp32 then hex_fp32 --> fp32.)

Should binums be using the separate raw bits as the input for the as number output?

(bfloat16 is also saturating differently than float16 in the below example. Maybe leave subnormal discussion as separate from when the values are representable among the different types.)

Current output (trimmed):

> Release\bin\binums.exe 0x1.fffffep127
Representations:
          type float64
       decimal 3.40282346638528859811704e+38
      floathex 0x1.fffffe0000000p+127
       raw hex 0x47EFFFFFE0000000
       raw oct 0o
       raw bin 0b0100011111101111111111111111111111100000000000000000000000000000
    fields bin frac:0b1111111111111111111111100000000000000000000000000000 exp:0b10001111110 sign:0b0

As raw bits:
       float16 0x7C00
      bfloat16 0x7F7F
       float32 0x7F7FFFFF
 ->    float64 0x47EFFFFFE0000000

As number:
       float16 0
      bfloat16 0
       float32 -36893488147419103232
 ->    float64 3.40282346638528859811704e+38

Proposed output (trimmed):

> Release\bin\binums.exe 0x1.fffffep127
Representations:
          type float64
       decimal 3.40282346638528859811704e+38
      floathex 0x1.fffffe0000000p+127
       raw hex 0x47EFFFFFE0000000
       raw oct 0o
       raw bin 0b0100011111101111111111111111111111100000000000000000000000000000
    fields bin frac:0b1111111111111111111111100000000000000000000000000000 exp:0b10001111110 sign:0b0

As raw bits:
       float16 0x7C00
      bfloat16 0x7F80
       float32 0x7F7FFFFF
 ->    float64 0x47EFFFFFE0000000

As number:
       float16 inf
      bfloat16 inf
       float32 3.40282346638528859811704e+38
 ->    float64 3.40282346638528859811704e+38

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions