-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Opening this issue to track the discussion for cast operator in PR #478.
(raised by @wacky6 in Chromium CL-5056249 review)
Casting to/from fp <-> int or fp is well understood. But what about the casting behavior from fp / int <-> uint?
Would -1.23 fp32 cast to 0 or 255 uint8?
The spec PR mentions cast is implementation defined, which isn't ideal. We should at least provide what caller should expect.
what about the casting behavior from fp / int <-> uint?
That's a trickier case for conformance, casting from a wider range into a narrower range, as different hardware gives differing results. On CPU, using SSE vs the classic FPU could return different results. On GPU, you could get different results depending on whether your GPU supported typed UAV's or native structured UAV's. On NPU, I don't even know yet.
Would -1.23 fp32 cast to 0 or 255 uint8?
I can say that locally for -1.0f -> uint8, I get 255 on CPU via C++ static_cast<uint8_t>, but I get 0 on my GPU (because -1.0f is mapped to int32_t -1, which then clamps to [0,255] when written out to the typed UAV (since uint8_t is not a natively supported type within HLSL). Then for -1.0f -> uint16_t, it's a similar story, 0xFFFF on CPU but 0 on GPU. Though, if I tried this on a GPU with D3D12_FEATURE_DATA_D3D12_OPTIONS4::Native16BitShaderOpsSupported true, then I might well get 0xFFFF instead.
So, if we require an always consistent answer in the spec for negative float -> uint cases, then we'd need some intermediate casts. Surprisingly though, this issue evidently hasn't come up so far in the DML EP. Want to open an issue for it?
Type casting from floating-point to signed/unsigned integer is a complex process, and one without clear industry standard b/c the casting will be both accuracy-lossy and range-dependent. It is generally understood simply as an "undefined" behavior where the outcome may depend on a number of runtime factors including the support in the hardware. This is part of the reason why no other framework attempts to define it concretely to-date and leave part of it as implementation-dependent. The same situation is applied to WebNN.
Another open is if the behavior depends on different hardware internals, would it cause some finger-printing issues?