Skip to content

(Un)signedness of the base type is not taken into account in min/max and some comparison primitives. #4

@pdamme

Description

@pdamme

We always use one of the uintX_t types from the <cstdint>-header for the base type of the processing style when specializing the TVL primitives and usually use the suitable _mm*_epiX intrinsic (for Intel‘s SIMD extensions). While this is fine for almost all primitives, it yields wrong results for certain inputs for primitives where the (un)signedness of the base type actually matters. This is the case for primitives that must determine which of two vector elements is less/greater than the other. In particular these primitives are (at least): min, max, less, greater, lessequal, greaterequal.

For instance, given the two input elements 0xffffffffffffffff and 0x0000000000000001 the current AVX-512 specialization of the min-primitive would interpret them as -1 and 1, respecitively, and, thus, identify the former as the minimum, which is correct only for signed int64_t. However, since the primitive is a specialization for uint64_t base type, the inputs should be interpreted as (2^64)-1 and 1, whereby the latter is the minimum.

Please note that we really need the specialization for uintX_t, since we exclusively work with unsigned integers in MorphStore at the moment.

Please also note that the current state of things also incurs an inconsistency between the scalar and the vectorized primitives, since the specializations for a scalar processing style always return the correct result.

The solution would be to use the _mm*_epu intrinsics, e.g., _mm512_min_epu64 instead of _mm512_min_epi64, whenever such an intrinsic is available. When no instrinsic for unsigned elements is available, we require an efficient workaround. For instance, there is no _mm_min_epu64 in SSE.

Furthermore, we should keep the current specializations of the primitives mentioned above, but correct them such that they assume intX_t as the base type, rather than uintX_t.

Finally, this issue is worth fixing, since we often try all possible bit widths when experimenting with compression. That is, we really encounter data elements with the MSB set. In fact, some micro benchmarks in the Engine repo need to circumvent this issue, which they should not need to do.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions