Skip to content

n bit integer values #4

@AaronGoldman

Description

@AaronGoldman

Proposal int0 = -1 and int1 = {-1, 0}

As we encode integers in different formats it makes sence to have types that can restrict the domain to a fixed number of bits
These subsets of integers can be referred to as the n bit integers and come in signed and unsigned varieties. Some examples of integers sizes that may make sence are the powers of 2, the vint sizes, and the utf-8 sizes.

raw
| unsigned integer                  | signed integer                                    |
| name | min |                  max | name |                  min |                 max |
| u0   |   0 |                    0 | i0   |                 -1.0 |                -1.0 |
| u1   |   0 |                    1 | i1   |                   -1 |                   0 |
| u2   |   0 |                    3 | i2   |                   -2 |                   1 |
| u4   |   0 |                   15 | i4   |                   -8 |                   7 |
| u8   |   0 |                  255 | i8   |                 -128 |                 127 |
| u16  |   0 |                65535 | i16  |               -32768 |               32767 |
| u32  |   0 |           4294967295 | i32  |          -2147483648 |          2147483647 |
| u64  |   0 | 18446744073709551615 | i64  | -9223372036854775808 | 9223372036854775807 |

vint
| unsigned integer                  | signed integer                                    |
| name | min |                  max | name |                  min |                 max |
| u7   |   0 |                  127 | i7   |                  -64 |                  63 |
| u14  |   0 |                16383 | i14  |                -8192 |                8191 |
| u21  |   0 |              2097151 | i21  |             -1048576 |             1048575 |
| u28  |   0 |            268435455 | i28  |           -134217728 |           134217727 |
| u35  |   0 |          34359738367 | i35  |         -17179869184 |         17179869183 |
| u42  |   0 |        4398046511103 | i42  |       -2199023255552 |       2199023255551 |
| u49  |   0 |      562949953421311 | i49  |     -281474976710656 |     281474976710655 |
| u56  |   0 |    72057594037927935 | i56  |   -36028797018963968 |   36028797018963967 |
| u64  |   0 | 18446744073709551615 | i64  | -9223372036854775808 | 9223372036854775807 |

utf8
| unsigned integer                  | signed integer                                    |
| name | min |                  max | name |                  min |                 max |
| u7   |   0 |                  127 | i7   |                  -64 |                  63 |
| u11  |   0 |                 2047 | i11  |                -1024 |                1023 |
| u16  |   0 |                65535 | i16  |               -32768 |               32767 |
| u21  |   0 |              2097151 | i21  |             -1048576 |             1048575 |
| u26  |   0 |             67108863 | i26  |            -33554432 |            33554431 |
| u32  |   0 |           4294967295 | i32  |          -2147483648 |          2147483647 |

The table of n bit integers can be derived formulaically.
for unsigned ints the min is 0 and the max is (2 ^ n) - 1
we steal one of the positive numbers to have a representation for 0
leaving us with the range [0, (2 ^ n) - 1]

for the signed integers we first split the numbers in half leaving just 2 ^ (n-1) numbers for each sign.
once again we steal one of the positive numbers for the zero
leaving us with the range [-(2 ^ (n - 1)), (2 ^ (n - 1)) - 1]

where this gets interesting is the case of i1 witch can only 2 values once we divide the space in two there is only a single positive value and we steal it to make the 0 leaving us with the range [-1, 0]
even weirder is the case of the stateless zero bit integers
for u0 it is straightforward 2^0-1 = 0 so the only value is 0 and the range is [0, 0]
for the i0 the naive formula would render the i0max as 1/2 and the min at -1/2 theas are not integers and not equal thus not acceptable answers for the value of i0 so we extended the formula with a integer deviation by one just to force range back into the integers giving us the range of [-1, -1] witch is a single value needing no memory and is a signed integer thuse a acceptable value for i0. As a benofit it fits with the intuition of splitting the range between the positive and negative numbers then failing to steal the zero as there is no positive value to take.

#practical concerns

The fact that the. formula is not in and of itself a compelling reson for i1 to be [-1, 0] but in the hardwere we used the fact that the first bit of a signed 2s complement number is set to test that the number is negative. By violating this property for i1 we risk making it a special case that not only the language designer but also the users must deal with whenever that have a signed number that they wish to test the sign of.
The case is less compelling for i0 being -1 as it has no first bit as it is a no memory type and would be used as part of constant propagation. That said I still think it might make generics a little cleaner to have u0 and i0 not be equal as a signed and unsigned value are only equal when the bit patterns match and and the first bit is not set on either number. As there is no first bit the first bits can't be 0 and the number cannot be equal.

#union types
one nisity we get from i0 = -1 = True and u0 = 0 = False is that i1 is i0u0
In general if we have a iN and a uN we need to go up a bit to represent is to i(N+1) this symatery is only preserved if i0 and u0 are different values.

This way all signed ints. and unsigned int can

i0 = -1 = True
u0 = 0 = False
i1 = {-1, 0} = bool

Python code for the table

iname = lambda n: 'i' + str(n)
imin  = lambda n: -(2**(n - 1)) // 1
imax  = lambda n: (2**(n - 1) - 1) // 1
uname = lambda n: 'u' + str(n)
umin  = lambda n: 0
umax  = lambda n: (2**n) - 1


line  = lambda n: '| {:4} | {:3} | {:20} | {:4} | {:20} | {:19} |'.format(
    uname(n), umin(n), umax(n), iname(n), imin(n), imax(n))
header = """| unsigned integer                  | signed integer                                    |
| name | min |                  max | name |                  min |                 max |
"""

print('raw')
# 0 byte  i0 
# 1 byte  i1  0000 000x
# 1 byte  i2  0000 00xx
# 1 byte  i4  0000 xxxx
# 1 byte  i8  xxxx xxxx
# 2 byte  i16 xxxx xxxx  xxxx xxxx
# 4 byte  i32 xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx
# 8 byte  i64 xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx
body = '\n'.join(line(n) for n in [0, 1, 2, 4, 8, 16, 32, 64])
print(header + body)
print()

print('vint')
#               Lo        Hi      LoHi
# print(2**16 - 2 ** 10 - 2**10 + 2**20) # 1,112,064 max Unicode code point
# 3 bytes of vint for any Unicode code point compaired to 4 byte utf-8
# 1 byte  i7  0xxx xxxx
# 2 byte  i14 10xx xxxx  xxxx xxxx
# 3 byte  i21 110x xxxx  xxxx xxxx  xxxx xxxx
# 4 byte  i28 1110 xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx
# 5 byte  i35 1111 0xxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx
# 6 byte  i42 1111 10xx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx
# 7 byte  i49 1111 110x  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx
# 8 byte  i56 1111 1110  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx
# 9 byte  i64 1111 1111  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx
body = '\n'.join(line(n) for n in [7, 14, 21, 28, 35, 42, 49, 56, 64])
print(header + body)
print()

print('utf8')
# Unicode code space [0,1114111] or [0x000000, 0x10FFFF] < 21 bits
#  1 byte i7  0xxx xxxx
#  2 byte i11 110x xxxx  10xx xxxx
#  3 byte i16 1110 xxxx  10xx xxxx  10xx xxxx
#  4 byte i21 1111 0xxx  10xx xxxx  10xx xxxx  10xx xxxx  # Unicode 
#  5 byte i26 1111 10xx  10xx xxxx  10xx xxxx  10xx xxxx  10xx xxxx
#  6 byte i32 1111 11xx  10xx xxxx  10xx xxxx  10xx xxxx  10xx xxxx  10xx xxxx
body = '\n'.join(line(n) for n in [7, 11, 16, 21, 26, 32])
print(header + body)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions