-
Notifications
You must be signed in to change notification settings - Fork 0
Literals
This document describes the bytecode for generating heap-allocated
literals in the SML/NJ compiler. It replaces the previous bytecode
and was introduced as part of the support for future improvements,
such as 64-bit support, Real32, and better Int64 and IntInf
integration.
The compiler extracts the literal values from the CPS IR and generates a program for building a record of literal values. Direct references to literal values in the CPS code are replaced by references to components of the literal record. The binary data representing the literal-construction program is packaged as part of the Binfile generated by the compiler.
Multiple byte quantities are represented in big-endian form (most-significant byte first).
The first four 32-bit words of the literal representation correspond to the following C struct:
struct literal_header {
uint32_t magic;
uint32_t maxstk;
uint32_t wordsz;
uint32_t maxsaved;
};where
-
magiccontains the version ID (which should be0x20171031) -
maxstkis the maximum stack depth required, and -
wordszis the size of an ML value (32 or 64) -
maxsavedis the number of saved literals (used for sharing)
Note that Version 1 files will have the version ID 0x19981022 and have
the first two header fields, but not the wordsz or numsaved fields.
The following is a list of the symbolic opcodes used in the interpreter. We describe the instruction encoding below.
-
INT(n) literal value in the default (tagged) integer or word type (
Int.intorWord.word). The valuenshould be in the range -2^w-1^ to 2^w^-1 when encoded as a w-bit 2's complement integer. The width w will be 31 or 63 depending on the host architecture. -
INT32(n) 32-bit literal value for either the type
Int32.intorWord32.int. -
INT64(n) 64-bit literal value for either the type
Int64.intorWord64.int. -
BIGINT(n) arbitrary precision integer literal (currently not used).
-
IVEC8(n, b
1, ..., bn) packed vector of 8-bit integers for either the typeInt8Vector.vectororWord8Vector.vector. -
IVEC16(n, h
1, ..., hn) packed vector of 16-bit integers for either the typeInt16Vector.vectororWord16Vector.vector(currently not used). -
IVEC32(n, w
1, ..., wn) packed vector of 32-bit integers for either the typeInt32Vector.vectororWord32Vector.vector(currently not used). -
IVEC64(n, d
1, ..., dn) packed vector of 64-bit integers for either the typeInt64Vector.vectororWord64Vector.vector(currently not used). -
REAL32(f) 32-bit floating-point literal for the type
Real32.real(currently not used). -
REAL64(f) 64-bit floating-point literal for the type
Real32.real. -
RVEC32(n, f
1, ..., fn) packed vector of 32-bit floating-point literals for the typeReal32Vector.vector(currently not used). -
RVEC64(n, F
1, ..., Fn) packed vector of 64-bit floating-point literals for the typeReal64Vector.vector. -
STR8(s) string literal (8-bit characters)
-
RECORD(n) construct record from the topmost n literal values
-
VECTOR(n) construct a vector from the topmost n literal values
-
RAW8(n) raw sequence of bytes. This literal does not have an SML type.
-
RAW16(n) raw sequence of 16-bit values. This literal does not have an SML type.
-
RAW32(n) raw sequence of 32-bit values. This literal does not have an SML type.
-
RAW64(n) raw sequence of 64-bit values. This literal does not have an SML type.
-
CONCAT(n) pop n records/vectors from the stack and concatenate them into a single record/vector. This operation allows the implementation to avoid excessively large stacks when building very large record/vector literals.
-
SAVE(i) save the top of the stack in the i^th^ save slot, which allows it to be shared by some subsequent aggregate literal.
-
LOAD(i) push the i^th^ saved literal onto the stack.
-
RETURN signals the end of the program; the stack depth should be one and that value is popped and returns as the result.
There are a number of additional features that we might want to support, which we list here.
-
support for 32-bit string literals for the type
WideString.string -
support for array literals (like vectors, but mutable)
In the encoding below, we use the following conventions:
-
b represents a signed 8-bit integer.
-
ub represents an unsigned 8-bit integer.
-
c represents a 8-bit character.
-
h represents a signed 16-bit integer.
-
w represents a signed 32-bit integer.
-
lw represents a signed 64-bit integer.
-
n represents a 32-bit integer length (usually unsigned).
-
d represents a bignum digit whose size will be the default word size.
-
f represents a 32-bit floating-point literal.
-
F represents a 64-bit floating-point literal.
-
i represents a tagged default int or word literal (e.g.,
Int.intorWord.word).
-
00000000(0x00)
INT(0)
default tagged literal value 0. -
00000001(0x01)
INT(1)
default tagged literal value 1. -
00000010(0x02)
INT(2)
default tagged literal value 2. -
00000011(0x03)
INT(3)
default tagged literal value 3. -
00000100(0x04)
INT(4) default tagged literal value 4. -
00000101(0x05)
INT(5) default tagged literal value 5. -
00000110(0x06)
INT(6) default tagged literal value 6. -
00000111(0x07)
INT(7) default tagged literal value 7. -
00001000(0x08)
INT(8) default tagged literal value 8. -
00001001(0x09)
INT(9) default tagged literal value 9. -
00001010(0x0A)
INT(10) default tagged literal value 10. -
00001011(0x0B)
INT(-1) default tagged literal value -1. -
00001100(0x0C)
INT(-2) default tagged literal value -2. -
00001101(0x0D)
INT(-3) default tagged literal value -3. -
00001110(0x0E)
INT(-4) default tagged literal value -4. -
00001111(0x0F)
INT(-5) default tagged literal value -5. -
00010000(0x10b)
INT(b) --- for tagged integer literals in the range -128..127. -
00010001(0x11h)
INT(h) --- for tagged integer literals in the range -32768..32767. -
00010010(0x12w)
INT(w) --- for tagged integer literals in the range -2147483648..2147483647. -
00010011(0x13lw)
INT(lw) --- for all other tagged integer literals (64-bit target only). -
00010100(0x14b)
INT32(b) --- for 32-bit integer literals in the range -128..127. -
00010101(0x15h)
INT32(h) --- for 32-bit integer literals in the range -32768..32767. -
00010110(0x16w)
INT32(w) --- for all other 32-bit integer literals. -
00010111(0x17b)
INT64(b) --- for 64-bit integer literals in the range -128..127. -
00011000(0x18h)
INT64(h) --- for 64-bit integer literals in the range -64768..64767. -
00011001(0x19w)
INT64(w) --- for 64-bit integer literals in the range -2147483648..2147483647. -
00011010(0x1Alw)
INT64(lw) --- for all other 64-bit integer literals. -
00011011(0x1Bn d1... d~|n|)|n|~ ... d
BIGINT(i) --- where i = sign(n) b^|n|-1^ d1. I.e., the absolute value of n is the number of digits, where is n is negative, then i is negative. The digits follow n in least-significant to most-significant order. If n is zero, the i is zero. The base b and size of the digits will depend on the target word size. -
00011100(0x1Cub i1... iub)
IVEC(ub, i1, ..., iub) --- short int vector (up to 255 elements). -
00011101(0x1Dn i1... in)
IVEC(ub, i1, ..., in) -
00011110(0x1Eub b1... bub)
IVEC8(ub, b1, ..., bub) --- short bytevectors (up to 255 elements). -
00011111(0x1Fn b1... bn)
IVEC8(n, b1, ..., bn) -
00100000(0x20ub h1... hub)
IVEC16(ub, h1, ..., hub) --- short 16-bit integer vectors (up to 255 elements). -
00100001(0x21n h1... hn)
IVEC16(n, h1, ..., hn) -
00100010(0x22ub w1... wub)
IVEC32(ub, w1, ..., wub) --- short 32-bit integer vectors (up to 255 elements). -
00100011(0x23n w1... wn)
IVEC32(n, w1, ..., wn) -
00100100(0x24ub lw1... lwub)
IVEC64(ub, lw1, ..., lwub) --- short 64-bit integer vectors (up to 255 elements). -
00100101(0x25n lw1... lwn)
IVEC64(n, lw1, ..., lwn) -
00100110(0x26f)
REAL32(f) -
00100111(0x27F)
REAL64(F) -
00101000(0x28ub f1... fub)
RVEC32(ub, f1, ..., fub) --- short 32-bit real vectors (up to 255 elements). -
00101001(0x29n f1... fn)
RVEC32(n, f1, ..., fn) -
00101010(0x2Aub F1... Fub)
RVEC64(ub, F1, ..., Fub) --- short 64-bit real vectors (up to 255 elements). -
00101011(0x2Bn F1... Fn)
RVEC64(n, F1, ..., Fn) -
00101100(0x2Cub c1... cub)
STR8(s) --- where size(s) = ub and c1, ..., cubare the characters of s. -
00101101(0x2Dn c1... cn)
STR8(s) --- where size(s) = n and c1, ..., cnare the characters of s. -
00101110(0x2E)
reserved for STR32 -
00101111(0x2F)
reserved for STR32 -
00110000(0x30)
RECORD(1) -
00110001(0x31)
RECORD(2) -
00110010(0x32)
RECORD(3) -
00110011(0x33)
RECORD(4) -
00110100(0x34)
RECORD(5) -
00110101(0x35)
RECORD(6) -
00110101(0x36)
RECORD(7) -
00110101(0x37ub)
RECORD(ub) -
00110101(0x38h)
RECORD(h) -
00110101(0x39ub)
VECTOR(ub) -
00110101(0x3Ah)
VECTOR(h) -
00110101(0x3Bub)
RAW8(ub) -
00110101(0x3Ch)
RAW8(h) -
00110101(0x3Dub)
RAW16(ub) -
00110101(0x3Eh)
RAW16(h) -
00111111(0x3Fub)
RAW32(ub) -
01000000(0x49h)
RAW32(h) -
01000001(0x41ub)
RAW64(ub) -
01000010(0x42h)
RAW64(h) -
01000011(0x43h)
CONCAT(h) -
01000100(0x44ub)
SAVE(ub) -
01000101(0x45h)
SAVE(h) -
01000110(0x46ub)
LOAD(ub) -
01000111(0x47h)
LOAD(h) -
01001000--11111110(0x48--0xFE)
unused -
11111111(0xFF)
RETURN