-
Notifications
You must be signed in to change notification settings - Fork 0
Language Reference
I maintain this document as the evolving language reference for L0.
Status: bootstrap phase. I describe the currently implemented and enforced subset in l0c verify, plus the intended direction.
- Low-level, typed SSA source representation
- Deterministic canonical text format
- Numeric identity (
tN,fN,bN,vN, etc.) - Defined semantics by default (no implicit UB contracts)
Section order is fixed and required:
vertypesconstsexternglobalsfns
Bootstrap type-table requirement currently enforced:
-
types { }is valid (empty table). - non-empty
typesentries must use contiguous canonical ids:t0,t1,t2, ... - bootstrap type RHS token set currently supports:
-
i1,i8,i16,i32,i64 -
u8,u16,u32,u64 p0<i8>-
s{tA,tB,...}(struct with one-or-moretNfields) -
aN<tA>(fixed array withN > 0) -
fn(tA,...)->tR(function type with zero-or-more args and atNreturn)
-
- for
s{},aN<>, andfn()->, referencedtNids are validated and forward/self references are rejected in the bootstrap type-table parser.
Example skeleton:
ver 1
types { }
consts { }
extern { }
globals { }
fns {
}
- Types:
t0,t1, ... - Constants:
k0,k1, ... - Globals:
g0,g1, ... - Functions:
f0,f1, ... - Blocks:
b0,b1, ... - SSA values:
v0,v1, ...
Canonical function header shape currently enforced:
fn fN (arg_types)->tM {
Where:
-
arg_typesis either empty()or comma-separatedtNvalues, e.g.(t0,t1). - return type is
tM. - each referenced
tNmust exist in the parsed moduletypestable.
Function body requirements currently enforced:
- at least one block
- first block must be
b0: -
b0:must be unique inside a function - block labels use
bN: - every block label must be unique inside a function
- block labels must be contiguous in canonical order (
b0,b1,b2, ...) - instruction lines are indented with two spaces
- each block must terminate before next block or function close
- no instruction is allowed after a terminator within the same block
Function ordering requirement currently enforced:
- function ids must be contiguous in canonical order (
f0,f1,f2, ...) -
brandcbrtargets must reference blocks declared in the same function -
cbrcondition value must be typed asi1
retret vNbr bNcbr vN bT bF
Non-terminators must follow canonical assignment form:
vN = OP args... : tM
Current bootstrap checks enforce structural shape:
vN =- non-empty opcode token (restricted tokenizer subset)
- non-empty args payload
- explicit type suffix
: tM - value result type suffix
tMmust exist in the parsed moduletypestable.
Current bootstrap opcode-aware checks:
- unknown opcodes are rejected in the current bootstrap subset
-
argrequires a numeric index operand -
argindex must be within the function argument count -
argresult type must match the declared type of function argument indexN -
constrequires a decimal literal operand (Nor-N) -
callrequires args in canonical shape:fNfollowed by zero-or-morevNoperands -
calltargetfNmust reference a declared function in the module -
callresult type suffix must match the declared return type of targetfN -
callargument count must match the declared arity of targetfN -
icmp.eqrequiresvN vNoperands, ani1result type suffix, and matching operand value types -
ldrequiresvNoperand shape and enforcesp0<i8>pointer typing on the operand -
geprequiresvN <signed_decimal>operand shape and enforcesp0<i8>pointer typing on operand and result -
allocarequirestN, Noperand shape and enforcesp0<i8>result typing -
mallocrequiresvNoperand shape, enforces non-pointer typing onvN, and enforcesp0<i8>result typing -
stis accepted as a canonical non-value instruction (st vPtr vVal) with def-before-use andp0<i8>pointer typing checks onvPtr -
freeis accepted as a canonical non-value instruction (free vPtr) with def-before-use andp0<i8>pointer typing checks onvPtr -
exitis accepted as a canonical non-value instruction (exit vCode) with def-before-use checks and non-pointer typing checks onvCode -
writeis accepted as a canonical non-value instruction (write vPtr vLen) with def-before-use checks,p0<i8>pointer typing onvPtr, and non-pointer typing checks onvLen -
traceis accepted as a canonical non-value instruction (trace N vA vB ...) with decimal trace-idNand def-before-use checks on each traced value - binary ops (
add.wrap,add.trap,sub.wrap,sub.trap,mul.wrap,mul.trap,and,or,xor,shl,shr) requirevN vNoperands - binary ops require both operand value types to match the explicit result type suffix
Current bootstrap SSA check:
- each SSA value id (
vN) may be assigned once per function - def-before-use is enforced for:
-
ret vN(with return-type compatibility check) -
cbr vN bT bFcondition value -
call fN vA vB ...value operands (vA,vB, ...) -
ld,gep, andstvalue operands -
mallocandfreevalue operands -
exitvalue operands -
writevalue operands -
tracevalue operands - bootstrap binary operands (
vN vN)
-
Note: full opcode semantics/type-checking are still being added incrementally.
-
verifyrejects non-canonical structure for the implemented subset. -
canoncurrently validates and echoes canonical input. - full canonical rewrite mode is planned as a later pass.
l0c canon <input.l0>l0c canon <input.l0> -o <out.l0>l0c verify <input.l0>l0c build <input.l0> <out.l0img>l0c build <input.l0> -o <out.l0img>l0c build <input.l0> <out.l0img> --trace-schema <out.bin>l0c build <input.l0> <out.l0img> --debug-map <out.bin>l0c build <input.l0> <out.l0img> --trace-schema <out.bin> --debug-map <out.bin>l0c build-elf <input.l0> <out.o>l0c imgcheck <out.l0img>l0c imgmeta <out.l0img>l0c run <out.l0img> [u64_a] [u64_b] [u64_c] [u64_d] [u64_e] [u64_f]l0c tracecat <trace.bin>l0c mapcat <debug_map.bin>l0c schemacat <trace_schema.bin>l0c tracejoin <trace.bin> <debug_map.bin>
imgcheck currently validates:
- header magic
L0IM - version
1 - header size
80 - flags
0(reserved for future use) - source section bounds/size consistency
- code/debug section pair consistency (both zero or both valid in-bounds ranges)
- debug schema consistency for non-zero debug section (
L0IXmagic/version, kernel kind range, code-size match, trace schema/version constants)
imgmeta currently prints selected validated image metadata fields:
versionsrc_sizecode_sizefn_counttype_countkernel_kindtrace_schema_vertrace_record_size
Before printing, imgmeta now also rejects bootstrap debug-index schema mismatches:
- out-of-range kernel kind ids
- debug
code_sizemismatch vs image header - unexpected trace schema/version constants
run currently executes the image code section with a minimal syscall-only loader path:
- I validate core image header fields and code section bounds.
- I allocate executable memory with
mmapand copy code bytes into it. - I invoke code as
fn(u64,u64,u64,u64,u64,u64)->u64using optional decimal CLI args (u64_a..u64_f) as inputs mapped to SysV integer arg registers (rdi,rsi,rdx,rcx,r8,r9). - I print the returned value as unsigned decimal with a newline.
- I reject invalid numeric arguments.
tracecat currently decodes binary trace records as fixed 16-byte tuples:
u64 trace_idu64 traced_value
I can emit a matching schema file during build with --trace-schema <out.bin>.
Current bootstrap schema payload is 32 bytes:
- magic
L0TS - version
1 - record size
16 - field count
2
I can emit a minimal debug map file during build with --debug-map <out.bin>.
Current bootstrap debug map payload is variable-size:
- magic
L0DM - version
2 - instruction entry count
N - code size (
code_sizefrom the built image) - entry array with triplets:
inst_idstartend
- current bootstrap emits kernel-kind-specific deterministic ranges:
- fallback/const kernels use one full-range entry
- canonical lowered kernels use fixed opcode-boundary splits per kernel family
- current
tracekernel emits two entries:inst_id 1for trace record emission bytes andinst_id 2for trailing return path bytes - unknown future kernel kinds fall back to deterministic synthetic partitions
mapcat decodes this bootstrap debug map format and prints:
entries <count>code_size <bytes>- then one
inst_id/start/endtriplet per entry - it rejects entries with
inst_id = 0 - it rejects non-increasing
inst_idorder - it rejects invalid ranges (
start > endorend > code_size) - it rejects overlapping/non-monotonic ranges (
start < previous_end)
schemacat decodes the bootstrap trace-schema format and prints:
version <n>record_size <bytes>-
fields <count>and it now rejects schema payloads where bootstrap constants do not match (version != 1,record_size != 16, orfields != 2).
tracejoin decodes trace records and debug-map entries, joins by inst_id, and prints:
id <trace_id>val <value>start <offset>end <offset>- it rejects invalid debug-map entries (
inst_id = 0, non-increasinginst_id, or ranges outside/overlappingcode_size) - it rejects trace records whose
trace_idhas no matching debug-map entry - it rejects truncated/non-16-byte-aligned trace payloads
- it treats empty trace payloads as valid and emits no output
I print decoded output in deterministic text lines:
id <trace_id>val <traced_value>
Bootstrap build output currently also includes a compact 64-byte debug semantic index section:
- I currently emit one of two bootstrap code payloads:
- canonical lowered kernel payloads for:
-
add.wrap,add.trap,sub.wrap,sub.trap,mul.wrap,mul.trap,and,or,xor,shl,shr - commutative binary kernels in that set also accept canonical swapped operand order (
v1 v0) during bootstrap lowering - non-commutative
sub.wrapalso accepts canonical swapped operand order (v1 v0) during bootstrap lowering by selecting a reverse-sub payload - binary kernel templates also accept nonzero result value ids when
retreferences the same value (vN = <op> ...,ret vN) - before binary kernel selection, I run a normalization pass that strips canonical dead
constvalue lines; this lets me lower binary kernels even when dead const defs are interleaved in the block - in that normalization pass, I now strip only dead const defs (not live const defs), and I scope dead-const detection to the current function so same numeric value ids in other functions do not interfere
-
icmp.eqcompare kernel (i64args,i1result) - canonical
icmp.eq + cbrselect kernel (i64args,i64result) - before compare/select kernel selection, I run normalization that strips dead
constand deadicmp.eqvalue lines, so canonical interleaved dead defs do not blockicmp.eqoricmp.eq + cbrlowering - both kernels also accept swapped compare operand order (
icmp.eq v1 v0) in bootstrap lowering -
icmp.eqcompare kernels also accept nonzero result ids whenretreferences the same compare result value id -
icmp.eq + cbrkernels also accept nonzero compare-result ids whencbrreferences the same compare result value id -
icmp.eq + cbrkernels now also lower deterministic reverse return mappings whereb1returns arg1 andb2returns arg0 (including normalized dead-const variants) -
icmp.eq + cbrkernels now also tolerate extra dead pure value lines (constoricmp.eq) inb0,b1, andb2when compare id/dataflow and branch-return mapping still match supported selector shapes - canonical memory roundtrip kernel (
alloca+st+ld) - before memory-roundtrip kernel selection, I run the same dead-const normalization pass, so canonical interleaved dead
constdefs do not block lowering - memory-roundtrip kernel also accepts canonical nonzero ids across arg/alloca/st/ld/ret when ids/dataflow match
- memory-roundtrip kernel also accepts canonical arg-return form (
ret vArg) when the stored value is that same arg - memory-roundtrip kernel also accepts canonical nonzero
allocaelement counts (alloca t0, N,N > 0) - memory-roundtrip kernel also accepts either canonical arg/alloca definition order (
argthenalloca, orallocathenarg) - canonical
gepmemory roundtrip kernel (alloca+st+gep+ld) - before memory-gep-roundtrip kernel selection, I run the same dead-const normalization pass, so canonical interleaved dead
constdefs do not block lowering - memory-gep-roundtrip kernel also accepts canonical nonzero ids across arg/alloca/st/gep/ld/ret when ids/dataflow match
- memory-gep-roundtrip kernel also accepts canonical arg-return form (
ret vArg) when the stored value is that same arg - memory-gep-roundtrip kernel also accepts canonical nonzero
allocaelement counts (alloca t0, N,N > 0) - memory-gep-roundtrip kernel also accepts either canonical arg/alloca definition order (
argthenalloca, orallocathenarg) - canonical intrinsic kernels (
mallocsyscall-backed allocator,freeno-op,exitsyscall,writesyscall; canonical newline test returns0,tracecurrently lowers to fixed 16-byte binary stderr emission) - before
mallocandexitkernel selection, I run generalized dead pure-line normalization, so canonical interleaved deadconstand deadicmp.eqdefs do not block lowering for those const-independent intrinsic shapes -
mallocintrinsic kernel also accepts canonical nonzero arg/result ids when ids/dataflow match (vN = arg ...,vM = malloc vN,ret vM) -
freeintrinsic kernel also accepts canonical nonzero arg/const-ret ids when ids/dataflow match (vN = arg ...,free vN,vM = const 0,ret vM) -
exitintrinsic kernel also accepts canonical nonzero arg/return ids when ids/dataflow match (vN = arg ...,exit vN,ret vN) - bootstrap newline
writeintrinsic kernel also accepts canonical nonzero ids across alloca/const/store/write/ret when ids/dataflow match - bootstrap newline
writeintrinsic kernel also accepts canonical nonzeroallocaelement counts (alloca t0, N,N > 0) -
traceintrinsic kernel also accepts canonical nonzero traced-arg id and const/return id when ids/dataflow match (trace 1 vNandret vMwherevMis the const-def id) - for const-dependent kernels (
free,write,trace), I now run generalized dead pure-line normalization before selector matching and lower valid dead-const/dead-icmp-injected canonical variants (including nonzero-id, multi-dead-const, and cross-function value-id-reuse cases), while preserving intentional write guardrail fallback foralloca ... , 0shapes - intrinsic generalized coverage now also includes multi-dead-pure (
const+icmp.eq) variants formalloc/free/write/trace/exit, plus cross-function dead-icmp id-reuse variants forfreeandtrace, with writealloca 0cross-function guardrails preserved - in the current build selector chain, these const-dependent intrinsic families are routed through generalized normalized selector paths only (legacy direct fallback stages are removed)
- I now apply the same generalized-only routing to all current generalized families, including const-return (
exit,malloc,call, memory roundtrip families, compare/select, binary, and const-return) - canonical two-function call kernels (
f0callingf1withadd.wrap/sub.wrap/mul.wrap/and/or) - before call-kernel selection, I run generalized dead pure-line normalization, so canonical interleaved dead
constand deadicmp.eqdefs inf0/f1do not block call lowering - call->commutative targets (
add.wrap,mul.wrap,and,or,xor) also accept swapped call-arg order inf0(call f1 v1 v0) in bootstrap lowering - call->non-commutative targets (
shl,shr) now lower both semantic call-arg mappings:- canonical arg0->arg1 mapping
- reverse arg1->arg0 mapping via dedicated reverse-shift payloads
- for call->
sub.wrap, I now lower canonical semantic arg0->arg1 shapes and one deterministic reverse-mapping shape wheref0provides arg1->arg0 mapping whilef1remains canonical - non-matching non-commutative call shapes remain intentionally unlowered guardrails in current bootstrap lowering
- call-kernel templates also accept either canonical arg-definition order in
f0(arg 0thenarg 1, orarg 1thenarg 0) - for call->
sub.wrap, I keep non-commutative guardrails by lowering canonical and explicit reverse-mapping forms under supportedf0/f1mapping combinations - call-kernel templates also accept either canonical arg-definition order in
f1(arg 0thenarg 1, orarg 1thenarg 0) - for call->
sub.wrap, I now also lower reversef1mapping variants (including argdef-order-swapped and dead-const-normalized forms); non-matching structural mismatch shapes remain guardrailed - call-kernel templates also accept nonzero call-result ids in
f0whenretreferences the same value id - mismatch trace-id/dataflow shapes remain intentionally unlowered in current bootstrap selector and are regression-tested
- mismatch malloc result-id/dataflow shapes remain intentionally unlowered in current bootstrap selector and are regression-tested
- mismatch free-noop const/return-id dataflow shapes remain intentionally unlowered in current bootstrap selector and are regression-tested
- non-returning
exitshapes now lower whenexitoperand matches the canonical arg id, even when trailing return-path lines are unreachable - mismatch write-newline const/return-id dataflow shapes remain intentionally unlowered in current bootstrap selector and are regression-tested
- mismatch memory-roundtrip load/return-id dataflow shapes remain intentionally unlowered in current bootstrap selector and are regression-tested
- mismatch memory-gep-roundtrip load/return-id dataflow shapes remain intentionally unlowered in current bootstrap selector and are regression-tested
- canonical branch-identity multi-block modules (
cbr vN b1 b2with both branches returningvN) are now directly lowered in bootstrap instead of falling back to the single-byteretstub - canonical branch-const-select multi-block modules (
cbr vN b1 b2with branch-localconstreturns) are now directly lowered in bootstrap, including dead-const-normalized variants, while unsupported branch-return mappings remain strict fallback guardrails - canonical merge-memory-select multi-block modules (
cbr-> branch-localconst+st->b3joinld+ret) are now directly lowered in bootstrap, including dead-const-normalized variants, while unsupported join-return mappings remain strict fallback guardrails - canonical spill/reload stress modules (multi-op value chains lowered through explicit stack spill/reload style backend paths) are now directly lowered in bootstrap, including dead-const-normalized variants, while unsupported return-mapping shapes remain strict fallback guardrails
- const-return kernel (
const Norconst -N->ret v0) - const-return kernels also accept nonzero const-def value ids when the same id is returned (
vN = const ...,ret vN)
-
- canonical lowered kernel payloads for:
- fallback payload for other verified modules: single-byte
ret(0xC3) - magic
L0IX - version
1 - function count
- type count
- kernel kind id
- code size
- trace schema version
- trace record size
- General CFG lowering beyond current template/kernel selectors.
- SSA merge/join lowering for broader branch convergence shapes.
- Register allocation generalization and spill stress coverage.
- ABI/output-path expansion (including object emission milestones).
I use this consolidated matrix when I need a single place that states ability, type contract, and lowering status.
| Family | Ops | Type/Shape Contract | Current Lowering Behavior |
|---|---|---|---|
| arithmetic |
add.wrap, add.trap, sub.wrap, sub.trap, mul.wrap, mul.trap
|
binary vA vB, explicit result type equals operand types |
canonical lowered templates, otherwise fallback |
| bitwise/shift |
and, or, xor, shl, shr
|
binary vA vB, explicit result type equals operand types |
canonical lowered templates, otherwise fallback |
| compare | icmp.eq |
binary operands same type, explicit result type i1
|
canonical lowered templates, otherwise fallback |
| calls | call fN ... |
callee exists, arity/types match signature, result type matches return type | canonical two-function lowered families, otherwise fallback |
| memory |
alloca, st, ld, gep
|
pointer paths use p0<i8> bootstrap contract |
canonical memory lowered families, otherwise fallback |
| intrinsics |
malloc, free, write, exit, trace
|
intrinsic-specific pointer/non-pointer checks | canonical intrinsic lowered families, otherwise fallback |
| control flow |
br, cbr, ret
|
target existence + cbr condition i1 + return type compatibility |
canonical CFG templates lowered for supported families, otherwise fallback |
I keep unsupported shapes verifier-valid where possible, but codegen may intentionally choose deterministic fallback (kernel_kind 0).
- How-To-Write-L0
- Language-Reference
- Instruction-Set
- CLI-and-Compiler-Spec
- Implementable-Spec
- Command-Reference
- Examples-Catalog
- LLM-Quick-Reference
- Opcode-Examples
- LLM-Doc-Index