From cd667362afc27da8c036182585dc13cae6b3eabb Mon Sep 17 00:00:00 2001 From: Faras Siddiqui Date: Fri, 13 Mar 2026 04:58:45 +0500 Subject: [PATCH] chore: upgrade CFLAGS from -O2 to -O3 -ffast-math -O3 enables more aggressive inlining and loop opts. -ffast-math allows float reordering, safe here because: - software FP16 uses integer bit manipulation, unaffected - online softmax exponents are always <= 0 by construction - model weights are 4-bit quantized, ULP differences irrelevant Deliberately omitted: -funroll-loops (would bloat ~80KB binary toward 200-400KB) -flto (could OOM on Pi Zero during on-device compilation) Binary size: 87784 -> 87736 bytes (-48 bytes). Tested on Apple M4 Pro, TinyLlama 1.1B Q4_K_M, -t 0 greedy: -n 20: 23.9 -> 26.6 tok/s (+11%) -n 100: 20.9 -> 22.2 tok/s (+6%) Output character-identical to baseline. Closes #16 --- picolm/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/picolm/Makefile b/picolm/Makefile index 4fd3c7a..b7764a3 100644 --- a/picolm/Makefile +++ b/picolm/Makefile @@ -1,5 +1,5 @@ CC = gcc -CFLAGS = -O2 -std=c11 -D_GNU_SOURCE -Wall -Wextra -Wpedantic +CFLAGS = -O3 -std=c11 -D_GNU_SOURCE -Wall -Wextra -Wpedantic -ffast-math LDFLAGS = -lm -lpthread SRCS = picolm.c model.c tensor.c quant.c tokenizer.c sampler.c grammar.c TARGET = picolm