chore: upgrade CFLAGS from -O2 to -O3 -ffast-math#23
Open
siddiquifaras wants to merge 1 commit intoRightNow-AI:mainfrom
Open
chore: upgrade CFLAGS from -O2 to -O3 -ffast-math#23siddiquifaras wants to merge 1 commit intoRightNow-AI:mainfrom
siddiquifaras wants to merge 1 commit intoRightNow-AI:mainfrom
Conversation
-O3 enables more aggressive inlining and loop opts. -ffast-math allows float reordering, safe here because: - software FP16 uses integer bit manipulation, unaffected - online softmax exponents are always <= 0 by construction - model weights are 4-bit quantized, ULP differences irrelevant Deliberately omitted: -funroll-loops (would bloat ~80KB binary toward 200-400KB) -flto (could OOM on Pi Zero during on-device compilation) Binary size: 87784 -> 87736 bytes (-48 bytes). Tested on Apple M4 Pro, TinyLlama 1.1B Q4_K_M, -t 0 greedy: -n 20: 23.9 -> 26.6 tok/s (+11%) -n 100: 20.9 -> 22.2 tok/s (+6%) Output character-identical to baseline. Closes RightNow-AI#16
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Upgrades Makefile CFLAGS from
-O2to-O3 -ffast-mathfor all Makefile-based builds (Linux, macOS, Pi, RISC-V). The Windows MSVC build is unaffected since it uses hardcoded flags inbuild.bat/ CI.Type of change
Why these flags are safe
-O3enables more aggressive inlining and loop optimization with a modest binary size impact.-ffast-mathallows float reordering and assumes no NaN/Inf in float operations. Safe here because:fp16_to_fp32/fp32_to_fp16) uses integer bit manipulation, completely unaffected by-ffast-mathWhat I deliberately left out
-funroll-loops: would bloat the binary from ~80KB toward 200-400KB, breaking the project's advertised binary size-flto: requires holding full program IR at link time, could OOM on Pi Zero / LicheeRV Nano (512MB / 256MB RAM) during on-device compilationTesting
Test command:
Output:
Output is character-identical to the baseline
-O2build at all context lengths tested (-n 20, -n 100, -n 256).Results
Checklist
make native)--jsonmode--cacheround-tripCloses #16