batch.cpp(Primary batch processing):- Multiple
#pragma omp parallel for schedule(static) if(count > 500) - Targets: Visibility culling, distance checks, animation updates, physics batches.
- Functions: Batch visibility, distance culling, animation/position updates.
- Multiple
particle_physics.cpp:#pragma omp parallel for schedule(static)on particle processing loop (line ~76).- Handles multi-threaded particle updates when OpenMP available.
bootstrap_loader.cpp: Comments mention OpenMP for heightmap gen/shader warmup; uses pthread master + OMP.mesh_deformation.cpp: Defines no-op macros (#ifndef _OPENMP); ready for vertex deformation loops.physics.cpp: No-op defines; potential for collision/obstacle loops.omp.h: Local copy of OpenMP header (LLVM-based).libomp.a: Static lib inemscripten/for linking.
build.sh:- Flags:
-fopenmp -pthread -matomics -mbulk-memory -msimd128 -ffast-math - Linking:
-L$SCRIPT_DIR -lomp
- Flags:
- Emscripten supports OMP via pthreads (multi-threaded WASM).
- Fallback: Single-threaded mode (common in browser) ignores pragmas gracefully via no-op defines.
- Multi-threaded: 2-8x speedup on loops >500 iterations (e.g., 10k particles).
- Overhead:
if(count > 500)prevents spawn on small batches. - Tested: Works in dev/prod builds; verified via
verify_build.js.
High-impact loops (add #pragma omp parallel for schedule(static) if(N > 500)):
animation_batch.cpp: Batch animation calcs (sway, bounce, wobble).fluid.cpp: Fluid sim steps (velocity/pressure updates).mesh_deformation.cpp: Vertex deformation loops (wave/jiggle).physics.cpp: Collision detection, obstacle queries.math.cpp: Batched noise/FBM/invSqrt (if vectorized).bootstrap_loader.cpp: Heightmap gen chunks.
Priorities:
- High: Particle/fluid/physics batches (>10k elements).
- Medium: Animation/mesh deformation (per-frame).
- Low: Math utils (SIMD-first).
#include "omp.h" // Local copy in emscripten/
// Always add no-op defines for ST fallback:
#ifndef _OPENMP
#define omp_get_thread_num() 0
#define omp_get_num_threads() 1
#endif
- Include only in files with pragmas (reduces binary size).
- Place no-ops early (before any
omp_get_*calls).
#pragma omp parallel for \
schedule(static) if(count > 500) \
reduction(+:sum) // If needed
- schedule(static): Best for uniform work (e.g., particles).
- Threshold:
if(N > 500)– Emscripten pthread spawn cost ~100-500 iters. - Reductions: Use for scalars (e.g.,
visibleCount). - No dynamic/shared: Avoid unless profiled.
COMPILE_FLAGS="-O2 -msimd128 -ffast-math -fwasm-exceptions -fno-rtti -funroll-loops -mbulk-memory -fopenmp -pthread -matomics -I."
LINK_FLAGS="... -fopenmp -pthread -L$SCRIPT_DIR -lomp"
- MANDATORY:
-fopenmp -pthread(enables pthreads). - OPT:
-msimd128(combine with OMP for hybrid speedup). - DEBUG:
CANDY_DEBUG=1enables assertions.
- Multi-thread: Requires SharedArrayBuffer (COOP/COEP headers via Vite).
- ST Fallback: Pragmas become no-ops; no crash.
- Pthread Pool: OMP uses Emscripten pthreads (max 2048 threads).
- No GPU OMP: CPU-only; use WebGPU compute for shaders.
- Verify:
npm run verify:emccchecks exports/pragmas.
# Build with OMP
npm run build:emcc
# Verify exports & perf
npm run test:integration
node verify_build.js # Checks OMP symbols
# Profile (Chrome DevTools → Performance → WASM)
- Benchmark: Compare OMP vs. serial on loops >1k iters.
- Threshold Tune: Adjust
500based on perf (e.g., 1000 for mobile).
- Add no-op defines to new files.
- Profile serial loop → Add pragma if >2x speedup.
- Update
AGENTS.md&PERFORMANCE_MIGRATION_STRATEGY.md. - Commit with
PERF: +OMP batch-xyz.
Goal: 20-50% perf uplift in ST/multi-thread; maintain JS fallback compatibility.