just b # Build (Debug, 30 threads, Ninja + Clang)
just r # Run MoerEditor
just br # Build + Run
just gbr # Generate + Build + Run (full rebuild)
just b Release # Release build
just clean # Remove build/ and target/
- CMake generator: Ninja, compiler: Clang (C++20)
- Shader compiler: DXC with
-O3 -spirv -fspv-target-env=vulkan1.3 -fvk-use-dx-layout -all-resources-bound, targetps_6_0/cs_6_0 - Shaders are compiled at runtime, cached in
asset/shader_cache/{platform}.sdc - Output:
target/bin/{Config}/MoerEditor.exe - When an agent is debugging, it should validate the result immediately after making the change. Run
just bfirst; ifjustis unavailable, use the equivalent build command such ascmake --build build. - If the build succeeds and the change affects runtime behavior, run
just rnext; ifjustis unavailable, launchtarget/bin/{Config}/MoerEditor.exedirectly. - Before building or running, check the root
MoerEngine.toml. This file is copied to the executable directory on every build and controls the default scene path and the default renderer. - Set
default_render_method = "Raster"when working on the raster pipeline, and setdefault_render_method = "Raytracing"when working on the ray tracing pipeline. - After starting the editor with
just r, allow up to one minute for the engine to finish loading so startup errors, validation errors, and obvious runtime regressions have time to surface.
source/
editor/ # ImGui editor app
main.cpp, Editor.cpp/h # Entry point, editor lifecycle
EditorUI.cpp/h # Main UI layout
raster_ui/ # Raster pipeline UI controls (RasterUI.cpp/h)
raytracing_ui/ # RT pipeline UI controls
runtime/
core/ # Math, containers, platform, logging, task graph
include/math/ # Vector, Matrix (Base.h, Matrix.h)
include/misc/ # Traits.h (C++ ↔ HLSL type aliases: float3, uint4, float4x4)
Engine.cpp/h # Engine init/loop
render/
renderer/
Renderer.cpp/h # Top-level renderer (selects raster or RT)
raster/ # ★ RASTER PIPELINE (most frequently modified)
RasterRenderer.cpp/h # Frame loop: calls each pass in order
RasterResource.h # Per-frame resources (textures, buffers, CSM data)
RasterTextures.h # GBuffer / RT texture definitions
RasterConfig.h # UI-driven config struct (shadow, AO, AA settings)
DirectionalShadowMaskPass.cpp/h # Shadow Mask pass
ShadowDepthPass.cpp/h # CSM depth rendering (cascade setup, frustum split)
LightingPass.h # Deferred lighting pass
GeometryPass.h # Geometry / GBuffer pass
AoPass.h # SSAO / RTAO / SSDO
AaPass.h # SMAA / FXAA
SkyboxPass.cpp/h # Skybox rendering
TonemappingPass.h # Tonemapping + bloom
raytracing/ # RT pipeline (ReSTIR DI, path tracing)
common/ui/ # ImGui renderer integration
rhi/ # ★ RHI ABSTRACTION LAYER
RHI.h, RHIResource.h # Core types (BufferRef, TextureRef, EBufferUsageFlags)
RHICommon.h # Enums (EBufferUsageFlags, EPixelFormat, etc.)
RHICommand.h # Command list / draw dispatching
vulkan/ # Vulkan backend
VulkanDevice.cpp/h # Pipeline creation, descriptor set layout, reflection
VulkanQueue.cpp/h # Command submission, draw/dispatch recording
VulkanRHIResource.cpp/h # Buffer/Texture creation, enum translation
VulkanDescriptor.cpp/h # Descriptor pool & set management
d3d12/ # D3D12 backend (partial)
shader/
ShaderPipeline.h # ★ Pipeline class macros (DEFINE_SHADER_CONSTANT_STRUCT, etc.)
ShaderCompiler.cpp/h # DXC invocation, SPIR-V generation
ShaderManager.cpp # PSO caching, hot-reload
shaderheaders/shared/ # ★ C++/HLSL SHARED HEADERS (dual-language via #ifdef __cplusplus)
raster/
ShaderParameters.h # Raster shared params entry (includes sub-headers)
SharedEnum.h # Shared enums (EShadowMapMode, ERtaoSampleMode, etc.)
lighting_pass/ShaderParameters.h # ★ LightingData, pass param structs
scene/ # Scene graph, GPU scene, camera, lights
GpuScene.cpp/h # GPU-side scene buffers (instances, materials, lights)
resources/ # Vertex factories, mesh resources
shaders/ # HLSL shader sources
core/
common/
Bindless.hlsl # ★ Bindless heap definition (ArrayBuffer, TextureHandle, BINDLESS_BINDINGS)
Common.hlsl # Shared utilities (WorldPosFromDepth, packing, etc.)
materials/
Brdf.hlsli # PBR BRDF (GGX, multi-scatter)
Material.hlsli # Material fetching
pipelines/
RasterCommon.hlsli # Raster shared helpers
raster/deferred/
geometry/ # GBuffer shaders
lighting/
Lighting.hlsli # Light accumulation
RasterLightingPass.frag.hlsl # Deferred lighting entry point
shadows/ # ★ SHADOW SYSTEM
ShadowMask.frag.hlsl # Shadow mask entry (full-screen pass)
Shadows.hlsli # Shadow dispatch (CSM / point)
CSM.hlsli # Cascade selection, blend ratio
PCSS.hlsli # PCSS blocker search + penumbra
PCF.hlsli # PCF filtering
ShadowCore.hlsli # Bias, blocker stats
ShadowSampling.hlsli # Poisson disk, rotation, sampling utils
env_and_atmo/ # Skybox shaders
postprocess/
lighting_effects/ # AO (SSAO, RTAO, SSDO), SSR
aa/ # SMAA, FXAA
denoise/ # Bilateral, RTAO denoiser
color/ # Tonemapping, bloom
common/ # Upsample, copy pass
All resources go through a unified bindless heap. Shaders declare BINDLESS_BINDINGS(BufferSpace, TextureSpace, SamplerSpace, AccelSpace) and access resources by uint handles via ArrayBuffer(handle).Load<T>() and TextureHandle(handle).Sample2D<T>().
Files in shaderheaders/shared/ use #ifdef __cplusplus guards:
- C++ side:
namespace Moer::Render, types fromTraits.h(float3=Vector3f,float4x4=Matrix4x4f) - HLSL side:
namespace Moer, native HLSL types - Both sides see the same struct layout. Alignment must be manually kept in sync (see pitfalls below).
class MyPipeline : public RasterPipeline {
DEFINE_RASTER_PIPELINE_CLASS(MyPipeline);
DEFINE_SHADER_BUFFER(lighting_data); // [[vk::binding(N, S)]] ConstantBuffer / StructuredBuffer
DEFINE_SHADER_CONSTANT_STRUCT(MyParam, param); // [[vk::push_constant]]
DEFINE_SHADER_BINDLESS_ARRAY(bdls); // Bindless heap
DEFINE_SHADER_ARGS(lighting_data, bdls, param); // Arg order must match Gfx() call order
};Pass code calls cmd_list.Gfx(pipeline, bufferRef, bdls, paramStruct) — argument order matches DEFINE_SHADER_ARGS.
RasterRenderer::Render() calls each pass's Process() in order. Each pass owns its pipeline and fills its param struct from RasterContext.
- CSM: up to
MAX_CSM_CASCADES(currently 4) cascades, 4096² shadow maps - PCSS: 16 blocker + 16 PCF samples,
[unroll]loops, Poisson disk sampling - Shadow mask is a separate full-screen pass (
DirectionalShadowMaskPass) writing to a single-channel texture, read later byRasterLightingPass LightingDatais bound as a UBO (ConstantBuffer) to the shadow mask shader for hardware-assisted scalarization (fields only enter registers when accessed, not all at once)
Problem: ByteAddressBuffer.Load<LargeStruct>() loads the ENTIRE struct into vector registers (vgpr) at once. With LightingData (~600 bytes), this consumed ~150 live registers, causing 98% register-limited stalls.
Root cause: Dynamic array indexing (e.g., world2shadow_clip[cascade_index]) forces the compiler to keep all array elements alive simultaneously.
Solution: Use ConstantBuffer<T> (UBO) instead of ByteAddressBuffer for uniform data. The GPU has a dedicated constant cache (separate from registers) — fields are fetched via scalar loads (s_buffer_load) and shared across all lanes without per-lane register cost.
Problem: In ConstantBuffer (std140 layout), each element of a scalar array (e.g., float[4], uint[4]) is padded to 16 bytes. A C++ struct with float cascade_split_ratios[4] = 16B but GPU sees 64B.
Solution: Use vector types instead: float4 cascade_split_ratios, uint4 cascade_shadow_map. HLSL supports float4[i] indexing, so shader code doesn't change.
- Sky pixel early-out:
if (depth < 1e-6) return 1.0(reverse-Z, sky = 0) - Full-blocker early-out in PCSS: skip PCF if all 16 blocker samples are shadow
- Cascade blend early-out: only sample next cascade when
blend_ratio > 0 - Redundant texture sample removal in
get_single_shadow() tan(acos(x))→sqrt(1-x²)/xin slope-scaled bias
Do NOT convert [unroll] to [loop] when the loop body indexes a static const array (e.g., Poisson disk). With [loop], the array must be dynamically indexed, which pushes it into registers or local memory — registers increase instead of decrease.
D32_FLOAT is sufficient (no need for D32_FLOAT_S8_UINT); stencil is unused by the shadow system. Saves VRAM and bandwidth.
Vulkan push constants are limited to 128 bytes on most hardware. Keep pass param structs small — remove fields that can be loaded from buffers instead.
| HLSL Type | C++ Type | Size | Align |
|---|---|---|---|
float3 |
Vector3f (12B, union of 3 floats) |
12 | 4 |
float4 |
Vector4f (16B) |
16 | 4 |
float4x4 |
Matrix4x4f (64B, union of 4×Vector4f) |
64 | 4 |
uint |
uint32_t |
4 | 4 |
uint4 |
Vector4ui |
16 | 4 |
Note: C++ side has no implicit padding (all 4-byte aligned). But HLSL ConstantBuffer (std140) adds padding to arrays of scalars — always use vector types for arrays in shared structs.