-
Notifications
You must be signed in to change notification settings - Fork 1
[EPIC] ReSTIR — Reservoir-based Spatiotemporal Importance Resampling on Intel iGPU via SYCL #64
Description
Purpose
Replace the current uniform emissive triangle sampling in the path tracer with a reservoir-based resampling system that identifies which lights actually contribute to each pixel. The goal is to produce significantly cleaner renders at the same sample budget by having pixels share light-finding work spatially across the image. This is the current state of the art in real-time path tracing and no open SYCL implementation exists targeting Intel integrated GPU.
Issue: Define and allocate the Reservoir buffer
The reservoir is the core data structure of the entire system. Each pixel in the framebuffer owns one reservoir that tracks the best light candidate found so far, the cumulative weight of all candidates evaluated, how many candidates were considered, and the final unbiased contribution weight used at shade time.
This issue covers defining the struct, ensuring its memory layout is suitable for coalesced access on iGPU (SoA consideration), allocating a device buffer of size width x height using USM, and verifying the buffer lifecycle integrates cleanly with the existing SYCL queue and memory management in RenderScene.
Issue: Implement weighted reservoir sampling (WRS) as a device function
Reservoir sampling with weighted candidates requires a specific update rule: given a new candidate with weight w, accept it as the current selection with probability w / (weightSum + w), then increment weightSum. This is mathematically the correct way to select one item from a stream of candidates proportional to their weights without storing all candidates simultaneously.
This issue covers implementing this as a fgt_device_gpu inline function that operates on a Reservoir struct, writing unit tests that verify the selection distribution matches the expected probabilities over many trials, and confirming it compiles correctly under the Intel SYCL compiler.
Issue: Implement light candidate weighting function
For each light candidate, the system needs a scalar weight representing how much that light is expected to contribute to a given surface point. This is the target function that ReSTIR optimizes toward. A correct weight accounts for the BRDF response at the hit point for the light direction, the cosine of the angle between surface normal and light direction, the emissive intensity of the triangle, and the inverse square distance to the light.
Visibility is intentionally excluded from this weight at this stage for performance reasons. This issue covers implementing the weighting function as a device function, verifying it produces sensible relative weights for simple test configurations, and documenting clearly why visibility is deferred.
Issue: Pass 1 — Initial candidate sampling kernel
This is the first of three per-frame kernel dispatches. Each pixel independently generates a fixed number of light candidates (target: 32), evaluates the weight of each using the function from the previous issue, and runs weighted reservoir sampling to select one. The result is stored in the pixel's reservoir.
This pass is embarrassingly parallel. No pixel reads data from any other pixel. This issue covers writing the SYCL kernel, wiring it into the render loop after BVH traversal produces hit data, and verifying that reservoir weights look correct by rendering the weight field as a debug visualization before any spatial resampling is applied.
Issue: Pass 2 — Spatial resampling kernel
This is the technically difficult pass. Each pixel examines K randomly chosen neighboring pixels (target: 4-8 neighbors), retrieves each neighbor's selected light candidate, evaluates what that candidate's weight would be at the current pixel's surface point, and attempts to merge it into the current pixel's reservoir using WRS.
The merge must apply a Jacobian correction factor to account for the geometry difference between the neighbor's surface and the current pixel's surface, otherwise the result is biased. This issue covers implementing the spatial resampling kernel, implementing the Jacobian correction, and producing a side-by-side comparison render showing the noise reduction from spatial resampling alone relative to Pass 1 output.
Issue: Pass 3 — Shading with reservoir output
Replace the current sampleEmissiveLight call in the path tracer with a lookup into the reservoir buffer. The reservoir's selected light index and final weight W are used to evaluate evaluateCookTorrance once per pixel with a well-chosen light direction rather than a randomly chosen one.
This issue covers integrating the reservoir output into the existing shading path in pathTracer_CookTorrance, removing the old uniform emissive sampling for the direct light term while keeping indirect bounce sampling unchanged, and verifying the output is unbiased by checking that a converged ReSTIR render matches a reference render produced with high sample count uniform sampling.
Issue: Temporal resampling — reuse reservoirs from the previous frame
Spatial resampling shares information between pixels in the same frame. Temporal resampling extends this across frames by reusing each pixel's reservoir from the previous frame as an additional candidate in the current frame. For static or slowly moving scenes this dramatically accelerates convergence because good candidates found in previous frames continue contributing.
This requires maintaining two reservoir buffers (current frame and previous frame), a reprojection step that maps the current pixel's world position back to its screen position in the previous frame using the camera matrices, and a validity check that rejects previous-frame reservoirs where the surface has changed significantly (different triangle, large normal deviation).
This issue covers implementing double-buffered reservoirs, camera matrix reprojection, the validity heuristic, and measuring convergence rate with and without temporal reuse.
Issue: Benchmark — uniform NEE vs ReSTIR on Intel iGPU
Run the same scene under three configurations: the current uniform emissive triangle sampling, ReSTIR with spatial resampling only, and ReSTIR with spatial and temporal resampling. Measure equal-time image quality using MSE against a ground truth reference, frame time broken down by pass, and GPU memory bandwidth consumed per frame using Intel VTune.
The specific hypothesis to test is whether spatial resampling's random neighbor access pattern creates measurable bandwidth pressure on iGPU where DRAM is shared with the CPU, and whether this limits the scalability of the neighbor count K. This is an open question in the literature for memory-constrained hardware and a genuine research contribution if quantified rigorously.