-
Notifications
You must be signed in to change notification settings - Fork 67
Description
Motivation
The wait_for API provides this synchronization by blocking until the GPU reaches a specific synchronization point, preventing the CPU from destroying or reusing GPU resources while they're still in use. When you submit work to the GPU (like rendering a frame), it continues running asynchronously. If the CPU immediately tries to delete textures or reuse buffers, you could corrupt memory or crash. This function provides a safe way to wait until the GPU has finished with a specific batch of work, identified by a sync point.
Currently, it returns a simple boolean that loses critical error information like a timeout, device loss, or other backend-specific failure. Any failure look the same to the caller - false value.
Details
The CommandDevice trait defines the contract for GPU synchronization. It includes:
submit() which returns a SyncPoint representing GPU work completion
wait_for() which blocks until that work finishes or times out
Each backend implements this differently:
Vulkan: Uses timeline semaphores with nanosecond precision
GLES/WebGL: Uses GL sync objects with millisecond precision
Metal: Polls command buffer status in a loop
The current boolean return forces all callers to treat any failure identically, preventing proper error handling and recovery strategies.
Backend-specific implementations
Details
Vulkan Backend Implementation
The Vulkan backend implements wait_for using timeline semaphores. It locks the queue to access the timeline semaphore, creates a wait info structure with the sync point's progress value, and calls the Vulkan driver. The implementation maps the timeout from milliseconds to nanoseconds and converts the Vulkan Result to a boolean using .is_ok(), which discards the specific error type.
The key issue is that different error conditions require different handling strategies:
TIMEOUT: The operation timed out but the device is still functional
DEVICE_LOST: The GPU was lost and needs recovery
OUT_OF_DATE: The surface is out of date (common during resize)
...
By converting all these to false, the API forces callers to treat every failure identically, limiting robust error recovery.
GLES/WebGL Backend Implementation
The GLES implementation uses OpenGL's sync objects (gl.client_wait_sync) to block until GPU work completes. It converts the millisecond timeout to nanoseconds, with special handling for WebGL's 1-second timeout limit. The function returns true only when the GPU signals completion (ALREADY_SIGNALED or CONDITION_SATISFIED) and false for timeouts or any other error conditions.
Key behavior: A zero timeout enables non-blocking polling (useful for checking if resources are available), while !0 (max u32) creates an indefinite block (used when you must wait before proceeding). The current boolean return collapses all error types into a simple success/failure, which is why there's interest in migrating to a Result type for better error handling.
Metal Backend Implementation
The Metal implementation uses a simple polling loop because Metal's command buffers don't support efficient blocking waits like Vulkan's timeline semaphores. It records the start time, then continuously checks the command buffer status. When the status is "Completed", it returns true. The key limitation is that error states are silently ignored - if the command buffer fails, the loop continues until timeout, then returns false just like a timeout condition.
The polling approach with 1ms sleeps is inefficient but necessary given Metal's API constraints. This design loses valuable error information that could help applications distinguish between timeouts, device loss, or actual command buffer errors.
Existing usage of wait_for
Details
FramePacer
The FramePacer uses wait_for to enforce a strict "one frame at a time" execution model, guaranteeing that the previous frame's GPU work completes before the next frame begins and before any temporary resources are recycled.
The FramePacer maintains a sync point from the previous frame and blocks indefinitely (!0 timeout) until GPU work completes. This blocking wait happens at three critical points:
Frame start - wait_for_previous_frame() blocks until the previous frame finishes
Frame end - Called automatically after submitting the current frame
Cleanup - Ensures all GPU work is done before destroying the FramePacer
After the wait succeeds, the code safely destroys buffers and acceleration structures from the previous frame, knowing the GPU no longer accesses them. The current implementation assumes the wait always succeeds, which is why it returns void rather than handling errors - a design choice that needs reconsideration for robust error handling.
BufferBelt
GPU operations are asynchronous - when you submit work to the GPU, it continues executing long after your CPU code returns. This creates a critical problem: how do you safely reuse GPU resources like buffers without corrupting data that's still being used? The BufferBelt solves this by tracking when the GPU finishes with each buffer chunk through sync points, enabling efficient resource recycling.
The BufferBelt maintains two pools: active buffers currently being filled, and buffers waiting for GPU completion. When allocating space:
- First it tries to fit your request in an active buffer - this is fastest as no GPU synchronization is needed
- If that fails, it searches the recycled pool for buffers the GPU has finished with
- The key check is
gpu.wait_for(sp, 0)- a non-blocking poll that asks "is the GPU done?" - If no recycled buffers are ready, it creates a new chunk from the GPU
The zero timeout is crucial - it means "check and return immediately", preventing the CPU from stalling while waiting for the GPU. This design enables high-throughput applications to continuously submit work without blocking.
Texture Cleanup in EGUI
When EGUI renders UI elements, it creates GPU textures for fonts and images. These textures have a lifecycle: they're created, used for rendering, then eventually become obsolete when fonts change or images are updated. The critical problem is that the GPU might still be reading from a texture when the CPU tries to delete it, which would cause crashes or visual corruption.
The texture deletion system in EGUI works as a two-phase cleanup:
-
Mark for deletion: When textures become obsolete, they're added to
textures_to_deletewith their associated GPU sync point -
Safe deletion check: The
triage_deletions()function periodically checks if the GPU has finished using each texture by callingcontext.wait_for(sp, 0)with a zero timeout. This is a non-blocking poll:If
wait_forreturns true, the GPU is still using the texture
Ifwait_forreturns false, the GPU has finished and the texture can be safely destroyed -
Actual deletion: Only textures whose GPU work has completed are destroyed through
destroy_texture_view()anddestroy_texture()calls.
This approach ensures GPU safety without blocking the rendering thread, as the zero-timeout wait never stalls execution.
Shader Hot Reload
When shaders are hot-reloaded during development, the GPU might still be executing graphics commands that reference the old shader code. Destroying shaders while they're in use would cause undefined behavior and crashes. The renderer needs to wait for all GPU work to complete before replacing shaders, ensuring no GPU commands reference outdated resources.
The shader hot reload system blocks indefinitely until the GPU finishes processing the current frame. This happens through gpu.wait_for(sync_point, !0) where !0 represents an infinite timeout. The sync point tracks when all previously submitted GPU commands have completed.
Once the wait succeeds, the renderer joins any background shader compilation tasks and proceeds to update the shader pipelines.
P.S. This is follow up to #248 inspired by zed-industries/zed#43070