Improve wait_for API to return Result

### **Motivation**

The `wait_for` API provides this synchronization by blocking until the GPU reaches a specific synchronization point, preventing the CPU from destroying or reusing GPU resources while they're still in use. When you submit work to the GPU (like rendering a frame), it continues running asynchronously. If the CPU immediately tries to delete textures or reuse buffers, you could corrupt memory or crash. This function provides a safe way to wait until the GPU has finished with a specific batch of work, identified by a sync point.

Currently, it returns a simple boolean that loses critical error information like a timeout, device loss, or other backend-specific failure. Any failure look the same to the caller - `false` value.

### **Details**
The `CommandDevice` trait defines the contract for GPU synchronization. It includes:

`submit()` which returns a `SyncPoint` representing GPU work completion
`wait_for()` which blocks until that work finishes or times out

Each backend implements this differently:

**Vulkan**: Uses timeline semaphores with nanosecond precision
**GLES/WebGL**: Uses GL sync objects with millisecond precision
**Metal**: Polls command buffer status in a loop

The current boolean return forces all callers to treat any failure identically, preventing proper error handling and recovery strategies.

### Backend-specific implementations

<details>

#### **Vulkan Backend Implementation**

The Vulkan backend implements `wait_for` using timeline semaphores. It locks the queue to access the timeline semaphore, creates a wait info structure with the sync point's progress value, and calls the Vulkan driver. The implementation maps the timeout from milliseconds to nanoseconds and converts the Vulkan `Result` to a boolean using `.is_ok()`, which discards the specific error type.

The key issue is that different error conditions require different handling strategies:

**TIMEOUT**: The operation timed out but the device is still functional
**DEVICE_LOST**: The GPU was lost and needs recovery
**OUT_OF_DATE**: The surface is out of date (common during resize)
...

By converting all these to `false`, the API forces callers to treat every failure identically, limiting robust error recovery.

#### **GLES/WebGL Backend Implementation**

The GLES implementation uses OpenGL's sync objects (`gl.client_wait_sync`) to block until GPU work completes. It converts the millisecond timeout to nanoseconds, with special handling for WebGL's 1-second timeout limit. The function returns true only when the GPU signals completion (`ALREADY_SIGNALED` or `CONDITION_SATISFIED`) and `false` for timeouts or any other error conditions.

`Key behavior`: A zero timeout enables non-blocking polling (useful for checking if resources are available), while `!0` (max u32) creates an indefinite block (used when you must wait before proceeding). The current boolean return collapses all error types into a simple success/failure, which is why there's interest in migrating to a `Result` type for better error handling.

#### **Metal Backend Implementation**

The Metal implementation uses a simple polling loop because Metal's command buffers don't support efficient blocking waits like Vulkan's timeline semaphores. It records the start time, then continuously checks the command buffer status. When the status is "Completed", it returns `true`. The key limitation is that error states are silently ignored - if the command buffer fails, the loop continues until timeout, then returns false just like a timeout condition.

The polling approach with 1ms sleeps is inefficient but necessary given Metal's API constraints. This design loses valuable error information that could help applications distinguish between timeouts, device loss, or actual command buffer errors.

</details>

### Existing usage of `wait_for`

<details>

#### **FramePacer**

The `FramePacer` uses `wait_for` to enforce a strict "one frame at a time" execution model, guaranteeing that the previous frame's GPU work completes before the next frame begins and before any temporary resources are recycled.

The `FramePacer` maintains a sync point from the previous frame and blocks indefinitely (`!0` timeout) until GPU work completes. This blocking wait happens at three critical points:

**Frame start** - `wait_for_previous_frame()` blocks until the previous frame finishes
**Frame end** - Called automatically after submitting the current frame
**Cleanup** - Ensures all GPU work is done before destroying the `FramePacer`

After the wait succeeds, the code safely destroys buffers and acceleration structures from the previous frame, knowing the GPU no longer accesses them. The current implementation assumes the wait always succeeds, which is why it returns void rather than handling errors - a design choice that needs reconsideration for robust error handling.


#### **BufferBelt**

GPU operations are asynchronous - when you submit work to the GPU, it continues executing long after your CPU code returns. This creates a critical problem: how do you safely reuse GPU resources like buffers without corrupting data that's still being used? The BufferBelt solves this by tracking when the GPU finishes with each buffer chunk through sync points, enabling efficient resource recycling.

The `BufferBelt` maintains two pools: active buffers currently being filled, and buffers waiting for GPU completion. When allocating space:

1. First it tries to fit your request in an **active buffer** - this is fastest as no GPU synchronization is needed
2. If that fails, it searches the **recycled pool** for buffers the GPU has finished with
3. The key check is `gpu.wait_for(sp, 0)` - a **non-blocking** poll that asks "is the GPU done?"
4. If no recycled buffers are ready, it creates a **new chunk** from the GPU

The zero timeout is crucial - it means "check and return immediately", preventing the CPU from stalling while waiting for the GPU. This design enables high-throughput applications to continuously submit work without blocking.


#### **Texture Cleanup in EGUI**

When EGUI renders UI elements, it creates GPU textures for fonts and images. These textures have a lifecycle: they're created, used for rendering, then eventually become obsolete when fonts change or images are updated. The **critical problem** is that the GPU might still be reading from a texture when the CPU tries to delete it, which would cause crashes or visual corruption.

The texture deletion system in EGUI works as a two-phase cleanup:

1. Mark for deletion: When textures become obsolete, they're added to `textures_to_delete` with their associated GPU sync point
2. Safe deletion check: The `triage_deletions()` function periodically checks if the GPU has finished using each texture by calling `context.wait_for(sp, 0)` with a zero timeout. This is a non-blocking poll:

    If `wait_for` returns true, the GPU is still using the texture
    If `wait_for` returns false, the GPU has finished and the texture can be safely destroyed

3. Actual deletion: Only textures whose GPU work has completed are destroyed through `destroy_texture_view()` and `destroy_texture()` calls.

This approach ensures GPU safety without blocking the rendering thread, as the zero-timeout wait never stalls execution.

#### **Shader Hot Reload**

When shaders are hot-reloaded during development, the GPU might still be executing graphics commands that reference the old shader code. Destroying shaders while they're in use would cause undefined behavior and crashes. The renderer needs to wait for all GPU work to complete before replacing shaders, ensuring no GPU commands reference outdated resources.

The shader hot reload system blocks indefinitely until the GPU finishes processing the current frame. This happens through `gpu.wait_for(sync_point, !0)` where `!0` represents an infinite timeout. The sync point tracks when all previously submitted GPU commands have completed.

Once the wait succeeds, the renderer joins any background shader compilation tasks and proceeds to update the shader pipelines.

</details>



-----
P.S.  This is follow up to #248 inspired by https://github.com/zed-industries/zed/pull/43070

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve wait_for API to return Result #285

Motivation

Details

Backend-specific implementations

Vulkan Backend Implementation

GLES/WebGL Backend Implementation

Metal Backend Implementation

Existing usage of `wait_for`

FramePacer

BufferBelt

Texture Cleanup in EGUI

Shader Hot Reload

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve wait_for API to return Result #285

Description

Motivation

Details

Backend-specific implementations

Vulkan Backend Implementation

GLES/WebGL Backend Implementation

Metal Backend Implementation

Existing usage of wait_for

FramePacer

BufferBelt

Texture Cleanup in EGUI

Shader Hot Reload

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Existing usage of `wait_for`