allow >16 dynamic buffer slots

The 4-slot limit is an artificial restriction in the Slang layer, not a hardware constraint. Let me break down the layers:

## Hardware reality

Push constants/root constants have a **fixed hardware size**:

| Platform | Guaranteed minimum | Typical desktop |
|----------|-------------------|-----------------|
| Vulkan | 128 bytes | 256 bytes |
| DX12 | 64 DWORDs (256 bytes) | 256 bytes |
| Metal | ~4 KB (function constants) | 4 KB |

At 4 bytes per `u32` index, 256 bytes gives you **64 slots**. Goldy's backends already allocate 16 (`MAX_PUSH_CONSTANT_INDICES`), using only 64 bytes. The hardware can comfortably handle this.

## Where the artificial limit comes from

The `GoldyDynamicSlots` struct was written for simple ping-pong demos (swap 2 buffers each frame), not for real compute pipelines:

```63:64:goldy/shaders/goldy_exp/access.slang
public struct GoldyDynamicSlots {
    uint slots[4];  // Up to 4 dynamic buffer indices
```

The DX12 path makes it even more restrictive by packing into `uint4` and using a switch statement that only handles 4 cases. This is just underdesigned.

## Why a fixed array at all?

The deeper question. Push constants are the **fastest** way to communicate per-dispatch data to a shader — they're loaded directly into shader registers with zero indirection, no memory fetch, no cache miss. But they're small and fixed-size. The alternatives form a trade-off spectrum:

1. **Push constants (current)**: Zero latency, no indirection, hardware size cap (~64 indices practical). Optimal for the common case.

2. **Indirection buffer**: Store an array of resource indices in a GPU buffer, pass a single push constant pointing to the offset. Unlimited slots, but adds one memory read before you can access any resource. This is how production bindless engines handle large material/draw-data tables.

3. **Buffer Device Address**: Pass raw 64-bit GPU pointers via push constants. 8 bytes each → half as many as u32 indices in the same space. More powerful but more limited in count.

## The right design for Goldy

Given Goldy's philosophy of "abstract meaning, expose cost," the current design is wrong in two ways:

**First**, the 4-slot shader-side limit should trivially match the 16-slot backend limit. That's just a bug.

**Second**, even 16 shouldn't be a hard ceiling. For use cases beyond 16, the established pattern is to pack indices into a buffer:

```
// Shader: one push constant slot points to an index buffer
uint indexBufferSlot = goldy_dyn_scattered<uint>(0);
// Then read as many indices as you need from that buffer
uint actualResourceIdx = indexBufferSlot[bindingNumber];
RWStructuredBuffer<MyType> resource = goldy_scattered<MyType>(actualResourceIdx);
```

This is exactly how production engines do it — push constants for the fast path, buffer indirection for unbounded bindings. Goldy could formalize this as a `goldy_indirect_binding` pattern.

The philosophical takeaway: there's no reason for a general-purpose GPU library to impose an arbitrary binding count limit *below* what the hardware supports. The limit should be the hardware's limit, clearly documented, with an escape hatch (buffer indirection) for anything beyond that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow >16 dynamic buffer slots #55

Hardware reality

Where the artificial limit comes from

Why a fixed array at all?

The right design for Goldy

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Platform	Guaranteed minimum	Typical desktop
Vulkan	128 bytes	256 bytes
DX12	64 DWORDs (256 bytes)	256 bytes
Metal	~4 KB (function constants)	4 KB

allow >16 dynamic buffer slots #55

Description

Hardware reality

Where the artificial limit comes from

Why a fixed array at all?

The right design for Goldy

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions