Study Next Gen Graphics API

Metal

Metal explicitly divides GPU work into three categories: Render, Compute and Copy. It is more fit Modern GPU arch.
How about state query of rendering and computing ? Is there explicit APIs to do that ?
How about Geometry Shader in Metal ? Metal suggests customers to do this job by computing shader.
Metal puts Rendering into two categories: Vertex and Pixel, much like modern GPU Hardware does, Vertex Space task & Screen Space task.
Can Metal make Compute easy to use in Rendering ? Case by Case talk.

Resources

MTLBuffer

It is used to store unformatted data accessible to GPU.
Metal creates Buffer with storage option: Shared, Managed and Private. There is NO Buffer Usage in Metal. How does Metal exploit Buffer performance by its Usage ? Does this mean MTLBuffer supports all access, you can read, write and update at your will ? I think those hint can be found in Metal Shading language.
There is NO so called "Resources view" in Metal, Metal directly bind Buffers to GPU slots. It is simpler than those in D3D & Vulkan.
- Why dose Microsoft not choose this way ? Does this have performance overhead ?
- How to understand MTLBuffer Address ? Any way to map directly to "VGA" ? Or is there "VGA" in Metal ?
- How about Constant Buffer in Metal ? There is no explicit API to do that. Does this mean a performance loss ? immutable and mutable, performance concern.
- Constant Buffer and Vertex Buffer share the same Binding API .
Create buffer API can be with new storage, reuse storage, copy data to new storage. It is more useful than those in D3D11 and OGL.
- Create Texture sharing Buffer data.
Buffer update is easy in Metal. There is no explicit map and unmap.
- Suppose there is a big MTLBuffer, one part of it is in a draw call, can we can update the other part by another CPU thread at the same time ?
How about "Structed Buffer" and "Byte address buffer" in Metal ? These type are explicitly supplied in D3D.
How about "Unordered Access View" used in Compute Shader in D3D11 ? Does that mean all MTLBuffer support out of order accessing ?

MTLHeap

MTLHeap and MTLDevice all have newBufferWithLength() interface
- It requires a descriptor to Create a MTLHeap from Device, There is no such requirements for MTLBuffer.
- What is difference between them ?
  - MTLHeap is only target against a certain GPU, can be cached by CPU. Seems accessible by both GPU and CPU.
  - MTLHeap is used to allocate Buffers and Textures. So It focused on block memory management.
- MTLHeap is much like D3D11Buffer accessible by both GPU and CPU depending on memory usage.
- MTLHeap can improve memory efficiency.
- MTLResourceOptions in Metal is like Memory Usage in D3D11

MTLTexture

Storing formatted image data accessible to the GPU. Following is good comments in metal code.

Each image in a texture is a 1D, 2D, 2DMultisample, or 3D image. The texture contains one or more images arranged in a mipmap stack. If there are multiple mipmap stacks, each one is referred to as a slice of the texture. 1D, 2D, 2DMultisample, and 3D textures have a single slice. In 1DArray and 2DArray textures, every slice is an array element. A Cube texture always has 6 slices, one for each face. In a CubeArray texture, each set of six slices is one element in the array.
- Create with a MTLTextureDescriptor. Create with new storage,
- MTLTexture has a MTLBuffer object inside. We can create a texture view for buffer.
- Create new MTLTexture Object sharing storage with different pixel format.
- Support Buffer sharing across address space. Is that possible in D3D11 D3D12 ?
- Framebuffer only texture (obtained from CAMetalDrawables).
  - It only exist in Metal. There is NO such concept in D3D.
  - It can only be used as a render target.
- MTLTextureUsage determines how the resource be used: read in shader, write in shader and as render target.
- Read and Write for MTLTexture from CPU , are map and unmap operation need ?
- In Metal, Blit Command can do Texture Copy. In D3D11, by a single API call.
  - supported by SCG in NV hardware
- In Metal, Create Texture from an Image on disk is done by MTKTextureLoader.
  - So MTKTextureLoader is a must in Metal framework ?
  - How about use another external lib for texture handling ?
2DMultisample Texture, There are counterparts in D3D.
- Is must resolve MSAA texture before copying it into non-MSAA texture. How does Metal resolve MSAA ? By API ? By Custom ?
- IOSurface compatitable

Metal Shading Language

Share the same definitions between Metal CPU code and Metal Shading Language(GPU code).
How is MSL compiled and linked ?
- Metal provides no explicit compile APIs to do this, instead, it uses external tools to compile and archive them as a shader library.
- These tools are integrated into Xcode. After archived, Create a Default Library from Device by API newDefaultLibrary .
- Then Pick Shader Function from within it.
- You can also create shader library from source code directly, shader compiling is implicitly expressed here.
Compared with OGL and D3D11, Metal gives less control, is that a good thing ?
MTLFunction
- A handle to to compiled intermediate Metal Shading code.
- It contains information of intermediate byte code, function type, vertex attribute and so on.
- message newArgumentEncoderWithBufferIndex:
  - Used to create an Argument Buffer Encoder to encode arguments into a MTLBuffer at a certain binding point, the layout of the encoded Buffer is identical to that in Metal Shading Code.
- MTLFunction is something like a bridge between Metal Shading Code and Objective-C.
- There is NO such mechanism in D3D11.
- MTLArgumentEncoder This type maybe tightly coupled with MTLFunction
  - Encode Resource bindings and Store them in an Argument Buffer(MTLBuffer).
  - Supported bings: Buffer, Texture, Sampler, Render Pipe State, Constant Data, ICB.
MTLFunctionConstantValues
- control function by a dynamic way. CPU manipulates GPU resources.

Indirect Command Buffer

MTLIndirectCommandBuffer
- It is a GPU Device Resource, It is neither different from MTLBuffer nor MTLTexture.
- It stores encoded GPU Commands by CPU or GPU
  - They are Indirect Commands, represented by Metal class MTLIndirectRenderCommand.
  - These commands are executed from Compute Shader. So GPU can issue draw calls by itself in this way.

New Resource Binding through buffer

Argument Buffer
- A MTLBuffer is used to reduce CPU overhead of binding resources by different API calls.
- It stores resource bindings of Buffers, textures, samplers and inlined constant data by a MTLArgumentEncoder.
benefits by this "resource binding"

The main benefit of using argument buffers is to reduce the overhead incurred by assigning the same multiple resources to individual indices of the same function argument table. This is particularly beneficial for resources that do not change from frame to frame, because they can be assigned to an argument buffer once and reused many times.

Finally, argument buffers allow resources to be indexed dynamically at function execution time by greatly increasing the limit on the number of resources that can be placed inside them.

GPU Work Submission

Command Queue
- It is a little like Device Context in D3D11.
- It is thread safe, much fit for multi-threading environment.
- Command Buffer can encoded by different encoders serially, So a Command Buffer may contain render commands, compute commands, blit commands ?
Command Buffer
- It encodes and packs "GPU Commands" explicitly by a "Command Encoder".
- Each thread can have its own "Command Buffer" and operates on it.
- It has a addCompletedHandler interface to add call back function when command buffer completed execution.
- Can Command Buffer be reused after submitted ? It seems not.
Command Encoder
- It is used to encode GPU commands into buffer.
- Render Command Encoder is created from Command Buffer by a Render Pass descriptor.
- Compute and Blit command Encoder, there is no such constraints, because they are independent of Rendering.
- Render Command Encoder is 1:1 map to "A Render Pass".
- Render Command Encoder can only be created serially except of Parallel Render command Encoder. That is before end encoding one you cannot create a new one.
  - create encoder
  - encoder GPU commands
  - endEncoding
- Render Command Encoder contains an implicit "ClearColor" command.
MTLRenderPassDescriptor
- It is used to create "Render Command Encoder" from commandBuffer by renderCommandEncoderWithDescriptor
- It is a description of output of a Render Pass. There is NO counterpart in D3D11.
- What role does it play in rendering process ?
  - It contains color attachment array, depth attachment and stencil attachment.
  - Sample position operations of Multi-Sample technique. It gives a way to do custom MSAA. Does D3D provide API to do these things ?
  - Layered Rendering, Tile Shading
  - Visibility Result buffer. How this buffer is written and queried ?

Render Pipeline State (PSO) in Metal

MTLRenderPipelineDescriptor is used to create MTLRenderPipelineState object from device. It configures the render pipe state used in a render pass.
- calling newRenderPipelineStateWithDescriptor to create MTLRenderPipelineState .
- Vertex function, Fragment function. How about tessellation and geometry stage ?
  - D3D11 compiles source code into byte code, then generates shader objects from byte code.
  - D3D11 binds different Shader objects to different stages at run time.
- Vertex attribute layout
- Rasterizer sample count For MSAA
- color attachment description array, depth format, stencil format.
  - D3D11 use Resource View: RTV, DSV, SRV / CBV / UAV.
- Tessellator state settings.
- MTLPipelineBufferDescriptorArray
  - An array of pipeline buffer descriptor objects to specify the mutuality of buffers used in pipe.
  - Maybe it is a performance hint.
MTLDepthStencilDescriptor && MTLDepthStencilState
- create MTLDepthStencilState object from device by newDepthStencilStateWithDescriptor
  - depthWriteEnabled
  - depthCompareFunction
- Why not include MTLDepthStencilState in MTLRenderPipelineState ?

Present Rendering Result in Metal

How to present rendering results in Metal ?

D3D12

Work submission

Command Queue

ID3D12CommandQueue Provides methods for submitting command lists, synchronizing command list execution, instrumenting the command queue, and updating resource tile mappings. doc on microsoft
Command Queues of all types (3D, compute and copy) share the same interface and are all command-list based.
- Command List type determines Command Queue Type.
Command Queue is an abstraction to GPU Engine and affects how engine scheduled.
- So Interface for synchronization such as Wait(), Signal() are all handled by Command Queue.
Command Queue synchronization.

Command List

ID3D12CommandList is created by CreateCommandList().
- It is GP-Buffer in NV GPU containing an ordered set of commands that the GPU executes.
- So it is a kind of GPU Resource. Interface ID3D12CommandAllocator serves this purpose.
- ID3D12CommandAllocator
  - The command allocator object corresponds to the underlying allocations in which GPU commands are stored. It is GP-Buffer in NV GPU.
  - ID3D12CommandAllocator can be reused.

GPU Resources & Binding Model

Resources in D3D12 is created from 3 categories: committed, placed, reserved. GPU resource creation in D3D12 is more complicated than Metal. For example, resource state.
- committed resource has both VA and PA. It is the common one since DX9.
- placed resource is a pointer to a certain region of a heap
- reserved resource has its own unique GPU virtual address space. It is hard to understand.

Resource descriptor & descriptor Heap

A descriptor is a small block of data that fully describes an object to the GPU . It is a GPU specific opaque format.
- SRVs, UAVs, CBVs and Samplers, they are descriptor.
- It resembles resource view object in old API such as DX11.
A descriptor handle is the unique address of the descriptor. similar to a pointer, but is opaque as its implementation is hardware specific.
- The handle is unique across descriptor heaps. It has both a GPU handle and a CPU handle for API setting & root parameter setting.
Null descriptors and Default descriptors, See descriptors overview
A descriptor heap is to encompass the bulk of memory allocation required for storing the descriptor specifications of object types that shaders reference for as large of a window of rendering as possible.
descriptor heap is stored in a CPU writable memory, typically a GPU memory or system memory.
CBV, UAV and SRV entries can be in the same descriptor heap. Samplers entries cannot share a heap with CBV, UAV or SRV entries. That is because most GPU handle texture sampler and other memory access in a different way on hardware. These descriptor handles are all shader visible resource.
D3D12 requires the use of descriptor heaps, there is no option to put descriptors anywhere in memory.
Descriptor heaps can only be edited immediately by the CPU, there is no option to edit a descriptor heap by the GPU.
On some hardware, It is an expensive operation to switch descriptor heap. It requires a GPU stall to flush all work that depends on the currently bound descriptor heap.
Descriptors are of varying size,
Some descriptors bind to pipeline by recording directly into command list through APIs Not by root signature. These resources are always shader invisible. They are:
- Render Target Views (RTVs) Depth Stencil Views (DSVs)
- Stream Output Views (SOVs)
- Index Buffer Views (IBVs) Vertex Buffer Views (VBVs).

Root Signature

Resource State Barriers & Sync up

Blogs for Other Graphic API

Study Next Gen Graphics API

Metal

Resources

MTLBuffer

MTLHeap

MTLTexture

Metal Shading Language

Indirect Command Buffer

New Resource Binding through buffer

GPU Work Submission

Render Pipeline State (PSO) in Metal

Present Rendering Result in Metal

D3D12

Work submission

Command Queue

Command List

GPU Resources & Binding Model

Resource descriptor & descriptor Heap

Root Signature

Resource State Barriers & Sync up

Blogs for Other Graphic API

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally