Skip to content

Commit 8d0a016

Browse files
authored
Merge pull request #26 from LandonTClipp/LandonTClipp-patch-1
Revise IOMMU and PCIe device interaction details
2 parents 857452b + f2aaf70 commit 8d0a016

File tree

1 file changed

+81
-81
lines changed

1 file changed

+81
-81
lines changed

docs/blog/posts/2026-02-15-pcie-mmio.md

Lines changed: 81 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -92,33 +92,33 @@ Allowing PCIe devices to write arbitrary data to arbitrary locations in memory i
9292
The Input-Output Memory Management Unit (IOMMU) is a component of the root complex that maintains a number of page tables specifically designed for IO devices. It's used when a device wants to read or write data into the host memory and needs to translate its IO Virtual Address (IOVA) to a Host Physical Address (HPA). The IOMMU enforces translations per domain (a context). Devices are assigned to a domain, and the domain points to the page tables the IOMMU walks. An abstracted diagram of what this is doing is shown below:
9393

9494
```title="IOMMU" linenums="1"
95-
IOMMU
96-
┌───────────────────────────────────────────────────┐
97-
98-
IOVA HPA
99-
│ ┌────────────────┐ ┌────────────────┐ │
100-
│ │ 0x00-0xF0 ├───────────►│ 0x5000 │ │
101-
│ ├────────────────┤ ├────────────────┤ │
102-
│ │ │ │
103-
│ │ 0xF1-0xFF ├────────────────┤ │
104-
│ │ ├──────┐ │ │
105-
│ │ ├────────────────┤ │
106-
│ ├────────────────┤ │ │
107-
│ │ ├────────────────┤ │
108-
│ │ └────►│ 0xF000 │ │
109-
│ │ ├────────────────┤ │
110-
│ │ 0x100-0xFFF │ │
111-
│ │ ├──────┐ ├────────────────┤ │
112-
│ │ │ │
113-
│ │ ├────────────────┤ │
114-
│ ├────────────────┤ └────►│ 0xBF00 │ │
115-
│ │ ├────────────────┤ │
116-
│ │ ┌────►│ 0xDEADC0DE │ │
117-
│ │ 0x1000-0x10F0 ├────────────────┤ │
118-
│ │ ├──────┘ │ │
119-
│ └────────────────┘ └────────────────┘ │
120-
121-
└───────────────────────────────────────────────────┘
95+
IOMMU
96+
+---------------------------------------------------+
97+
| |
98+
| IOVA HPA |
99+
| +----------------+ +----------------+ |
100+
| | 0x00-0xF0 +----------->| 0x5000 | |
101+
| +----------------+ +----------------+ |
102+
| | | | | |
103+
| | 0xF1-0xFF | +----------------+ |
104+
| | +------+ | | |
105+
| | | | +----------------+ |
106+
| +----------------+ | | | |
107+
| | | | +----------------+ |
108+
| | | +---->| 0xF000 | |
109+
| | | +----------------+ |
110+
| | 0x100-0xFFF | | | |
111+
| | +------+ +----------------+ |
112+
| | | | | | |
113+
| | | | +----------------+ |
114+
| +----------------+ +---->| 0xBF00 | |
115+
| | | +----------------+ |
116+
| | | +---->| 0xDEADC0DE | |
117+
| | 0x1000-0x10F0 | | +----------------+ |
118+
| | +------+ | | |
119+
| +----------------+ +----------------+ |
120+
| |
121+
+---------------------------------------------------+
122122
```
123123

124124
When a PCIe device performs a memory read or write, it provides an IOVA to the IOMMU which gets translated to the host physical address. The transaction is then forwarded to the memory controller (for RAM) or back into the root complex (for MMIO). This provides three crucial functions to devices:
@@ -130,26 +130,26 @@ When a PCIe device performs a memory read or write, it provides an IOVA to the I
130130
Number 2 is particularly crucial when it comes to device virtualization. When doing what's called direct passthrough to a virtual machine, the CPU will program the IOMMU such that a device is physically restricted to DMA to the memory allocated for the guest. This provides a hard level of hardware memory isolation. The components interact with each other like this:
131131

132132
```title="" linenums="1"
133-
┌─────────────────┐
134-
135-
136-
CPU
137-
138-
139-
140-
└────────┬────────┘
141-
142-
programs
143-
144-
145-
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
146-
147-
148-
IOVA HPA
149-
PCIe Device ├───────►│ IOMMU ├───────►│ RAM Controller
150-
151-
152-
└─────────────────┘ └─────────────────┘ └─────────────────┘
133+
+-----------------+
134+
| |
135+
| |
136+
| CPU |
137+
| |
138+
| |
139+
| |
140+
+--------+--------+
141+
|
142+
|programs
143+
|
144+
v
145+
+-----------------+ +-----------------+ +-----------------+
146+
| | | | | |
147+
| | | | | |
148+
| | IOVA | | HPA | |
149+
| PCIe Device +------->| IOMMU +------->| RAM Controller |
150+
| | | | | |
151+
| | | | | |
152+
+-----------------+ +-----------------+ +-----------------+
153153
```
154154

155155
Just as the CPU MMU translates process virtual addresses to physical memory, the IOMMU translates device virtual addresses (IOVAs) to physical memory.
@@ -310,40 +310,40 @@ If the address is permitted, the transaction is routed back down the tree to GPU
310310
Let's first illustrate an example scenario.
311311

312312
```title="" linenums="1"
313-
┌──────────────────────────────────┐
314-
Root Complex
315-
316-
3: 0x4100 ┌───────┐ │
317-
┌─────────────────► │ │
318-
4: 0x4100 IOMMU │ │
319-
┌───────┼ │ │
320-
└───────┘ │
321-
┌┴─────────▼─┐
322-
Root Port
323-
└────│────────────│────────────────┘
324-
325-
└─────▲────┬─┘
326-
327-
2: 0x4100 5: 0x4100
328-
329-
┌───────┴────▼─────┐
330-
331-
332-
Switch
333-
334-
335-
└─────▲─────────┬──┘
336-
337-
1: 0x4100 6: 0x4100
338-
339-
┌────────────────┴───┐ ┌───▼────────────────┐
340-
│ │
341-
GPU A │ │ GPU B
342-
│ │
343-
BDF: 0000:12:00.0 │ │ BDF: 0000:13:00.0
344-
BAR: 0x4000-0x40FF │ │ BAR: 0x4100-0x41FF
345-
│ │
346-
└────────────────────┘ └────────────────────┘
313+
+----------------------------------+
314+
| Root Complex |
315+
| |
316+
| 3: 0x4100 +-------+ |
317+
| +-----------------> | |
318+
| | 4: 0x4100 | IOMMU | |
319+
| | +-------+ | |
320+
| | | +-------+ |
321+
| ++---------v-+ |
322+
| | Root Port | |
323+
+----|------------|----------------+
324+
| |
325+
+-----^----+-+
326+
| |
327+
2: 0x4100 | |5: 0x4100
328+
| |
329+
+-------+----v-----+
330+
| |
331+
| |
332+
| Switch |
333+
| |
334+
| |
335+
+-----^---------+--+
336+
| |
337+
1: 0x4100 | |6: 0x4100
338+
| |
339+
+----------------+---+ +---v----------------+
340+
| | | |
341+
| GPU A | | GPU B |
342+
| | | |
343+
| BDF: 0000:12:00.0 | | BDF: 0000:13:00.0 |
344+
| BAR: 0x4000-0x40FF | | BAR: 0x4100-0x41FF |
345+
| | | |
346+
+--------------------+ +--------------------+
347347
```
348348

349349
We go through each step of the process:

0 commit comments

Comments
 (0)