-
Notifications
You must be signed in to change notification settings - Fork 247
Description
I have built v1.2.0 RC2 from source. After running
sudo build/gatekeeper
an error is shown :
cannot add vfio group to container, error 22 (invalid argument)
and there I'm unable to start Gatekeeper. While troubleshooting, the error message below is found :
sudo dmesg | grep -i vfio
Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.
From the preliminary study, I think the issue is more related to hardware / BIOS. However, I can't find an exact solution to actually solve this. I'm trying my luck here to see if anyone with deeper understanding of Intel VT-d, IOMMU and vfio-pci can assist to provide any idea.
The same error occured on both GT and GK. Below is the specification of the testbed :
Bare-metal deployment, isolated lab environment.
GK :
OS : Ubuntu 24.04 LTS
Server : HPE ProLiant
RAM : 256GB
CPU : Intel Xeon E5-2665 2.4GHz, 32 cores
NUMA : 2 NUMA nodes
NIC : Intel I350 1G, both front and back (Confirmed that DPDK is supported). The server also has Intel 82599ES 10G interface that supports DPDK, but we neither have a 10G uplink router available at the moment, so we didn't use it for the testbed.
GT :
OS : Ubuntu 24.04 LTS
Server : HPE ProLiant
RAM : 256GB
CPU : Intel Xeon E5-2640 2.6GHz, 32 cores
NUMA : 2 NUMA nodes
NIC : Intel I350 1G, front
Solutions tried on GT (which didn't work):
Adding vfio_iommu_type1.allow_unsafe_interrupts=1 In GRUB_CMDLINE_LINUX_DEFAULT
Current thoughts :
- In https://www.kernel.org/doc/Documentation/devicetree/bindings/iommu/iommu.txt , it is mentioned that "The device tree node of the IOMMU device's parent bus must contain a valid "dma-ranges" property that describes how the physical address space of the IOMMU maps to memory. An empty "dma-ranges" property means that there is a 1:1 mapping from IOMMU to memory.".
Is there a way to verify the "dma-ranges" property? If yes, at least I can know what is causing it a non 1:1 mapping, and probably being able to trace down the root cause from here. - The "Contact your platform vendor" mentioned in the error message raised my suspicion of BIOS incompatibility. If these machines can't do the job, machines of which specification / brand can?
Some links that are probably relevant but I can't fully understand the content due to lacking of relevant experience :
https://github.com/kiler129/relax-intel-rmrr/blob/master/deep-dive.md#what-vendors-did-wrong
https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com/
https://lore.kernel.org/linux-iommu/BN9PR11MB52768ACA721898D5C43CBE9B8C27A@BN9PR11MB5276.namprd11.prod.outlook.com/t/
https://lore.kernel.org/linux-iommu/ZEKQwVjJrMTUlPUR@nvidia.com/
https://community.hpe.com/t5/proliant-servers-ml-dl-sl/proliant-dl360-gen9-getting-error-quot-rejecting-configuring-the/td-p/7220298
https://www.reddit.com/r/VFIO/comments/1gi95zf/rejecting_configuring_the_device_without_a_11/?rdt=37975
https://forum.proxmox.com/threads/qemu-exited-with-code-1-pcie-passthrough-not-working.146297/