Skip to content

IOMMU related issues #718

@hzc12321

Description

@hzc12321

I have built v1.2.0 RC2 from source. After running
sudo build/gatekeeper
an error is shown :
cannot add vfio group to container, error 22 (invalid argument)

and there I'm unable to start Gatekeeper. While troubleshooting, the error message below is found :
sudo dmesg | grep -i vfio
Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.

From the preliminary study, I think the issue is more related to hardware / BIOS. However, I can't find an exact solution to actually solve this. I'm trying my luck here to see if anyone with deeper understanding of Intel VT-d, IOMMU and vfio-pci can assist to provide any idea.

The same error occured on both GT and GK. Below is the specification of the testbed :
Bare-metal deployment, isolated lab environment.
GK :
OS : Ubuntu 24.04 LTS
Server : HPE ProLiant
RAM : 256GB
CPU : Intel Xeon E5-2665 2.4GHz, 32 cores
NUMA : 2 NUMA nodes
NIC : Intel I350 1G, both front and back (Confirmed that DPDK is supported). The server also has Intel 82599ES 10G interface that supports DPDK, but we neither have a 10G uplink router available at the moment, so we didn't use it for the testbed.

GT :
OS : Ubuntu 24.04 LTS
Server : HPE ProLiant
RAM : 256GB
CPU : Intel Xeon E5-2640 2.6GHz, 32 cores
NUMA : 2 NUMA nodes
NIC : Intel I350 1G, front

Solutions tried on GT (which didn't work):
Adding vfio_iommu_type1.allow_unsafe_interrupts=1 In GRUB_CMDLINE_LINUX_DEFAULT

Current thoughts :

  1. In https://www.kernel.org/doc/Documentation/devicetree/bindings/iommu/iommu.txt , it is mentioned that "The device tree node of the IOMMU device's parent bus must contain a valid "dma-ranges" property that describes how the physical address space of the IOMMU maps to memory. An empty "dma-ranges" property means that there is a 1:1 mapping from IOMMU to memory.".
    Is there a way to verify the "dma-ranges" property? If yes, at least I can know what is causing it a non 1:1 mapping, and probably being able to trace down the root cause from here.
  2. The "Contact your platform vendor" mentioned in the error message raised my suspicion of BIOS incompatibility. If these machines can't do the job, machines of which specification / brand can?

Some links that are probably relevant but I can't fully understand the content due to lacking of relevant experience :
https://github.com/kiler129/relax-intel-rmrr/blob/master/deep-dive.md#what-vendors-did-wrong
https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com/
https://lore.kernel.org/linux-iommu/BN9PR11MB52768ACA721898D5C43CBE9B8C27A@BN9PR11MB5276.namprd11.prod.outlook.com/t/
https://lore.kernel.org/linux-iommu/ZEKQwVjJrMTUlPUR@nvidia.com/
https://community.hpe.com/t5/proliant-servers-ml-dl-sl/proliant-dl360-gen9-getting-error-quot-rejecting-configuring-the/td-p/7220298
https://www.reddit.com/r/VFIO/comments/1gi95zf/rejecting_configuring_the_device_without_a_11/?rdt=37975
https://forum.proxmox.com/threads/qemu-exited-with-code-1-pcie-passthrough-not-working.146297/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions