-
Notifications
You must be signed in to change notification settings - Fork 16
Description
I've been exploring the purpose of nvidia-modprobe recently, and the implications for anyone using a dual-gpu setup and occasionally needing to blacklist the nvidia drivers. I'm using Wayland exclusively.
It's my understanding that nvidia-modprobe is provided as a fallback mechanism to ensure the nvidia driver is initialised with root priveleges (should it not already be properly initialised). The mechanism for calling nvidia-modprobe appears to be triggered by the nvidia libraries themselves when they are invoked by the relevant ICD
eg:
libnvidia-egl-gbm.so
libGLX_nvidia.so
libnvidia-egl-wayland.so
I've found that even when the nvidia drivers themselves are blacklisted, any program that tries to invoke or interrogate the ICD's for available devices causes nvidia-modprobe to be called (which in turns, attempts to modprobe nvidia as root)
Unfortunately, modprobe isn't the quickest in town and it takes a while for it to fail when the nvidia drivers are blacklisted (close to 1 second in my testing).
The problem is compounded by diagnostic tools such as inxi
For example, inxi -Fxz will repeatedly poll the ICD layer (approximately 33 times), which in turn loads the nvidia shared libraries (33 times) which triggers nvidia-modprobe (33 times)
This chain of events takes approximately 30 seconds to complete, while my journal logs shows (correctly) that Module nvidia is blacklisted (33 times).
This isn't the end of the world, though I've tried to mitigate the issue as follows:
Workaround
It's been suggested that I should be able to move nvidia-modprobe out of the way, short circuiting this chain of events somewhat. This does have the desired effect when the nvidia drivers are blacklisted
Problem
This has a side effect when the nVidia drivers are not blacklisted.
Specifically, despite the nvidia module being present and accounted for (via lsmod) it seems the appropriate device files have not been created (or the driver otherwise not fully initialised).
This is evidenced by the likes of eglinfo / vulkaninfo not showing the nVidia device whatsoever.
This can be rectified by one of the following approaches
- Manually run the renamed
nvidia-modprobe - Run
vulkaninfoas root - Run
nvidia-debugdump --listas root
Theory
I believe that this isn't an issue for X11 users, as the Xorg service runs as root and thus has no trouble when the nvidia shared libraries are instantiated (thus, the driver fully initialises without need for the nvidia-modprobe fallback mechanism.
For GDM and Wayland users, this isn't the case.. since these services do not run with superuser priveleges, the nvidia drivers will ultimately be loaded without special priveleges and will try to initiate the fallback mechanism by default. That obviously does not work if nvidia-modprobe cannot be found
So, to restate the problem (with the above taken into account)...
A linux system running Wayland without nvidia-modprobe will be unable to initialise the nVidia device without user intervention
Potential paths forward
- Accept that when the nvidia device is blacklisted,
nvidia-modprobewill trigger a modprobe any time a userspace application tries to query or use the ICD's available - and that this may not be immediate. - Accept that the removal of nvidia-modprobe will prevent proper initialisation of nVidia devices under Wayland
or we could consider a check within nvidia-modprobe (or indeed the shared libraries/drivers themselves) such that:
- Have
nvidia-modprobeproactively check if the nvidia drivers are blacklisted before calling out to/sbin/modprobeand fail fast if that is the case - Have the nvidia shared libraries proactively check if the nvidia drivers are blacklisted before attempting the fallback
nvidia-modprobemechanism
# 1 is a minor irritation (it drove me to research this issue)
# 2 could be scripted around via user code or udev rules, but doesn't help the wider community.
Perhaps # 3 or # 4 could be considered, if it doesn't introduce too much complexity?
Background:
My specific setup includes a GTX 960 with drivers 545.29.06
I've been testing across both Arch Linux and Fedora Linux (same drivers + kernel). It's worth noting that on Arch I'm using regular kernel modules, while Fedora uses akmods. I do not observe any difference in behaviour between the two.
I'm also running with an AMD RX 580
For development purposes, I frequently switch between nvidia, nouveau and amdgpu drivers using boot time kernel parameters to blacklist as appropriate.