Skip to content

Conversation

@hippogr
Copy link
Contributor

@hippogr hippogr commented Dec 17, 2025

Set priviledge as true for user jobs to avoid NVidia GPUs losing in containers.

Copilot AI review requested due to automatic review settings December 17, 2025 07:22
@hippogr hippogr requested review from abuccts and zhogu December 17, 2025 07:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request attempts to fix an issue where containers lose access to NVIDIA GPUs by unconditionally setting the privileged flag to true for all user job containers. Previously, privileged mode was only enabled when the Docker-in-Docker (dind) runtime plugin was active. The change removes the conditional logic and makes all job containers run in privileged mode.

Key Changes:

  • Removed conditional privileged mode based on dindMode flag
  • Set privileged: true unconditionally for all user job containers in Kubernetes pod specifications

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

],
securityContext: {
...((dindMode) && { privileged: true }),
privileged: true,
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting privileged to true unconditionally for all user jobs poses a significant security risk. Privileged containers have access to all host devices and can bypass most container security features, essentially granting root-level access to the host system. The previous implementation only enabled privileged mode when dindMode (Docker-in-Docker) was active, which was a more secure approach.

While privileged mode may enable GPU access, it's important to verify that this is truly necessary. NVIDIA GPUs can typically be accessed in containers without privileged mode when the NVIDIA Container Runtime is properly configured. Consider investigating whether the issue is actually with the container runtime configuration, device plugin setup, or other aspects of the GPU passthrough mechanism rather than requiring privileged containers.

Suggested change
privileged: true,
...(launcherConfig.dindMode && { privileged: true }),

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants