Open
Conversation
- Add comprehensive troubleshooting section to README with 6 common issues - Change .gitmodules to use HTTPS URLs instead of SSH (fixes auth issues) - Fix setup script to create diffusion_outputs directory automatically - Add SUBMODULE_FIXES.md documenting required changes for submodules
- Implement gvm-docker-daemon for automatic GPU resource control - Add comprehensive documentation and examples - Test with Stable Diffusion workload (50 images, 8GB limit, priority 7) - Support memory limits and compute priority via environment variables
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request: GVM Docker Integration
Summary
This PR adds Docker integration for GVM, enabling GPU resource control (memory limits and compute priority) for containerized workloads through a monitoring daemon.
Motivation
Modern GPU workloads increasingly run in containers, but existing container orchestration lacks fine-grained GPU resource management. This integration brings GVM's GPU virtualization capabilities to Docker, enabling:
Implementation
Architecture
The integration uses a daemon-based approach rather than a runtime wrapper:
gvm-docker-daemon.go) - Monitors Docker containers and applies GVM controlsGVM_*environment variables/sys/kernel/debug/nvidia-uvm/processes/Why daemon instead of runtime wrapper?
docker run -d)Key Features
✅ Automatic GPU process discovery - Finds GPU processes associated with containers
✅ Environment-based configuration - Simple
GVM_MEMORY_LIMITandGVM_COMPUTE_PRIORITYenv vars✅ Retry logic - Handles processes that appear after container startup
✅ Non-intrusive - No Docker daemon modifications required
✅ Production-tested - Successfully ran 50-image diffusion workload with GVM controls
Files Added
Core Implementation
gvm-docker/gvm-docker-daemon.go- Main daemon implementation (250 lines)gvm-docker/DOCKER_INTEGRATION.md- Comprehensive documentationgvm-docker/PR_DESCRIPTION.md- This PR descriptionExamples
gvm-docker/examples/diffusion/Dockerfile- Example GPU workload containergvm-docker/examples/test-colocation.sh- Test script for workload colocationDocumentation
gvm-docker/README.md- Quick start guide (updated)Testing
Test Environment
Test Results
Single Container Test:
Results:
Daemon Logs:
Usage Example
1. Start the daemon
2. Run a GPU container with GVM controls
3. Verify controls are applied
Use Cases
1. Workload Colocation
Run inference server + batch training on same GPU:
2. Multi-Tenant GPU Sharing
Isolate GPU resources between tenants:
3. Development/Testing
Prevent runaway processes from consuming all GPU memory:
Compatibility
Requirements
--gpusflag)/sys/kernel/debug/nvidia-uvm/)Limitations
Future Work
Documentation
Comprehensive documentation added:
Breaking Changes
None - this is a new feature addition.
Checklist
Related Issues
This PR addresses the need for containerized GPU workload management mentioned in discussions about GVM use cases for cloud environments and multi-tenant scenarios.
Acknowledgments
Thanks to the GVM team for the excellent GPU virtualization framework that made this integration possible.