Skip to content

Implement non-root user for backend Python container #91

@davidamacey

Description

@davidamacey

Summary

Implement a non-root user for the backend Python container to follow Docker security best practices and comply with industry standards for container security. Currently, both Dockerfile.prod and the production deployment run containers as root user, which poses security risks.

Progress Update (2025-10-14)

Implementation Complete - Ready for Testing

All code changes have been implemented and documented. The non-root user implementation is complete and ready for testing.

Completed Tasks:

  • ✅ Updated backend/Dockerfile.prod with multi-stage build and non-root user
  • ✅ Updated docker-compose.yml volume mappings for development
  • ✅ Updated docker-compose.prod.yml volume mappings for production
  • ✅ Created scripts/fix-model-permissions.sh migration script
  • ✅ Updated CLAUDE.md with security documentation
  • ✅ Updated scripts/README.md with migration guide
  • ✅ All changes pushed to fix/backend-non-root branch

Next Steps:

  • ⏳ Test in development environment
  • ⏳ Verify GPU access and model caching
  • ⏳ Test migration script on existing installation
  • ⏳ Build and push new Docker images to DockerHub

Background

Current State:

  • backend/Dockerfile.prod runs as root user (no USER directive)
  • backend/Dockerfile.prod.optimized already has non-root implementation (user appuser with UID 1000)
  • Volume mappings in docker-compose files use /root/.cache/* paths
  • Model cache directories are mounted to root-owned paths

Security Concerns:

  1. Running as root violates principle of least privilege
  2. Container escape vulnerabilities could lead to host root compromise
  3. File permission issues when volumes are accessed from host
  4. Non-compliance with security scanning tools (Trivy, Snyk, etc.)

Objectives

Implement non-root user configuration that:

  1. ✅ Follows Python official Docker image best practices
  2. ✅ Maintains compatibility with GPU access (NVIDIA runtime)
  3. ✅ Preserves model caching functionality (HuggingFace, PyTorch)
  4. ✅ Ensures proper permissions for temp directories and file uploads
  5. ✅ Works with both development and production environments
  6. ✅ Compatible with Celery worker processes
  7. ✅ No breaking changes to existing deployments

Implementation Summary

1. Updated Dockerfile.prod ✅

Converted to multi-stage build with non-root user:

  • Added builder stage for package installation
  • Created appuser (UID 1000, GID 1000) in video group
  • Updated cache directories to /home/appuser/.cache/*
  • Set environment variables for HuggingFace and PyTorch
  • Added health check for container orchestration
  • All files copied with proper ownership

2. Updated Docker Compose Files ✅

Modified Files:

  • docker-compose.yml (development)
  • docker-compose.prod.yml (production)

Services Updated:

  • backend
  • celery-worker
  • flower

Volume Mappings Changed:

# Old (root user)
- ${MODEL_CACHE_DIR}/huggingface:/root/.cache/huggingface
- ${MODEL_CACHE_DIR}/torch:/root/.cache/torch

# New (non-root user)
- ${MODEL_CACHE_DIR}/huggingface:/home/appuser/.cache/huggingface
- ${MODEL_CACHE_DIR}/torch:/home/appuser/.cache/torch

3. Created Migration Script ✅

File: scripts/fix-model-permissions.sh

Automated permission fixer for existing deployments:

  • Reads MODEL_CACHE_DIR from .env file
  • Fixes ownership to UID:GID 1000:1000
  • Supports Docker and sudo methods
  • Sets correct permissions (755 for dirs, 644 for files)

Usage:

./scripts/fix-model-permissions.sh

4. Updated Documentation ✅

CLAUDE.md:

  • Added "Security Features" section
  • Documented non-root container user
  • Included migration instructions
  • Updated volume mapping examples

scripts/README.md:

  • Added "Model Cache Permission Fixer" section
  • Documented script usage and verification
  • Linked to security documentation

Testing Checklist

Development Environment Tests

  • ./opentr.sh start dev starts without errors
  • Backend container runs as non-root user (docker exec backend whoami)
  • Model downloads work (HuggingFace, PyTorch)
  • File uploads to MinIO succeed
  • Transcription tasks complete successfully
  • Celery worker processes tasks correctly
  • Flower dashboard accessible

Production Environment Tests

  • Docker image builds successfully
  • Container starts without permission errors
  • GPU access works (NVIDIA runtime)
  • Model cache persists between restarts
  • Multi-user scenarios work correctly
  • Health checks pass
  • Log files are accessible

Security Verification

  • Container runs as UID 1000 (verify with docker exec -it <container> whoami)
  • No root processes inside container
  • Security scanners (Trivy, Snyk) pass
  • File permissions are correct (755 for dirs, 644 for files)
  • Volume mounts have proper ownership

GPU and AI Model Tests

  • WhisperX transcription works with GPU
  • PyAnnote diarization works
  • Model downloads to correct cache location
  • Models persist after container restart
  • CUDA libraries accessible to non-root user

Migration Path for Existing Deployments

For Users with Existing Deployments:

  1. Run the migration script:

    ./scripts/fix-model-permissions.sh
  2. Pull latest changes:

    git pull origin main
  3. Rebuild and restart containers:

    docker compose down
    docker compose build
    docker compose up -d
  4. Verify migration:

    # Check container user
    docker compose exec backend whoami
    # Should output: appuser (not root)
    
    # Verify model cache accessibility
    docker compose exec backend ls -la /home/appuser/.cache/huggingface

Documentation Updates

  • Update CLAUDE.md with new volume paths ✅
  • Update scripts/README.md with migration script documentation ✅
  • Update main README.md with security best practices section
  • Update setup script setup-opentranscribe.sh to set correct permissions
  • Add troubleshooting guide for permission issues
  • Document GPU access requirements for non-root users ✅

Acceptance Criteria

  • Dockerfile.prod.optimized already implements non-root pattern (use as reference) ✅
  • Dockerfile.prod updated to match optimized version ✅
  • All docker-compose files use /home/appuser/.cache/* paths ✅
  • Development and production environments both work ⏳
  • GPU access verified on NVIDIA systems ⏳
  • Model caching works without permission errors ⏳
  • File uploads and temp directory writes succeed ⏳
  • Security scanners pass without root user warnings ⏳
  • Existing deployments can migrate without data loss (script provided) ✅
  • Documentation updated with migration guide ✅

References

Industry Standards

Best Practices

  • Use numeric UID for better compatibility across systems
  • Avoid UID 0 (root) and UIDs below 1000 (system users)
  • Use named volumes for better permission management
  • Add user to necessary groups (video for GPU access)
  • Set proper file permissions (755 for executables, 644 for files)

Related Issues

Priority

High - Security best practice, required for production deployments

Labels

security, docker, backend, enhancement, production

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions