Conversation
New derivative image bundling eight creative AI applications with a GPU-accelerated KDE Plasma desktop (Selkies WebRTC streaming + VNC): - ComfyUI, SD Forge, Wan2GP, ACE Step, Voicebox, Whisper WebUI, Ostris AI Toolkit, Unsloth Studio - KDE Plasma desktop with Blender, Chrome, LibreOffice, VLC - Desktop runs as single supervisor service (start/stop atomically) - NVIDIA display drivers installed on-demand at first desktop start - Per-app isolated venvs sharing torch via .pth + copied dist-info - Based on multi-torch base image (2.10.0, 2.9.1, 2.7.1) - SUPERVISOR_AUTOSTART env var for boot-time service selection - Blackwell GPU fix for ACE Step (nanovllm CUDA graph capture) - UV_TORCH_BACKEND=cu128 to prevent CUDA 13 wheel resolution - Agent API image scaffold (external/vllm/derivatives/agent-api) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Selkies resets X display resolution when its GStreamer pipeline reinitializes (on startup and client connections). Replace one-shot resize attempts with a background loop that checks every 5 seconds and re-applies the target resolution when it drifts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The background+wait pattern broke because pty is a shell function that doesn't propagate to subshells. The direct exec via pty works reliably. Orphan training process cleanup will be addressed separately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dockerfile.base: system deps, multi-torch venvs, create_app_venv helper, full desktop stack (KDE, Selkies, VNC, VirtualGL, Blender, Chrome). Built as robatvastai/aio-studio:base — changes rarely. Dockerfile: apps only (ComfyUI, Forge, Voicebox, Ostris, Wan2GP, Unsloth, Whisper, ACE Step) + ROOT copy + env-hash. Derives from aio-studio:base — fast iteration on app versions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ROOT_BASE: desktop infrastructure (supervisor scripts, PAM, polkit, D-Bus, KDE config, Chrome wrapper, NVIDIA driver installer, VGL patcher, SUPERVISOR_AUTOSTART). Copied in Dockerfile.base. ROOT: app configs and scripts only. Copied in Dockerfile. Blender moved from base to app Dockerfile — version iterates faster than desktop infrastructure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Chrome install moved from Dockerfile.base to Dockerfile (stays current) - Unsloth llama.cpp: use ai-dock pre-built CUDA binaries instead of compiling from source (saves several minutes per build) - Desktop supervisor startsecs reduced from 60 to 10 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create /tmp/.X11-unix as root before starting Xvfb as user (sticky bit on /tmp prevents non-root from creating directories) - Set __EGL_VENDOR_LIBRARY_FILENAMES to Mesa software EGL to prevent NVIDIA EGL/GBM segfault in virtual framebuffer context (matches standalone desktop image behavior) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
__EGL_VENDOR_LIBRARY_FILENAMES was exported globally, forcing all processes (including VirtualGL/Blender) to use software rendering. Now passed via env prefix to Xvfb only, so GPU-accelerated apps use NVIDIA EGL as intended. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Suppresses "bitsandbytes not installed" warning and enables 8-bit AdamW optimizer for training. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run the Node process in background with wait so cleanup_generic.sh trap fires on TERM and kills the full process tree, including run.py training jobs that escape the supervisor process group. Dropped unbuffer/pty for the backgrounded process — unbuffer -p exits immediately when backgrounded without a proper TTY. The log-tee handler still captures output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AI Toolkit's Node worker spawns run.py with start_new_session=True, which detaches from the process group and survives supervisor stop. Added targeted pkill trap matching "ai-toolkit/run.py" to ensure training jobs and their VRAM are released on service stop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pkill trap fires on TERM whether pty execs or not, so we can keep pty for proper TTY output while still killing detached training jobs on stop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ai-dock CUDA binaries now install to /opt/llama.cpp-cuda (immutable) with a symlink from the unsloth data dir. Workspace sync was overwriting them with stale CPU-only binaries from prior runs. Boot script 38-unsloth-symlinks.sh now restores the symlink after workspace sync to ensure CUDA inference works on every boot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
unsloth studio setup downloads its own CPU-only prebuilt llama.cpp (nvidia-smi not available at build time). Our ai-dock CUDA binaries must be installed after to replace them. The rm + symlink ensures the CUDA backend is used for inference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
libmtmd.so.0 and other llama.cpp shared libs at /opt/llama.cpp-cuda need to be registered after boot since the symlink paths change during workspace sync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ldconfig conf created at build time may not survive across image iterations. Write /etc/ld.so.conf.d/llama-cpp.conf and run ldconfig on every boot when /opt/llama.cpp-cuda exists, regardless of workspace sync state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
05-configure-cuda.sh removes any ldconfig conf containing "cuda" in its content. Renamed /opt/llama.cpp-cuda to /opt/llama-cpp-gpu so the ldconfig entry survives the boot-time CUDA cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build-aio-studio-base.yml: manual trigger only, builds Dockerfile.base (system deps + desktop). Rarely needed. build-aio-studio.yml: bi-weekly schedule (1st + 15th) + manual trigger. Resolves latest release tags (ComfyUI, Voicebox, Whisper, ACE Step) and latest commits (Forge, AI Toolkit, Wan2GP). Always builds on schedule — no age gating since bi-weekly guarantees upstream changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ai-dock pre-built binaries (b8606) caused model behavior differences vs Unsloth's pinned version (b8595). Revert to UNSLOTH_LLAMA_FORCE_COMPILE which builds the exact version Unsloth expects with CUDA support. Binaries are still copied to /opt/llama-cpp-gpu to survive workspace sync, with the boot script restoring symlinks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a new AIO Studio PyTorch-derived image that bundles multiple creative AI applications plus a GPU-accelerated remote desktop stack, along with CI workflows to build/publish the base + app images. It also adds an initial architecture/design doc for a separate “agent-api” derivative image.
Changes:
- Add AIO Studio Dockerfiles, Supervisor services/scripts, and boot-time configuration for desktop + 8 apps.
- Add desktop enablement infrastructure (D-Bus configs, polkit/PAM rules, KDE defaults, VirtualGL patching, NVIDIA display driver libs installer).
- Add GitHub Actions workflows to build/push aio-studio base and aio-studio app images.
Reviewed changes
Copilot reviewed 46 out of 48 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| external/vllm/derivatives/agent-api/CLAUDE.md | Architecture/design notes for a headless multimodal “agent-api” image. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/provisioning.yaml | Adds ComfyUI provisioner extension configuration. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/whisper-webui.sh | Supervisor start script for Whisper WebUI. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/wan2gp.sh | Supervisor start script for Wan2GP. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/voicebox.sh | Supervisor start script for Voicebox. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/unsloth-studio.sh | Supervisor start script for Unsloth Studio. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/forge.sh | Supervisor start script for SD Forge. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/comfyui.sh | Supervisor start script for ComfyUI (+ requirements refresh behavior). |
| derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/ai-toolkit.sh | Supervisor start script for Ostris AI Toolkit (+ cleanup override). |
| derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/ace-step.sh | Supervisor start script for ACE Step (API+UI orchestration). |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/vast_boot.d/38-unsloth-symlinks.sh | Boot-time fixes for Unsloth symlinks + llama.cpp ldconfig. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/vast_boot.d/05-aio-studio-env.sh | AIO Studio portal defaults + UV torch backend env. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/whisper-webui.conf | Supervisor program config for Whisper WebUI. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/wan2gp.conf | Supervisor program config for Wan2GP. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/voicebox.conf | Supervisor program config for Voicebox. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/unsloth-studio.conf | Supervisor program config for Unsloth Studio. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/forge.conf | Supervisor program config for SD Forge. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/comfyui.conf | Supervisor program config for ComfyUI. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/ai-toolkit.conf | Supervisor program config for AI Toolkit. |
| derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/ace-step.conf | Supervisor program config for ACE Step. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/usr/local/bin/google-chrome | Adds Chrome wrapper (forces --no-sandbox). |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/opt/supervisor-scripts/vgl-desktop-patcher.sh | Patches .desktop Exec lines to run via vglrun and reverts on exit. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/opt/supervisor-scripts/nvidia-display-drivers.sh | Downloads/extracts matching NVIDIA .run to supply missing GL/Vulkan libs. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/opt/supervisor-scripts/desktop.sh | Unified Supervisor-managed desktop stack (dbus, Xvfb, PipeWire, KDE, VNC, TURN, Selkies). |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/opt/instance-tools/bin/export_env.sh | Adds a replacement env export parser script. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/xdg/kscreenlockerrc | Disables KDE screen locking. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/xdg/kdeglobals | KDE defaults (single click, lock restrictions). |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/vast_boot.d/60-supervisor-autostart.sh | Enables autostart for selected Supervisor services via env var. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/supervisor/conf.d/desktop.conf | Supervisor program config for unified desktop service. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/polkit-1/polkit-1/rules.d/49-nopasswd_global.rules | Polkit rule (currently unconditional allow-all). |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/polkit-1/polkit-1/rules.d/02-unrestricted.rules | Polkit rule allowing broad action prefixes. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/polkit-1/polkit-1/rules.d/01-nopassword.rules | Polkit admin rule for all unix users. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/polkit-1/polkit-1/localauthority/50-local.d/00-allow-all.pkla | Legacy pkla allow-all policy. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/pam.d/sudo | PAM config (currently unconditional permit). |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/pam.d/polkit-1 | PAM config (currently unconditional permit). |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/pam.d/login | PAM config (currently unconditional permit). |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/pam.d/common-auth | PAM config (currently unconditional permit). |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/dbus-1/container-system.conf | Container system bus config with power management restrictions. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/dbus-1/container-session.conf | Container session bus config with permissive policy + high limits. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/apt/preferences.d/nosnap.pref | Pins snapd to prevent installation. |
| derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/apt/preferences.d/firefox.pref | Pins snap Firefox packages to prevent installation. |
| derivatives/pytorch/derivatives/aio-studio/README.template.md | Vast template-facing documentation for AIO Studio. |
| derivatives/pytorch/derivatives/aio-studio/README.md | Repository documentation for the AIO Studio image. |
| derivatives/pytorch/derivatives/aio-studio/Dockerfile.base | Base layer: system deps, desktop stack deps, helper venv tooling. |
| derivatives/pytorch/derivatives/aio-studio/Dockerfile | App layer: installs ComfyUI/Forge/Voicebox/etc + Chrome + Blender. |
| derivatives/pytorch/derivatives/ace-step/Dockerfile | Adds UV backend env and a Blackwell workaround patch. |
| .github/workflows/build-aio-studio.yml | CI workflow to resolve refs and build/push AIO Studio app image. |
| .github/workflows/build-aio-studio-base.yml | CI workflow to build/push AIO Studio base image. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| polkit.addRule(function(action, subject) { | ||
| return polkit.Result.YES; | ||
| }); No newline at end of file |
There was a problem hiding this comment.
The polkit rule unconditionally returns polkit.Result.YES for every action and subject, which effectively disables authorization checks system-wide inside the container. If this image is ever exposed beyond a fully trusted local desktop session, this is a critical privilege-escalation surface. Consider narrowing this rule to the intended user/session (e.g., only uid 1001, only active local session) and/or only a minimal allowlist of actions required for the desktop experience, rather than Action=* semantics.
| auth sufficient pam_permit.so | ||
| account sufficient pam_permit.so | ||
| password sufficient pam_permit.so | ||
| session sufficient pam_permit.so No newline at end of file |
There was a problem hiding this comment.
These PAM configs use pam_permit.so for all stages, which effectively makes sudo authentication/authorization unconditional. This is extremely permissive and bypasses all password / policy checks. If the goal is passwordless sudo for a specific non-root user, prefer using /etc/sudoers.d/* with NOPASSWD and leave PAM intact; otherwise you risk unexpected privilege escalation paths and breaking security assumptions in other tooling.
| # --- 4. VirtualGL desktop patcher --- | ||
| if [[ "${DISABLE_VGL,,}" != "true" ]] && nvidia-smi --query-gpu=uuid --format=csv,noheader 2>/dev/null | head -n1 | grep -q .; then | ||
| log "Starting VirtualGL desktop patcher..." | ||
| runuser -u user -- /opt/supervisor-scripts/vgl-desktop-patcher.sh 2>&1 | sed -u 's/^/[vgl] /' & | ||
| else |
There was a problem hiding this comment.
vgl-desktop-patcher.sh is started in the background but its PID is not added to the PIDS array (unlike other background services). As a result, stopping the desktop supervisor program may leave the patcher running and .desktop files remain patched/unpatched unpredictably. Start it via run_bg_user (or otherwise track and terminate it in cleanup_desktop).
| && echo "[desktop] Display resized to $TARGET" | ||
| fi | ||
| sleep 5 | ||
| done | ||
| ) & |
There was a problem hiding this comment.
The persistent resize loop is launched in the background but not tracked for cleanup. On service stop/restart, this can leave orphaned loops repeatedly calling xrandr and selkies-gstreamer-resize. Consider launching it with run_bg_user/PID tracking (or store its PID and terminate it in cleanup_desktop).
| repository: Comfy-Org/ComfyUI | ||
| age-threshold-seconds: "999999999" | ||
| github-token: ${{ secrets.GITHUB_TOKEN }} | ||
| trigger: ${{ inputs.COMFYUI_REF && 'manual' || 'manual' }} | ||
| manual-ref: ${{ inputs.COMFYUI_REF }} | ||
| default-ref: master |
There was a problem hiding this comment.
This workflow passes trigger as 'manual' unconditionally. In check-github-release, any trigger other than schedule skips the release API check (action.yml:63-68), so scheduled runs here will never resolve the latest release tag and will fall back to default-ref. Align with the established pattern used in .github/workflows/build-comfyui.yml (trigger = inputs.* && github.event_name || 'schedule') so scheduled builds actually poll releases.
| schedule: | ||
| # Bi-weekly: 1st and 15th of each month at midnight UTC | ||
| - cron: '0 0 1,15 * *' | ||
|
|
There was a problem hiding this comment.
This workflow deviates from the repository CI/CD conventions documented in .github/AGENTS.md (e.g., required preflight -> build -> collect-tags -> notify job naming/order and the standard schedule pattern). Since the doc explicitly says “Do not deviate from these conventions” (.github/AGENTS.md:4,55-58), consider restructuring this workflow to match the standard pipeline so future maintenance and shared tooling (tag collection, notifications, gating) behave consistently.
| schedule: | |
| # Bi-weekly: 1st and 15th of each month at midnight UTC | |
| - cron: '0 0 1,15 * *' |
| notify: | ||
| needs: build | ||
| if: always() | ||
| uses: ./.github/workflows/notify-slack.yml | ||
| with: | ||
| build-result: ${{ needs.build.result }} | ||
| image-name: "AIO Studio Base" | ||
| image-ref: "base" | ||
| image-tags: '["${{ needs.build.outputs.FULL_IMAGE || '' }}"]' | ||
| trigger: ${{ github.event_name }} | ||
| run-url: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} | ||
| secrets: |
There was a problem hiding this comment.
The notify job references needs.build.outputs.FULL_IMAGE, but the build job does not define any outputs (it only writes to $GITHUB_ENV). This will evaluate to empty and produce incorrect Slack notifications. Either expose FULL_IMAGE via outputs on the build job, or follow the repo’s standard pattern of uploading a built-tag artifact and using a collect-tags job to feed notify.
desktop.sh: track vgl-desktop-patcher and persistent resize loop in PIDS[] so cleanup_desktop actually terminates them on supervisor stop. build-aio-studio.yml: the trigger ternaries on every check-github-release call were 'manual' on both branches, so scheduled cron runs always skipped the release API path and fell back to default-ref. Use github.event_name on the non-manual branch. build-aio-studio-base.yml: build job did not declare outputs, so needs.build.outputs.FULL_IMAGE in notify was always empty. Expose it via $GITHUB_OUTPUT and tighten the image-tags expression to fall back to '[]' instead of producing literal '[""]'. Not addressed: Copilot also flagged the polkit allow-all rules and PAM pam_permit.so configs as privilege-escalation surfaces. These are intentional for this image — it is a single-user GPU desktop sandbox running on ephemeral Vast.ai instances where the user already owns the container. Scoping polkit/PAM would add complexity without changing the threat model. Revisit if the image is ever used in a multi-tenant context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These were internal planning notes accidentally committed in #143 and not intended for publication. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No description provided.