Skip to content

Aio studio desktop#143

Merged
robballantyne merged 21 commits intomainfrom
aio-studio-desktop
Apr 9, 2026
Merged

Aio studio desktop#143
robballantyne merged 21 commits intomainfrom
aio-studio-desktop

Conversation

@robballantyne
Copy link
Copy Markdown
Collaborator

No description provided.

robballantyne and others added 19 commits March 31, 2026 15:15
New derivative image bundling eight creative AI applications with a
GPU-accelerated KDE Plasma desktop (Selkies WebRTC streaming + VNC):

- ComfyUI, SD Forge, Wan2GP, ACE Step, Voicebox, Whisper WebUI,
  Ostris AI Toolkit, Unsloth Studio
- KDE Plasma desktop with Blender, Chrome, LibreOffice, VLC
- Desktop runs as single supervisor service (start/stop atomically)
- NVIDIA display drivers installed on-demand at first desktop start
- Per-app isolated venvs sharing torch via .pth + copied dist-info
- Based on multi-torch base image (2.10.0, 2.9.1, 2.7.1)
- SUPERVISOR_AUTOSTART env var for boot-time service selection
- Blackwell GPU fix for ACE Step (nanovllm CUDA graph capture)
- UV_TORCH_BACKEND=cu128 to prevent CUDA 13 wheel resolution
- Agent API image scaffold (external/vllm/derivatives/agent-api)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Selkies resets X display resolution when its GStreamer pipeline
reinitializes (on startup and client connections). Replace one-shot
resize attempts with a background loop that checks every 5 seconds
and re-applies the target resolution when it drifts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The background+wait pattern broke because pty is a shell function
that doesn't propagate to subshells. The direct exec via pty works
reliably. Orphan training process cleanup will be addressed separately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dockerfile.base: system deps, multi-torch venvs, create_app_venv helper,
full desktop stack (KDE, Selkies, VNC, VirtualGL, Blender, Chrome).
Built as robatvastai/aio-studio:base — changes rarely.

Dockerfile: apps only (ComfyUI, Forge, Voicebox, Ostris, Wan2GP,
Unsloth, Whisper, ACE Step) + ROOT copy + env-hash.
Derives from aio-studio:base — fast iteration on app versions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ROOT_BASE: desktop infrastructure (supervisor scripts, PAM, polkit,
D-Bus, KDE config, Chrome wrapper, NVIDIA driver installer, VGL patcher,
SUPERVISOR_AUTOSTART). Copied in Dockerfile.base.

ROOT: app configs and scripts only. Copied in Dockerfile.

Blender moved from base to app Dockerfile — version iterates faster
than desktop infrastructure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Chrome install moved from Dockerfile.base to Dockerfile (stays current)
- Unsloth llama.cpp: use ai-dock pre-built CUDA binaries instead of
  compiling from source (saves several minutes per build)
- Desktop supervisor startsecs reduced from 60 to 10

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create /tmp/.X11-unix as root before starting Xvfb as user
  (sticky bit on /tmp prevents non-root from creating directories)
- Set __EGL_VENDOR_LIBRARY_FILENAMES to Mesa software EGL to prevent
  NVIDIA EGL/GBM segfault in virtual framebuffer context
  (matches standalone desktop image behavior)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
__EGL_VENDOR_LIBRARY_FILENAMES was exported globally, forcing all
processes (including VirtualGL/Blender) to use software rendering.
Now passed via env prefix to Xvfb only, so GPU-accelerated apps
use NVIDIA EGL as intended.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Suppresses "bitsandbytes not installed" warning and enables 8-bit
AdamW optimizer for training.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run the Node process in background with wait so cleanup_generic.sh
trap fires on TERM and kills the full process tree, including run.py
training jobs that escape the supervisor process group.

Dropped unbuffer/pty for the backgrounded process — unbuffer -p exits
immediately when backgrounded without a proper TTY. The log-tee
handler still captures output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AI Toolkit's Node worker spawns run.py with start_new_session=True,
which detaches from the process group and survives supervisor stop.
Added targeted pkill trap matching "ai-toolkit/run.py" to ensure
training jobs and their VRAM are released on service stop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pkill trap fires on TERM whether pty execs or not, so we can
keep pty for proper TTY output while still killing detached training
jobs on stop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ai-dock CUDA binaries now install to /opt/llama.cpp-cuda (immutable)
with a symlink from the unsloth data dir. Workspace sync was
overwriting them with stale CPU-only binaries from prior runs.

Boot script 38-unsloth-symlinks.sh now restores the symlink after
workspace sync to ensure CUDA inference works on every boot.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
unsloth studio setup downloads its own CPU-only prebuilt llama.cpp
(nvidia-smi not available at build time). Our ai-dock CUDA binaries
must be installed after to replace them. The rm + symlink ensures
the CUDA backend is used for inference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
libmtmd.so.0 and other llama.cpp shared libs at /opt/llama.cpp-cuda
need to be registered after boot since the symlink paths change
during workspace sync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ldconfig conf created at build time may not survive across image
iterations. Write /etc/ld.so.conf.d/llama-cpp.conf and run ldconfig
on every boot when /opt/llama.cpp-cuda exists, regardless of
workspace sync state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
05-configure-cuda.sh removes any ldconfig conf containing "cuda"
in its content. Renamed /opt/llama.cpp-cuda to /opt/llama-cpp-gpu
so the ldconfig entry survives the boot-time CUDA cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build-aio-studio-base.yml: manual trigger only, builds Dockerfile.base
(system deps + desktop). Rarely needed.

build-aio-studio.yml: bi-weekly schedule (1st + 15th) + manual trigger.
Resolves latest release tags (ComfyUI, Voicebox, Whisper, ACE Step) and
latest commits (Forge, AI Toolkit, Wan2GP). Always builds on schedule —
no age gating since bi-weekly guarantees upstream changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ai-dock pre-built binaries (b8606) caused model behavior differences
vs Unsloth's pinned version (b8595). Revert to UNSLOTH_LLAMA_FORCE_COMPILE
which builds the exact version Unsloth expects with CUDA support.

Binaries are still copied to /opt/llama-cpp-gpu to survive workspace
sync, with the boot script restoring symlinks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new AIO Studio PyTorch-derived image that bundles multiple creative AI applications plus a GPU-accelerated remote desktop stack, along with CI workflows to build/publish the base + app images. It also adds an initial architecture/design doc for a separate “agent-api” derivative image.

Changes:

  • Add AIO Studio Dockerfiles, Supervisor services/scripts, and boot-time configuration for desktop + 8 apps.
  • Add desktop enablement infrastructure (D-Bus configs, polkit/PAM rules, KDE defaults, VirtualGL patching, NVIDIA display driver libs installer).
  • Add GitHub Actions workflows to build/push aio-studio base and aio-studio app images.

Reviewed changes

Copilot reviewed 46 out of 48 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
external/vllm/derivatives/agent-api/CLAUDE.md Architecture/design notes for a headless multimodal “agent-api” image.
derivatives/pytorch/derivatives/aio-studio/ROOT/provisioning.yaml Adds ComfyUI provisioner extension configuration.
derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/whisper-webui.sh Supervisor start script for Whisper WebUI.
derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/wan2gp.sh Supervisor start script for Wan2GP.
derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/voicebox.sh Supervisor start script for Voicebox.
derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/unsloth-studio.sh Supervisor start script for Unsloth Studio.
derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/forge.sh Supervisor start script for SD Forge.
derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/comfyui.sh Supervisor start script for ComfyUI (+ requirements refresh behavior).
derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/ai-toolkit.sh Supervisor start script for Ostris AI Toolkit (+ cleanup override).
derivatives/pytorch/derivatives/aio-studio/ROOT/opt/supervisor-scripts/ace-step.sh Supervisor start script for ACE Step (API+UI orchestration).
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/vast_boot.d/38-unsloth-symlinks.sh Boot-time fixes for Unsloth symlinks + llama.cpp ldconfig.
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/vast_boot.d/05-aio-studio-env.sh AIO Studio portal defaults + UV torch backend env.
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/whisper-webui.conf Supervisor program config for Whisper WebUI.
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/wan2gp.conf Supervisor program config for Wan2GP.
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/voicebox.conf Supervisor program config for Voicebox.
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/unsloth-studio.conf Supervisor program config for Unsloth Studio.
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/forge.conf Supervisor program config for SD Forge.
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/comfyui.conf Supervisor program config for ComfyUI.
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/ai-toolkit.conf Supervisor program config for AI Toolkit.
derivatives/pytorch/derivatives/aio-studio/ROOT/etc/supervisor/conf.d/ace-step.conf Supervisor program config for ACE Step.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/usr/local/bin/google-chrome Adds Chrome wrapper (forces --no-sandbox).
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/opt/supervisor-scripts/vgl-desktop-patcher.sh Patches .desktop Exec lines to run via vglrun and reverts on exit.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/opt/supervisor-scripts/nvidia-display-drivers.sh Downloads/extracts matching NVIDIA .run to supply missing GL/Vulkan libs.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/opt/supervisor-scripts/desktop.sh Unified Supervisor-managed desktop stack (dbus, Xvfb, PipeWire, KDE, VNC, TURN, Selkies).
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/opt/instance-tools/bin/export_env.sh Adds a replacement env export parser script.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/xdg/kscreenlockerrc Disables KDE screen locking.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/xdg/kdeglobals KDE defaults (single click, lock restrictions).
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/vast_boot.d/60-supervisor-autostart.sh Enables autostart for selected Supervisor services via env var.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/supervisor/conf.d/desktop.conf Supervisor program config for unified desktop service.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/polkit-1/polkit-1/rules.d/49-nopasswd_global.rules Polkit rule (currently unconditional allow-all).
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/polkit-1/polkit-1/rules.d/02-unrestricted.rules Polkit rule allowing broad action prefixes.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/polkit-1/polkit-1/rules.d/01-nopassword.rules Polkit admin rule for all unix users.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/polkit-1/polkit-1/localauthority/50-local.d/00-allow-all.pkla Legacy pkla allow-all policy.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/pam.d/sudo PAM config (currently unconditional permit).
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/pam.d/polkit-1 PAM config (currently unconditional permit).
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/pam.d/login PAM config (currently unconditional permit).
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/pam.d/common-auth PAM config (currently unconditional permit).
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/dbus-1/container-system.conf Container system bus config with power management restrictions.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/dbus-1/container-session.conf Container session bus config with permissive policy + high limits.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/apt/preferences.d/nosnap.pref Pins snapd to prevent installation.
derivatives/pytorch/derivatives/aio-studio/ROOT_BASE/etc/apt/preferences.d/firefox.pref Pins snap Firefox packages to prevent installation.
derivatives/pytorch/derivatives/aio-studio/README.template.md Vast template-facing documentation for AIO Studio.
derivatives/pytorch/derivatives/aio-studio/README.md Repository documentation for the AIO Studio image.
derivatives/pytorch/derivatives/aio-studio/Dockerfile.base Base layer: system deps, desktop stack deps, helper venv tooling.
derivatives/pytorch/derivatives/aio-studio/Dockerfile App layer: installs ComfyUI/Forge/Voicebox/etc + Chrome + Blender.
derivatives/pytorch/derivatives/ace-step/Dockerfile Adds UV backend env and a Blackwell workaround patch.
.github/workflows/build-aio-studio.yml CI workflow to resolve refs and build/push AIO Studio app image.
.github/workflows/build-aio-studio-base.yml CI workflow to build/push AIO Studio base image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +3
polkit.addRule(function(action, subject) {
return polkit.Result.YES;
}); No newline at end of file
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polkit rule unconditionally returns polkit.Result.YES for every action and subject, which effectively disables authorization checks system-wide inside the container. If this image is ever exposed beyond a fully trusted local desktop session, this is a critical privilege-escalation surface. Consider narrowing this rule to the intended user/session (e.g., only uid 1001, only active local session) and/or only a minimal allowlist of actions required for the desktop experience, rather than Action=* semantics.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +4
auth sufficient pam_permit.so
account sufficient pam_permit.so
password sufficient pam_permit.so
session sufficient pam_permit.so No newline at end of file
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These PAM configs use pam_permit.so for all stages, which effectively makes sudo authentication/authorization unconditional. This is extremely permissive and bypasses all password / policy checks. If the goal is passwordless sudo for a specific non-root user, prefer using /etc/sudoers.d/* with NOPASSWD and leave PAM intact; otherwise you risk unexpected privilege escalation paths and breaking security assumptions in other tooling.

Copilot uses AI. Check for mistakes.
Comment on lines +147 to +151
# --- 4. VirtualGL desktop patcher ---
if [[ "${DISABLE_VGL,,}" != "true" ]] && nvidia-smi --query-gpu=uuid --format=csv,noheader 2>/dev/null | head -n1 | grep -q .; then
log "Starting VirtualGL desktop patcher..."
runuser -u user -- /opt/supervisor-scripts/vgl-desktop-patcher.sh 2>&1 | sed -u 's/^/[vgl] /' &
else
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vgl-desktop-patcher.sh is started in the background but its PID is not added to the PIDS array (unlike other background services). As a result, stopping the desktop supervisor program may leave the patcher running and .desktop files remain patched/unpatched unpredictably. Start it via run_bg_user (or otherwise track and terminate it in cleanup_desktop).

Copilot uses AI. Check for mistakes.
Comment on lines +238 to +242
&& echo "[desktop] Display resized to $TARGET"
fi
sleep 5
done
) &
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The persistent resize loop is launched in the background but not tracked for cleanup. On service stop/restart, this can leave orphaned loops repeatedly calling xrandr and selkies-gstreamer-resize. Consider launching it with run_bg_user/PID tracking (or store its PID and terminate it in cleanup_desktop).

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +97
repository: Comfy-Org/ComfyUI
age-threshold-seconds: "999999999"
github-token: ${{ secrets.GITHUB_TOKEN }}
trigger: ${{ inputs.COMFYUI_REF && 'manual' || 'manual' }}
manual-ref: ${{ inputs.COMFYUI_REF }}
default-ref: master
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow passes trigger as 'manual' unconditionally. In check-github-release, any trigger other than schedule skips the release API check (action.yml:63-68), so scheduled runs here will never resolve the latest release tag and will fall back to default-ref. Align with the established pattern used in .github/workflows/build-comfyui.yml (trigger = inputs.* && github.event_name || 'schedule') so scheduled builds actually poll releases.

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +16
schedule:
# Bi-weekly: 1st and 15th of each month at midnight UTC
- cron: '0 0 1,15 * *'

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow deviates from the repository CI/CD conventions documented in .github/AGENTS.md (e.g., required preflight -> build -> collect-tags -> notify job naming/order and the standard schedule pattern). Since the doc explicitly says “Do not deviate from these conventions” (.github/AGENTS.md:4,55-58), consider restructuring this workflow to match the standard pipeline so future maintenance and shared tooling (tag collection, notifications, gating) behave consistently.

Suggested change
schedule:
# Bi-weekly: 1st and 15th of each month at midnight UTC
- cron: '0 0 1,15 * *'

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +92
notify:
needs: build
if: always()
uses: ./.github/workflows/notify-slack.yml
with:
build-result: ${{ needs.build.result }}
image-name: "AIO Studio Base"
image-ref: "base"
image-tags: '["${{ needs.build.outputs.FULL_IMAGE || '' }}"]'
trigger: ${{ github.event_name }}
run-url: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
secrets:
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notify job references needs.build.outputs.FULL_IMAGE, but the build job does not define any outputs (it only writes to $GITHUB_ENV). This will evaluate to empty and produce incorrect Slack notifications. Either expose FULL_IMAGE via outputs on the build job, or follow the repo’s standard pattern of uploading a built-tag artifact and using a collect-tags job to feed notify.

Copilot uses AI. Check for mistakes.
robballantyne and others added 2 commits April 3, 2026 12:47
desktop.sh: track vgl-desktop-patcher and persistent resize loop in
PIDS[] so cleanup_desktop actually terminates them on supervisor stop.

build-aio-studio.yml: the trigger ternaries on every check-github-release
call were 'manual' on both branches, so scheduled cron runs always
skipped the release API path and fell back to default-ref. Use
github.event_name on the non-manual branch.

build-aio-studio-base.yml: build job did not declare outputs, so
needs.build.outputs.FULL_IMAGE in notify was always empty. Expose it via
$GITHUB_OUTPUT and tighten the image-tags expression to fall back to
'[]' instead of producing literal '[""]'.

Not addressed: Copilot also flagged the polkit allow-all rules and PAM
pam_permit.so configs as privilege-escalation surfaces. These are
intentional for this image — it is a single-user GPU desktop sandbox
running on ephemeral Vast.ai instances where the user already owns the
container. Scoping polkit/PAM would add complexity without changing the
threat model. Revisit if the image is ever used in a multi-tenant
context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robballantyne robballantyne merged commit 99bd679 into main Apr 9, 2026
@robballantyne robballantyne deleted the aio-studio-desktop branch April 9, 2026 12:29
robballantyne added a commit that referenced this pull request Apr 9, 2026
These were internal planning notes accidentally committed in #143
and not intended for publication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants