Skip to content

feat: add browser-based WebUI for CorridorKey#164

Open
JamesNyeVRGuy wants to merge 9 commits intonikopueringer:mainfrom
JamesNyeVRGuy:feat/webui
Open

feat: add browser-based WebUI for CorridorKey#164
JamesNyeVRGuy wants to merge 9 commits intonikopueringer:mainfrom
JamesNyeVRGuy:feat/webui

Conversation

@JamesNyeVRGuy
Copy link
Contributor

@JamesNyeVRGuy JamesNyeVRGuy commented Mar 14, 2026

Full-stack web interface served via FastAPI + SvelteKit, accessible at localhost:3000 via docker compose --profile web up -d --build.

Backend (web/api/):

  • FastAPI app with lifespan management, SPA fallback routing
  • Clip scanning, detail, deletion, and move-between-projects endpoints
  • Job submission with full pipeline chaining (extract → alpha → inference)
  • Parallel worker pool: CPU jobs (extraction) run alongside GPU jobs, configurable VRAM limit for GPU job concurrency
  • Video upload with auto frame extraction, zip frame/alpha/mask upload
  • Preview: single-frame PNG (EXR converted on-the-fly), ffmpeg-stitched MP4 video with caching and encode-lock, ZIP download per pass
  • Project CRUD (create, rename, delete, list with nested clips)
  • System endpoints: device detection, system-wide VRAM via nvidia-smi, model weight download from HuggingFace, VRAM limit control
  • WebSocket for real-time job progress, status, VRAM updates

Frontend (web/frontend/):

  • SvelteKit SPA with Svelte 5 runes, adapter-static build
  • Corridor Digital branding: signature yellow (#fff203) on black, cinematic dark theme, Outfit + JetBrains Mono typography
  • Project-grouped clip browser with collapsible sections
  • Drag-and-drop upload (videos + zipped frames), drag clips between projects
  • Right-click context menus: rename project, process all, move/delete clips
  • Clip detail: inference form with slider + number input, output config, full pipeline button, individual step buttons, alpha/mask upload
  • Frame viewer: scrub, play (ffmpeg MP4), A/B comparison mode, per-pass download, keyboard shortcuts (Space, arrows, Home/End, ?)
  • Job queue: real-time progress bars with ETA + fps, expandable error logs
  • Settings: weight management with download buttons, auto-extract toggle, VRAM limit slider, default inference params
  • Toast notification system replacing all alert() calls
  • Global activity bar showing current job progress
  • VRAM meter + connection status in sidebar

Infrastructure:

  • Multi-stage Dockerfile (Node build → Python runtime with CUDA 12.8)
  • docker-compose.yml web profile with GPU passthrough + weight volumes
  • pyproject.toml: web optional dependency group (fastapi, uvicorn, python-multipart)
  • .dockerignore/.gitignore updates for frontend artifacts + Projects dir

Existing code changes:

  • backend/service.py: fix total_mem → total_memory (PyTorch API change)
  • gvm_core/wrapper.py: add progress_callback parameter to process_sequence

What does this change?

Adds a persistent WebUI that wraps the existing backend/ service layer. Users can manage clips, configure inference, monitor jobs, and preview results from a browser instead of using the CLI wizard. The WebUI is entirely optional — no changes to the CLI workflow. All new code lives in web/ except two small fixes to existing files:

  1. backend/service.pyprops.total_memprops.total_memory (attribute renamed in recent PyTorch versions, caused silent VRAM query failures)
  2. gvm_core/wrapper.py — added progress_callback parameter to process_sequence() so the WebUI can report per-batch GVM progress via WebSocket

How was it tested?

  • Built and ran via docker compose --profile web up -d --build on Linux with an NVIDIA RTX 5090 (CUDA 12.8)
  • Uploaded green screen video clips, ran full pipeline (extract → GVM alpha → inference), verified output passes (FG, Matte, Comp, Processed)
  • Tested video playback, A/B comparison, per-pass ZIP download
  • Tested project creation, clip moving between projects, deletion
  • Tested weight download (CorridorKey, GVM, VideoMaMa) from Settings page
  • Verified real-time WebSocket progress updates during jobs
  • uv run ruff check web/ passes
  • uv run ruff format --check web/ passes

Checklist

  • uv run pytest passes
  • uv run ruff check passes
  • uv run ruff format --check passes

@JamesNyeVRGuy
Copy link
Contributor Author

image image image image

@shezmic
Copy link
Contributor

shezmic commented Mar 14, 2026

Code Review — Community Contribution

Hey @JamesNyeVRGuy, this is a feature-rich PR. The WebUI concept is great for accessibility. I've focused my review on the changes to existing files since those affect all users, not just WebUI users.


Changes to existing files

1. backend/service.pytotal_memtotal_memory (line 196)

This is a legitimate fix. PyTorch's cuda.get_device_properties() returns a _CudaDeviceProperties object where the attribute is total_memory, not total_mem. The current code on main would raise AttributeError when get_vram_info() is called on a CUDA system. Good catch — this should probably be a separate small PR so it can be merged independently without waiting for the full WebUI review.

2. gvm_core/wrapper.py — added progress_callback parameter

This is an additive, backwards-compatible change (default=None, no behavior change when not passed). The callback fires at the batch level with (batch_id + 1, total_batches), which is the right granularity. The whitespace-only fix on the function signature is fine. One consideration: gvm_core/ is third-party upstream code (BSD-2-Clause) — modifying it means the project carries a fork delta that would need to be re-applied on upstream GVM updates.

3. pyproject.tomlweb optional dependency group

Clean — fastapi, uvicorn[standard], python-multipart are the right minimal set. Scoped under [project.optional-dependencies] as web, so it doesn't affect default installs.

4. docker-compose.ymlweb profile

Uses a separate Dockerfile.web and the web profile, so it doesn't interfere with the existing corridorkey service. gpus via environment variable with all default is sensible.

5. .gitignore / .dockerignoreProjects/ and frontend artifacts

Appropriate additions.

General observations on the WebUI code (new web/ directory)

Since the entire WebUI is new code under web/, it doesn't risk breaking existing functionality — the CLI and existing backend paths are unchanged. A few things worth noting for the maintainer:

  • The WebUI is a significant maintenance surface area (~5000+ lines of new code including SvelteKit frontend). The maintainer should weigh whether they want to own this long-term.
  • Docker-based deployment is the right call — keeps the web stack isolated from the Python ML environment.
  • The parallel worker pool (CPU + GPU jobs) is a solid architectural choice for the web context.

Recommendation

The total_memtotal_memory fix is a genuine bug fix that benefits all users. Consider splitting that into a standalone 1-line PR so it can merge quickly. The progress_callback addition to GVM wrapper is clean and backwards-compatible. The bulk of the PR (the WebUI itself) is fundamentally a maintainer decision about scope and long-term ownership.

@JamesNyeVRGuy
Copy link
Contributor Author

Code Review — Community Contribution

Hey @JamesNyeVRGuy, this is an ambitious and feature-rich PR. The WebUI concept is great for accessibility. I've focused my review on the changes to existing files since those affect all users, not just WebUI users.

Changes to existing files

1. backend/service.pytotal_memtotal_memory (line 196)

This is a legitimate fix. PyTorch's cuda.get_device_properties() returns a _CudaDeviceProperties object where the attribute is total_memory, not total_mem. The current code on main would raise AttributeError when get_vram_info() is called on a CUDA system. Good catch — this should probably be a separate small PR so it can be merged independently without waiting for the full WebUI review.

2. gvm_core/wrapper.py — added progress_callback parameter

This is an additive, backwards-compatible change (default=None, no behavior change when not passed). The callback fires at the batch level with (batch_id + 1, total_batches), which is the right granularity. The whitespace-only fix on the function signature is fine. One consideration: gvm_core/ is third-party upstream code (BSD-2-Clause) — modifying it means the project carries a fork delta that would need to be re-applied on upstream GVM updates.

3. pyproject.tomlweb optional dependency group

Clean — fastapi, uvicorn[standard], python-multipart are the right minimal set. Scoped under [project.optional-dependencies] as web, so it doesn't affect default installs.

4. docker-compose.ymlweb profile

Uses a separate Dockerfile.web and the web profile, so it doesn't interfere with the existing corridorkey service. gpus via environment variable with all default is sensible.

5. .gitignore / .dockerignoreProjects/ and frontend artifacts

Appropriate additions.

General observations on the WebUI code (new web/ directory)

Since the entire WebUI is new code under web/, it doesn't risk breaking existing functionality — the CLI and existing backend paths are unchanged. A few things worth noting for the maintainer:

* The WebUI is a significant maintenance surface area (~5000+ lines of new code including SvelteKit frontend). The maintainer should weigh whether they want to own this long-term.

* Docker-based deployment is the right call — keeps the web stack isolated from the Python ML environment.

* The parallel worker pool (CPU + GPU jobs) is a solid architectural choice for the web context.

Recommendation

The total_memtotal_memory fix is a genuine bug fix that benefits all users. Consider splitting that into a standalone 1-line PR so it can merge quickly. The progress_callback addition to GVM wrapper is clean and backwards-compatible. The bulk of the PR (the WebUI itself) is fundamentally a maintainer decision about scope and long-term ownership.

#165
#166

@shezmic
Copy link
Contributor

shezmic commented Mar 14, 2026

Follow-up review (condensed):

I reviewed the existing-file edits in this PR.

  1. backend/service.py
  • props.total_mem -> props.total_memory is a correct fix.
  • This is independently useful and could be split into a tiny standalone PR for faster merge.
  1. gvm_core/wrapper.py
  • progress_callback addition is backward-compatible (None default).
  • No objection technically.
  1. Scope note
  • The rest of this PR is a large new WebUI surface area.
  • That’s mainly a maintainer scope/ownership decision, not a correctness objection.

Net: the total_memory fix is valid; WebUI merge decision is about project direction and maintenance cost.

Copy link
Contributor

@shezmic shezmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes to existing files and the web/api/ backend. Frontend (Svelte) components not reviewed in depth.


NOTE — Depends on #165 and #166 to function

Two bugs exist on main that affect the WebUI:

  1. backend/service.py line 196: props.total_memprops.total_memory (fixes VRAM query always returning {} on CUDA; PR #165 addresses this)
  2. gvm_core/wrapper.py: process_sequence() doesn't accept progress_callback yet, which causes TypeError on every service.run_gvm() call (PR #166 addresses this)

Both are tracked separately — worth coordinating merge order so the WebUI doesn't land on a broken base.


NOTE — Projects/ and ClipsForInference/ are not the same root

web/api/routes/clips.py uses Projects/ as the clip storage root. The CLI wizard uses ClipsForInference/. A clip set up via the WebUI won't be visible to the CLI wizard and vice versa — they scan different directories. This isn't necessarily wrong (isolated modes can be valid), but it isn't documented anywhere in the PR or the README additions. Users who switch between CLI and WebUI for the same footage will get confused.

Fix: Add a note to the WebUI section of the README clarifying that the WebUI manages clips under Projects/ and the CLI uses ClipsForInference/, and that the two directories are independent.


SUGGESTION — Mac/MLX path not validated

The PR was tested on Linux with NVIDIA GPU (CUDA 12.8) only. The uvicorn non-Docker path (uv run uvicorn web.api.app:create_app --factory --port 3000) should work on Mac, but nothing in the submission confirms it. Specifically: nvidia-smi returns nothing on Mac, which the /api/system/vram endpoint handles correctly (falls back to VRAMResponse(available=False)), but the actual inference path through MLX is untested.

Not a blocker for merging with appropriate documentation, but the README should note that Mac support for the WebUI is not yet validated.


SUGGESTION — web/api/worker.py VRAM check only applies to CUDA

_get_free_vram_gb() in web/api/worker.py returns None if CUDA is not available, and _can_start_gpu_job() returns True when None is returned:

if free is None:
    return True  # can't check, allow it

On Mac with MLX, this means the VRAM concurrency gate never activates — multiple GPU jobs can be submitted simultaneously with no throttle. For a single-user local tool this is probably fine, but it silently disables the feature for the non-CUDA case. A comment in _can_start_gpu_job() noting that MLX VRAM checking is not yet implemented would prevent confusion.


No no-go zones touched. The web/ code goes through backend.service — it doesn't call inference engines directly.

@nikopueringer
Copy link
Owner

First off, this looks amazing, and would work great with how our computers are set up for the team at Corridor.

Merging this into the project now, however, would mean having to also maintain and review all future bug fixes and feature tweaks on the WebUI.

I think the best approach would be to run this as a separate project or fork, and if there are any tweaks we need to make to the main branch in order to make it plug-and-play with the WebUI, we can do so. It would make sense to give people the option to download and implement the WebUI as an add-on or a separate version. Are there any efficient changes we would need to make to the main repo here to facilitate that?

@JamesNyeVRGuy
Copy link
Contributor Author

I hear you on maintenance scope; that's a fair concern.

Both of those small PRs I needed for this to run smoothly already merged though (thank youuuu), so the WebUI now works against main as-is with zero changes to existing code.
It's entirely self-contained in web/ and wraps backend/ as a read-only consumer. Changes to the inference engine, color pipeline, or CLI don't require WebUI updates.

The practical concern with a separate repo is user friction. Right now someone can clone CorridorKey and run docker compose --profile web up -d --build ... done. With a separate repo they'd need to clone two repos, make sure versions match, and manually merge directories. For a tool aimed at artists and VFX teams, that's a significant barrier.

Ideally a GitHub Container Registry image can be published so users don't even need to build.

Just pull and run:

# docker-compose.yml — no clone or build required
services:
  corridorkey-web:
    image: ghcr.io/nikopueringer/corridorkey-web:latest
    ports: ["3000:3000"]
    gpus: all
    volumes:
      - ./Projects:/app/Projects
      - ./CorridorKeyModule/checkpoints:/app/CorridorKeyModule/checkpoints
      - ./gvm_core/weights:/app/gvm_core/weights
      - ./VideoMaMaInferenceModule/checkpoints:/app/VideoMaMaInferenceModule/checkpoints
      # These can point to local paths or network drives (NAS, NFS, SMB, etc.)
      # e.g. //server/vfx/projects:/app/Projects

A CI workflow could auto-publish that image on each release; zero friction for end users. Instant deployment and updates to production.

A middle ground: keep it in-tree under web/ but mark it as community-maintained in the README. Bug reports and PRs touching web/ go through community contributors (happy to own that.) You'd only need to review changes that touch files outside web/. The Docker Compose profile already isolates it, so users who don't want it never see it.

If keeping it in-tree doesn't work for you, happy to set it up as its own repo; just let me know.

Glad to hear it matches your setup at Corridor! Coming from the film industry myself this is how I would see people using it. If you have any other feature requests or needs please let me know.

@nikopueringer
Copy link
Owner

Thanks for the additional thoughts and suggestions! What you’re saying makes sense, though I am a bit inexperienced with GitHub’s process so I’ll be pulling in another person or two to help guide me on the best way to implement this. It sounds like you laid out a good approach for maintaining this hand-in-hand with the main repo. Let’s give this some discussion so we can strategize on the best way to get this out to users while keeping the main repo focused on core inference and functionality.

@JamesNyeVRGuy
Copy link
Contributor Author

Agreed and understood! I found a small bug for queuing jobs on a single GPU, so I've updated it as well. More than happy to talk thru discussions as need be. Will try to keep this PR up to date with main

I will note I setup a GitHub Container Registry image on my fork that depends on this feature branch:
https://github.com/JamesNyeVRGuy/CorridorKey/blob/feat/distributed-nodes/deploy/docker-compose.web.yml

This allows it (and the future nodes improvement) to be run on any machine without needing CLI or batch files.

@JamesNyeVRGuy
Copy link
Contributor Author

Updated the PR with fixes from testing and community feedback.

Bug fixes:

  • GPU workers limited to 1 per physical GPU. Prevents model thrashing when multiple jobs are queued (GVM + inference fighting over VRAM).
  • Multi-job queue. The queue tracked a single running job, so when a second job started the first disappeared from the UI. Now supports multiple concurrent running jobs.
  • Auto-extract on video upload was broken. _clips_dir was captured as empty string at import time, so the post-upload scan found nothing and the extraction job never queued.
  • Docker permissions error on bind-mounted Projects directory. Container's appuser couldn't write to host-owned dirs. Added user: "${UID:-1000}:${GID:-1000}" to compose and mkdir + chown in Dockerfile.

Improvements:

  • All frontend settings (inference params, output config, auto-extract toggle) persist to localStorage across page refreshes
  • Clip detail page auto-refreshes when a job completes via WebSocket. No more navigating away and back to see new outputs.
  • Batch delete: right-click a project > "Delete All Clips"
  • GZip middleware compresses HTTP responses >1KB
  • Browser push notifications when jobs complete or fail (only fires when tab is in background)

On the backend/job_queue.py change:

The internal _current_job single-slot attribute is replaced with a _running_jobs list. This is the only change outside web/ that touches actual logic. It's backward compatible:

  • The current_job property still exists, returns the first running job. Any code using the old interface gets the same behavior.
  • The CLI (clip_manager.py, corridorkey_cli.py) doesn't reference current_job at all. It uses next_job()/start_job()/complete_job() which all work the same.
  • No external code accessed _current_job directly, only through the property.
  • complete_job(), fail_job(), mark_cancelled() all work with the list using identity checks (if job in self._running_jobs), same semantics as before.

Other files outside web/: config (docker-compose.yml, pyproject.toml, .dockerignore, .gitignore) and docs (README.md). No changes to the inference engine, CLI, or model code. Rebased onto latest main.

Full-stack web interface served via FastAPI + SvelteKit, accessible at
localhost:3000 via `docker compose --profile web up -d --build`.

Backend (web/api/):
- FastAPI app with lifespan management, SPA fallback routing
- Clip scanning, detail, deletion, and move-between-projects endpoints
- Job submission with full pipeline chaining (extract → alpha → inference)
- Parallel worker pool: CPU jobs (extraction) run alongside GPU jobs,
  configurable VRAM limit for GPU job concurrency
- Video upload with auto frame extraction, zip frame/alpha/mask upload
- Preview: single-frame PNG (EXR converted on-the-fly), ffmpeg-stitched
  MP4 video with caching and encode-lock, ZIP download per pass
- Project CRUD (create, rename, delete, list with nested clips)
- System endpoints: device detection, system-wide VRAM via nvidia-smi,
  model weight download from HuggingFace, VRAM limit control
- WebSocket for real-time job progress, status, VRAM updates

Frontend (web/frontend/):
- SvelteKit SPA with Svelte 5 runes, adapter-static build
- Corridor Digital branding with cinematic dark theme
- Project-grouped clip browser with collapsible sections
- Drag-and-drop upload, clip moving, right-click context menus
- Frame viewer with playback, A/B comparison, per-pass download
- Job queue with real-time progress, ETA, expandable error logs
- Settings: weight management, auto-extract, VRAM limit, inference defaults
- Toast notifications, keyboard shortcuts (?), global activity bar
…ment

- README: document that WebUI uses Projects/ while CLI uses
  ClipsForInference/, and how to unify via CK_CLIPS_DIR
- README: note that Mac/MLX WebUI support is not yet validated
- worker.py: clarify that VRAM concurrency gate is CUDA-only,
  MLX unified memory checking not yet implemented
max_gpu_workers defaulted to 2, allowing two GPU jobs (e.g. inference +
GVM) to run simultaneously on a single GPU. Since CorridorKey needs
~23GB VRAM and models can't coexist, the second job would force-unload
the first mid-processing.

Now auto-detects GPU count and sets max_gpu_workers = 1 per GPU. On a
single GPU system, GPU jobs run strictly sequentially — the second job
queues until the first completes.
The job queue tracked only a single _current_job, so when a second job
was claimed (by a remote node or parallel worker), the first disappeared
from the UI. Now:

- GPUJobQueue uses _running_jobs list instead of _current_job singleton
- API returns running: list[Job] alongside current (backward compat)
- Frontend runningJobs store tracks all running jobs
- Jobs page "RUNNING" section shows all active jobs with count badge
- Activity bar shows a progress bar for each running job
- Deduplication checks all running jobs, not just one
- cancel_all cancels all running jobs
- find_job_by_id searches all running jobs
- Dockerfile.web: create /app/Projects owned by appuser before
  switching to non-root user
- docker-compose.yml: add user: "${UID:-1000}:${GID:-1000}" to
  web service so bind-mounted volumes match host permissions
Auto-refresh:
- Clip detail page subscribes to the clips store and re-fetches
  when the clip's state changes (e.g. job completes via WebSocket)
- No more navigating away and back to see new outputs

Batch delete:
- Right-click a project → "Delete All Clips" removes all clips
  in the project with confirmation
- Clip thumbnails already working (first frame from comp or input)
GZip middleware compresses all HTTP responses > 1KB. Significantly
reduces transfer time for PNG frame sequences to remote nodes.

Browser push notifications fire when jobs complete or fail, but only
when the tab is in the background (document.hidden). Permission
requested on first visit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants