Simple docker compose for getting ComfyUI and sd-webui (Forge) up and running with minimal modification on the main system.
- docker-compose
- AMD GPU (PRs for other cards are welcome)
- checkpoints are saved into data/checkpoints other model files in their respective subfolder, for example data/comfyui/models or data/sd-webui/models
- clone this repo
- open repo directory in terminal
- start up the docker container by typing:
docker-compose up- It will take a while to download all python libraries
- wait a while until you see the text:
To see the GUI go to: http://0.0.0.0:31488 - After getting that message start ComfyUI by open browser with the following link: http://localhost:31488
- or WebUI by going to http://localhost:31489
You can also run without containers using the helper script:
./run-local.sh
- On first run it will create a Python virtual environment and set up ComfyUI, ROCm PyTorch and Flash Attention.
- It creates a marker file in the data directory to skip repeated heavy setup on subsequent runs.
- By default it prompts whether to reinstall ROCm Torch and Flash Attention. To automate, use environment variables:
NON_INTERACTIVE=1to disable promptsREINSTALL_ROCM=yesto reinstall ROCm Torch in non-interactive modeREINSTALL_FLASH=yesto reinstall Flash Attention in non-interactive mode
You can customize behavior via a .env file at the repo root (auto-loaded) or by exporting variables before running:
PYTHON_VERSION(default3.12)ROCM_VERSION(nightly,release, orcpuonly; defaultnightlyfor local run)ROOT_DIR(default${PWD}/data/home-local)COMFYUI_PORT(default31490)
Example non-interactive run on CPU-only:
ROCM_VERSION=cpuonly NON_INTERACTIVE=1 ./run-local.sh
Important: The provided .env in this repo is tailored for Docker and sets ROOT_DIR=/root. When running locally as a non-root user, run-local.sh will ignore ROOT_DIR=/root and use ${PWD}/data/home-local instead. To explicitly control this, set ROOT_DIR to a writable directory when invoking the script or in your own .env.local file.
Examples:
ROOT_DIR="$PWD/data/home-local" ./run-local.sh
# or
export ROOT_DIR="$PWD/data/home-local"; ./run-local.sh
To make the docker created files accessable from your own user run these commands
sudo setfacl -d -m "u:${USER}:rwX" data/
sudo setfacl -R -m "u:${USER}:rwX" data/
You can run in CPU-only mode by setting ROCM_VERSION=cpuonly and remove the lines with /dev/kfd in docker-compose.yml
You can enable an optional profiling mode for ComfyUI when running locally. It will:
- Continuously sample and log GPU memory usage (allocated, reserved, peak) and process RSS to a CSV file.
- Optionally run a PyTorch profiler (CPU + ROCm/CUDA) and export a Chrome trace JSON.
How to use:
- Enable profiling for a run by setting
PROFILING=1:
PROFILING=1 ./run-local.sh
- Outputs will be stored under a timestamped folder:
${ROOT_DIR}/profiles/<YYYYMMDD-HHMMSS>/
- memory_log.csv # periodic memory samples
- torch_trace.json # Chrome trace from torch profiler
Environment variables (optional):
PROFILING_SAMPLING_INTERVAL(default2): seconds between memory samples.PROFILING_TORCH_PROFILER(default1): set to0to disable the torch profiler and only collect memory samples.PROFILING_EXPORT_GZIP(default1): when enabled, also writes a compressed*.json.gzversion of every exported trace (recommended for Perfetto UI).PROFILING_RECORD_SHAPES(default0): enable to record input shapes; increases trace size.PROFILING_WITH_STACK(default0): enable to capture Python stack traces; increases trace size.
How to get torch_trace.json (step-by-step):
- Start ComfyUI with profiling enabled:
PROFILING=1 ./run-local.sh- Optional: leave the default
PROFILING_TORCH_PROFILER=1so the trace is recorded.
- Use ComfyUI as usual to reproduce the behavior you want to profile.
- Stop ComfyUI (Ctrl+C in the terminal). On shutdown, the profiler writes the trace to:
${ROOT_DIR}/profiles/<YYYYMMDD-HHMMSS>/torch_trace.json
- Open the trace in one of the following tools:
- Chrome: open
chrome://tracingand load the file - Perfetto UI: https://ui.perfetto.dev (you can load
.jsonor the compressed.json.gzdirectly) - Note: Firefox Profiler expects a different profile format and typically cannot import Chrome Trace JSON. Use Perfetto or Chrome for these traces.
- Chrome: open
Viewing and correlating data:
torch_trace.json(andtorch_trace.json.gz) show detailed operator timelines (CPU and, if available, ROCm GPU activity through the CUDA flag).memory_log.csvprovides a coarse timeline of process RSS and GPU memory usage to correlate with actions in ComfyUI.
Notes:
- Profiling is opt-in and does not change the default behavior if
PROFILINGis not set. - On ROCm builds, the torch profiler uses the CUDA activity flag internally but maps to HIP; it works if
torch.cuda.is_available()returns true. - The trace is written when the app shuts down (Ctrl+C). On shutdown you should see a message like:
[profiler] Exported Chrome trace to: .../torch_trace.json. - On-demand export without stopping: send SIGUSR1 to the running process (Linux/macOS). Example:
pkill -USR1 -f profiler_runner.py. Each signal writes a timestamped snapshot (e.g.,torch_trace-YYYYMMDD-HHMMSS-<n>.json) and automatically restarts the profiler to continue recording; on final shutdown it writestorch_trace.json.
- If Perfetto (or your browser) complains about memory or 2 GB ArrayBuffer limits, load the compressed
.json.gzfile instead of the raw.json. - To reduce trace size further, keep
PROFILING_RECORD_SHAPES=0andPROFILING_WITH_STACK=0(defaults). Enable them only when necessary. - Firefox Profiler generally cannot import Chrome Trace JSON. Use Chrome
chrome://tracingor Perfetto UI.
Big thanks for all Open Source gangs that have made this possible.
- ComfyUI: https://github.com/comfyanonymous/ComfyUI
- Automatic1111: https://github.com/AUTOMATIC1111/stable-diffusion-webui
- WebUI Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge