A web dashboard that summarizes GPU utilization across SSH-accessible nodes by running nvidia-smi on each host.
- The machine running the dashboard must be able to SSH directly to each node
~/.ssh/id_ed25519_hakonemust exist (or editsrc/config.pyto changeSSH_IDENTITY_FILE)- SSH options (User, IdentityFile) are passed explicitly;
~/.ssh/configis optional
cd baulab-dashboard # after: git clone git@github.com:wendlerc/baulab-dashboard.git
uv sync
uv run python -m src.mainTo access from outside your network:
# Terminal 1: start the dashboard
uv run python -m src.main
# Terminal 2: start the tunnel
./run-tunnel.shThe tunnel prints a URL like https://random-name.trycloudflare.com — open it in a browser. No Cloudflare account required.
To run the dashboard automatically on boot with Flask + cloudflared:
1. Configure the env file
mkdir -p ~/.config/baulab-dashboard
nano ~/.config/baulab-dashboard/envOption A: Cloudflare named tunnel (recommended) — no rate limits, stable URL:
CLOUDFLARE_TUNNEL_TOKEN=eyJ...
CLOUDFLARE_TUNNEL_URL=https://your-tunnel.example.com
Option B: Quick tunnel — no config needed; URL is printed to the log. Use ?api=URL when visiting to point the frontend at the tunnel.
2. Install the systemd user service
cd /path/to/baulab-dashboard
mkdir -p ~/.config/systemd/user
cp baulab-dashboard.service ~/.config/systemd/user/
# Edit the service file: update WorkingDirectory, ExecStart, and EnvironmentFile
# to match your paths (e.g. ~/code/baulab-dashboard)
nano ~/.config/systemd/user/baulab-dashboard.service
systemctl --user daemon-reload
systemctl --user enable baulab-dashboard
systemctl --user start baulab-dashboard3. Enable user services at boot (without login)
loginctl enable-linger4. Verify
systemctl --user status baulab-dashboard
journalctl --user -u baulab-dashboard -fStartup logs are also written to /tmp/baulab-dashboard-startup.log.
- A background thread runs
nvidia-smion each node every 60 seconds via SSH - The web UI polls
/api/statusevery 30 seconds - Nodes are queried in parallel; connection timeout is 15 seconds per node