Run massive LLM models across multiple machines by pooling VRAM over your local network.
Overview β’ Features β’ Architecture β’ Quick Start β’ Dashboard β’ Build β’ Tech Stack
RPC Manager is a lightweight orchestration platform designed to manage distributed llama.cpp RPC nodes.
It allows multiple machines to combine their GPU VRAM and compute power, enabling you to run large language models that would normally exceed a single GPU's capacity.
The system automatically discovers nodes, deploys binaries, monitors hardware, and launches cluster inference β all from a clean web interface.
This turns your local machines into a personal AI compute cluster.
Uses mDNS / Zeroconf to automatically discover nodes on the local network.
No IP configuration required.
Download and deploy llama.cpp builds directly from GitHub releases.
Supports:
- CUDA builds
- dependency downloads (DLLs)
- remote installation on nodes
Monitor all nodes in real time:
β’ CPU usage β’ system RAM β’ GPU temperature β’ GPU VRAM usage
Powered by psutil and pynvml.
Enable or disable nodes in the UI.
The orchestrator automatically builds the correct RPC endpoint configuration and launches the cluster.
Quickly manage your models:
- scan directories for
.gguf - store launch presets
- switch models instantly
Both components can be compiled into standalone executables.
β No Python installation required β Easy deployment across machines
Below is the main control panel of RPC Manager.
It provides full control over your distributed llama.cpp cluster:
- monitor node hardware in real time
- deploy binaries
- manage models
- configure launch parameters
- control cluster execution
Each connected node displays:
- CPU usage
- system RAM usage
- GPU model
- GPU VRAM usage
- GPU temperature
- live usage graphs
- RPC server status
You can also:
- deploy
llama.cppbuilds - start / stop RPC servers
- manage individual nodes
Central configuration panel where you can:
- choose
.ggufmodels - scan model directories
- configure context size
- configure GPU layers
- enable Flash Attention
- configure KV cache types
- save reusable presets
A live terminal showing the llama.cpp runtime logs.
From here you can:
- monitor model loading
- debug RPC connections
- see layer distribution across GPUs
- start or stop the entire cluster
The system consists of two applications.
+---------------------+
| RPC Server |
| (Orchestrator) |
| |
| Flask Web UI |
| Cluster Control |
+----------+---------+
|
|
RPC / WebSocket
|
+------------------+------------------+
| | |
+---------------+ +---------------+ +---------------+
| RPC Agent | | RPC Agent | | RPC Agent |
| Worker Node | | Worker Node | | Worker Node |
| GPU Machine | | GPU Machine | | GPU Machine |
+---------------+ +---------------+ +---------------+
Runs on the main machine.
Responsibilities:
- cluster orchestration
- Web UI
- launching llama.cpp
- node management
- model selection
Runs on worker machines.
Responsibilities:
- telemetry reporting
- downloading binaries
- running RPC server
- responding to orchestration commands
Download the latest binaries from the releases page:
https://github.com/arseniy0924/rpc_manager/releases
No Python installation is required.
Run the executables directly on your machines.
The easiest way to get started is using precompiled binaries.
Download:
RPC_Server.exe
from
[GitHub Releases](https://github.com/arseniy0924/rpc_manager/releases)
Run it:
RPC_Server.exe
Open your browser:
http://localhost:5000
On every worker PC:
Download:
RPC_Agent.exe
Run it.
The node will automatically appear in the dashboard.
In the Web UI:
- Select a
llama.cppbuild - Click Apply
- Set your models directory
- Click Scan
- Choose a model
- Press Start Cluster
Run directly from Python.
python server/app.py
python client/main.py
The project uses PyInstaller.
pyinstaller --clean --noconfirm --onefile --console --name "RPC_Server" \
--paths . \
--hidden-import "server" \
--collect-all "server" \
--collect-all "zeroconf" \
--collect-all "engineio" \
--collect-all "socketio" \
--collect-data "certifi" \
--add-data "server/templates;server/templates" \
--add-data "server/static;server/static" \
server/app.py
pyinstaller --noconfirm --onefile --console --name "RPC_Agent" \
--collect-all "zeroconf" \
client/main.py
- Python
- Flask
- Flask-SocketIO
- HTML5
- Vanilla JavaScript
- TailwindCSS
- Chart.js
- Zeroconf (mDNS)
- WebSockets
- psutil
- pynvml
- PyInstaller
Contributions are welcome.
If you want to improve the project:
- Fork the repository
- Create a feature branch
- Submit a pull request
You can also open an issue for bugs or feature requests.
This project is licensed under the MIT License.
See the LICENSE file for details.
If you find this project useful:
β Star the repository π Report issues π‘ Suggest improvements
