A lightweight, distributed cluster management system built in Go for managing 100+ Ubuntu/Debian VMs in Proxmox environments. Features real-time node monitoring, file distribution across nodes, and a professional dark-mode dashboard accessible from any node.
- Zero-Configuration Join: Bootstrap cluster on one node, get a join token for all others
- Distributed Architecture: Built on etcd for reliable cluster state management
- Node Management: Join/leave cluster, real-time health monitoring with heartbeats
- File Distribution: Upload, download, and manage files across all nodes (perfect for certificate distribution)
- Web Dashboard: Professional dark-mode UI accessible from any node on port 8080
- CLI Tools: Comprehensive command-line interface for all operations
- Labels & Tagging: Organize nodes with custom labels (e.g.,
role=database,env=production)
# Install etcd
ETCD_VER=v3.6.5
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/
# Start etcd
etcd --listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://YOUR_IP:2379# Build the agent
go build -o bin/cluster-agent ./cmd/agent
# Or use Make
make build
# Copy to all nodes
scp bin/cluster-agent root@node:/usr/local/bin/# On your first node (e.g., your cert management VM)
cluster-agent init --etcd http://etcd-server-ip:2379This will output something like:
╔════════════════════════════════════════════════════════════════╗
║ Cluster Initialized Successfully! ║
╚════════════════════════════════════════════════════════════════╝
Node Information:
ID: abc123...
Name: cert-manager
IP: 192.168.1.100
Port: 8080
Dashboard: http://192.168.1.100:8080
╔════════════════════════════════════════════════════════════════╗
║ Run this command on other nodes to join the cluster: ║
╚════════════════════════════════════════════════════════════════╝
cluster-agent daemon eyJldGNkX2VuZHBvaW50cyI6WyIxOTIuMTY4LjEuMToyMzc5Il0sImNsdXN0ZXJfaWQiOiJjbHVzdGVyLWFiYzEyMyJ9
Copy the command from step 3 and run it on each of your other 99 nodes:
# On nodes 2-100
cluster-agent daemon eyJldGNkX2VuZHBvaW50cyI6WyIxOTIuMTY4LjEuMToyMzc5Il0sImNsdXN0ZXJfaWQiOiJjbHVzdGVyLWFiYzEyMyJ9That's it! All nodes are now in the cluster with the dashboard running.
List all nodes:
cluster-agent list --token <your-join-token>Leave cluster:
cluster-agent leave --token <your-join-token>Upload a file (e.g., SSL certificate):
cluster-agent file upload /path/to/cert.pem node1:8080List files in cluster:
cluster-agent file list --token <your-join-token>Download file from a node:
cluster-agent file download <file-id> node1:8080 /local/path/cert.pem# On your certificate management VM
cluster-agent file upload /etc/ssl/certs/my-app.crt 192.168.1.100:8080
# The file is now tracked in the cluster
cluster-agent file list --token <token>
# On each target node, download the certificate
cluster-agent file download <file-id> 192.168.1.100:8080 /etc/ssl/certs/my-app.crtAccess the dashboard from any node at http://node-ip:8080
Features:
- Real-time node status (updates every 5 seconds)
- Online/offline nodes visualization
- Node details (IP, labels, last seen)
- Professional dark mode interface
- No external dependencies
Create /etc/systemd/system/cluster-agent.service:
[Unit]
Description=Cluster Management Agent
After=network.target
[Service]
Type=simple
User=root
Environment="CLUSTER_TOKEN=eyJldGNkX2VuZHBvaW50cyI6WyIxOTIuMTY4LjEuMToyMzc5Il0sImNsdXN0ZXJfaWQiOiJjbHVzdGVyLWFiYzEyMyJ9"
ExecStart=/usr/local/bin/cluster-agent daemon $CLUSTER_TOKEN --labels role=app,env=prod
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.targetEnable and start:
sudo systemctl daemon-reload
sudo systemctl enable cluster-agent
sudo systemctl start cluster-agent
sudo systemctl status cluster-agent┌─────────────────────────────────────────────────────────┐
│ etcd Cluster │
│ (Distributed State Store) │
└─────────────────────────────────────────────────────────┘
▲ ▲ ▲
│ │ │
┌───────────────┘ │ └──────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Node 1 │ │ Node 2 │ │ Node N │
│ (init node) │ │ │ │ │
│ Agent │ │ Agent │ │ Agent │
│ + API │ │ + API │ │ + API │
│ + Dashboard│ │ + Dashboard│ │ + Dashboard│
│ :8080 │ │ :8080 │ │ :8080 │
└─────────────┘ └─────────────┘ └─────────────┘
- etcd: Distributed key-value store (3 nodes recommended for HA)
- Agent: Runs on each node, manages cluster membership and serves dashboard
- Dashboard: Web UI for monitoring (embedded in agent)
- File Manager: Handles file distribution across the cluster
The join token is a base64-encoded JSON containing:
- etcd endpoints
- Cluster ID
Example decoded token:
{
"etcd_endpoints": ["192.168.1.1:2379"],
"cluster_id": "cluster-abc123"
}The agent exposes a REST API on port 8080:
GET /api/nodes- List all nodesGET /api/nodes/{id}- Get node detailsGET /api/health- Health check
GET /api/files- List all filesPOST /api/files- Upload a file (multipart/form-data)GET /api/files/{id}- Get file metadataGET /api/files/{id}/download- Download fileDELETE /api/files/{id}- Delete file
For production, run etcd in cluster mode (3 or 5 nodes):
# Node 1
etcd --name node1 \
--initial-advertise-peer-urls http://10.0.0.1:2380 \
--listen-peer-urls http://0.0.0.0:2380 \
--listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://10.0.0.1:2379 \
--initial-cluster node1=http://10.0.0.1:2380,node2=http://10.0.0.2:2380,node3=http://10.0.0.3:2380Then bootstrap with all etcd nodes:
cluster-agent init --etcd http://10.0.0.1:2379,http://10.0.0.2:2379,http://10.0.0.3:2379# Add labels during init
cluster-agent init --labels role=cert-manager,tier=management,env=prod
# Add labels when joining
cluster-agent daemon <token> --labels role=database,tier=data,env=prod# Use different port (default is 8080)
cluster-agent init --port 9090
cluster-agent daemon <token> --port 9090Agent won't start:
- Check etcd is running:
curl http://etcd-server:2379/health - Verify token is correct
- Check firewall allows ports 8080 and 2379
- Check logs:
journalctl -u cluster-agent -f
Node shows offline:
- Heartbeat might have failed (recovers automatically in 10s)
- Check network connectivity to etcd
- Verify node is still running:
systemctl status cluster-agent
File transfer fails:
- Ensure source node is online
- Check target node can reach source node on port 8080
- Verify file exists on source node
Invalid join token:
- Ensure you copied the entire token
- Token is case-sensitive
- Re-run
initto generate a new token if lost
.
├── cmd/
│ ├── agent/ # Agent CLI
│ └── server/ # Standalone server (optional)
├── pkg/
│ ├── api/ # REST API handlers
│ ├── cluster/ # Cluster management logic
│ │ └── token.go # Join token generation
│ ├── filemanager/ # File distribution
│ ├── node/ # Node models
│ └── store/ # etcd abstraction
├── web/
│ └── dashboard/ # Web UI (HTML/CSS/JS)
├── Makefile # Build automation
└── README.md
# Build agent
make agent
# Build server
make server
# Build both
make build
# Clean
make clean
# Run tests
make testFor manual deployment to multiple nodes:
# Create a nodes list file
cat > nodes.txt <<EOF
192.168.1.10
192.168.1.11
192.168.1.12
EOF
# Get your join token from init
TOKEN="your-token-here"
# Deploy to all nodes
while read node; do
echo "Deploying to $node..."
scp bin/cluster-agent root@$node:/usr/local/bin/
ssh root@$node "cluster-agent daemon $TOKEN &"
done < nodes.txtProprietary - CodeCreation Labs
For issues or questions, contact your cluster administrator.
Pro Tip: Save your join token! You'll need it to add nodes, list nodes, and for administrative tasks. Store it in a password manager or environment variable.