Internal Cluster Management

A lightweight, distributed cluster management system built in Go for managing 100+ Ubuntu/Debian VMs in Proxmox environments. Features real-time node monitoring, file distribution across nodes, and a professional dark-mode dashboard accessible from any node.

Features

Zero-Configuration Join: Bootstrap cluster on one node, get a join token for all others
Distributed Architecture: Built on etcd for reliable cluster state management
Node Management: Join/leave cluster, real-time health monitoring with heartbeats
File Distribution: Upload, download, and manage files across all nodes (perfect for certificate distribution)
Web Dashboard: Professional dark-mode UI accessible from any node on port 8080
CLI Tools: Comprehensive command-line interface for all operations
Labels & Tagging: Organize nodes with custom labels (e.g., role=database, env=production)

Quick Start

1. Setup etcd (One-Time, on Management Node)

# Install etcd
ETCD_VER=v3.6.5
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/

# Start etcd
etcd --listen-client-urls http://0.0.0.0:2379 \
     --advertise-client-urls http://YOUR_IP:2379

2. Build and Install

# Build the agent
go build -o bin/cluster-agent ./cmd/agent

# Or use Make
make build

# Copy to all nodes
scp bin/cluster-agent root@node:/usr/local/bin/

3. Initialize Cluster on First Node

# On your first node (e.g., your cert management VM)
cluster-agent init --etcd http://etcd-server-ip:2379

This will output something like:

╔════════════════════════════════════════════════════════════════╗
║              Cluster Initialized Successfully!                 ║
╚════════════════════════════════════════════════════════════════╝

Node Information:
  ID:        abc123...
  Name:      cert-manager
  IP:        192.168.1.100
  Port:      8080
  Dashboard: http://192.168.1.100:8080

╔════════════════════════════════════════════════════════════════╗
║  Run this command on other nodes to join the cluster:         ║
╚════════════════════════════════════════════════════════════════╝

  cluster-agent daemon eyJldGNkX2VuZHBvaW50cyI6WyIxOTIuMTY4LjEuMToyMzc5Il0sImNsdXN0ZXJfaWQiOiJjbHVzdGVyLWFiYzEyMyJ9

4. Join Other Nodes

Copy the command from step 3 and run it on each of your other 99 nodes:

# On nodes 2-100
cluster-agent daemon eyJldGNkX2VuZHBvaW50cyI6WyIxOTIuMTY4LjEuMToyMzc5Il0sImNsdXN0ZXJfaWQiOiJjbHVzdGVyLWFiYzEyMyJ9

That's it! All nodes are now in the cluster with the dashboard running.

Usage

Node Management

List all nodes:

cluster-agent list --token <your-join-token>

Leave cluster:

cluster-agent leave --token <your-join-token>

File Management

Upload a file (e.g., SSL certificate):

cluster-agent file upload /path/to/cert.pem node1:8080

List files in cluster:

cluster-agent file list --token <your-join-token>

Download file from a node:

cluster-agent file download <file-id> node1:8080 /local/path/cert.pem

Example: Distribute Certificates

# On your certificate management VM
cluster-agent file upload /etc/ssl/certs/my-app.crt 192.168.1.100:8080

# The file is now tracked in the cluster
cluster-agent file list --token <token>

# On each target node, download the certificate
cluster-agent file download <file-id> 192.168.1.100:8080 /etc/ssl/certs/my-app.crt

Dashboard

Access the dashboard from any node at http://node-ip:8080

Features:

Real-time node status (updates every 5 seconds)
Online/offline nodes visualization
Node details (IP, labels, last seen)
Professional dark mode interface
No external dependencies

Systemd Service

Create /etc/systemd/system/cluster-agent.service:

[Unit]
Description=Cluster Management Agent
After=network.target

[Service]
Type=simple
User=root
Environment="CLUSTER_TOKEN=eyJldGNkX2VuZHBvaW50cyI6WyIxOTIuMTY4LjEuMToyMzc5Il0sImNsdXN0ZXJfaWQiOiJjbHVzdGVyLWFiYzEyMyJ9"
ExecStart=/usr/local/bin/cluster-agent daemon $CLUSTER_TOKEN --labels role=app,env=prod
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable cluster-agent
sudo systemctl start cluster-agent
sudo systemctl status cluster-agent

Architecture

┌─────────────────────────────────────────────────────────┐
│                     etcd Cluster                         │
│              (Distributed State Store)                   │
└─────────────────────────────────────────────────────────┘
                        ▲  ▲  ▲
                        │  │  │
        ┌───────────────┘  │  └──────────────┐
        │                  │                  │
        ▼                  ▼                  ▼
┌─────────────┐   ┌─────────────┐   ┌─────────────┐
│   Node 1    │   │   Node 2    │   │   Node N    │
│ (init node) │   │             │   │             │
│  Agent      │   │  Agent      │   │  Agent      │
│  + API      │   │  + API      │   │  + API      │
│  + Dashboard│   │  + Dashboard│   │  + Dashboard│
│  :8080      │   │  :8080      │   │  :8080      │
└─────────────┘   └─────────────┘   └─────────────┘

Components

etcd: Distributed key-value store (3 nodes recommended for HA)
Agent: Runs on each node, manages cluster membership and serves dashboard
Dashboard: Web UI for monitoring (embedded in agent)
File Manager: Handles file distribution across the cluster

Join Token

The join token is a base64-encoded JSON containing:

etcd endpoints
Cluster ID

Example decoded token:

{
  "etcd_endpoints": ["192.168.1.1:2379"],
  "cluster_id": "cluster-abc123"
}

API Reference

The agent exposes a REST API on port 8080:

Nodes

GET /api/nodes - List all nodes
GET /api/nodes/{id} - Get node details
GET /api/health - Health check

Files

GET /api/files - List all files
POST /api/files - Upload a file (multipart/form-data)
GET /api/files/{id} - Get file metadata
GET /api/files/{id}/download - Download file
DELETE /api/files/{id} - Delete file

Advanced Configuration

High Availability etcd

For production, run etcd in cluster mode (3 or 5 nodes):

# Node 1
etcd --name node1 \
  --initial-advertise-peer-urls http://10.0.0.1:2380 \
  --listen-peer-urls http://0.0.0.0:2380 \
  --listen-client-urls http://0.0.0.0:2379 \
  --advertise-client-urls http://10.0.0.1:2379 \
  --initial-cluster node1=http://10.0.0.1:2380,node2=http://10.0.0.2:2380,node3=http://10.0.0.3:2380

Then bootstrap with all etcd nodes:

cluster-agent init --etcd http://10.0.0.1:2379,http://10.0.0.2:2379,http://10.0.0.3:2379

Custom Node Labels

# Add labels during init
cluster-agent init --labels role=cert-manager,tier=management,env=prod

# Add labels when joining
cluster-agent daemon <token> --labels role=database,tier=data,env=prod

Custom Port

# Use different port (default is 8080)
cluster-agent init --port 9090
cluster-agent daemon <token> --port 9090

Troubleshooting

Agent won't start:

Check etcd is running: curl http://etcd-server:2379/health
Verify token is correct
Check firewall allows ports 8080 and 2379
Check logs: journalctl -u cluster-agent -f

Node shows offline:

Heartbeat might have failed (recovers automatically in 10s)
Check network connectivity to etcd
Verify node is still running: systemctl status cluster-agent

File transfer fails:

Ensure source node is online
Check target node can reach source node on port 8080
Verify file exists on source node

Invalid join token:

Ensure you copied the entire token
Token is case-sensitive
Re-run init to generate a new token if lost

Development

Project Structure

.
├── cmd/
│   ├── agent/          # Agent CLI
│   └── server/         # Standalone server (optional)
├── pkg/
│   ├── api/            # REST API handlers
│   ├── cluster/        # Cluster management logic
│   │   └── token.go    # Join token generation
│   ├── filemanager/    # File distribution
│   ├── node/           # Node models
│   └── store/          # etcd abstraction
├── web/
│   └── dashboard/      # Web UI (HTML/CSS/JS)
├── Makefile            # Build automation
└── README.md

Building

# Build agent
make agent

# Build server
make server

# Build both
make build

# Clean
make clean

# Run tests
make test

Deployment Script

For manual deployment to multiple nodes:

# Create a nodes list file
cat > nodes.txt <<EOF
192.168.1.10
192.168.1.11
192.168.1.12
EOF

# Get your join token from init
TOKEN="your-token-here"

# Deploy to all nodes
while read node; do
    echo "Deploying to $node..."
    scp bin/cluster-agent root@$node:/usr/local/bin/
    ssh root@$node "cluster-agent daemon $TOKEN &"
done < nodes.txt

License

Proprietary - CodeCreation Labs

Support

For issues or questions, contact your cluster administrator.

Pro Tip: Save your join token! You'll need it to add nodes, list nodes, and for administrative tasks. Store it in a password manager or environment variable.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cmd/cluster		cmd/cluster
pkg		pkg
web/dashboard		web/dashboard
.gitignore		.gitignore
.gitignore.md		.gitignore.md
CONFIG.md		CONFIG.md
LABELS.md		LABELS.md
Makefile		Makefile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum

codecreationlabs/cluster-cli

Folders and files

Latest commit

History

Repository files navigation