Cloud-native control plane for distributed network device management
Lightweight gRPC-based control plane for managing distributed network devices (routers, access points, IoT gateways). Provides device registration, health monitoring, configuration management, and real-time telemetry collection.
graph TB
subgraph "Control Plane"
A[gRPC API Server]
B[Device Registry]
C[Health Check Manager]
D[Configuration Manager]
E[Telemetry Collector]
F[REST Gateway]
end
subgraph "Devices"
G[Router #1]
H[Access Point #1]
I[Gateway #1]
J[Switch #1]
K[IoT Device #1]
end
A --> B
A --> C
A --> D
A --> E
B --> F
D --> F
E --> F
G --> A
H --> A
I --> A
J --> A
K --> A
- gRPC-based device communication
- Real-time health monitoring
- Device registry with metadata
- Configuration distribution and management
- Telemetry collection and aggregation
- REST API for queries and monitoring
- Prometheus metrics
- Docker-based deployment
- Device command & control
- Health status tracking (online/offline/degraded)
- Fleet statistics and monitoring
- Go 1.21+
- Docker & Docker Compose
- protoc (Protocol Buffers compiler)
# Clone the repository
git clone <repository-url>
cd network-control-plane
# Install dependencies
make dev-setup
# Build the project
make build
# Or run with Docker Compose
make demo# Terminal 1: Start the control plane server
make run-server
# Terminal 2: Start a simulated device
make run-device# Build and start the full stack
make demo
# View logs
make docker-logs
# Query devices via REST API
make queryThe control plane exposes a gRPC API for device communication:
Register(RegisterRequest) returns (RegisterResponse): Device registration with metadataHeartbeat(HeartbeatRequest) returns (HeartbeatResponse): Periodic health status updatesStreamTelemetry(stream TelemetryData) returns (TelemetryResponse): Real-time metrics and eventsGetConfiguration(ConfigRequest) returns (stream Configuration): Configuration distributionExecuteCommand(Command) returns (CommandResponse): Remote command executionDeregister(DeregisterRequest) returns (DeregisterResponse): Device cleanup
The control plane provides a REST API for monitoring and management:
GET /api/v1/devices- List all devices with status and metricsGET /api/v1/devices/{id}- Get specific device details and historyGET /api/v1/stats- Fleet statistics by status and typeGET /metrics- Prometheus metrics endpointGET /health- Control plane health check
- gRPC server for device communication
- Device registration and authentication
- Configuration distribution
- Telemetry collection
- Command/control interface
- In-memory store for device state
- Device metadata management
- Query interface by status, type, etc.
- Health history tracking
- Periodic health checks (configurable interval)
- Device status tracking (online/offline/degraded/unknown)
- Failure detection and alerting
- Heartbeat monitoring with timeout detection
- Configuration distribution to devices
- Version tracking for configurations
- Bulk configuration updates by device type/tag
- Configuration rollback capability
- Receive and process telemetry streams from devices
- Time-series buffer for recent telemetry data
- Aggregation and filtering capabilities
- Integration with Prometheus metrics
- HTTP REST interface for queries
- Prometheus metrics endpoint
- Admin operations
- Demo/visualization support
The server accepts the following command-line flags:
./server --port 50051 --rest-port 8080--port: gRPC server port (default: 50051)--rest-port: REST API server port (default: 8080)
The control plane exposes Prometheus metrics at /metrics. Key metrics include:
control_plane_devices_total- Number of registered devices by status and typecontrol_plane_heartbeats_total- Total heartbeats received by device typecontrol_plane_registrations_total- Total device registrationscontrol_plane_grpc_duration_seconds- gRPC request duration histogram
- IoT Device Fleet Management - Manage thousands of edge devices
- Network Infrastructure - Monitor router/AP mesh networks
- Edge Computing - Coordinate edge computing nodes
- Firmware Updates - Orchestrate device updates across fleet
- Health Monitoring - Real-time device status and performance tracking
- Supports 1000+ concurrent device connections
- Sub-50ms health check processing
- 10K events/sec telemetry ingestion
- Sub-100ms API query response (p95)
- Devices register with control plane on startup (FR-1.1)
- Provide device metadata (type, MAC, IP, firmware version, capabilities) (FR-1.2)
- Control plane assigns unique device ID (FR-1.3)
- Support device re-registration (handle restarts) (FR-1.4)
- Deregister devices gracefully on shutdown (FR-1.5)
- Periodic health checks (configurable interval, default 30s) (FR-2.1)
- Track device status: ONLINE, OFFLINE, DEGRADED, UNKNOWN (FR-2.2)
- Detect devices that haven't responded in 90s as OFFLINE (FR-2.3)
- Collect device metrics (CPU, memory, uptime, bandwidth) (FR-2.4)
- Maintain health history (last 100 health checks) (FR-2.5)
- Push configuration updates to devices (FR-3.1)
- Devices acknowledge configuration receipt (FR-3.2)
- Support configuration rollback (FR-3.3)
- Version tracking for configurations (FR-3.4)
- Bulk configuration updates by device type/tag (FR-3.5)
- Devices stream telemetry data (metrics, logs, events) (FR-4.1)
- Aggregate telemetry at control plane (FR-4.2)
- Expose metrics via Prometheus endpoint (FR-4.3)
- Store recent telemetry (time-series buffer) (FR-4.4)
- REST API to list all devices (FR-5.1)
- Query devices by status, type, location, tags (FR-5.2)
- Get individual device details (FR-5.3)
- Get aggregated fleet statistics (FR-5.4)
- Search devices by MAC/IP/hostname (FR-5.5)
- Send commands to devices (reboot, update firmware, run diagnostics) (FR-6.1)
- Track command execution status (FR-6.2)
- Command timeout and retry logic (FR-6.3)
- Command audit logging (FR-6.4)
network-control-plane/
├── api/
│ └── proto/ # Protocol buffer definitions
├── cmd/
│ ├── server/ # Control plane server entry point
│ ├── device/ # Device simulator
│ └── cli/ # CLI tool for admin operations
├── pkg/
│ ├── server/ # Server implementation
│ ├── registry/ # Device registry
│ ├── health/ # Health monitoring
│ ├── config/ # Configuration management
│ ├── telemetry/ # Telemetry collection
│ └── device/ # Device-related utilities
├── internal/
│ ├── auth/ # Authentication logic
│ ├── metrics/ # Prometheus metrics
│ └── logger/ # Logging setup
├── test/
│ ├── integration/ # End-to-end tests
│ └── load/ # Load testing
├── deploy/
│ ├── kubernetes/ # Kubernetes deployment configs
│ └── prometheus/ # Prometheus configuration
├── examples/ # Example configurations
├── scripts/ # Utility scripts
└── Makefile # Build and deployment commands
# Generate protobuf code
make proto
# Build binaries
make build
# Run tests
make test
# Run integration tests
make test-integrationThe project includes:
- Unit tests for core components
- Integration tests for component interactions
- Load tests for performance validation
Run all tests:
make testThe demo includes:
- 1 Control Plane server
- 4 simulated devices (2 routers, 1 access point, 1 gateway)
- Prometheus monitoring
- All connected via a virtual network
Start the demo:
make demoUse docker-compose.yml for local development and testing with multiple devices.
Kubernetes deployment files are available in deploy/kubernetes/ for production deployment.
- Persistent storage (etcd/PostgreSQL)
- Firmware update orchestration
- Authentication and authorization (mTLS)
- Device command audit trails
- Advanced configuration templating
- Real-time alerting system
MIT