A comprehensive streaming test and power monitoring stack for analyzing energy consumption during video transcoding. Features high-performance Go exporters, VictoriaMetrics for production-grade telemetry, and distributed compute capabilities for scaling workloads across multiple nodes.

Production deployment uses master-agent architecture (no Docker required). Docker Compose available for local development only.
This project is organized into three main directories for clarity:
master/- Master node components (orchestration, monitoring, visualization)worker/- Worker node components (transcoding, hardware metrics)shared/- Shared libraries, scripts, and documentation
See ARCHITECTURE.md for system architecture and design.
For local testing, use the automated script to run both master and agent on your machine:
# One-command setup: builds, runs, and verifies everything
./scripts/run_local_stack.shThis will compile all binaries, start master+agent, and display helpful commands. See docs/LOCAL_STACK_GUIDE.md for details.
The recommended way to deploy for production workloads is Distributed Compute Mode with master and agent nodes.
- Go 1.21+ (for building binaries)
- Python 3.10+ (for agent analysis scripts)
- FFmpeg (for transcoding)
- Linux with kernel 4.15+ (for RAPL power monitoring)
# Clone, build and run the required parts of the stack
git clone https://github.com/psantana5/ffmpeg-rtmp.git
cd ffmpeg-rtmp
docker compose up -d nginx-rtmp
make build-master
# Set API key (required for production)
export MASTER_API_KEY=$(openssl rand -base64 32)
# Start master service with production defaults
# TLS enabled (auto-generates cert)
# SQLite persistence (master.db)
# Job retry (3 attempts)
# Prometheus metrics (:9090)
./bin/master --port 8080 &
# Start monitoring stack (VictoriaMetrics + Grafana)
make vm-up-build# On compute node(s)
git clone https://github.com/psantana5/ffmpeg-rtmp.git
cd ffmpeg-rtmp
make build-agent
# Set same API key as master
export MASTER_API_KEY="<same-key-as-master>"
# Generate a test video file
ffmpeg -y -f lavfi -i testsrc2=size=3840x2160:rate=60 -t 30 -c:v libx264 -preset veryfast -crf 18 /tmp/test_input.mp4
# Register and start agent (uses HTTPS with TLS)
./bin/agent --register --master https://MASTER_IP:8080 --api-key "$MASTER_API_KEY"# Submit job to master (requires API key)
curl -X POST https://MASTER_IP:8080/jobs \
-H "Authorization: Bearer $MASTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"scenario": "1080p-test",
"confidence": "auto",
"parameters": {"duration": 300, "bitrate": "5000k"}
}'
# Agent automatically picks up and executes job
# Failed jobs auto-retry up to 3 times- Grafana: http://MASTER_IP:3000 (admin/admin)
- VictoriaMetrics: http://MASTER_IP:8428
- Master API: https://MASTER_IP:8080/nodes (view registered nodes)
- Prometheus Metrics: http://MASTER_IP:9090/metrics
See deployment/README.md for systemd service templates and production setup.
For development and local testing only, you can use Docker Compose to run all components on a single machine.
Important: Docker Compose mode is NOT recommended for production. Use Distributed Mode above for production workloads.
- Docker 20.10+ and Docker Compose 2.0+
- Python 3.10+
- FFmpeg
# Clone repository
git clone https://github.com/psantana5/ffmpeg-rtmp.git
cd ffmpeg-rtmp
# Start all services
make up-build# Build the CLI tool first
go build -o bin/ffrtmp ./cmd/ffrtmp
# Run a simple transcoding job
./bin/ffrtmp jobs submit --scenario "test1" --bitrate 2000k --duration 60
# View dashboards at http://localhost:3000See shared/docs/DEPLOYMENT_MODES.md for detailed comparison and setup instructions.
For running exporters without Docker, see:
- Exporters Quick Reference - Quick commands and setup
- Master Exporters Guide - Detailed Python exporter deployment
- Worker Exporters Guide - Detailed Go exporter deployment
Distributed mode now production-ready with enterprise features:
- TLS/HTTPS - Enabled by default with auto-generated certificates
- API Authentication - Required via
MASTER_API_KEYenvironment variable - SQLite Persistence - Default storage, survives restarts
- Automatic Job Retry - Failed jobs retry up to 3 times
- Prometheus Metrics - Built-in metrics endpoint on port 9090
- Structured Logging - Production-grade logging support
See shared/docs/PRODUCTION_FEATURES.md for complete feature guide.
Production-ready reliability features for mission-critical workloads:
- Node Failure Detection - Identifies dead nodes based on heartbeat timeout (2min default)
- Automatic Job Reassignment - Jobs from failed nodes automatically reassigned to healthy workers
- Transient Failure Retry - Smart retry for connection errors, timeouts, network issues
- Configurable Max Retries - Default 3 attempts with exponential backoff
- Stale Job Detection - Batch jobs timeout after 30min, live jobs after 5min inactivity
- Multi-Level Priorities - Live > High > Medium > Low > Batch
- Queue-Based Scheduling -
live,default,batchqueues with different SLAs - FIFO Within Priority - Fair scheduling for same-priority jobs
- Smart Job Selection - Automatic priority-based job assignment
- Distributed Tracing - OpenTelemetry integration for end-to-end visibility
- Prometheus Metrics - Comprehensive metrics for jobs, nodes, and system health
- Structured Logging - Production-grade logging with context
- Rate Limiting - Built-in per-IP rate limiting (100 req/s default)
- TLS/mTLS - Mutual TLS authentication between master and workers
- API Key Authentication - Required for all API operations
- Certificate Management - Auto-generation and rotation support
# Submit high-priority live stream job
./bin/ffrtmp jobs submit \
--scenario live-4k \
--queue live \
--priority high \
--duration 3600
# Configure fault tolerance
./bin/master \
--max-retries 5 \
--scheduler-interval 10s \
--heartbeat-interval 30sSee docs/PRODUCTION.md for complete production deployment guide.
Choose the best transcoding engine for your workload:
- FFmpeg (default) - Versatile, mature, excellent for file transcoding
- GStreamer - Optimized for low-latency live streaming
- Intelligent Auto-Selection - System picks the best engine automatically
- Hardware Acceleration - NVIDIA NVENC, Intel QSV/VAAPI support for both engines
# Auto-select best engine (default)
ffrtmp jobs submit --scenario live-stream --engine auto
# Force specific engine
ffrtmp jobs submit --scenario transcode --engine ffmpeg
ffrtmp jobs submit --scenario live-rtmp --engine gstreamerAuto-selection logic:
- LIVE queue β GStreamer (low latency)
- FILE/batch β FFmpeg (better for offline)
- RTMP streaming β GStreamer
- GPU+NVENC+streaming β GStreamer
See docs/DUAL_ENGINE_SUPPORT.md for complete documentation.
This project helps you:
- Run FFmpeg streaming tests with various configurations (bitrate, resolution, codec)
- Monitor power consumption in real-time using Intel RAPL
- Collect system metrics (CPU, memory, network, Docker overhead)
- Analyze energy efficiency and get recommendations for optimal transcoding settings
- Visualize results in Grafana dashboards
- Set up alerts for power thresholds
- Scale workloads across multiple compute nodes (NEW in v2.1)
The system supports two deployment modes:
Master-agent architecture for scaling across multiple nodes:
- Master Node: Job orchestration, metrics aggregation, dashboards
- Master Service (Go HTTP API)
- VictoriaMetrics (TSDB with 30-day retention)
- Grafana (visualization)
- Compute Agents: Execute transcoding workloads
- Hardware auto-detection
- Job polling and execution
- Local metrics collection
- Results reporting
Docker Compose stack on single machine:
- Nginx RTMP: Streaming server
- VictoriaMetrics: Time-series database
- Grafana: Dashboards
- Go Exporters: CPU (RAPL), GPU (NVML), FFmpeg stats
- Python Exporters: QoE metrics, cost analysis, results tracking
- Alertmanager: Alert routing
Local Testing mode is for development only. Use Distributed Compute mode for production.
See shared/docs/DEPLOYMENT_MODES.md for detailed comparison and architecture diagrams.
Documentation organized by topic:
- Dual Engine Support - β‘ NEW: FFmpeg + GStreamer engine selection guide
- Production Features - Production-ready features guide (TLS, auth, retry, metrics)
- Deployment Modes - Production vs development deployment guide
- Internal Architecture - Complete runtime model and operations reference
- Distributed Architecture - Distributed compute details
- Production Deployment - Systemd service templates and setup
- Getting Started Guide - Initial setup walkthrough
- Running Tests - Test scenarios and batch execution
- Go Exporters Quick Start - One-command Go exporter deployment
- Troubleshooting - Common issues and solutions
- Architecture Overview - System design and data flow
- Exporters Quick Reference - π Quick commands for deploying exporters without Docker
- Exporters Overview - Master exporters (results, qoe, cost)
- Master Exporters Manual Deployment - Running master exporters without Docker
- Worker Exporters - Worker exporters (CPU, GPU, FFmpeg)
- Worker Exporters Manual Deployment - Running worker exporters without Docker
- Energy Advisor - ML models and efficiency scoring
- Documentation Index - All technical documentation
# Build binaries
make build-master # Build master node binary
make build-agent # Build compute agent binary
make build-distributed # Build both
# Run services
./bin/master --port 8080 # Start master
./bin/agent --register --master http://MASTER_IP:8080 # Start agent
# Production with systemd
sudo systemctl start ffmpeg-master # Start master service
sudo systemctl start ffmpeg-agent # Start agent service
sudo systemctl status ffmpeg-master # Check status
# Monitor
curl http://localhost:8080/nodes # List registered agents
curl http://localhost:8080/jobs # List jobs
journalctl -u ffmpeg-master -f # View master logs
journalctl -u ffmpeg-agent -f # View agent logs# Stack management
make up-build # Start Docker Compose stack
make down # Stop stack
make ps # Show container status
make logs SERVICE=victoriametrics # View specific service logs
# Testing (local mode)
make test-single # Run single stream test
make test-batch # Run batch test matrix
make run-benchmarks # Run automated benchmark suite
make analyze # Analyze latest results
# Development
make lint # Run code linting
make format # Format code
make test # Run test suiteRun long-duration benchmarks across multiple compute nodes:
# Submit multiple jobs to master
curl -X POST http://master:8080/jobs -H "Content-Type: application/json" -d '{
"scenario": "4K-h265", "confidence": "auto",
"parameters": {"duration": 3600, "bitrate": "15000k"}
}'
# Agents automatically pick up and execute jobs in parallel
# View results in Grafana at http://master:3000Use local testing mode to iterate quickly:
# Start local stack
make up-build
# Submit multiple test jobs with different configurations
ffrtmp jobs submit --scenario "4K60-h264" --bitrate 10M --duration 120
ffrtmp jobs submit --scenario "1080p60-h265" --bitrate 5M --duration 60
ffrtmp jobs submit --scenario "720p30-h264" --bitrate 2M --duration 60
# Analyze results and get recommendations
python3 scripts/analyze_results.pyThe analyzer ranks configurations by energy efficiency and recommends optimal settings.
Submit jobs to test different codecs:
# H.264 tests
ffrtmp jobs submit --scenario "4K60-h264" --bitrate 10M --duration 120
ffrtmp jobs submit --scenario "1080p60-h264" --bitrate 5M --duration 60
# H.265 tests
ffrtmp jobs submit --scenario "4K60-h265" --bitrate 10M --duration 120
ffrtmp jobs submit --scenario "1080p60-h265" --bitrate 5M --duration 60
# Compare results in Grafana dashboardsDeploy distributed mode with agents on your build servers:
# CI/CD pipeline submits jobs to master after each release
curl -X POST http://master:8080/jobs -d @benchmark_config.json
# Results automatically aggregated and visualized
# Alerts fire if performance regressions detectedContributions are welcome! See the detailed documentation for development guidelines.
See LICENSE file for details.
The project includes comprehensive test coverage for critical components:
# Run all tests with race detector
cd shared/pkg
go test -v -race ./...
# Run tests with coverage report
go test -v -coverprofile=coverage.out ./models ./scheduler ./store
go tool cover -html=coverage.outTest Coverage:
- models: 85% (FSM state machine fully tested)
- scheduler: 53% (priority queues, recovery logic)
- store: Comprehensive database operations tests
- agent: Engine selection, optimizers, encoders
CI/CD:
- Automated testing on every push
- Race condition detection
- Multi-architecture builds (amd64, arm64)
- Binary artifacts for master, worker, and CLI
See CONTRIBUTING.md for testing guidelines.
Core documentation has been streamlined for clarity:
- QUICKSTART.md - Get started in 5 minutes
- docs/ARCHITECTURE.md - System design and architecture
- docs/API.md - Complete API reference
- DEPLOYMENT.md - Production deployment guide
- CONTRIBUTING.md - Contribution guidelines
- docs/SECURITY.md - Security best practices
- docs/LOCAL_STACK_GUIDE.md - Local development setup
- CHANGELOG.md - Version history
Additional technical documentation is available in docs/archive/ for reference.