FFmpeg RTMP Power Monitoring

A comprehensive streaming test and power monitoring stack for analyzing energy consumption during video transcoding. Features high-performance Go exporters, VictoriaMetrics for production-grade telemetry, and distributed compute capabilities for scaling workloads across multiple nodes.

Production deployment uses master-agent architecture (no Docker required). Docker Compose available for local development only.

Project Organization

This project is organized into three main directories for clarity:

master/ - Master node components (orchestration, monitoring, visualization)
worker/ - Worker node components (transcoding, hardware metrics)
shared/ - Shared libraries, scripts, and documentation

See ARCHITECTURE.md for system architecture and design.

Quick Start (Local Development)

For local testing, use the automated script to run both master and agent on your machine:

# One-command setup: builds, runs, and verifies everything
./scripts/run_local_stack.sh

This will compile all binaries, start master+agent, and display helpful commands. See docs/LOCAL_STACK_GUIDE.md for details.

Quick Start (Production - Distributed Mode)

The recommended way to deploy for production workloads is Distributed Compute Mode with master and agent nodes.

Prerequisites

Go 1.21+ (for building binaries)
Python 3.10+ (for agent analysis scripts)
FFmpeg (for transcoding)
Linux with kernel 4.15+ (for RAPL power monitoring)

Deploy Master Node

# Clone, build and run the required parts of the stack
git clone https://github.com/psantana5/ffmpeg-rtmp.git
cd ffmpeg-rtmp
docker compose up -d nginx-rtmp
make build-master

# Set API key (required for production)
export MASTER_API_KEY=$(openssl rand -base64 32)

# Start master service with production defaults
# TLS enabled (auto-generates cert)
# SQLite persistence (master.db)
# Job retry (3 attempts)
# Prometheus metrics (:9090)
./bin/master --port 8080 &

# Start monitoring stack (VictoriaMetrics + Grafana)
make vm-up-build

Deploy Compute Agent(s)

# On compute node(s)
git clone https://github.com/psantana5/ffmpeg-rtmp.git
cd ffmpeg-rtmp
make build-agent

# Set same API key as master
export MASTER_API_KEY="<same-key-as-master>"

# Generate a test video file
ffmpeg -y -f lavfi -i testsrc2=size=3840x2160:rate=60 -t 30 -c:v libx264 -preset veryfast -crf 18 /tmp/test_input.mp4

# Register and start agent (uses HTTPS with TLS)
./bin/agent --register --master https://MASTER_IP:8080 --api-key "$MASTER_API_KEY"

Submit and Run Job

# Submit job to master (requires API key)
curl -X POST https://MASTER_IP:8080/jobs \
  -H "Authorization: Bearer $MASTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "scenario": "1080p-test",
    "confidence": "auto",
    "parameters": {"duration": 300, "bitrate": "5000k"}
  }'

# Agent automatically picks up and executes job
# Failed jobs auto-retry up to 3 times

Access Dashboards

Grafana: http://MASTER_IP:3000 (admin/admin)
VictoriaMetrics: http://MASTER_IP:8428
Master API: https://MASTER_IP:8080/nodes (view registered nodes)
Prometheus Metrics: http://MASTER_IP:9090/metrics

Production Deployment with Systemd

See deployment/README.md for systemd service templates and production setup.

🔬 Quick Start (Development - Local Testing Mode)

For development and local testing only, you can use Docker Compose to run all components on a single machine.

Important: Docker Compose mode is NOT recommended for production. Use Distributed Mode above for production workloads.

Prerequisites

Docker 20.10+ and Docker Compose 2.0+
Python 3.10+
FFmpeg

Start Local Stack

# Clone repository
git clone https://github.com/psantana5/ffmpeg-rtmp.git
cd ffmpeg-rtmp

# Start all services
make up-build

Run Local Test

# Build the CLI tool first
go build -o bin/ffrtmp ./cmd/ffrtmp

# Run a simple transcoding job
./bin/ffrtmp jobs submit --scenario "test1" --bitrate 2000k --duration 60

# View dashboards at http://localhost:3000

See shared/docs/DEPLOYMENT_MODES.md for detailed comparison and setup instructions.

For running exporters without Docker, see:

Exporters Quick Reference - Quick commands and setup
Master Exporters Guide - Detailed Python exporter deployment
Worker Exporters Guide - Detailed Go exporter deployment

What's New: Production-Ready v2.2

Distributed mode now production-ready with enterprise features:

TLS/HTTPS - Enabled by default with auto-generated certificates
API Authentication - Required via MASTER_API_KEY environment variable
SQLite Persistence - Default storage, survives restarts
Automatic Job Retry - Failed jobs retry up to 3 times
Prometheus Metrics - Built-in metrics endpoint on port 9090
Structured Logging - Production-grade logging support

See shared/docs/PRODUCTION_FEATURES.md for complete feature guide.

NEW: Enterprise-Grade Fault Tolerance

Production-ready reliability features for mission-critical workloads:

Automatic Job Recovery

Node Failure Detection - Identifies dead nodes based on heartbeat timeout (2min default)
Automatic Job Reassignment - Jobs from failed nodes automatically reassigned to healthy workers
Transient Failure Retry - Smart retry for connection errors, timeouts, network issues
Configurable Max Retries - Default 3 attempts with exponential backoff
Stale Job Detection - Batch jobs timeout after 30min, live jobs after 5min inactivity

Priority Queue Management

Multi-Level Priorities - Live > High > Medium > Low > Batch
Queue-Based Scheduling - live, default, batch queues with different SLAs
FIFO Within Priority - Fair scheduling for same-priority jobs
Smart Job Selection - Automatic priority-based job assignment

Observability

Distributed Tracing - OpenTelemetry integration for end-to-end visibility
Prometheus Metrics - Comprehensive metrics for jobs, nodes, and system health
Structured Logging - Production-grade logging with context
Rate Limiting - Built-in per-IP rate limiting (100 req/s default)

Security

TLS/mTLS - Mutual TLS authentication between master and workers
API Key Authentication - Required for all API operations
Certificate Management - Auto-generation and rotation support

# Submit high-priority live stream job
./bin/ffrtmp jobs submit \
    --scenario live-4k \
    --queue live \
    --priority high \
    --duration 3600

# Configure fault tolerance
./bin/master \
    --max-retries 5 \
    --scheduler-interval 10s \
    --heartbeat-interval 30s

See docs/PRODUCTION.md for complete production deployment guide.

Dual Transcoding Engine Support

Choose the best transcoding engine for your workload:

FFmpeg (default) - Versatile, mature, excellent for file transcoding
GStreamer - Optimized for low-latency live streaming
Intelligent Auto-Selection - System picks the best engine automatically
Hardware Acceleration - NVIDIA NVENC, Intel QSV/VAAPI support for both engines

# Auto-select best engine (default)
ffrtmp jobs submit --scenario live-stream --engine auto

# Force specific engine
ffrtmp jobs submit --scenario transcode --engine ffmpeg
ffrtmp jobs submit --scenario live-rtmp --engine gstreamer

Auto-selection logic:

LIVE queue → GStreamer (low latency)
FILE/batch → FFmpeg (better for offline)
RTMP streaming → GStreamer
GPU+NVENC+streaming → GStreamer

See docs/DUAL_ENGINE_SUPPORT.md for complete documentation.

What This Project Does

This project helps you:

Run FFmpeg streaming tests with various configurations (bitrate, resolution, codec)
Monitor power consumption in real-time using Intel RAPL
Collect system metrics (CPU, memory, network, Docker overhead)
Analyze energy efficiency and get recommendations for optimal transcoding settings
Visualize results in Grafana dashboards
Set up alerts for power thresholds
Scale workloads across multiple compute nodes (NEW in v2.1)

Architecture

The system supports two deployment modes:

1. Distributed Compute Mode (Production)

Master-agent architecture for scaling across multiple nodes:

Master Node: Job orchestration, metrics aggregation, dashboards
- Master Service (Go HTTP API)
- VictoriaMetrics (TSDB with 30-day retention)
- Grafana (visualization)
Compute Agents: Execute transcoding workloads
- Hardware auto-detection
- Job polling and execution
- Local metrics collection
- Results reporting

2. Local Testing Mode (Development Only)

Docker Compose stack on single machine:

Nginx RTMP: Streaming server
VictoriaMetrics: Time-series database
Grafana: Dashboards
Go Exporters: CPU (RAPL), GPU (NVML), FFmpeg stats
Python Exporters: QoE metrics, cost analysis, results tracking
Alertmanager: Alert routing

Local Testing mode is for development only. Use Distributed Compute mode for production.

See shared/docs/DEPLOYMENT_MODES.md for detailed comparison and architecture diagrams.

Documentation

Documentation organized by topic:

Deployment & Operations

Dual Engine Support - ⚡ NEW: FFmpeg + GStreamer engine selection guide
Production Features - Production-ready features guide (TLS, auth, retry, metrics)
Deployment Modes - Production vs development deployment guide
Internal Architecture - Complete runtime model and operations reference
Distributed Architecture - Distributed compute details
Production Deployment - Systemd service templates and setup
Getting Started Guide - Initial setup walkthrough

Development & Testing

Running Tests - Test scenarios and batch execution
Go Exporters Quick Start - One-command Go exporter deployment
Troubleshooting - Common issues and solutions

Technical Reference

Architecture Overview - System design and data flow
Exporters Quick Reference - 🚀 Quick commands for deploying exporters without Docker
Exporters Overview - Master exporters (results, qoe, cost)
Master Exporters Manual Deployment - Running master exporters without Docker
Worker Exporters - Worker exporters (CPU, GPU, FFmpeg)
Worker Exporters Manual Deployment - Running worker exporters without Docker
Energy Advisor - ML models and efficiency scoring
Documentation Index - All technical documentation

Common Commands

Distributed Mode (Production)

# Build binaries
make build-master          # Build master node binary
make build-agent           # Build compute agent binary
make build-distributed     # Build both

# Run services
./bin/master --port 8080                        # Start master
./bin/agent --register --master http://MASTER_IP:8080  # Start agent

# Production with systemd
sudo systemctl start ffmpeg-master    # Start master service
sudo systemctl start ffmpeg-agent     # Start agent service
sudo systemctl status ffmpeg-master   # Check status

# Monitor
curl http://localhost:8080/nodes      # List registered agents
curl http://localhost:8080/jobs       # List jobs
journalctl -u ffmpeg-master -f        # View master logs
journalctl -u ffmpeg-agent -f         # View agent logs

Local Testing Mode (Development)

# Stack management
make up-build              # Start Docker Compose stack
make down                  # Stop stack
make ps                    # Show container status
make logs SERVICE=victoriametrics  # View specific service logs

# Testing (local mode)
make test-single           # Run single stream test
make test-batch            # Run batch test matrix
make run-benchmarks        # Run automated benchmark suite
make analyze               # Analyze latest results

# Development
make lint                  # Run code linting
make format                # Format code
make test                  # Run test suite

Example Use Cases

Production: Distributed Transcoding Benchmarks

Run long-duration benchmarks across multiple compute nodes:

# Submit multiple jobs to master
curl -X POST http://master:8080/jobs -H "Content-Type: application/json" -d '{
  "scenario": "4K-h265", "confidence": "auto",
  "parameters": {"duration": 3600, "bitrate": "15000k"}
}'

# Agents automatically pick up and execute jobs in parallel
# View results in Grafana at http://master:3000

Development: Find Energy-Efficient Encoding Settings

Use local testing mode to iterate quickly:

# Start local stack
make up-build

# Submit multiple test jobs with different configurations
ffrtmp jobs submit --scenario "4K60-h264" --bitrate 10M --duration 120
ffrtmp jobs submit --scenario "1080p60-h265" --bitrate 5M --duration 60
ffrtmp jobs submit --scenario "720p30-h264" --bitrate 2M --duration 60

# Analyze results and get recommendations
python3 scripts/analyze_results.py

The analyzer ranks configurations by energy efficiency and recommends optimal settings.

Development: Compare H.264 vs H.265 Power Consumption

Submit jobs to test different codecs:

# H.264 tests
ffrtmp jobs submit --scenario "4K60-h264" --bitrate 10M --duration 120
ffrtmp jobs submit --scenario "1080p60-h264" --bitrate 5M --duration 60

# H.265 tests
ffrtmp jobs submit --scenario "4K60-h265" --bitrate 10M --duration 120
ffrtmp jobs submit --scenario "1080p60-h265" --bitrate 5M --duration 60

# Compare results in Grafana dashboards

Production: Continuous CI/CD Benchmarking

Deploy distributed mode with agents on your build servers:

# CI/CD pipeline submits jobs to master after each release
curl -X POST http://master:8080/jobs -d @benchmark_config.json

# Results automatically aggregated and visualized
# Alerts fire if performance regressions detected

Contributing

Contributions are welcome! See the detailed documentation for development guidelines.

License

See LICENSE file for details.

Quick Links

🧪 Testing

The project includes comprehensive test coverage for critical components:

# Run all tests with race detector
cd shared/pkg
go test -v -race ./...

# Run tests with coverage report
go test -v -coverprofile=coverage.out ./models ./scheduler ./store
go tool cover -html=coverage.out

Test Coverage:

models: 85% (FSM state machine fully tested)
scheduler: 53% (priority queues, recovery logic)
store: Comprehensive database operations tests
agent: Engine selection, optimizers, encoders

CI/CD:

Automated testing on every push
Race condition detection
Multi-architecture builds (amd64, arm64)
Binary artifacts for master, worker, and CLI

See CONTRIBUTING.md for testing guidelines.

📚 Documentation

Core documentation has been streamlined for clarity:

QUICKSTART.md - Get started in 5 minutes
docs/ARCHITECTURE.md - System design and architecture
docs/API.md - Complete API reference
DEPLOYMENT.md - Production deployment guide
CONTRIBUTING.md - Contribution guidelines
docs/SECURITY.md - Security best practices
docs/LOCAL_STACK_GUIDE.md - Local development setup
CHANGELOG.md - Version history

Additional technical documentation is available in docs/archive/ for reference.

Name		Name	Last commit message	Last commit date
Latest commit History 554 Commits
.github		.github
certs		certs
cmd		cmd
deployment		deployment
docs		docs
examples		examples
master		master
ml_rust		ml_rust
pids		pids
pkg		pkg
results		results
scripts		scripts
shared		shared
test_results		test_results
tests		tests
worker		worker
.coverage		.coverage
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CI_FIX_SUMMARY.md		CI_FIX_SUMMARY.md
CI_STATUS_VERIFIED.md		CI_STATUS_VERIFIED.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
DEPLOYMENT_STATUS.md		DEPLOYMENT_STATUS.md
Dockerfile.master		Dockerfile.master
ENTERPRISE_GAP_SUMMARY.md		ENTERPRISE_GAP_SUMMARY.md
ENTERPRISE_READINESS.md		ENTERPRISE_READINESS.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
Makefile		Makefile
PHASE1_COMPLETE.md		PHASE1_COMPLETE.md
PHASE2_MULTITENANCY_COMPLETE.md		PHASE2_MULTITENANCY_COMPLETE.md
PHASE2_STATUS.md		PHASE2_STATUS.md
POSTGRES_DEPLOYMENT_FIXED.md		POSTGRES_DEPLOYMENT_FIXED.md
POSTGRES_MIGRATION.md		POSTGRES_MIGRATION.md
POSTGRES_VERIFIED.md		POSTGRES_VERIFIED.md
QUICKSTART.md		QUICKSTART.md
QUICKSTART_POSTGRES.md		QUICKSTART_POSTGRES.md
README.md		README.md
TEST_RESULTS.md		TEST_RESULTS.md
batch_stress_matrix.json		batch_stress_matrix.json
config-postgres.yaml		config-postgres.yaml
config.yaml		config.yaml
deploy_production.sh		deploy_production.sh
docker-compose.postgres.yml		docker-compose.postgres.yml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
nginx.conf		nginx.conf
production_benchmarks.json		production_benchmarks.json
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-plots.txt		requirements-plots.txt
requirements.txt		requirements.txt
start_clean.sh		start_clean.sh
start_postgres.sh		start_postgres.sh
test_scheduler_comprehensive.sh		test_scheduler_comprehensive.sh
test_scheduler_matrix.sh		test_scheduler_matrix.sh
verify_enhancement.sh		verify_enhancement.sh

License

psantana5/ffmpeg-rtmp

Folders and files

Latest commit

History

Repository files navigation

FFmpeg RTMP Power Monitoring

Project Organization

Quick Start (Local Development)

Quick Start (Production - Distributed Mode)

Prerequisites

Deploy Master Node

Deploy Compute Agent(s)

Submit and Run Job

Access Dashboards

Production Deployment with Systemd

🔬 Quick Start (Development - Local Testing Mode)

Prerequisites

Start Local Stack

Run Local Test

What's New: Production-Ready v2.2

NEW: Enterprise-Grade Fault Tolerance

Automatic Job Recovery

Priority Queue Management

Observability

Security

Dual Transcoding Engine Support

What This Project Does

Architecture

1. Distributed Compute Mode (Production)

2. Local Testing Mode (Development Only)

Documentation

Deployment & Operations

Development & Testing

Technical Reference

Common Commands

Distributed Mode (Production)

Local Testing Mode (Development)

Example Use Cases

Production: Distributed Transcoding Benchmarks

Development: Find Energy-Efficient Encoding Settings

Development: Compare H.264 vs H.265 Power Consumption

Production: Continuous CI/CD Benchmarking

Contributing

License

Quick Links

🧪 Testing

📚 Documentation

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages