Skip to content

shubhamwagdarkar/workflow-execution-tracer

Repository files navigation

workflow-execution-tracer

Python FastAPI Prometheus Grafana PostgreSQL Docker License

AIOps observability service — instruments distributed workflow executions, traces each step, detects bottlenecks against historical baselines, exposes Prometheus metrics, visualizes in Grafana, and fires Slack alerts.


Why This Project

In enterprise automation platforms, long-running workflows can silently degrade — a single slow step buries the whole execution. This service gives you full execution visibility:

  • Every step is traced with millisecond precision
  • Bottlenecks are auto-detected by comparing each step against its own historical average
  • Alerts fire to Slack before users notice slowdowns
  • Metrics are scraped by Prometheus and visualized in a pre-built Grafana dashboard

Architecture

Client (automation platform)
    │
    ▼
FastAPI (main.py)
    ├── POST /workflows                  → start workflow execution
    ├── POST /workflows/{id}/steps       → start a step
    ├── PUT  /workflows/{id}/steps/{sid} → complete step → bottleneck check → Slack alert
    ├── PUT  /workflows/{id}             → complete workflow
    ├── GET  /workflows/{id}/bottlenecks → on-demand bottleneck report
    ├── GET  /stats/{workflow_name}      → historical step averages
    └── GET  /metrics                    → Prometheus text format
         │
         ▼
    Grafana (port 3000) ← scrapes /metrics every 15s
         │
    PostgreSQL
    ├── workflow_executions
    └── workflow_steps

Key Highlights

Feature Detail
Step-level tracing Every step traced with start/end timestamps and duration in ms
Bottleneck detection Steps > 2× historical average → WARNING; > 3× → CRITICAL
Prometheus metrics /metrics endpoint — workflow count, step durations, error rates
Grafana dashboard Pre-built dashboard JSON auto-provisioned via Docker Compose
Slack alerting Structured alerts with workflow ID, step name, severity, and duration
One-command start Full stack (API + PostgreSQL + Prometheus + Grafana) via docker-compose up

Demo

# Start the full stack
docker-compose up --build

# Run a simulated workflow with randomized slow steps
curl -X POST "http://localhost:8000/simulate?workflow_name=demo"

# Sample response
{
  "workflow_id": "wf_9c3e1a",
  "workflow_name": "demo",
  "total_duration_ms": 4823,
  "steps_completed": 5,
  "bottlenecks_detected": [
    {
      "step_name": "data_transform",
      "duration_ms": 2100,
      "historical_avg_ms": 420,
      "severity": "CRITICAL",
      "factor": 5.0
    }
  ]
}

Open Grafanahttp://localhost:3000 (admin / admin) to see live metrics.


Quick Start (Docker — recommended)

docker-compose up --build
Service URL Notes
API http://localhost:8000 FastAPI app
API docs http://localhost:8000/docs Interactive Swagger UI
Prometheus http://localhost:9090 Scrapes /metrics every 15s
Grafana http://localhost:3000 admin / admin — auto dashboard
# Stop and remove containers
docker-compose down

# Also remove volumes (clears DB data)
docker-compose down -v

Manual Setup

git clone https://github.com/shubhamwagdarkar/workflow-execution-tracer.git
cd workflow-execution-tracer

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
cp .env.example .env   # fill in PostgreSQL + Slack details
uvicorn main:app --reload

Stack

Layer Technology
API FastAPI + Uvicorn
Database PostgreSQL + psycopg2
Alerting Slack Incoming Webhooks (slack-sdk)
Observability Prometheus metrics + Grafana dashboard
Containerization Docker + Docker Compose
Testing pytest

About

AIOps: instruments distributed workflow executions — traces steps, measures latency, detects bottlenecks, alerts via Slack, visualizes in Grafana

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors