Skip to content

feat(telemetry): DevOps CI/CD Dashboard with GitHub, Cloudflare, and Railway integrations #69

@rockfridrich

Description

@rockfridrich

Summary

Enhance the telemetry dashboard with a comprehensive DevOps CI/CD flow visualization that integrates all our infrastructure APIs: GitHub Actions, Cloudflare, and Railway.

Motivation

Currently, the telemetry dashboard shows basic health checks and GitHub Actions status. We need a unified view of our entire deployment pipeline with interactive controls for common DevOps operations.

Proposed Features

1. Interactive Pipeline Visualization

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Commit    │───▶│     CI      │───▶│   Staging   │───▶│ Production  │
│   (GitHub)  │    │  (Actions)  │    │ (Railway)   │    │  (Railway)  │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
      │                  │                  │                  │
      ▼                  ▼                  ▼                  ▼
  Branch info       Build logs        Deploy status      Traffic stats
  PR status        Test results       Health check       Cache status

2. GitHub Integration (Enhanced)

Feature API Endpoint Status
Workflow runs GET /repos/{owner}/{repo}/actions/runs ✅ Exists
Job details GET /repos/{owner}/{repo}/actions/runs/{id}/jobs ✅ Exists
Deployments GET /repos/{owner}/{repo}/deployments 🆕 New
Check suites GET /repos/{owner}/{repo}/check-suites 🆕 New
PR status GET /repos/{owner}/{repo}/pulls/{id} 🆕 New
Trigger workflow POST /repos/{owner}/{repo}/actions/workflows/{id}/dispatches 🆕 New

3. Railway Integration (New)

API Endpoint: https://backboard.railway.com/graphql/v2

Feature GraphQL Query Purpose
Service status services { ... status } Real-time deployment status
Deploy history deployments { ... } Recent deployments with timing
Environment vars variables { ... } Config visibility
Logs streaming deploymentLogs { ... } Live log tailing
Trigger deploy mutation { serviceInstanceDeploy } One-click redeploy
Restart service mutation { serviceInstanceRestart } Service restart

Authentication: Team token required (stored as env var RAILWAY_API_TOKEN)

4. Cloudflare Integration (New)

API Endpoint: https://api.cloudflare.com/client/v4

Feature Endpoint Purpose
DNS records GET /zones/{zone_id}/dns_records DNS management
Analytics GET /zones/{zone_id}/analytics/dashboard Traffic metrics
Cache purge POST /zones/{zone_id}/purge_cache Cache invalidation
Page Rules GET /zones/{zone_id}/pagerules Redirect rules
SSL status GET /zones/{zone_id}/settings/ssl Certificate status

Authentication: API token with Zone permissions (stored as CLOUDFLARE_API_TOKEN)

UI Components

A. Pipeline Flow (Existing, Enhanced)

  • Visual representation of code → CI → staging → production
  • Click any stage to see details
  • Real-time status indicators (animated when running)

B. Quick Actions Panel (New)

<QuickActions>
  <ActionButton icon="🚀" label="Deploy to Staging" onClick={deployStaging} />
  <ActionButton icon="🔄" label="Restart Production" onClick={restartProd} confirm />
  <ActionButton icon="🧹" label="Purge CDN Cache" onClick={purgeCache} />
  <ActionButton icon="🧪" label="Run E2E Tests" onClick={triggerE2E} />
</QuickActions>

C. Service Health Grid (Enhanced)

  • Add Railway deployment metadata (commit SHA, deploy time)
  • Add Cloudflare edge response times
  • Add DNS propagation status

D. Metrics Dashboard (New)

┌────────────────────────────────────────────────────────────┐
│  Traffic (24h)          Cache Hit Rate       Error Rate   │
│  ████████████ 12.5k    ████████░░ 87%       ░░░░░░░░ 0.2%│
└────────────────────────────────────────────────────────────┘

E. Deploy Timeline (New)

  • Chronological view of all deployments across environments
  • Rollback capability (click to redeploy previous version)
  • Deploy notes / changelog

Technical Implementation

API Routes to Add

/api/railway/services      - List all Railway services
/api/railway/deploy        - Trigger deployment
/api/railway/logs          - Stream deployment logs
/api/cloudflare/analytics  - Traffic and performance metrics
/api/cloudflare/cache      - Cache management
/api/cloudflare/dns        - DNS records

Environment Variables Required

RAILWAY_API_TOKEN=         # Railway team token
CLOUDFLARE_API_TOKEN=      # Cloudflare API token
CLOUDFLARE_ZONE_ID=        # villa.cash zone ID

Security Considerations

  1. Local-only by default: Telemetry dashboard should only be accessible locally or with authentication
  2. Action confirmations: Destructive actions (restart, purge cache) require confirmation
  3. Audit logging: Log all actions taken through the dashboard
  4. Rate limiting: Respect API rate limits (Railway: 1000/hr, Cloudflare: 1200/5min)

Acceptance Criteria

  • Railway service status visible in dashboard
  • One-click deploy to staging from dashboard
  • Cloudflare traffic metrics displayed
  • Cache purge functionality
  • Deploy timeline with rollback capability
  • All actions have confirmation dialogs
  • Error handling for API failures
  • Rate limit indicators

Resources

Estimated Effort

  • Phase 1 (Railway): 2-3 hours
  • Phase 2 (Cloudflare): 2-3 hours
  • Phase 3 (UI polish): 1-2 hours

Total: ~6-8 hours


cc @rockfridrich

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions