Skip to content

agenticdevops/devops-execution-engine

Repository files navigation

DevOps Execution Engine

Professional DevOps skills for Clawdbot

Bring enterprise-grade DevOps expertise to your Clawdbot instance with safe, auditable execution.

License Clawdbot Skill PRs Welcome

Built for Clawdbot - The AI assistant that actually helps with DevOps

Be a part of Agentic Ops Buiders - To build Agentic Devops Automation together with fellow builders.


What Is This?

DevOps Execution Engine is a comprehensive skill package for Clawdbot that transforms it into a professional DevOps assistant.

Why Clawdbot?

Clawdbot is the only AI assistant that:

  • Actually executes commands (not just suggests them like ChatGPT)
  • Integrates with your infrastructure (kubectl, AWS CLI, Terraform, etc.)
  • Provides human-in-the-loop safety (approve before execution)
  • Maintains audit trails (complete accountability)
  • Works across platforms (Telegram, Discord, WhatsApp, CLI, web)

This skill package extends Clawdbot with 11 production-ready DevOps skills, giving it deep domain expertise in:

  • Kubernetes operations and debugging
  • Cloud cost optimization
  • Incident response
  • Infrastructure as Code
  • Container management
  • And much more...

Platform Compatibility

Primary Platform: Clawdbot ⭐ (Full integration)

Also Compatible With:

  • LangChain - Can be adapted as custom tools
  • AutoGPT/BabyAGI - Execution engine can be integrated
  • Custom AI Agents - Core modules are platform-agnostic Node.js

Note: Full functionality (approval workflow, audit logging, skill integration) works best with Clawdbot's architecture.


The Problem

DevOps teams face a dilemma:

  • ✅ AI can diagnose issues faster than humans
  • ❌ But you can't let AI execute commands blindly in production
  • ✅ Manual execution is slow and error-prone
  • ❌ But automation without oversight is dangerous

Current options are inadequate:

  • Pure automation → Risky, no human oversight
  • Manual everything → Slow, defeats the purpose of AI
  • ChatGPT → Can't actually execute, just suggests commands

The Solution

DevOps Execution Engine bridges the gap:

  1. AI Diagnoses - Analyzes logs, metrics, cluster state
  2. AI Generates Plan - Creates detailed, reviewable execution plan
  3. Human Approves - You review and approve (or reject)
  4. AI Executes Safely - Runs with monitoring, rollback ready
  5. AI Verifies - Confirms success and logs everything

The result: AI speed + Human judgment = Safe, fast operations


What You Get

🔒 Safety First

  • No auto-execution - Always requires human approval for risky operations
  • Risk classification - Every action rated LOW/MEDIUM/HIGH/CRITICAL
  • Rollback plans - Every plan includes how to undo
  • Audit trail - Complete log of who approved what and when
  • Pre/post validation - Checks before and after execution

📚 Comprehensive Skills Library

11 Production-Ready DevOps Skills:

  • Kubernetes - Debug, deploy, manage (k8s-debug, k8s-deploy, argocd-gitops)
  • Cloud - AWS operations and cost optimization (aws-ops, cost-optimization)
  • Infrastructure - Terraform, Docker operations (terraform-workflow, docker-ops)
  • Operations - Incident response, log analysis, health checks (incident-response, log-analysis, system-health)
  • Development - Git workflows and best practices (git-workflow)

🎯 Real-World Use Cases

Incident Response:

You: SEV1 - API is down!
AI: [diagnoses → identifies database crash → generates recovery plan]
You: [reviews plan → approves]
AI: [executes → restores service → verifies → logs incident]
Result: 5-minute recovery instead of 30-minute scramble

Cost Optimization:

You: Find AWS cost savings
AI: [analyzes → identifies $3,250/month in waste]
You: [reviews idle resources → approves cleanup]
AI: [terminates safely → verifies → updates inventory]
Result: $39,000/year saved with full audit trail

Safe Deployments:

You: Deploy api v2.5.0 with canary strategy
AI: [generates multi-stage canary plan → monitors each stage]
You: [approves each stage after review]
AI: [deploys → monitors → promotes → completes]
Result: Zero-downtime deployment with human gates

Why Add These Skills to Your Clawdbot?

Transform Clawdbot Into Your DevOps Co-Pilot

With this skill package installed, your Clawdbot can:

Diagnose production issues in seconds (not hours)
Generate safe execution plans with full rollback procedures
Execute approved changes with monitoring and verification
Respond to incidents with structured playbooks
Optimize cloud costs and find waste automatically
Deploy safely with canary strategies and human gates
Maintain complete audit trails for compliance

Perfect For

DevOps Teams Using Clawdbot who want:

  • Professional-grade DevOps expertise built-in
  • Safe execution with human oversight
  • Domain knowledge for Kubernetes, AWS, Docker, Terraform
  • Incident response capabilities
  • Cost optimization insights
  • Complete audit trails for compliance

Platform Engineers who want to:

  • Give their team a 24/7 DevOps assistant
  • Standardize operations with tested playbooks
  • Reduce mean-time-to-recovery (MTTR)
  • Onboard new team members faster

Solo DevOps/SREs who want:

  • A second pair of eyes before executing
  • Quick diagnosis without searching docs
  • Structured incident response
  • Cost optimization without manual analysis

How It Works With Clawdbot


Quick Start (5 Minutes)

Prerequisites

You need Clawdbot installed first:

# Install Clawdbot
npm install -g clawdbot

# Start the gateway
clawdbot gateway start

# Verify it's running
clawdbot status

📚 New to Clawdbot? Check out docs.clawd.bot for installation guide.

Optional tools (depending on what you'll manage):

  • kubectl for Kubernetes operations
  • aws CLI for AWS operations
  • terraform for IaC operations
  • docker for container operations

Install DevOps Skills Into Clawdbot

# 1. Clone this skill package
git clone https://github.com/agenticdevops/devops-execution-engine.git
cd devops-execution-engine

# 2. Install into your Clawdbot instance
clawdbot skills:install .

# 3. Verify installation
clawdbot skills:list | grep devops-execution-engine

That's it! Your Clawdbot now has professional DevOps expertise.

First Steps (Recommended)

Start with read-only operations to build trust:

# Start Clawdbot chat
clawdbot chat

Then try these safe commands:

You: Check cluster health
You: List all pods across namespaces
You: Show recent Kubernetes events
You: Analyze system resource usage

All read-only, zero risk. Get familiar with how it works.

Your First Execution Plan

When you're ready to let AI execute (with your approval):

You: I have pods in CrashLoopBackOff, can you fix them?

AI: [diagnoses the issue]
    [generates detailed execution plan]
    [shows you exactly what will be done]
    [waits for your approval]

You: yes

AI: [executes step-by-step with progress updates]
    [verifies the fix worked]
    [logs everything to audit trail]

You're always in control. Review every plan before approving.


How It Works

The Workflow

┌─────────────────────────────────────────────────────────────┐
│  1. You: "Fix the crashloop pods"                           │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│  2. AI Diagnoses (read-only, safe)                          │
│     - Checks pod status                                      │
│     - Analyzes logs                                          │
│     - Reviews events                                         │
│     - Identifies root cause                                  │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│  3. AI Generates Execution Plan                             │
│                                                              │
│     📋 PLAN: Fix CrashLoopBackOff                           │
│     Risk: MEDIUM | Time: ~5min                              │
│                                                              │
│     Steps:                                                   │
│     1. Increase memory limit 256Mi → 512Mi                  │
│     2. Wait for rollout (5min)                              │
│     3. Verify all pods running                              │
│                                                              │
│     Rollback: kubectl rollout undo deployment/api           │
│                                                              │
│     Approve? (yes/no)                                       │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│  4. YOU REVIEW & APPROVE                                    │
│     - Read the plan                                          │
│     - Understand impact                                      │
│     - Check rollback procedure                              │
│     - Approve or reject                                      │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│  5. AI Executes (only after approval)                       │
│     ✓ Step 1: Patching deployment... done                  │
│     ✓ Step 2: Waiting for rollout... done (2m 15s)         │
│     ✓ Step 3: Verifying pods... all running                │
│                                                              │
│     ✅ Complete! Logged to audit trail                      │
└─────────────────────────────────────────────────────────────┘

What Gets Logged

Every action creates an audit entry:

{
  "timestamp": "2026-01-26T13:00:00Z",
  "plan_id": "plan-20260126-001",
  "action": "kubectl patch deployment",
  "risk": "MEDIUM",
  "status": "success",
  "approver": "your-name",
  "duration_seconds": 135
}

Full transparency. Full accountability.


Example Usage

You: Check cluster health

Clawd: [runs diagnostics]
- 3/3 nodes ready
- 2 pods in CrashLoopBackOff (api-service)
- Disk usage: worker-1 at 85%

You: Fix the crashloop

Clawd: 📋 EXECUTION PLAN: plan-001

Title: Fix CrashLoopBackOff in api-service
Risk: MEDIUM
Time: ~5min

Steps:
1. Increase memory 256Mi → 512Mi
2. Wait for rollout (5min)
3. Verify pods running

Approve? (yes/no)

You: yes

Clawd: ✅ Executing...
[runs steps with progress]
✅ Completed! All pods running.

Features

🔒 Safety First

  • No auto-execution - always requires approval
  • Risk assessment for every action
  • Pre-flight validation
  • Rollback plans included
  • Complete audit trail

📚 Comprehensive Skills

Kubernetes

  • k8s-debug, k8s-deploy, argocd-gitops

Cloud

  • aws-ops, cost-optimization

Infrastructure

  • terraform-workflow, docker-ops

Operations

  • incident-response, log-analysis, system-health, git-workflow

📝 Structured Plans

Every operation generates a YAML execution plan:

plan:
  title: "What I'm fixing"
  risk: MEDIUM
  estimated_time: 5min
  rollback: ["how to undo"]
  
steps:
  - action: kubectl_patch
    command: "exact command"
    risk: MEDIUM

🎯 Use Cases

  • Incident Response - Structured playbooks for outages
  • Kubernetes Management - Debug and fix cluster issues
  • Cost Optimization - Find and eliminate waste
  • Safe Deployments - Deploy with confidence and rollback
  • Infrastructure as Code - Terraform workflows
  • Container Operations - Docker debugging and management

Documentation


Examples

Kubernetes Debugging

"Debug pods in production"
"Why is api-service crashing?"
"Check node resource usage"

Incident Response

"We have a SEV1 - API down"
"High error rates in payment service"
"Check recent deployments"

Cost Analysis

"Analyze AWS costs"
"Find idle resources"
"Suggest optimizations"

Deployments

"Deploy api v2.1.0 to prod"
"Rollback last deployment"
"Check ArgoCD status"

Architecture

devops-execution-engine/
├── SKILL.md              # Main documentation
├── core/                 # Execution engine
│   ├── plan-generator.js
│   ├── executor.js
│   ├── approval.js
│   └── logger.js
├── templates/            # Plan templates
├── skills/               # 11 DevOps skills
├── examples/             # Example plans
└── docs/                 # Documentation

Safety Model

Risk Levels

  • 🟢 LOW - Read-only, no impact
  • 🟡 MEDIUM - Resource changes, reversible
  • 🔴 HIGH - Production changes, potential downtime
  • CRITICAL - Data/security operations

Approval Process

  1. Generate plan
  2. Present with risk assessment
  3. Wait for approval
  4. Execute with monitoring
  5. Validate results
  6. Log to audit trail

Contributing

We welcome contributions!

  • 🐛 Bug reports - Open an issue
  • 💡 Feature requests - Start a discussion
  • 🔧 Pull requests - See CONTRIBUTING.md
  • 📚 Documentation - Improvements welcome
  • 🎓 Skills - Add new DevOps skills

Requirements

  • Clawdbot v1.0.0+
  • kubectl (for Kubernetes operations)
  • aws CLI (for AWS operations, optional)
  • terraform (for IaC operations, optional)
  • docker (for container operations, optional)

License

Apache 2.0 - See LICENSE


Support


Why This Exists

DevOps teams love AI assistance but fear automation. This skill bridges that gap:

  • AI does the diagnosis and planning
  • Human reviews and approves
  • AI executes safely with monitoring
  • Everything is logged and reversible

The best of both worlds: AI speed + human oversight.


Why Clawdbot?

Clawdbot is different from other AI assistants:

Feature ChatGPT GitHub Copilot Clawdbot + DevOps Skills
Suggests commands
Actually executes
Domain expertise Code only ✅ DevOps
Approval workflow
Audit trail
Rollback procedures
Multi-platform Web only IDE only ✅ Everywhere

Clawdbot + DevOps Skills = The only AI that can safely manage your infrastructure

Get Clawdbot


Contributing

Help make this the best DevOps skill package for Clawdbot! See CONTRIBUTING.md


License

Apache 2.0 - See LICENSE


Built with ❤️ by the Clawdbot community

Get Started | View Skills | See Examples