Skip to content

gustcol/dedo-duro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dedo-Duro: AWS Resource Utilization Analyzer

AWS Resource Utilization Analyzer - A comprehensive tool for analyzing AWS resource utilization, identifying cost optimization opportunities, finding orphaned resources, and ensuring security and privacy compliance.

Python Version AWS boto3 License Version


Table of Contents


Overview

Dedo-Duro (Portuguese for "Snitch" or "Tattletale") is a powerful command-line tool designed to help AWS administrators and DevOps engineers gain deep insights into their AWS resource utilization. It analyzes various services, identifies potential cost savings, flags security and privacy compliance issues, detects orphaned resources, and provides actionable recommendations.

mindmap
  root((Dedo-Duro))
    Cost Optimization
      Right-sizing
      Spot Instances
      Savings Plans
      Schedule Optimization
      Cost Explorer Integration
    Security & Privacy
      GDPR Compliance
      ISO 27701
      Best Practices
    Resource Analysis
      30+ AWS Services
      AI/ML Services
      Multi-Region
      Multi-Account
    Kubernetes
      EKS Sessions
      Deployment Lifecycle
      RTO/RPO Analysis
    Reporting
      HTML Interactive
      JSON/CSV Export
      S3 Upload
      Visual Charts
    CI/CD Integration
      GitHub Actions
      Jenkins
      CircleCI
Loading

Features

For a detailed history of changes and features introduced in specific versions, please see the changelog.md.

Comprehensive Resource Analysis

  • Graceful ExpiredToken Handling: Implemented immediate stopping of analysis and partial report generation upon ExpiredToken errors from AWS APIs. This ensures a graceful exit and provides a report with available data, preventing prolonged execution with invalid credentials.
  • Enhanced EBS Snapshot Analysis: The ebs_snapshot analyzer now identifies and reports on "repeated" snapshots (multiple snapshots from the same volume) that are also older than one year. This includes their estimated monthly costs, highlighting additional cost-saving opportunities.
  • Cost Optimization: Analyzes utilization (CPU, Memory, Network, IOPS) for EC2, RDS, Lambda, EBS, etc., over configurable periods (e.g., 30/60/90 days). Provides right-sizing recommendations, identifies idle/unused resources (EBS, EIP, NAT Gateways), suggests configuration optimizations (e.g., migrating EBS gp2 to gp3). Now uses real-time AWS Pricing API data for more accurate cost and savings estimations. Includes EC2 Reserved Instance (RI) awareness, identifying instances covered by RIs expiring soon.
  • Compute Optimizer Integration: Ingests and reports recommendations directly from AWS Compute Optimizer for EC2, ASG, EBS, and Lambda (compute_optimizer analyzer), leveraging AWS's ML-based analysis.
  • Savings Plans Analysis: Analyzes Compute, EC2 Instance, and SageMaker Savings Plans utilization, coverage, and expirations (savings_plans analyzer).
  • Instance Schedule Optimization: Automatically identifies EC2 instances eligible for scheduled start/stop to reduce costs (schedule_optimizer analyzer):
    • Environment Detection: Automatically identifies development, test, staging, and demo environments from instance names or tags
    • Schedule Profiles: Recommends optimal schedules including business hours only (70% savings), extended business hours (58% savings), development hours (64% savings), staging hours (48% savings), and weekend shutdown (29% savings)
    • Financial Calculations: Calculates potential monthly/annual savings based on real instance pricing data, including holiday savings
    • Terraform Integration: Provides schedule tags for use with AWS Instance Scheduler or Lambda-based automation

Resource Coverage

graph TB
    subgraph Compute["Compute Services"]
        EC2["EC2 Instances"]
        Lambda["Lambda Functions"]
        ECS["ECS Clusters"]
        Spot["Spot Analysis"]
    end

    subgraph Storage["Storage Services"]
        S3["S3 Buckets"]
        EBS["EBS Volumes"]
        EFS["EFS File Systems"]
        Snapshots["EBS Snapshots"]
    end

    subgraph Database["Database Services"]
        RDS["RDS Instances"]
        DynamoDB["DynamoDB Tables"]
        ElastiCache["ElastiCache"]
        OpenSearch["OpenSearch"]
    end

    subgraph Network["Network Services"]
        VPC["VPC Endpoints"]
        NAT["NAT Gateways"]
        ELB["Load Balancers"]
        CloudFront["CloudFront"]
        Route53["Route 53"]
    end

    subgraph AIML["AI/ML Services"]
        SageMaker["SageMaker"]
        Bedrock["Bedrock"]
        Comprehend["Comprehend"]
        Rekognition["Rekognition"]
        Textract["Textract"]
        Transcribe["Transcribe"]
        Kendra["Kendra"]
    end

    subgraph Financial["Financial Analysis"]
        ComputeOpt["Compute Optimizer"]
        SavingsPlans["Savings Plans"]
        CUR["Cost & Usage Report"]
        ScheduleOpt["Schedule Optimizer"]
        CostExplorer["Cost Explorer"]
    end

    subgraph Governance["Governance"]
        Security["Security Analysis"]
        Privacy["Privacy Compliance"]
        Orphan["Orphaned Resources"]
        RTO["RTO/RPO Analysis"]
    end

    subgraph Kubernetes["Kubernetes (EKS)"]
        EKSSessions["Session Monitoring"]
        EKSDeployment["Deployment Lifecycle"]
    end

    style AIML fill:#9b59b6,color:#fff
    style Financial fill:#27ae60,color:#fff
    style Governance fill:#e74c3c,color:#fff
    style Kubernetes fill:#326ce5,color:#fff
Loading

AI/ML Service Analysis

Comprehensive cost optimization for AWS AI/ML services:

  • Amazon SageMaker: Analyzes notebook instances (idle detection, GPU warnings), endpoints (utilization metrics, serverless recommendations), training jobs (Spot usage), models, Feature Store, and Studio domains.
  • Amazon Bedrock: Analyzes provisioned throughput (underutilization detection), custom models, logging configuration, guardrails, and knowledge bases.
  • Amazon Comprehend: Analyzes endpoints (idle detection), document classifiers, entity recognizers, and flywheels.
  • Amazon Rekognition: Analyzes Custom Labels projects/models (running cost detection), stream processors, and face collections.
  • Amazon Textract: Analyzes custom adapters and usage patterns with per-operation cost estimation.
  • Amazon Transcribe: Analyzes custom vocabularies, language models, call analytics categories, and job patterns (failed/stuck detection).
  • Amazon Kendra: Analyzes indexes (edition-based costs), data sources, experiences, and query patterns.

v12.0 New Features

Multi-Account Analysis

  • Consolidated Analysis: Analyze multiple AWS accounts simultaneously with the --accounts-file option
  • Cross-Account Reports: Generate both individual and consolidated reports across accounts
  • AWS Organizations Support: Leverage AWS Organizations for automatic account discovery
  • Partition Support: Full support for AWS Commercial, GovCloud, and China partitions

Cost Explorer Integration

  • Real Cost Data: Integrates with AWS Cost Explorer API for actual spend data
  • Anomaly Detection: Identifies cost spikes and unusual spending patterns
  • Budget Tracking: Compares actual costs against estimated costs
  • Service-Level Analysis: Breaks down costs by AWS service

RTO/RPO Analysis

  • Backup Assessment: Analyzes backup configurations for RDS, S3, and other services
  • Cross-Region Replication: Checks for disaster recovery readiness
  • Recovery Metrics: Calculates estimated RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
  • Compliance Checks: Identifies resources not meeting recovery requirements

EKS Monitoring (Kubernetes)

  • Session Monitoring: Tracks active kubectl and SSM sessions to EKS clusters
  • Deployment Lifecycle: Monitors deployment health, age, and update frequency
  • Restart Analysis: Identifies pods with excessive restart counts
  • Stale Deployment Detection: Flags deployments not updated in 90+ days

Environment Filtering

  • Environment Tags: Filter analysis by environment (production, staging, development, test)
  • Tag-Based Grouping: Group resources by custom tags (Team, Project, CostCenter)
  • Targeted Reports: Generate environment-specific reports

CI/CD Integration

  • GitHub Actions: Pre-built workflow for automated weekly analysis
  • Jenkins Pipeline: Jenkinsfile for Jenkins CI/CD integration
  • CircleCI Config: Configuration for CircleCI pipelines
  • Artifact Upload: Automatic report upload to S3 or CI artifacts

Web Dashboard (Enterprise)

  • Real-time Monitoring: Flask-based web dashboard for live analysis status
  • REST API: Full API for triggering analysis and retrieving results
  • Report History: View and compare historical analysis reports
  • Alert Configuration: Configure custom alert thresholds via web interface

Notifications (Enterprise)

  • Slack Integration: Send alerts and reports to Slack channels via webhooks
  • Microsoft Teams: Teams channel integration for notifications
  • Custom Alerts: Configurable thresholds for cost, security, and idle resources
  • Alert Severity Levels: Critical, warning, and info classifications

Auto-Remediation (Experimental)

  • Safe Operations Only: Tagging and snapshot operations by default
  • Dry-Run Mode: All actions simulated unless explicitly enabled
  • Approval Workflow: High-risk actions require manual approval
  • Audit Logging: Complete audit trail of all remediation actions
  • Risk Levels: SAFE, LOW, MEDIUM, HIGH, CRITICAL classifications

Performance & Resilience

  • Session Reuse: Cached boto3 sessions and clients reduce connection overhead
  • Per-Analyzer Timeouts: Configurable timeouts prevent slow analyzers from blocking the pipeline
  • CloudWatch Metrics Cache: TTL-based caching reduces redundant API calls (5-30 min TTL by metric type)
  • Circuit Breaker: Failing services are automatically bypassed after threshold failures
  • Exponential Backoff: Intelligent retry with jitter for AWS API throttling
  • Progressive Results: Event-driven architecture enables streaming partial results
  • Performance Telemetry: Detailed metrics on analyzer execution times and cache efficiency

Advanced Capabilities & Reporting

  • Multi-Region & China Region Support: Analyzes resources across multiple specified AWS regions simultaneously, including AWS China regions (cn-north-1, cn-northwest-1).
  • Infrastructure-as-Code Recommendations: Generates suggested Terraform (terraform_recommendations) and CloudFormation (cloudformation_recommendations) code snippets based on analysis findings to aid remediation.
  • Flexible & Enhanced Reporting: Creates detailed reports in multiple formats:
    • HTML: Interactive, sortable, filterable reports with enhanced formatting, combined results, efficiency scores, RI details, and CloudWatch Agent information.
    • Visual Analytics Charts: HTML reports now include interactive Chart.js visualizations
    • JSON & CSV: Structured data formats suitable for programmatic consumption.
    • Supports direct report upload to a specified S3 bucket.

Architecture

High-Level System Architecture

flowchart TB
    subgraph Input["Input Layer"]
        CLI[/"CLI Arguments"/]
        Config["Configuration Files"]
        AWS["AWS Credentials"]
    end

    subgraph Core["Core Engine"]
        Manager["AWSResourceManager<br/>Orchestrator"]
        Metrics["CloudWatchMetrics<br/>Handler"]
        Types["Type System<br/>& Protocols"]
    end

    subgraph Analyzers["Analyzer Layer (30+ Analyzers)"]
        direction LR
        Compute["Compute<br/>EC2, Lambda, ECS"]
        Storage["Storage<br/>S3, EBS, EFS"]
        Database["Database<br/>RDS, DynamoDB"]
        AIML["AI/ML<br/>SageMaker, Bedrock"]
        Network["Network<br/>VPC, NAT, ELB"]
        Security["Security<br/>& Privacy"]
    end

    subgraph Output["Output Layer"]
        HTML["HTML Reporter<br/>Interactive Charts"]
        JSON["JSON Reporter"]
        CSV["CSV Reporter"]
        S3Out["S3 Upload"]
    end

    subgraph AWS_Services["AWS Services"]
        direction LR
        CloudWatch["CloudWatch"]
        Pricing["Pricing API"]
        STS["STS"]
        IAM["IAM"]
    end

    CLI --> Manager
    Config --> Manager
    AWS --> Manager

    Manager --> Metrics
    Manager --> Analyzers
    Types -.-> Manager

    Analyzers --> AWS_Services
    Metrics --> CloudWatch

    Manager --> Output
    Output --> S3Out

    style Manager fill:#4a90d9,color:#fff
    style Analyzers fill:#50c878,color:#fff
    style Output fill:#ff9500,color:#fff
Loading

Version 2.0 onwards follows a modular, object-oriented architecture:

  1. Core Components:

    • AWSResourceManager: Main orchestrator that coordinates all analyzers.
    • ResourceAnalyzer: Base class for all resource-specific analyzers with lazy initialization for AWS clients and metrics.
    • CloudWatchMetrics: Handles CloudWatch metric collection.
    • ReportCoordinator: Coordinates report generation.
    • AnalysisResult: Standardized dataclass for analyzer results with backward-compatible conversion methods.
  2. Resource Analyzers:

    • Each AWS service has its dedicated analyzer class.
    • Analyzers inherit from the ResourceAnalyzer base class.
    • AWS clients are initialized lazily on first access, reducing unnecessary API calls.
  3. Reporters:

    • Format-specific reporter classes (HTML, JSON, CSV).
    • Reporters inherit from a common BaseReporter class.
  4. Utilities:

    • AWS utilities for common operations (API calls, pagination, tags).
    • Console utilities for user interface (progress bars, colored output).
    • Cost estimator for savings calculations with real-time pricing data.
    • Protocol definitions for type-safe AWS client usage.
  5. Type System:

    • Protocol classes in utils/protocols.py for AWS client type hints.
    • TypedDict definitions for structured result dictionaries.
    • utc_now() helper replacing deprecated datetime.utcnow() calls.

Analysis Flow

sequenceDiagram
    autonumber
    participant User
    participant Main as main.py
    participant Manager as AWSResourceManager
    participant Analyzer as ResourceAnalyzer
    participant AWS as AWS APIs
    participant Reporter as ReportCoordinator

    User->>Main: python main.py --region us-east-1
    Main->>Manager: Initialize with configs
    Manager->>AWS: Fetch account info (STS)
    AWS-->>Manager: Account ID, Partition

    Main->>Manager: Register analyzers

    loop For each analyzer
        Manager->>Analyzer: Create instance (lazy init)
        Analyzer->>AWS: Fetch resources
        AWS-->>Analyzer: Resource data
        Analyzer->>AWS: Get CloudWatch metrics
        AWS-->>Analyzer: Utilization metrics
        Analyzer->>Analyzer: Analyze & generate recommendations
        Analyzer-->>Manager: Results + Summary
    end

    Manager->>Manager: Calculate savings summary
    Manager->>Reporter: Generate report
    Reporter->>Reporter: Create HTML/JSON/CSV
    Reporter-->>User: Save report file

    opt S3 Upload
        Reporter->>AWS: Upload to S3
    end
Loading

Component Architecture

flowchart LR
    subgraph Entry["Entry Points"]
        main["main.py"]
        run_npe["run_npe.py"]
        run_spot["run_spot_analyzer.py"]
    end

    subgraph Core["core/"]
        analyzer["analyzer.py<br/>─────────<br/>ResourceAnalyzer<br/>AWSResourceManager"]
        metrics["metrics.py<br/>─────────<br/>CloudWatchMetrics"]
        reporter["reporter.py<br/>─────────<br/>ReportCoordinator"]
        types["types.py<br/>─────────<br/>AnalysisResult<br/>TypedDicts"]
    end

    subgraph Analyzers["analyzers/"]
        ec2["ec2.py"]
        s3["s3.py"]
        rds["rds.py"]
        lambda_a["lambda.py"]
        spot["spot.py"]
        sagemaker["sagemaker.py"]
        bedrock["bedrock.py"]
        more["... 25+ more"]
    end

    subgraph Reporters["reporters/"]
        html["html_reporter.py"]
        json_r["json_reporter.py"]
        csv_r["csv_reporter.py"]
    end

    subgraph Utils["utils/"]
        aws_utils["aws_utils.py"]
        cost_est["cost_estimator.py"]
        console["console.py"]
        protocols["protocols.py"]
    end

    subgraph NPE["npe/"]
        collector["collector.py"]
        docker_gen["dockerfile_generator.py"]
        helm_gen["helm_generator.py"]
    end

    main --> Core
    Core --> Analyzers
    Core --> Reporters
    Analyzers --> Utils
    run_npe --> NPE

    style Core fill:#4a90d9,color:#fff
    style Analyzers fill:#50c878,color:#fff
    style Reporters fill:#ff9500,color:#fff
Loading

Project Structure

.
├── __init__.py                 # Package initialization and version info
├── main.py                     # Main Dedo-Duro analysis entry point
├── run_npe.py                  # Entry point for NPE Kubernetes artifact generation
├── config.py                   # Configuration settings
├── requirements.txt            # Required Python packages for pip
├── Pipfile                     # Required Python packages for pipenv
├── Pipfile.lock                # Lock file for pipenv
├── analyzers/                  # Resource-specific analyzers for Dedo-Duro
│   ├── __init__.py
│   ├── ... (various analyzer files) ...
│   ├── cur.py                  # Cost and Usage Report (CUR) analysis
│   ├── cost_explorer_analyzer.py    # Cost Explorer integration (v12.0)
│   ├── rto_analyzer.py              # RTO/RPO analysis (v12.0)
│   ├── eks_session_analyzer.py      # EKS session monitoring (v12.0)
│   └── eks_deployment_lifecycle.py  # EKS deployment lifecycle (v12.0)
├── core/                       # Core Dedo-Duro functionality
│   ├── __init__.py
│   ├── analyzer.py             # Main analyzer orchestration with lazy initialization
│   ├── metrics.py              # CloudWatch metrics handling (with caching)
│   ├── reporter.py             # Report generation coordination
│   ├── types.py                # Type definitions (AnalysisResult, TypedDicts, utc_now)
│   ├── multi_account.py        # Multi-account orchestration (v12.0)
│   ├── cache.py                # TTL-based CloudWatch metrics cache (v12.0)
│   ├── timeout.py              # Per-analyzer timeout management (v12.0)
│   ├── circuit_breaker.py      # Circuit breaker for failing services (v12.0)
│   ├── events.py               # Progressive results event bus (v12.0)
│   └── telemetry.py            # Performance monitoring/telemetry (v12.0)
├── reporters/                  # Dedo-Duro report generators
│   ├── __init__.py
│   ├── json_reporter.py        # JSON report generation
│   ├── html_reporter.py        # HTML report generation
│   └── csv_reporter.py         # CSV report generation
├── npe/                        # Native-to-Platform Extractor (Kubernetes migration)
│   ├── __init__.py
│   ├── collector.py            # Collects AWS Lambda/API GW data
│   ├── dockerfile_generator.py # Generates Dockerfiles
│   ├── helm_generator.py       # Generates Helm charts
│   └── templates/              # Templates for Dockerfile/Helm
├── schedule/                   # Schedule optimization module
│   └── __init__.py
├── ci/                         # CI/CD integration templates (v12.0)
│   ├── Jenkinsfile             # Jenkins pipeline configuration
│   └── github_reporter.py      # GitHub-specific output format
├── .github/workflows/          # GitHub Actions workflows (v12.0)
│   └── dedo-duro-analysis.yml  # Automated analysis workflow
├── .circleci/                  # CircleCI configuration (v12.0)
│   └── config.yml              # CircleCI pipeline config
├── web/                        # Web Dashboard (v12.0-Enterprise)
│   ├── app.py                  # Flask application
│   ├── templates/              # HTML templates
│   │   └── index.html          # Dashboard template
│   └── static/                 # CSS/JS assets
│       ├── style.css           # Dashboard styles
│       └── app.js              # Dashboard JavaScript
├── notifications/              # Notification System (v12.0-Enterprise)
│   ├── __init__.py
│   ├── slack.py                # Slack webhook integration
│   ├── teams.py                # Microsoft Teams integration
│   └── alerting.py             # Alert manager with thresholds
├── remediation/                # Auto-Remediation (v12.0-Enterprise)
│   ├── __init__.py
│   ├── base.py                 # Base remediation framework
│   ├── ec2_remediation.py      # EC2 remediation actions
│   ├── rds_remediation.py      # RDS remediation actions
│   └── s3_remediation.py       # S3 remediation actions
├── docs/                       # Documentation
│   └── kubernetes_permissions.md  # K8s permissions guide
└── utils/                      # Utility functions (shared)
    ├── __init__.py
    ├── aws_utils.py            # AWS-specific utilities
    ├── cost_estimator.py       # Cost estimation utilities
    ├── console.py              # Console output utilities
    └── protocols.py            # AWS client Protocol definitions for type safety

Implementation Details

Key Classes

AWSConfig (config.py)

Manages AWS configuration including region, profile, retry mechanisms, and timeouts.

aws_config = AWSConfig(
    region="us-east-1",
    profile="production",
    max_attempts=5,
    retry_mode="adaptive"
)

# Create a copy with a different region for multi-region analysis
west_config = aws_config.copy_with_region("us-west-2")

AnalysisConfig (config.py)

Controls analysis behavior including concurrency, throttling, and batch sizes.

analysis_config = AnalysisConfig(
    verbose=True,
    single_thread=False,
    multi_region=False,
    max_workers=10
)

ResourceAnalyzer (core/analyzer.py)

Base class for all resource analyzers with lazy initialization.

class ResourceAnalyzer(ABC):
    META_ANALYZER_SERVICES: Set[str] = {
        'terraform_recommendations',
        'cloudformation_recommendations',
        'schedule_estimation'
    }

    @property
    def client(self) -> Any:
        """Lazy initialization of AWS client - only created when first accessed."""
        if not self._client_initialized:
            self._client_initialized = True
            service_name = self.get_service_name()
            if service_name and service_name not in self.META_ANALYZER_SERVICES:
                self._client = self.aws_config.create_client(service_name)
        return self._client

    @abstractmethod
    def analyze(self, cur_data: Optional[Dict] = None, **kwargs) -> Union[List[Dict[str, Any]], Tuple[List[Dict[str, Any]], Dict[str, Any]]]:
        pass

AnalysisResult (core/types.py)

Standardized dataclass for analyzer results.

@dataclass
class AnalysisResult:
    """Standardized result container for analyzers."""
    resources: List[Dict[str, Any]] = field(default_factory=list)
    summary: Dict[str, Any] = field(default_factory=dict)
    metadata: Dict[str, Any] = field(default_factory=dict)

    def to_list(self) -> List[Dict[str, Any]]:
        """Convert to list format for backward compatibility."""
        return self.resources

AWS Client Protocols (utils/protocols.py)

Protocol definitions for type-safe AWS client usage:

@runtime_checkable
class EC2ClientProtocol(AWSClientProtocol, Protocol):
    """Protocol for EC2 client operations."""
    def describe_instances(self, **kwargs: Any) -> Dict[str, Any]: ...
    def describe_volumes(self, **kwargs: Any) -> Dict[str, Any]: ...
    def describe_snapshots(self, **kwargs: Any) -> Dict[str, Any]: ...

Protocols are available for: EC2, S3, RDS, DynamoDB, ELBv2, Lambda, CloudWatch, SageMaker, Bedrock, Comprehend, Rekognition, Textract, Transcribe, Kendra, and more.


Installation

Prerequisites

  • Python 3.7 or higher
  • pip and pipenv (Recommended: pip install pipenv)
  • AWS credentials configured
  • For CUR Analysis: polars and pyarrow packages
  • For NPE Module: PyYAML package

Using pipenv (Recommended)

# Clone the repository
git clone https://github.com/your-org/dedo-duro.git
cd dedo-duro

# Install dependencies
pipenv install

# Activate virtual environment
pipenv shell

# Run the analyzer
python main.py --region us-west-2

Using pip

# Clone the repository
git clone https://github.com/your-org/dedo-duro.git
cd dedo-duro

# Create virtual environment (optional but recommended)
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the analyzer
python main.py --region us-west-2

Usage

Basic Usage

# Analyze all resources in a region
python main.py --region us-east-1

# Generate HTML report with specific output file
python main.py --region us-west-2 --output-file my-report.html

# Analyze specific resource types
python main.py --resource-types ec2,s3,rds,spot

Command Flow

flowchart TD
    Start([Start]) --> Args{Parse Arguments}
    Args --> Config[Create Configs<br/>AWS, Analysis, Report]
    Config --> Manager[Initialize<br/>AWSResourceManager]
    Manager --> Register[Register Analyzers]

    Register --> CUR{CUR Analysis<br/>Requested?}
    CUR -->|Yes| RunCUR[Run CUR Analyzer]
    CUR -->|No| Primary
    RunCUR --> Primary[Run Primary Analyzers]

    Primary --> Multi{Multi-Region?}
    Multi -->|Yes| Parallel[Parallel Region Analysis]
    Multi -->|No| Single[Single Region Analysis]

    Parallel --> Combine[Combine Results]
    Single --> Summary
    Combine --> Summary[Calculate Summary]

    Summary --> Report[Generate Report]
    Report --> S3{Upload to S3?}
    S3 -->|Yes| Upload[Upload Report]
    S3 -->|No| Done
    Upload --> Done([Complete])

    style Start fill:#27ae60,color:#fff
    style Done fill:#27ae60,color:#fff
    style Manager fill:#3498db,color:#fff
    style Report fill:#e67e22,color:#fff
Loading

Available Arguments

Argument Description Default
--region AWS region to analyze AWS default
--profile AWS credential profile Default profile
--output-format Report format (html/json/csv) html
--output-file Output file path Auto-generated
--output-s3-bucket S3 bucket for upload None
--output-s3-prefix S3 prefix for upload None
--resource-types Comma-separated analyzers All standard
--cur-s3-uri S3 URI for CUR data None
--cur-days-ago Days of CUR data to analyze None
--multi-region Analyze all regions False
--accounts-file JSON file with account configs None
--all-accounts Analyze all Organization accounts False
--environment Filter by environment (prod/test/dev) None
--grouping-tags Tags for resource grouping Team,Project
--verbose Detailed output False
--single-thread Disable parallel processing False
--max-workers Parallel workers 10
--retry-attempts Max retry attempts 5
--no-cache Disable CloudWatch metrics caching False
--analyzer-timeout Per-analyzer timeout in seconds 180
--enable-streaming Enable progressive results streaming False

Multi-Account Analysis (v12.0)

Analyze multiple AWS accounts simultaneously:

# Using accounts file
python main.py --accounts-file accounts.json --output-format html

# Analyze all accounts in AWS Organizations
python main.py --all-accounts --output-format html

accounts.json format:

{
  "accounts": [
    {
      "account_id": "111111111111",
      "role_arn": "arn:aws:iam::111111111111:role/DedoDuroRole",
      "alias": "production",
      "regions": ["us-east-1", "us-west-2"]
    },
    {
      "account_id": "222222222222",
      "role_arn": "arn:aws:iam::222222222222:role/DedoDuroRole",
      "alias": "staging",
      "regions": ["us-east-1"]
    }
  ],
  "partition": "aws"
}

Partition values:

  • aws - AWS Commercial (default)
  • aws-us-gov - AWS GovCloud
  • aws-cn - AWS China

Environment Filtering (v12.0)

Filter analysis by environment:

# Analyze only production resources
python main.py --environment prod --region us-east-1

# Analyze with custom grouping tags
python main.py --grouping-tags Team,Project,CostCenter --region us-east-1

Supported Resource Types (Analyzer Keys)

graph LR
    subgraph Compute
        ec2[ec2]
        ec2_eff[ec2-eff]
        lambda[lambda]
        ecs[ecs]
        spot[spot]
    end

    subgraph Storage
        s3[s3]
        ebs[ebs]
        ebs_snap[ebs_snapshot]
        efs[efs]
    end

    subgraph Database
        rds[rds]
        dynamodb[dynamodb]
        elasticache[elasticache]
        opensearch[opensearch]
    end

    subgraph Network
        nat[nat]
        elb[elb]
        vpc_ep[vpc_endpoints]
        eip[eip]
        cloudfront[cloudfront]
        route53[route53]
    end

    subgraph AI_ML
        sagemaker[sagemaker]
        bedrock[bedrock]
        comprehend[comprehend]
        rekognition[rekognition]
        textract[textract]
        transcribe[transcribe]
        kendra[kendra]
    end

    subgraph Financial
        compute_opt[compute_optimizer]
        savings[savings_plans]
        cur[cur]
        schedule[schedule_optimizer]
    end

    subgraph Governance
        security[security_privacy]
        orphan[orphan]
        terraform[terraform_recommendations]
    end
Loading

Full list of analyzer keys:

  • ec2, ec2-eff, s3, rds, ebs, ebs_snapshot, lambda, elasticache, elb, dynamodb, api_gateway, nat, eip, vpc_endpoints, spot, security_privacy, orphan, ecs, sagemaker, bedrock, comprehend, rekognition, textract, transcribe, kendra, opensearch, compute_optimizer, savings_plans, cur, cloudfront, efs, route53, schedule_optimizer, terraform_recommendations, cloudformation_recommendations
  • v12.0 New: cost_explorer, rto_analysis, eks_sessions, eks_deployments

Native-to-Platform Extractor (NPE)

The NPE module helps migrate AWS Lambda/API Gateway to Kubernetes:

flowchart LR
    subgraph AWS["AWS Resources"]
        Lambda["Lambda Functions"]
        APIGW["API Gateway"]
    end

    subgraph NPE["NPE Module"]
        Collector["collector.py<br/>Gather configs"]
        DockerGen["dockerfile_generator.py<br/>Create Dockerfiles"]
        HelmGen["helm_generator.py<br/>Create Helm charts"]
    end

    subgraph Output["Generated Artifacts"]
        Dockerfile["Dockerfile<br/>per function"]
        Values["values.yaml"]
        Templates["Helm templates"]
    end

    subgraph K8s["Kubernetes"]
        Deployment["Deployments"]
        Service["Services"]
        Ingress["Ingress"]
    end

    AWS --> Collector
    Collector --> DockerGen
    Collector --> HelmGen
    DockerGen --> Dockerfile
    HelmGen --> Values
    HelmGen --> Templates
    Output --> K8s

    style NPE fill:#9b59b6,color:#fff
    style K8s fill:#326ce5,color:#fff
Loading

Usage

  1. Configure: Edit variables in run_npe.py
  2. Run: Execute python run_npe.py
  3. Review: Examine generated artifacts in ./generated_npe_artifacts

Note: The NPE module generates starting points for migration. Manual review and refinement are required for production deployment.


Data Integration & External Tools

Forwarding Reports to ELK Stack

  1. Generate JSON Report: python main.py --output-format json --output-file dedo_duro_results.json
  2. Configure Logstash to read the JSON file
  3. Output to Elasticsearch
  4. Visualize in Kibana

CUR Setup Best Practices

  • Enable Resource IDs: Check "Include Resource IDs" when creating your CUR
  • Parquet Format: Configure CUR to be delivered in Apache Parquet format
  • Hourly Granularity: Select hourly time granularity for detailed analysis
  • Permissions: IAM role needs s3:GetObject and s3:ListBucket on the CUR bucket

CI/CD Integration (v12.0)

Dedo-Duro includes ready-to-use CI/CD configurations for automated analysis.

GitHub Actions

# .github/workflows/dedo-duro-analysis.yml
name: Dedo-Duro AWS Analysis

on:
  schedule:
    - cron: '0 6 * * 1'  # Weekly on Mondays
  workflow_dispatch:

jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Run Dedo-Duro Analysis
        run: |
          pip install -r requirements.txt
          python main.py --region us-east-1 --output-format html

      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: dedo-duro-report
          path: '*.html'

Jenkins Pipeline

// ci/Jenkinsfile
pipeline {
    agent any

    triggers {
        cron('0 6 * * 1')  // Weekly
    }

    environment {
        AWS_DEFAULT_REGION = 'us-east-1'
    }

    stages {
        stage('Setup') {
            steps {
                sh 'pip install -r requirements.txt'
            }
        }

        stage('Analyze') {
            steps {
                withCredentials([[$class: 'AmazonWebServicesCredentialsBinding',
                                  credentialsId: 'aws-credentials']]) {
                    sh 'python main.py --region us-east-1 --output-format html'
                }
            }
        }

        stage('Archive') {
            steps {
                archiveArtifacts artifacts: '*.html', fingerprint: true
            }
        }
    }
}

CircleCI

# .circleci/config.yml
version: 2.1

jobs:
  analyze:
    docker:
      - image: cimg/python:3.11
    steps:
      - checkout
      - run:
          name: Install dependencies
          command: pip install -r requirements.txt
      - run:
          name: Run analysis
          command: python main.py --region us-east-1 --output-format html
      - store_artifacts:
          path: .
          destination: reports

workflows:
  weekly-analysis:
    triggers:
      - schedule:
          cron: "0 6 * * 1"
          filters:
            branches:
              only: main
    jobs:
      - analyze

Automated Multi-Account/Region Execution (AWS Batch)

flowchart TB
    subgraph Trigger["Scheduling"]
        EventBridge["EventBridge<br/>Scheduler"]
    end

    subgraph Orchestration["Orchestration"]
        Lambda["Lambda<br/>Orchestrator"]
    end

    subgraph Execution["AWS Batch"]
        JobQueue["Job Queue"]
        ComputeEnv["Compute<br/>Environment"]
        Container["Dedo-Duro<br/>Container"]
    end

    subgraph Accounts["Target Accounts"]
        Account1["Account 1"]
        Account2["Account 2"]
        AccountN["Account N"]
    end

    subgraph Storage["Central Storage"]
        S3Bucket["S3 Bucket<br/>Reports"]
        ECR["ECR<br/>Container Image"]
    end

    EventBridge --> Lambda
    Lambda --> JobQueue
    JobQueue --> ComputeEnv
    ComputeEnv --> Container
    Container --> Accounts
    Container --> S3Bucket
    ECR --> Container

    style Trigger fill:#ff9500,color:#fff
    style Execution fill:#4a90d9,color:#fff
    style Storage fill:#27ae60,color:#fff
Loading

Setup Steps

  1. Build and Push Docker Image to ECR
  2. Configure AWS Batch (Compute Environment, Job Queue, Job Definition)
  3. Create Orchestrator (Lambda function to submit jobs)
  4. Schedule Execution with EventBridge Scheduler

Report Examples

HTML Report Features

flowchart TB
    subgraph Report["Interactive HTML Report"]
        Summary["Executive Summary<br/>Total savings, opportunities"]
        Charts["Visual Analytics<br/>Chart.js charts"]
        Tables["Resource Tables<br/>Sortable, filterable"]
        Recommendations["Recommendations<br/>Prioritized actions"]
    end

    subgraph Charts_Detail["Chart Types"]
        Bar["Savings by<br/>Resource Type"]
        Doughnut["Resource<br/>Distribution"]
        Horizontal["AI/ML Services<br/>Overview"]
    end

    Report --> Charts_Detail
Loading

Sample Analysis Output

╔══════════════════════════════════════════════════════════════╗
║           AWS Resource Utilization Analyzer                   ║
║     Starting analysis for: ec2, s3, rds, spot, security      ║
╚══════════════════════════════════════════════════════════════╝

Analysis Progress
├── EC2 (Medium 30-60s).................... Complete ✅
├── S3 (Long 60-180s)...................... Complete ✅
├── RDS (Medium 30-60s).................... Complete ✅
├── Spot (Medium 30-90s)................... Complete ✅
└── Security (Medium-Long 60-120s)......... Complete ✅

Summary:
  Total Resources Analyzed: 147
  Optimization Opportunities: 23
  Estimated Monthly Savings: $2,450.00
  Estimated Annual Savings: $29,400.00

Data Flow

flowchart LR
    subgraph Input
        Credentials[AWS Credentials]
        Region[Region Config]
        Options[CLI Options]
    end

    subgraph Processing
        Discovery[Resource<br/>Discovery]
        Metrics[Metrics<br/>Collection]
        Analysis[Utilization<br/>Analysis]
        Pricing[Cost<br/>Calculation]
    end

    subgraph Output
        HTML[HTML Report]
        JSON[JSON Data]
        CSV[CSV Export]
        S3[S3 Storage]
    end

    Input --> Discovery
    Discovery --> Metrics
    Metrics --> Analysis
    Analysis --> Pricing
    Pricing --> Output

    style Discovery fill:#3498db
    style Metrics fill:#9b59b6
    style Analysis fill:#e67e22
    style Pricing fill:#27ae60
Loading

Customization and Extension

Adding a New Resource Analyzer

  1. Create a new file in analyzers/ (e.g., analyzers/new_service.py)
  2. Define a class inheriting from ResourceAnalyzer
  3. Implement get_service_name() and analyze()
  4. Update available_analyzers in main.py
'new_service': {
    'class': 'analyzers.new_service.NewServiceAnalyzer',
    'time': "Fast (5-10s)",
    'desc': "Analyzes the new awesome service for optimization."
},

Customizing Reports

Modify reporters/html_reporter.py for HTML customization, or create a new reporter class inheriting from BaseReporter.


Performance Tuning

Environment Size v1.0 v2.0+ v12.0+ (with cache) Improvement
Small (<100 resources) 2-3 min 1-2 min 30-60s ~70%
Medium (100-500) 10-15 min 5-10 min 3-6 min ~60%
Large (500+) 30-45 min 15-30 min 10-20 min ~50%
Very Large (1000+) Often fails 30-60 min 20-40 min Reliability

Performance Features (v12.0+)

CloudWatch Metrics Caching:

# Run with caching enabled (default)
python main.py --region us-east-1

# Disable caching for fresh data
python main.py --region us-east-1 --no-cache

Cache TTLs by metric type:

Metric Type TTL Rationale
CPU/Memory 5 min Changes frequently
Network/Disk 10 min Moderately volatile
S3 Size/Objects 30 min Rarely changes

Per-Analyzer Timeouts:

# Custom timeout (default: 180s)
python main.py --region us-east-1 --analyzer-timeout 120

# Timeout protects against slow analyzers blocking the pipeline
# Partial results are returned if an analyzer times out

Circuit Breaker:

  • Automatically opens after 5 consecutive failures per service
  • Prevents wasting time on unavailable services
  • Recovers automatically after 60 seconds

Tips:

  • Use --max-workers to control concurrency (default: 10)
  • Use --single-thread for debugging
  • Increase --retry-attempts for slow connections
  • Use --no-cache when you need the freshest metrics data
  • Lower --analyzer-timeout for faster feedback on slow environments

AWS Permissions

Essential Permissions

Essential:
  - ec2:Describe*
  - cloudwatch:GetMetricStatistics
  - s3:List*, s3:GetBucket*
  - rds:Describe*
  - lambda:List*, lambda:Get*
  - iam:ListAccountAliases

Spot Analysis:
  - ec2:DescribeSpotPriceHistory
  - pricing:GetProducts

AI/ML Services:
  - sagemaker:List*, sagemaker:Describe*
  - bedrock:List*, bedrock:Get*
  - comprehend:List*, comprehend:Describe*
  - rekognition:Describe*, rekognition:List*
  - textract:List*
  - transcribe:List*
  - kendra:List*, kendra:Describe*

Testing

# Run all tests
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_aiml_analyzers.py -v

# Run performance tests only
python -m pytest tests/test_performance.py -v

# Run with coverage report
python -m pytest tests/ --cov=analyzers --cov=core --cov-report=html

Test Coverage

  • AI/ML Analyzer Tests: Comprehensive mock tests for all 7 AI/ML analyzers
  • HTML Reporter Tests: Verifies Chart.js visual analytics generation
  • Integration Tests: Validates analyzer interface compliance
  • Performance Tests (v12.0): 39 tests covering:
    • Session reuse and thread safety
    • Per-analyzer timeouts
    • CloudWatch metrics caching (TTL, LRU eviction)
    • Circuit breaker state transitions
    • Event bus for progressive results
    • Performance telemetry
    • Exponential backoff with jitter
    • Backward compatibility

Troubleshooting

Issue Solution
API Throttling Reduce --max-workers or use --single-thread
Memory Usage Analyze fewer --resource-types at once
Permission Errors Check AWS credentials and IAM permissions
Analyzer Not Running Verify key in --resource-types matches supported keys
Missing Resources Verify resources exist in specified region
Boto3 Errors Update boto3: pip install -U boto3 botocore
Analyzer Timeout Increase --analyzer-timeout (default: 180s)
Stale Metrics Data Use --no-cache to fetch fresh CloudWatch data
Circuit Breaker Open Wait 60s for automatic recovery, or restart analysis

Use --verbose for detailed error messages and debugging.


Development Roadmap

timeline
    title Dedo-Duro Evolution

    section Foundation
        v1.0 : Monolithic script
        v2.0 : Modular architecture

    section Features
        v3.0 : Security/Privacy analysis
        v4.0 : Spot analysis
        v5.0 : Orphan detection

    section Enhancement
        v7.0-v8.0 : EC2 efficiency, RI awareness
        v9.0 : Containerization, S3 reports
        v10.0 : Type safety, lazy init
        v11.0 : AI/ML services

    section Current
        v12.0 : Multi-Account support
              : Cost Explorer integration
              : RTO/RPO Analysis
              : EKS Monitoring
              : CI/CD Integration
              : Environment filtering
              : Performance & Resilience
              : Metrics Caching
              : Circuit Breaker

    section Future
        v13.0+ : Web interface
               : Auto-remediation
               : Real-time dashboards
Loading

Technical Implementation Details

Analysis Pipeline

flowchart LR
    A[Resource Discovery] --> B[Metrics Collection]
    B --> C[Utilization Analysis]
    C --> D[Cost Calculation]
    D --> E[Recommendation Engine]
    E --> F[Report Generation]

    subgraph Details
        A1["List all resources<br/>via AWS APIs"]
        B1["30/60/90 day<br/>CloudWatch metrics"]
        C1["CPU, Memory, IOPS<br/>Network utilization"]
        D1["Real-time pricing<br/>Savings estimation"]
        E1["Right-sizing<br/>Spot migration<br/>Schedule optimization"]
        F1["HTML with charts<br/>JSON/CSV export"]
    end

    A -.-> A1
    B -.-> B1
    C -.-> C1
    D -.-> D1
    E -.-> E1
    F -.-> F1

    style A fill:#3498db,color:#fff
    style B fill:#9b59b6,color:#fff
    style C fill:#e67e22,color:#fff
    style D fill:#27ae60,color:#fff
    style E fill:#e74c3c,color:#fff
    style F fill:#1abc9c,color:#fff
Loading

Parallel Processing with Error Handling

def parallel_process_batch(items, process_func, config, service):
    """Process a batch of items in parallel with controlled concurrency."""
    results = []
    batch_size = config.get_batch_size_for_service(service)

    for i in range(0, len(items), batch_size):
        batch = items[i:i+batch_size]
        num_workers = max(1, min(config.max_workers, len(batch)))
        with ThreadPoolExecutor(max_workers=num_workers) as executor:
            future_to_item = {executor.submit(process_func, item): item for item in batch}
            for future in as_completed(future_to_item):
                try:
                    result = future.result()
                    if result:
                        results.append(result)
                except Exception as exc:
                    item = future_to_item[future]
                    item_id = item.get("InstanceId") or item.get("VolumeId") or "Unknown"
                    print_error(f'Item {item_id} generated an exception: {exc}')
        time.sleep(config.get_delay_for_service(service))
    return results

Team Collaboration

The Dedo-Duro architecture supports team collaboration:

  1. Modular Structure: Different team members can work on different analyzers
  2. Clear Interfaces: Well-defined interfaces between components
  3. Independent Testing: Each analyzer can be tested independently
  4. Easy Onboarding: New team members can understand isolated components
  5. Version Control Friendly: Minimizes merge conflicts with separate files

Conclusion

Dedo-Duro 12.0 introduces enterprise-scale analysis capabilities:

v12.0 Highlights

  • Multi-Account Analysis: Analyze entire AWS Organizations with consolidated reporting
  • Cost Explorer Integration: Real cost data with anomaly detection
  • RTO/RPO Analysis: Disaster recovery readiness assessment
  • EKS Monitoring: Kubernetes session tracking and deployment lifecycle analysis
  • Environment Filtering: Target specific environments (prod/test/dev)
  • CI/CD Integration: GitHub Actions, Jenkins, and CircleCI support out-of-the-box
  • Performance & Resilience:
    • CloudWatch metrics caching (40-60% faster repeat analyses)
    • Per-analyzer timeouts (graceful handling of slow services)
    • Circuit breaker (automatic bypass of failing services)
    • Session/client reuse (reduced connection overhead)

v11.0 AI/ML Capabilities (Retained)

  • Amazon SageMaker: Notebook instances, endpoints, training jobs, Feature Store, Studio
  • Amazon Bedrock: Provisioned throughput, custom models, guardrails, knowledge bases
  • Amazon Comprehend: Endpoints, classifiers, entity recognizers, flywheels
  • Amazon Rekognition: Custom Labels projects, stream processors, face collections
  • Amazon Textract: Custom adapters, per-operation cost estimation
  • Amazon Transcribe: Vocabularies, language models, job patterns
  • Amazon Kendra: Indexes, data sources, experiences

This object-oriented approach results in a maintainable and adaptable codebase, capable of evolving with new AWS services and user requirements.


License

MIT License


Contributors

Developed and maintained by Gustavo Lima.

Key milestones: v2.0 (architecture), v3.0 (security), v4.0 (Spot), v5.0 (orphan), v7-8.0 (efficiency), v9.0 (automation), v10.0 (type safety), v11.0 (AI/ML services), v12.0 (multi-account, Cost Explorer, EKS, CI/CD).


To Do's

Completed in v12.0 ✅

  • Division of execution by environment, production and testEnvironment filtering (--environment flag)
  • Monitoring of open sessions by environments versus KubernetesEKS Session Analyzer (eks_sessions)
  • Monitoring the deployment lifecycle with KubernetesEKS Deployment Lifecycle (eks_deployments)
  • RTO Analysis Process for ProductionRTO/RPO Analyzer (rto_analysis)
  • Reading files with tags and metadata to facilitate the resource grouping processTag-based grouping (--grouping-tags)
  • Create the all-in option - Run for a set of accounts at the same timeMulti-Account Analysis (--accounts-file, --all-accounts)

Completed in v12.0-Enterprise ✅

  • Web interface for real-time monitoringWeb Dashboard (web/app.py - Flask-based)
  • Auto-remediation capabilities (experimental)Remediation Framework (remediation/ module)
  • Integration with Slack/Teams for notificationsNotification System (notifications/ module)
  • Custom alerting thresholdsAlert Manager (notifications/alerting.py)
  • Kubernetes permissions documentationKubernetes Permissions (docs/kubernetes_permissions.md)

Pending

  • Enhanced web dashboard with real-time WebSocket updates
  • Remediation approval workflow via web interface
  • Historical trend analysis and forecasting

Dedo-Duro - Because your AWS resources shouldn't keep secrets from you.

About

Dedo-Duro (Portuguese for "Snitch" or "Tattletale") is a powerful command-line tool designed to help AWS administrators and DevOps engineers gain deep insights into their AWS resource utilizatio

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages