AWS Resource Utilization Analyzer - A comprehensive tool for analyzing AWS resource utilization, identifying cost optimization opportunities, finding orphaned resources, and ensuring security and privacy compliance.
- Overview
- Features
- Architecture
- Project Structure
- Implementation Details
- Installation
- Usage
- Native-to-Platform Extractor (NPE)
- Data Integration & External Tools
- Automated Multi-Account/Region Execution (AWS Batch)
- Example Report
- Customization and Extension
- Performance Tuning
- AWS Permissions
- Testing
- Troubleshooting
- Comparison Across Versions
- Development Roadmap
- Technical Implementation Details
- Team Collaboration
- Conclusion
- License
- Contributors
- To Do's
Dedo-Duro (Portuguese for "Snitch" or "Tattletale") is a powerful command-line tool designed to help AWS administrators and DevOps engineers gain deep insights into their AWS resource utilization. It analyzes various services, identifies potential cost savings, flags security and privacy compliance issues, detects orphaned resources, and provides actionable recommendations.
mindmap
root((Dedo-Duro))
Cost Optimization
Right-sizing
Spot Instances
Savings Plans
Schedule Optimization
Cost Explorer Integration
Security & Privacy
GDPR Compliance
ISO 27701
Best Practices
Resource Analysis
30+ AWS Services
AI/ML Services
Multi-Region
Multi-Account
Kubernetes
EKS Sessions
Deployment Lifecycle
RTO/RPO Analysis
Reporting
HTML Interactive
JSON/CSV Export
S3 Upload
Visual Charts
CI/CD Integration
GitHub Actions
Jenkins
CircleCI
For a detailed history of changes and features introduced in specific versions, please see the changelog.md.
- Graceful ExpiredToken Handling: Implemented immediate stopping of analysis and partial report generation upon
ExpiredTokenerrors from AWS APIs. This ensures a graceful exit and provides a report with available data, preventing prolonged execution with invalid credentials. - Enhanced EBS Snapshot Analysis: The
ebs_snapshotanalyzer now identifies and reports on "repeated" snapshots (multiple snapshots from the same volume) that are also older than one year. This includes their estimated monthly costs, highlighting additional cost-saving opportunities. - Cost Optimization: Analyzes utilization (CPU, Memory, Network, IOPS) for EC2, RDS, Lambda, EBS, etc., over configurable periods (e.g., 30/60/90 days). Provides right-sizing recommendations, identifies idle/unused resources (EBS, EIP, NAT Gateways), suggests configuration optimizations (e.g., migrating EBS gp2 to gp3). Now uses real-time AWS Pricing API data for more accurate cost and savings estimations. Includes EC2 Reserved Instance (RI) awareness, identifying instances covered by RIs expiring soon.
- Compute Optimizer Integration: Ingests and reports recommendations directly from AWS Compute Optimizer for EC2, ASG, EBS, and Lambda (
compute_optimizeranalyzer), leveraging AWS's ML-based analysis. - Savings Plans Analysis: Analyzes Compute, EC2 Instance, and SageMaker Savings Plans utilization, coverage, and expirations (
savings_plansanalyzer). - Instance Schedule Optimization: Automatically identifies EC2 instances eligible for scheduled start/stop to reduce costs (
schedule_optimizeranalyzer):- Environment Detection: Automatically identifies development, test, staging, and demo environments from instance names or tags
- Schedule Profiles: Recommends optimal schedules including business hours only (70% savings), extended business hours (58% savings), development hours (64% savings), staging hours (48% savings), and weekend shutdown (29% savings)
- Financial Calculations: Calculates potential monthly/annual savings based on real instance pricing data, including holiday savings
- Terraform Integration: Provides schedule tags for use with AWS Instance Scheduler or Lambda-based automation
graph TB
subgraph Compute["Compute Services"]
EC2["EC2 Instances"]
Lambda["Lambda Functions"]
ECS["ECS Clusters"]
Spot["Spot Analysis"]
end
subgraph Storage["Storage Services"]
S3["S3 Buckets"]
EBS["EBS Volumes"]
EFS["EFS File Systems"]
Snapshots["EBS Snapshots"]
end
subgraph Database["Database Services"]
RDS["RDS Instances"]
DynamoDB["DynamoDB Tables"]
ElastiCache["ElastiCache"]
OpenSearch["OpenSearch"]
end
subgraph Network["Network Services"]
VPC["VPC Endpoints"]
NAT["NAT Gateways"]
ELB["Load Balancers"]
CloudFront["CloudFront"]
Route53["Route 53"]
end
subgraph AIML["AI/ML Services"]
SageMaker["SageMaker"]
Bedrock["Bedrock"]
Comprehend["Comprehend"]
Rekognition["Rekognition"]
Textract["Textract"]
Transcribe["Transcribe"]
Kendra["Kendra"]
end
subgraph Financial["Financial Analysis"]
ComputeOpt["Compute Optimizer"]
SavingsPlans["Savings Plans"]
CUR["Cost & Usage Report"]
ScheduleOpt["Schedule Optimizer"]
CostExplorer["Cost Explorer"]
end
subgraph Governance["Governance"]
Security["Security Analysis"]
Privacy["Privacy Compliance"]
Orphan["Orphaned Resources"]
RTO["RTO/RPO Analysis"]
end
subgraph Kubernetes["Kubernetes (EKS)"]
EKSSessions["Session Monitoring"]
EKSDeployment["Deployment Lifecycle"]
end
style AIML fill:#9b59b6,color:#fff
style Financial fill:#27ae60,color:#fff
style Governance fill:#e74c3c,color:#fff
style Kubernetes fill:#326ce5,color:#fff
Comprehensive cost optimization for AWS AI/ML services:
- Amazon SageMaker: Analyzes notebook instances (idle detection, GPU warnings), endpoints (utilization metrics, serverless recommendations), training jobs (Spot usage), models, Feature Store, and Studio domains.
- Amazon Bedrock: Analyzes provisioned throughput (underutilization detection), custom models, logging configuration, guardrails, and knowledge bases.
- Amazon Comprehend: Analyzes endpoints (idle detection), document classifiers, entity recognizers, and flywheels.
- Amazon Rekognition: Analyzes Custom Labels projects/models (running cost detection), stream processors, and face collections.
- Amazon Textract: Analyzes custom adapters and usage patterns with per-operation cost estimation.
- Amazon Transcribe: Analyzes custom vocabularies, language models, call analytics categories, and job patterns (failed/stuck detection).
- Amazon Kendra: Analyzes indexes (edition-based costs), data sources, experiences, and query patterns.
- Consolidated Analysis: Analyze multiple AWS accounts simultaneously with the
--accounts-fileoption - Cross-Account Reports: Generate both individual and consolidated reports across accounts
- AWS Organizations Support: Leverage AWS Organizations for automatic account discovery
- Partition Support: Full support for AWS Commercial, GovCloud, and China partitions
- Real Cost Data: Integrates with AWS Cost Explorer API for actual spend data
- Anomaly Detection: Identifies cost spikes and unusual spending patterns
- Budget Tracking: Compares actual costs against estimated costs
- Service-Level Analysis: Breaks down costs by AWS service
- Backup Assessment: Analyzes backup configurations for RDS, S3, and other services
- Cross-Region Replication: Checks for disaster recovery readiness
- Recovery Metrics: Calculates estimated RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
- Compliance Checks: Identifies resources not meeting recovery requirements
- Session Monitoring: Tracks active kubectl and SSM sessions to EKS clusters
- Deployment Lifecycle: Monitors deployment health, age, and update frequency
- Restart Analysis: Identifies pods with excessive restart counts
- Stale Deployment Detection: Flags deployments not updated in 90+ days
- Environment Tags: Filter analysis by environment (production, staging, development, test)
- Tag-Based Grouping: Group resources by custom tags (Team, Project, CostCenter)
- Targeted Reports: Generate environment-specific reports
- GitHub Actions: Pre-built workflow for automated weekly analysis
- Jenkins Pipeline: Jenkinsfile for Jenkins CI/CD integration
- CircleCI Config: Configuration for CircleCI pipelines
- Artifact Upload: Automatic report upload to S3 or CI artifacts
- Real-time Monitoring: Flask-based web dashboard for live analysis status
- REST API: Full API for triggering analysis and retrieving results
- Report History: View and compare historical analysis reports
- Alert Configuration: Configure custom alert thresholds via web interface
- Slack Integration: Send alerts and reports to Slack channels via webhooks
- Microsoft Teams: Teams channel integration for notifications
- Custom Alerts: Configurable thresholds for cost, security, and idle resources
- Alert Severity Levels: Critical, warning, and info classifications
- Safe Operations Only: Tagging and snapshot operations by default
- Dry-Run Mode: All actions simulated unless explicitly enabled
- Approval Workflow: High-risk actions require manual approval
- Audit Logging: Complete audit trail of all remediation actions
- Risk Levels: SAFE, LOW, MEDIUM, HIGH, CRITICAL classifications
- Session Reuse: Cached boto3 sessions and clients reduce connection overhead
- Per-Analyzer Timeouts: Configurable timeouts prevent slow analyzers from blocking the pipeline
- CloudWatch Metrics Cache: TTL-based caching reduces redundant API calls (5-30 min TTL by metric type)
- Circuit Breaker: Failing services are automatically bypassed after threshold failures
- Exponential Backoff: Intelligent retry with jitter for AWS API throttling
- Progressive Results: Event-driven architecture enables streaming partial results
- Performance Telemetry: Detailed metrics on analyzer execution times and cache efficiency
- Multi-Region & China Region Support: Analyzes resources across multiple specified AWS regions simultaneously, including AWS China regions (
cn-north-1,cn-northwest-1). - Infrastructure-as-Code Recommendations: Generates suggested Terraform (
terraform_recommendations) and CloudFormation (cloudformation_recommendations) code snippets based on analysis findings to aid remediation. - Flexible & Enhanced Reporting: Creates detailed reports in multiple formats:
- HTML: Interactive, sortable, filterable reports with enhanced formatting, combined results, efficiency scores, RI details, and CloudWatch Agent information.
- Visual Analytics Charts: HTML reports now include interactive Chart.js visualizations
- JSON & CSV: Structured data formats suitable for programmatic consumption.
- Supports direct report upload to a specified S3 bucket.
flowchart TB
subgraph Input["Input Layer"]
CLI[/"CLI Arguments"/]
Config["Configuration Files"]
AWS["AWS Credentials"]
end
subgraph Core["Core Engine"]
Manager["AWSResourceManager<br/>Orchestrator"]
Metrics["CloudWatchMetrics<br/>Handler"]
Types["Type System<br/>& Protocols"]
end
subgraph Analyzers["Analyzer Layer (30+ Analyzers)"]
direction LR
Compute["Compute<br/>EC2, Lambda, ECS"]
Storage["Storage<br/>S3, EBS, EFS"]
Database["Database<br/>RDS, DynamoDB"]
AIML["AI/ML<br/>SageMaker, Bedrock"]
Network["Network<br/>VPC, NAT, ELB"]
Security["Security<br/>& Privacy"]
end
subgraph Output["Output Layer"]
HTML["HTML Reporter<br/>Interactive Charts"]
JSON["JSON Reporter"]
CSV["CSV Reporter"]
S3Out["S3 Upload"]
end
subgraph AWS_Services["AWS Services"]
direction LR
CloudWatch["CloudWatch"]
Pricing["Pricing API"]
STS["STS"]
IAM["IAM"]
end
CLI --> Manager
Config --> Manager
AWS --> Manager
Manager --> Metrics
Manager --> Analyzers
Types -.-> Manager
Analyzers --> AWS_Services
Metrics --> CloudWatch
Manager --> Output
Output --> S3Out
style Manager fill:#4a90d9,color:#fff
style Analyzers fill:#50c878,color:#fff
style Output fill:#ff9500,color:#fff
Version 2.0 onwards follows a modular, object-oriented architecture:
-
Core Components:
AWSResourceManager: Main orchestrator that coordinates all analyzers.ResourceAnalyzer: Base class for all resource-specific analyzers with lazy initialization for AWS clients and metrics.CloudWatchMetrics: Handles CloudWatch metric collection.ReportCoordinator: Coordinates report generation.AnalysisResult: Standardized dataclass for analyzer results with backward-compatible conversion methods.
-
Resource Analyzers:
- Each AWS service has its dedicated analyzer class.
- Analyzers inherit from the
ResourceAnalyzerbase class. - AWS clients are initialized lazily on first access, reducing unnecessary API calls.
-
Reporters:
- Format-specific reporter classes (HTML, JSON, CSV).
- Reporters inherit from a common
BaseReporterclass.
-
Utilities:
- AWS utilities for common operations (API calls, pagination, tags).
- Console utilities for user interface (progress bars, colored output).
- Cost estimator for savings calculations with real-time pricing data.
- Protocol definitions for type-safe AWS client usage.
-
Type System:
Protocolclasses inutils/protocols.pyfor AWS client type hints.TypedDictdefinitions for structured result dictionaries.utc_now()helper replacing deprecateddatetime.utcnow()calls.
sequenceDiagram
autonumber
participant User
participant Main as main.py
participant Manager as AWSResourceManager
participant Analyzer as ResourceAnalyzer
participant AWS as AWS APIs
participant Reporter as ReportCoordinator
User->>Main: python main.py --region us-east-1
Main->>Manager: Initialize with configs
Manager->>AWS: Fetch account info (STS)
AWS-->>Manager: Account ID, Partition
Main->>Manager: Register analyzers
loop For each analyzer
Manager->>Analyzer: Create instance (lazy init)
Analyzer->>AWS: Fetch resources
AWS-->>Analyzer: Resource data
Analyzer->>AWS: Get CloudWatch metrics
AWS-->>Analyzer: Utilization metrics
Analyzer->>Analyzer: Analyze & generate recommendations
Analyzer-->>Manager: Results + Summary
end
Manager->>Manager: Calculate savings summary
Manager->>Reporter: Generate report
Reporter->>Reporter: Create HTML/JSON/CSV
Reporter-->>User: Save report file
opt S3 Upload
Reporter->>AWS: Upload to S3
end
flowchart LR
subgraph Entry["Entry Points"]
main["main.py"]
run_npe["run_npe.py"]
run_spot["run_spot_analyzer.py"]
end
subgraph Core["core/"]
analyzer["analyzer.py<br/>─────────<br/>ResourceAnalyzer<br/>AWSResourceManager"]
metrics["metrics.py<br/>─────────<br/>CloudWatchMetrics"]
reporter["reporter.py<br/>─────────<br/>ReportCoordinator"]
types["types.py<br/>─────────<br/>AnalysisResult<br/>TypedDicts"]
end
subgraph Analyzers["analyzers/"]
ec2["ec2.py"]
s3["s3.py"]
rds["rds.py"]
lambda_a["lambda.py"]
spot["spot.py"]
sagemaker["sagemaker.py"]
bedrock["bedrock.py"]
more["... 25+ more"]
end
subgraph Reporters["reporters/"]
html["html_reporter.py"]
json_r["json_reporter.py"]
csv_r["csv_reporter.py"]
end
subgraph Utils["utils/"]
aws_utils["aws_utils.py"]
cost_est["cost_estimator.py"]
console["console.py"]
protocols["protocols.py"]
end
subgraph NPE["npe/"]
collector["collector.py"]
docker_gen["dockerfile_generator.py"]
helm_gen["helm_generator.py"]
end
main --> Core
Core --> Analyzers
Core --> Reporters
Analyzers --> Utils
run_npe --> NPE
style Core fill:#4a90d9,color:#fff
style Analyzers fill:#50c878,color:#fff
style Reporters fill:#ff9500,color:#fff
.
├── __init__.py # Package initialization and version info
├── main.py # Main Dedo-Duro analysis entry point
├── run_npe.py # Entry point for NPE Kubernetes artifact generation
├── config.py # Configuration settings
├── requirements.txt # Required Python packages for pip
├── Pipfile # Required Python packages for pipenv
├── Pipfile.lock # Lock file for pipenv
├── analyzers/ # Resource-specific analyzers for Dedo-Duro
│ ├── __init__.py
│ ├── ... (various analyzer files) ...
│ ├── cur.py # Cost and Usage Report (CUR) analysis
│ ├── cost_explorer_analyzer.py # Cost Explorer integration (v12.0)
│ ├── rto_analyzer.py # RTO/RPO analysis (v12.0)
│ ├── eks_session_analyzer.py # EKS session monitoring (v12.0)
│ └── eks_deployment_lifecycle.py # EKS deployment lifecycle (v12.0)
├── core/ # Core Dedo-Duro functionality
│ ├── __init__.py
│ ├── analyzer.py # Main analyzer orchestration with lazy initialization
│ ├── metrics.py # CloudWatch metrics handling (with caching)
│ ├── reporter.py # Report generation coordination
│ ├── types.py # Type definitions (AnalysisResult, TypedDicts, utc_now)
│ ├── multi_account.py # Multi-account orchestration (v12.0)
│ ├── cache.py # TTL-based CloudWatch metrics cache (v12.0)
│ ├── timeout.py # Per-analyzer timeout management (v12.0)
│ ├── circuit_breaker.py # Circuit breaker for failing services (v12.0)
│ ├── events.py # Progressive results event bus (v12.0)
│ └── telemetry.py # Performance monitoring/telemetry (v12.0)
├── reporters/ # Dedo-Duro report generators
│ ├── __init__.py
│ ├── json_reporter.py # JSON report generation
│ ├── html_reporter.py # HTML report generation
│ └── csv_reporter.py # CSV report generation
├── npe/ # Native-to-Platform Extractor (Kubernetes migration)
│ ├── __init__.py
│ ├── collector.py # Collects AWS Lambda/API GW data
│ ├── dockerfile_generator.py # Generates Dockerfiles
│ ├── helm_generator.py # Generates Helm charts
│ └── templates/ # Templates for Dockerfile/Helm
├── schedule/ # Schedule optimization module
│ └── __init__.py
├── ci/ # CI/CD integration templates (v12.0)
│ ├── Jenkinsfile # Jenkins pipeline configuration
│ └── github_reporter.py # GitHub-specific output format
├── .github/workflows/ # GitHub Actions workflows (v12.0)
│ └── dedo-duro-analysis.yml # Automated analysis workflow
├── .circleci/ # CircleCI configuration (v12.0)
│ └── config.yml # CircleCI pipeline config
├── web/ # Web Dashboard (v12.0-Enterprise)
│ ├── app.py # Flask application
│ ├── templates/ # HTML templates
│ │ └── index.html # Dashboard template
│ └── static/ # CSS/JS assets
│ ├── style.css # Dashboard styles
│ └── app.js # Dashboard JavaScript
├── notifications/ # Notification System (v12.0-Enterprise)
│ ├── __init__.py
│ ├── slack.py # Slack webhook integration
│ ├── teams.py # Microsoft Teams integration
│ └── alerting.py # Alert manager with thresholds
├── remediation/ # Auto-Remediation (v12.0-Enterprise)
│ ├── __init__.py
│ ├── base.py # Base remediation framework
│ ├── ec2_remediation.py # EC2 remediation actions
│ ├── rds_remediation.py # RDS remediation actions
│ └── s3_remediation.py # S3 remediation actions
├── docs/ # Documentation
│ └── kubernetes_permissions.md # K8s permissions guide
└── utils/ # Utility functions (shared)
├── __init__.py
├── aws_utils.py # AWS-specific utilities
├── cost_estimator.py # Cost estimation utilities
├── console.py # Console output utilities
└── protocols.py # AWS client Protocol definitions for type safety
Manages AWS configuration including region, profile, retry mechanisms, and timeouts.
aws_config = AWSConfig(
region="us-east-1",
profile="production",
max_attempts=5,
retry_mode="adaptive"
)
# Create a copy with a different region for multi-region analysis
west_config = aws_config.copy_with_region("us-west-2")Controls analysis behavior including concurrency, throttling, and batch sizes.
analysis_config = AnalysisConfig(
verbose=True,
single_thread=False,
multi_region=False,
max_workers=10
)Base class for all resource analyzers with lazy initialization.
class ResourceAnalyzer(ABC):
META_ANALYZER_SERVICES: Set[str] = {
'terraform_recommendations',
'cloudformation_recommendations',
'schedule_estimation'
}
@property
def client(self) -> Any:
"""Lazy initialization of AWS client - only created when first accessed."""
if not self._client_initialized:
self._client_initialized = True
service_name = self.get_service_name()
if service_name and service_name not in self.META_ANALYZER_SERVICES:
self._client = self.aws_config.create_client(service_name)
return self._client
@abstractmethod
def analyze(self, cur_data: Optional[Dict] = None, **kwargs) -> Union[List[Dict[str, Any]], Tuple[List[Dict[str, Any]], Dict[str, Any]]]:
passStandardized dataclass for analyzer results.
@dataclass
class AnalysisResult:
"""Standardized result container for analyzers."""
resources: List[Dict[str, Any]] = field(default_factory=list)
summary: Dict[str, Any] = field(default_factory=dict)
metadata: Dict[str, Any] = field(default_factory=dict)
def to_list(self) -> List[Dict[str, Any]]:
"""Convert to list format for backward compatibility."""
return self.resourcesProtocol definitions for type-safe AWS client usage:
@runtime_checkable
class EC2ClientProtocol(AWSClientProtocol, Protocol):
"""Protocol for EC2 client operations."""
def describe_instances(self, **kwargs: Any) -> Dict[str, Any]: ...
def describe_volumes(self, **kwargs: Any) -> Dict[str, Any]: ...
def describe_snapshots(self, **kwargs: Any) -> Dict[str, Any]: ...Protocols are available for: EC2, S3, RDS, DynamoDB, ELBv2, Lambda, CloudWatch, SageMaker, Bedrock, Comprehend, Rekognition, Textract, Transcribe, Kendra, and more.
- Python 3.7 or higher
pipandpipenv(Recommended:pip install pipenv)- AWS credentials configured
- For CUR Analysis:
polarsandpyarrowpackages - For NPE Module:
PyYAMLpackage
# Clone the repository
git clone https://github.com/your-org/dedo-duro.git
cd dedo-duro
# Install dependencies
pipenv install
# Activate virtual environment
pipenv shell
# Run the analyzer
python main.py --region us-west-2# Clone the repository
git clone https://github.com/your-org/dedo-duro.git
cd dedo-duro
# Create virtual environment (optional but recommended)
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the analyzer
python main.py --region us-west-2# Analyze all resources in a region
python main.py --region us-east-1
# Generate HTML report with specific output file
python main.py --region us-west-2 --output-file my-report.html
# Analyze specific resource types
python main.py --resource-types ec2,s3,rds,spotflowchart TD
Start([Start]) --> Args{Parse Arguments}
Args --> Config[Create Configs<br/>AWS, Analysis, Report]
Config --> Manager[Initialize<br/>AWSResourceManager]
Manager --> Register[Register Analyzers]
Register --> CUR{CUR Analysis<br/>Requested?}
CUR -->|Yes| RunCUR[Run CUR Analyzer]
CUR -->|No| Primary
RunCUR --> Primary[Run Primary Analyzers]
Primary --> Multi{Multi-Region?}
Multi -->|Yes| Parallel[Parallel Region Analysis]
Multi -->|No| Single[Single Region Analysis]
Parallel --> Combine[Combine Results]
Single --> Summary
Combine --> Summary[Calculate Summary]
Summary --> Report[Generate Report]
Report --> S3{Upload to S3?}
S3 -->|Yes| Upload[Upload Report]
S3 -->|No| Done
Upload --> Done([Complete])
style Start fill:#27ae60,color:#fff
style Done fill:#27ae60,color:#fff
style Manager fill:#3498db,color:#fff
style Report fill:#e67e22,color:#fff
| Argument | Description | Default |
|---|---|---|
--region |
AWS region to analyze | AWS default |
--profile |
AWS credential profile | Default profile |
--output-format |
Report format (html/json/csv) | html |
--output-file |
Output file path | Auto-generated |
--output-s3-bucket |
S3 bucket for upload | None |
--output-s3-prefix |
S3 prefix for upload | None |
--resource-types |
Comma-separated analyzers | All standard |
--cur-s3-uri |
S3 URI for CUR data | None |
--cur-days-ago |
Days of CUR data to analyze | None |
--multi-region |
Analyze all regions | False |
--accounts-file |
JSON file with account configs | None |
--all-accounts |
Analyze all Organization accounts | False |
--environment |
Filter by environment (prod/test/dev) | None |
--grouping-tags |
Tags for resource grouping | Team,Project |
--verbose |
Detailed output | False |
--single-thread |
Disable parallel processing | False |
--max-workers |
Parallel workers | 10 |
--retry-attempts |
Max retry attempts | 5 |
--no-cache |
Disable CloudWatch metrics caching | False |
--analyzer-timeout |
Per-analyzer timeout in seconds | 180 |
--enable-streaming |
Enable progressive results streaming | False |
Analyze multiple AWS accounts simultaneously:
# Using accounts file
python main.py --accounts-file accounts.json --output-format html
# Analyze all accounts in AWS Organizations
python main.py --all-accounts --output-format htmlaccounts.json format:
{
"accounts": [
{
"account_id": "111111111111",
"role_arn": "arn:aws:iam::111111111111:role/DedoDuroRole",
"alias": "production",
"regions": ["us-east-1", "us-west-2"]
},
{
"account_id": "222222222222",
"role_arn": "arn:aws:iam::222222222222:role/DedoDuroRole",
"alias": "staging",
"regions": ["us-east-1"]
}
],
"partition": "aws"
}Partition values:
aws- AWS Commercial (default)aws-us-gov- AWS GovCloudaws-cn- AWS China
Filter analysis by environment:
# Analyze only production resources
python main.py --environment prod --region us-east-1
# Analyze with custom grouping tags
python main.py --grouping-tags Team,Project,CostCenter --region us-east-1graph LR
subgraph Compute
ec2[ec2]
ec2_eff[ec2-eff]
lambda[lambda]
ecs[ecs]
spot[spot]
end
subgraph Storage
s3[s3]
ebs[ebs]
ebs_snap[ebs_snapshot]
efs[efs]
end
subgraph Database
rds[rds]
dynamodb[dynamodb]
elasticache[elasticache]
opensearch[opensearch]
end
subgraph Network
nat[nat]
elb[elb]
vpc_ep[vpc_endpoints]
eip[eip]
cloudfront[cloudfront]
route53[route53]
end
subgraph AI_ML
sagemaker[sagemaker]
bedrock[bedrock]
comprehend[comprehend]
rekognition[rekognition]
textract[textract]
transcribe[transcribe]
kendra[kendra]
end
subgraph Financial
compute_opt[compute_optimizer]
savings[savings_plans]
cur[cur]
schedule[schedule_optimizer]
end
subgraph Governance
security[security_privacy]
orphan[orphan]
terraform[terraform_recommendations]
end
Full list of analyzer keys:
ec2,ec2-eff,s3,rds,ebs,ebs_snapshot,lambda,elasticache,elb,dynamodb,api_gateway,nat,eip,vpc_endpoints,spot,security_privacy,orphan,ecs,sagemaker,bedrock,comprehend,rekognition,textract,transcribe,kendra,opensearch,compute_optimizer,savings_plans,cur,cloudfront,efs,route53,schedule_optimizer,terraform_recommendations,cloudformation_recommendations- v12.0 New:
cost_explorer,rto_analysis,eks_sessions,eks_deployments
The NPE module helps migrate AWS Lambda/API Gateway to Kubernetes:
flowchart LR
subgraph AWS["AWS Resources"]
Lambda["Lambda Functions"]
APIGW["API Gateway"]
end
subgraph NPE["NPE Module"]
Collector["collector.py<br/>Gather configs"]
DockerGen["dockerfile_generator.py<br/>Create Dockerfiles"]
HelmGen["helm_generator.py<br/>Create Helm charts"]
end
subgraph Output["Generated Artifacts"]
Dockerfile["Dockerfile<br/>per function"]
Values["values.yaml"]
Templates["Helm templates"]
end
subgraph K8s["Kubernetes"]
Deployment["Deployments"]
Service["Services"]
Ingress["Ingress"]
end
AWS --> Collector
Collector --> DockerGen
Collector --> HelmGen
DockerGen --> Dockerfile
HelmGen --> Values
HelmGen --> Templates
Output --> K8s
style NPE fill:#9b59b6,color:#fff
style K8s fill:#326ce5,color:#fff
- Configure: Edit variables in
run_npe.py - Run: Execute
python run_npe.py - Review: Examine generated artifacts in
./generated_npe_artifacts
Note: The NPE module generates starting points for migration. Manual review and refinement are required for production deployment.
- Generate JSON Report:
python main.py --output-format json --output-file dedo_duro_results.json - Configure Logstash to read the JSON file
- Output to Elasticsearch
- Visualize in Kibana
- Enable Resource IDs: Check "Include Resource IDs" when creating your CUR
- Parquet Format: Configure CUR to be delivered in Apache Parquet format
- Hourly Granularity: Select hourly time granularity for detailed analysis
- Permissions: IAM role needs
s3:GetObjectands3:ListBucketon the CUR bucket
Dedo-Duro includes ready-to-use CI/CD configurations for automated analysis.
# .github/workflows/dedo-duro-analysis.yml
name: Dedo-Duro AWS Analysis
on:
schedule:
- cron: '0 6 * * 1' # Weekly on Mondays
workflow_dispatch:
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Run Dedo-Duro Analysis
run: |
pip install -r requirements.txt
python main.py --region us-east-1 --output-format html
- name: Upload Report
uses: actions/upload-artifact@v4
with:
name: dedo-duro-report
path: '*.html'// ci/Jenkinsfile
pipeline {
agent any
triggers {
cron('0 6 * * 1') // Weekly
}
environment {
AWS_DEFAULT_REGION = 'us-east-1'
}
stages {
stage('Setup') {
steps {
sh 'pip install -r requirements.txt'
}
}
stage('Analyze') {
steps {
withCredentials([[$class: 'AmazonWebServicesCredentialsBinding',
credentialsId: 'aws-credentials']]) {
sh 'python main.py --region us-east-1 --output-format html'
}
}
}
stage('Archive') {
steps {
archiveArtifacts artifacts: '*.html', fingerprint: true
}
}
}
}# .circleci/config.yml
version: 2.1
jobs:
analyze:
docker:
- image: cimg/python:3.11
steps:
- checkout
- run:
name: Install dependencies
command: pip install -r requirements.txt
- run:
name: Run analysis
command: python main.py --region us-east-1 --output-format html
- store_artifacts:
path: .
destination: reports
workflows:
weekly-analysis:
triggers:
- schedule:
cron: "0 6 * * 1"
filters:
branches:
only: main
jobs:
- analyzeflowchart TB
subgraph Trigger["Scheduling"]
EventBridge["EventBridge<br/>Scheduler"]
end
subgraph Orchestration["Orchestration"]
Lambda["Lambda<br/>Orchestrator"]
end
subgraph Execution["AWS Batch"]
JobQueue["Job Queue"]
ComputeEnv["Compute<br/>Environment"]
Container["Dedo-Duro<br/>Container"]
end
subgraph Accounts["Target Accounts"]
Account1["Account 1"]
Account2["Account 2"]
AccountN["Account N"]
end
subgraph Storage["Central Storage"]
S3Bucket["S3 Bucket<br/>Reports"]
ECR["ECR<br/>Container Image"]
end
EventBridge --> Lambda
Lambda --> JobQueue
JobQueue --> ComputeEnv
ComputeEnv --> Container
Container --> Accounts
Container --> S3Bucket
ECR --> Container
style Trigger fill:#ff9500,color:#fff
style Execution fill:#4a90d9,color:#fff
style Storage fill:#27ae60,color:#fff
- Build and Push Docker Image to ECR
- Configure AWS Batch (Compute Environment, Job Queue, Job Definition)
- Create Orchestrator (Lambda function to submit jobs)
- Schedule Execution with EventBridge Scheduler
flowchart TB
subgraph Report["Interactive HTML Report"]
Summary["Executive Summary<br/>Total savings, opportunities"]
Charts["Visual Analytics<br/>Chart.js charts"]
Tables["Resource Tables<br/>Sortable, filterable"]
Recommendations["Recommendations<br/>Prioritized actions"]
end
subgraph Charts_Detail["Chart Types"]
Bar["Savings by<br/>Resource Type"]
Doughnut["Resource<br/>Distribution"]
Horizontal["AI/ML Services<br/>Overview"]
end
Report --> Charts_Detail
╔══════════════════════════════════════════════════════════════╗
║ AWS Resource Utilization Analyzer ║
║ Starting analysis for: ec2, s3, rds, spot, security ║
╚══════════════════════════════════════════════════════════════╝
Analysis Progress
├── EC2 (Medium 30-60s).................... Complete ✅
├── S3 (Long 60-180s)...................... Complete ✅
├── RDS (Medium 30-60s).................... Complete ✅
├── Spot (Medium 30-90s)................... Complete ✅
└── Security (Medium-Long 60-120s)......... Complete ✅
Summary:
Total Resources Analyzed: 147
Optimization Opportunities: 23
Estimated Monthly Savings: $2,450.00
Estimated Annual Savings: $29,400.00
flowchart LR
subgraph Input
Credentials[AWS Credentials]
Region[Region Config]
Options[CLI Options]
end
subgraph Processing
Discovery[Resource<br/>Discovery]
Metrics[Metrics<br/>Collection]
Analysis[Utilization<br/>Analysis]
Pricing[Cost<br/>Calculation]
end
subgraph Output
HTML[HTML Report]
JSON[JSON Data]
CSV[CSV Export]
S3[S3 Storage]
end
Input --> Discovery
Discovery --> Metrics
Metrics --> Analysis
Analysis --> Pricing
Pricing --> Output
style Discovery fill:#3498db
style Metrics fill:#9b59b6
style Analysis fill:#e67e22
style Pricing fill:#27ae60
- Create a new file in
analyzers/(e.g.,analyzers/new_service.py) - Define a class inheriting from
ResourceAnalyzer - Implement
get_service_name()andanalyze() - Update
available_analyzersinmain.py
'new_service': {
'class': 'analyzers.new_service.NewServiceAnalyzer',
'time': "Fast (5-10s)",
'desc': "Analyzes the new awesome service for optimization."
},Modify reporters/html_reporter.py for HTML customization, or create a new reporter class inheriting from BaseReporter.
| Environment Size | v1.0 | v2.0+ | v12.0+ (with cache) | Improvement |
|---|---|---|---|---|
| Small (<100 resources) | 2-3 min | 1-2 min | 30-60s | ~70% |
| Medium (100-500) | 10-15 min | 5-10 min | 3-6 min | ~60% |
| Large (500+) | 30-45 min | 15-30 min | 10-20 min | ~50% |
| Very Large (1000+) | Often fails | 30-60 min | 20-40 min | Reliability |
CloudWatch Metrics Caching:
# Run with caching enabled (default)
python main.py --region us-east-1
# Disable caching for fresh data
python main.py --region us-east-1 --no-cacheCache TTLs by metric type:
| Metric Type | TTL | Rationale |
|---|---|---|
| CPU/Memory | 5 min | Changes frequently |
| Network/Disk | 10 min | Moderately volatile |
| S3 Size/Objects | 30 min | Rarely changes |
Per-Analyzer Timeouts:
# Custom timeout (default: 180s)
python main.py --region us-east-1 --analyzer-timeout 120
# Timeout protects against slow analyzers blocking the pipeline
# Partial results are returned if an analyzer times outCircuit Breaker:
- Automatically opens after 5 consecutive failures per service
- Prevents wasting time on unavailable services
- Recovers automatically after 60 seconds
Tips:
- Use
--max-workersto control concurrency (default: 10) - Use
--single-threadfor debugging - Increase
--retry-attemptsfor slow connections - Use
--no-cachewhen you need the freshest metrics data - Lower
--analyzer-timeoutfor faster feedback on slow environments
Essential:
- ec2:Describe*
- cloudwatch:GetMetricStatistics
- s3:List*, s3:GetBucket*
- rds:Describe*
- lambda:List*, lambda:Get*
- iam:ListAccountAliases
Spot Analysis:
- ec2:DescribeSpotPriceHistory
- pricing:GetProducts
AI/ML Services:
- sagemaker:List*, sagemaker:Describe*
- bedrock:List*, bedrock:Get*
- comprehend:List*, comprehend:Describe*
- rekognition:Describe*, rekognition:List*
- textract:List*
- transcribe:List*
- kendra:List*, kendra:Describe*# Run all tests
python -m pytest tests/ -v
# Run specific test file
python -m pytest tests/test_aiml_analyzers.py -v
# Run performance tests only
python -m pytest tests/test_performance.py -v
# Run with coverage report
python -m pytest tests/ --cov=analyzers --cov=core --cov-report=html- AI/ML Analyzer Tests: Comprehensive mock tests for all 7 AI/ML analyzers
- HTML Reporter Tests: Verifies Chart.js visual analytics generation
- Integration Tests: Validates analyzer interface compliance
- Performance Tests (v12.0): 39 tests covering:
- Session reuse and thread safety
- Per-analyzer timeouts
- CloudWatch metrics caching (TTL, LRU eviction)
- Circuit breaker state transitions
- Event bus for progressive results
- Performance telemetry
- Exponential backoff with jitter
- Backward compatibility
| Issue | Solution |
|---|---|
| API Throttling | Reduce --max-workers or use --single-thread |
| Memory Usage | Analyze fewer --resource-types at once |
| Permission Errors | Check AWS credentials and IAM permissions |
| Analyzer Not Running | Verify key in --resource-types matches supported keys |
| Missing Resources | Verify resources exist in specified region |
| Boto3 Errors | Update boto3: pip install -U boto3 botocore |
| Analyzer Timeout | Increase --analyzer-timeout (default: 180s) |
| Stale Metrics Data | Use --no-cache to fetch fresh CloudWatch data |
| Circuit Breaker Open | Wait 60s for automatic recovery, or restart analysis |
Use --verbose for detailed error messages and debugging.
timeline
title Dedo-Duro Evolution
section Foundation
v1.0 : Monolithic script
v2.0 : Modular architecture
section Features
v3.0 : Security/Privacy analysis
v4.0 : Spot analysis
v5.0 : Orphan detection
section Enhancement
v7.0-v8.0 : EC2 efficiency, RI awareness
v9.0 : Containerization, S3 reports
v10.0 : Type safety, lazy init
v11.0 : AI/ML services
section Current
v12.0 : Multi-Account support
: Cost Explorer integration
: RTO/RPO Analysis
: EKS Monitoring
: CI/CD Integration
: Environment filtering
: Performance & Resilience
: Metrics Caching
: Circuit Breaker
section Future
v13.0+ : Web interface
: Auto-remediation
: Real-time dashboards
flowchart LR
A[Resource Discovery] --> B[Metrics Collection]
B --> C[Utilization Analysis]
C --> D[Cost Calculation]
D --> E[Recommendation Engine]
E --> F[Report Generation]
subgraph Details
A1["List all resources<br/>via AWS APIs"]
B1["30/60/90 day<br/>CloudWatch metrics"]
C1["CPU, Memory, IOPS<br/>Network utilization"]
D1["Real-time pricing<br/>Savings estimation"]
E1["Right-sizing<br/>Spot migration<br/>Schedule optimization"]
F1["HTML with charts<br/>JSON/CSV export"]
end
A -.-> A1
B -.-> B1
C -.-> C1
D -.-> D1
E -.-> E1
F -.-> F1
style A fill:#3498db,color:#fff
style B fill:#9b59b6,color:#fff
style C fill:#e67e22,color:#fff
style D fill:#27ae60,color:#fff
style E fill:#e74c3c,color:#fff
style F fill:#1abc9c,color:#fff
def parallel_process_batch(items, process_func, config, service):
"""Process a batch of items in parallel with controlled concurrency."""
results = []
batch_size = config.get_batch_size_for_service(service)
for i in range(0, len(items), batch_size):
batch = items[i:i+batch_size]
num_workers = max(1, min(config.max_workers, len(batch)))
with ThreadPoolExecutor(max_workers=num_workers) as executor:
future_to_item = {executor.submit(process_func, item): item for item in batch}
for future in as_completed(future_to_item):
try:
result = future.result()
if result:
results.append(result)
except Exception as exc:
item = future_to_item[future]
item_id = item.get("InstanceId") or item.get("VolumeId") or "Unknown"
print_error(f'Item {item_id} generated an exception: {exc}')
time.sleep(config.get_delay_for_service(service))
return resultsThe Dedo-Duro architecture supports team collaboration:
- Modular Structure: Different team members can work on different analyzers
- Clear Interfaces: Well-defined interfaces between components
- Independent Testing: Each analyzer can be tested independently
- Easy Onboarding: New team members can understand isolated components
- Version Control Friendly: Minimizes merge conflicts with separate files
Dedo-Duro 12.0 introduces enterprise-scale analysis capabilities:
- Multi-Account Analysis: Analyze entire AWS Organizations with consolidated reporting
- Cost Explorer Integration: Real cost data with anomaly detection
- RTO/RPO Analysis: Disaster recovery readiness assessment
- EKS Monitoring: Kubernetes session tracking and deployment lifecycle analysis
- Environment Filtering: Target specific environments (prod/test/dev)
- CI/CD Integration: GitHub Actions, Jenkins, and CircleCI support out-of-the-box
- Performance & Resilience:
- CloudWatch metrics caching (40-60% faster repeat analyses)
- Per-analyzer timeouts (graceful handling of slow services)
- Circuit breaker (automatic bypass of failing services)
- Session/client reuse (reduced connection overhead)
- Amazon SageMaker: Notebook instances, endpoints, training jobs, Feature Store, Studio
- Amazon Bedrock: Provisioned throughput, custom models, guardrails, knowledge bases
- Amazon Comprehend: Endpoints, classifiers, entity recognizers, flywheels
- Amazon Rekognition: Custom Labels projects, stream processors, face collections
- Amazon Textract: Custom adapters, per-operation cost estimation
- Amazon Transcribe: Vocabularies, language models, job patterns
- Amazon Kendra: Indexes, data sources, experiences
This object-oriented approach results in a maintainable and adaptable codebase, capable of evolving with new AWS services and user requirements.
MIT License
Developed and maintained by Gustavo Lima.
Key milestones: v2.0 (architecture), v3.0 (security), v4.0 (Spot), v5.0 (orphan), v7-8.0 (efficiency), v9.0 (automation), v10.0 (type safety), v11.0 (AI/ML services), v12.0 (multi-account, Cost Explorer, EKS, CI/CD).
Division of execution by environment, production and test→ Environment filtering (--environmentflag)Monitoring of open sessions by environments versus Kubernetes→ EKS Session Analyzer (eks_sessions)Monitoring the deployment lifecycle with Kubernetes→ EKS Deployment Lifecycle (eks_deployments)RTO Analysis Process for Production→ RTO/RPO Analyzer (rto_analysis)Reading files with tags and metadata to facilitate the resource grouping process→ Tag-based grouping (--grouping-tags)Create the all-in option - Run for a set of accounts at the same time→ Multi-Account Analysis (--accounts-file,--all-accounts)
Web interface for real-time monitoring→ Web Dashboard (web/app.py- Flask-based)Auto-remediation capabilities (experimental)→ Remediation Framework (remediation/module)Integration with Slack/Teams for notifications→ Notification System (notifications/module)Custom alerting thresholds→ Alert Manager (notifications/alerting.py)Kubernetes permissions documentation→ Kubernetes Permissions (docs/kubernetes_permissions.md)
- Enhanced web dashboard with real-time WebSocket updates
- Remediation approval workflow via web interface
- Historical trend analysis and forecasting
Dedo-Duro - Because your AWS resources shouldn't keep secrets from you.