Skip to content

refine CloudWatch alert thresholds and severity labels#152

Merged
ian-flores merged 1 commit intomainfrom
update-cloudwatch-alert-thresholds
Mar 3, 2026
Merged

refine CloudWatch alert thresholds and severity labels#152
ian-flores merged 1 commit intomainfrom
update-cloudwatch-alert-thresholds

Conversation

@ian-flores
Copy link
Contributor

Description

Follow-up to #139. Refines the CloudWatch Grafana alerts based on review feedback.

Changes:

  • Add severity: warning to all alerts that were missing it (RDS CPU utilization, free storage, freeable memory, database connections; ALB/NLB unhealthy targets, ALB 5XX errors, ALB response latency)
  • Delete NAT Gateway Port Allocation Errors and Packets Dropped alerts
  • Delete RDS Read Latency High alert
  • Reduce ALB and NLB Unhealthy Targets evaluation window from 10m → 5m
  • Lower RDS Database Connections threshold from 500 → 80 and unpause the alert
  • Add comments to RDS Free Storage Low and Freeable Memory Low explaining why they remain as absolute byte thresholds: CloudWatch does not expose AllocatedStorage or total instance RAM as time-series metrics for RDS, making percentage-based thresholds infeasible without a separate exporter

Category of change

  • Bug fix (non-breaking change which fixes an issue)
  • Version upgrade (upgrading the version of a service or product)
  • New feature (non-breaking change which adds functionality)
  • Build: a code change that affects the build system or external dependencies
  • Performance: a code change that improves performance
  • Refactor: a code change that neither fixes a bug nor adds a feature
  • Documentation: documentation changes
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

- Add severity: warning to all alerts missing it (CPU utilization,
  free storage, freeable memory, database connections, ALB/NLB targets,
  ALB 5XX errors, ALB response latency)
- Delete NAT Gateway alerts (port allocation errors, packets dropped)
- Delete RDS Read Latency High alert
- Reduce ALB/NLB Unhealthy Targets window from 10m to 5m
- Lower RDS Database Connections threshold from 500 to 80; unpause alert
- Note why RDS Free Storage and Freeable Memory remain as absolute byte
  thresholds (CloudWatch lacks AllocatedStorage/total RAM as metrics)
@ian-flores ian-flores marked this pull request as ready for review February 27, 2026 00:37
@ian-flores ian-flores requested a review from a team as a code owner February 27, 2026 00:37
@ian-flores ian-flores requested a review from amdove February 27, 2026 00:37
@ian-flores ian-flores enabled auto-merge February 27, 2026 00:37
@ian-flores ian-flores added this pull request to the merge queue Mar 3, 2026
Merged via the queue into main with commit 40bfeaa Mar 3, 2026
3 checks passed
@ian-flores ian-flores deleted the update-cloudwatch-alert-thresholds branch March 3, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants