Skip to content

Add Azure Monitor alerting for cloud-level resources#143

Open
ian-flores wants to merge 8 commits intomainfrom
azure-monitor-alerting
Open

Add Azure Monitor alerting for cloud-level resources#143
ian-flores wants to merge 8 commits intomainfrom
azure-monitor-alerting

Conversation

@ian-flores
Copy link
Contributor

@ian-flores ian-flores commented Feb 26, 2026

Summary

Add Azure Monitor-based alerting for Azure cloud resources, equivalent to PR #139 for AWS CloudWatch.

  • Add prometheus.exporter.azure config blocks to Alloy for Azure workloads (PostgreSQL, NetApp Files, Load Balancer, Storage, NAT Gateway)
  • Create Monitoring Reader RBAC role assignment and workload identity for Alloy managed identity
  • Create Azure-specific Grafana alert rule YAML files (azure_postgres, azure_netapp, azure_loadbalancer, azure_storage)

Alert rules

Resource Alert Threshold Duration
PostgreSQL CPU High >80% 10m
PostgreSQL Storage High >80% 5m
PostgreSQL Memory High >80% 10m
PostgreSQL Connections High >500 5m
PostgreSQL Failed Connections >10 5m
PostgreSQL Deadlocks >0 5m
NetApp Files Capacity High >80% 10m
NetApp Files Read Latency High >10ms 10m
NetApp Files Write Latency High >10ms 10m
Load Balancer Health Probe Down <100% 5m
Load Balancer Data Path Down <100% 5m
Load Balancer SNAT Port Exhaustion >80% 5m
Storage Availability Low <99.9% 5m
Storage Latency High >1000ms 10m

Test plan

  • All 190 tests pass (163 existing + 27 new)
  • Lint and format clean
  • Deploy to Azure test cluster
  • Verify metrics arrive in Mimir
  • Verify alert rules appear in Grafana Alerting

ian-flores and others added 8 commits February 26, 2026 09:18
Add prometheus.exporter.azure config blocks to Alloy for Azure workloads
covering PostgreSQL, NetApp Files, Load Balancer, Storage, and NAT Gateway
(conditional on public_subnet_cidr). Create Monitoring Reader role
assignment and workload identity for Alloy service account.

Ref: ptd-config#2779
Create Grafana provisioned alert YAML files for Azure cloud resources:
- azure_postgres.yaml: CPU, storage, memory, connections, deadlocks
- azure_netapp.yaml: capacity, read/write latency
- azure_loadbalancer.yaml: health probe, data path, SNAT exhaustion
- azure_storage.yaml: availability, E2E latency

Ref: ptd-config#2779
Add 27 tests covering:
- Azure Monitor Alloy config generation (metric blocks, NAT conditional,
  subscription/resource group interpolation, AWS returns empty)
- Alert YAML file validation (existence, structure, metric queries)
- Alloy monitoring identity method existence and signature

Ref: ptd-config#2779
@timtalbot timtalbot marked this pull request as ready for review March 10, 2026 19:57
@timtalbot timtalbot requested a review from a team as a code owner March 10, 2026 19:57
@timtalbot timtalbot self-assigned this Mar 10, 2026
@timtalbot timtalbot requested review from Lytol and t-margheim March 10, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants