-
Notifications
You must be signed in to change notification settings - Fork 123
Description
Draft - RFC: Unified Alert Viewing for Prometheus and OpenSearch Alerts
Summary
With OpenSearch adding support for Prometheus metrics querying via PromQL, users need a unified view of alerts across both Prometheus Alertmanager and OpenSearch Alerting. This RFC proposes a unified UI within OpenSearch Dashboards that displays alerts from both systems side-by-side, enabling users to monitor all their observability alerts without requiring alert format conversion or architectural changes to either alerting system.
Problem Statement
Current State
OpenSearch now supports querying Prometheus metrics directly using PromQL, enabling comprehensive metrics observability. However, OpenSearch’s alerting experience currently only supports OpenSearch alerts.
User Pain Points
- Tool fragmentation: Users want one view of their data instead of jumping between tools
- Incomplete observability: Cannot correlate Prometheus metric alerts with OpenSearch log/trace alerts
- Operational overhead: Managing multiple alerting UIs (e.g. Grafana) increases cognitive load and response time
Why This Matters
Prometheus is the most common open source metrics solution, and users migrating from commercial observability platforms expect unified experiences. With OpenSearch now supporting metrics (Prometheus), traces, and logs, a unified alert view is essential to deliver on the complete observability promise.
Goals and Non-Goals
Goals
- Unified alert viewing: Display Prometheus and OpenSearch alerts in a single UI within OpenSearch Dashboards
- Preserve native functionality: Maintain full fidelity of alert information from both systems without lossy conversion
- Seamless navigation: Enable drill-down from alerts to underlying metrics, traces, and logs
- Minimal operational changes: Require no changes to existing Prometheus Alertmanager or OpenSearch Alerting configurations
- Unified alert management: Creating/editing/managing alerts from the unified view (view-only in initial release)
- Notification unification: consolidating notification channels between systems
Non-Goals
- Alert rule conversion: Not converting Prometheus alerting rules to OpenSearch Alerting format
- Alert forwarding: Not ingesting Prometheus alerts as OpenSearch your source documents
- Historical alert storage: Not persisting Prometheus alert history in OpenSearch (future consideration)
Proposed Solution
Architecture Overview
The unified alert UI will query both Prometheus Alertmanager and OpenSearch Alerting APIs in parallel to display alerts, and provide bidirectional communication to enable full alert lifecycle management including creating, editing, acknowledging, and silencing alerts across both systems.
┌─────────────────────────────────────────────────────┐
│ OpenSearch Dashboards (Unified UI) │
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ Unified Alerts Management Component │ │
│ │ (Observability Plugin - Alerts Tab) │ │
│ │ │ │
│ │ • View alerts from both systems │ │
│ │ • Create/edit alerting rules & monitors │ │
│ │ • Acknowledge/silence alerts │ │
│ │ • Manage complete alert lifecycle │ │
│ │ • Bulk operations across systems │ │
│ └────────────────────────────────────────────┘ │
│ │ │ │
│ │ (Read/Write) │ (Read/Write)│
│ ▼ ▼ │
│ ┌─────────────────┐ ┌──────────────────┐ │
│ │ Prometheus │ │ OpenSearch │ │
│ │ Alert Adapter │ │ Alert Adapter │ │
│ │ • Query alerts │ │ • Query alerts │ │
│ │ • Create rules │ │ • Create monitors│ │
│ │ • Edit rules │ │ • Edit monitors │ │
│ │ • Silence alerts│ │ • Acknowledge │ │
│ │ • Delete rules │ │ • Enable/disable │ │
│ └─────────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────┘
│ │
│ (API calls) │ (API calls)
▼ ▼
┌─────────────────────┐ ┌──────────────────────┐
│ Prometheus │ │ OpenSearch │
│ • Alertmanager API │ │ • Alerting Plugin │
│ • Rules API │ │ API │
│ • Silences API │ │ │
└─────────────────────┘ └──────────────────────┘
Key Components
3. Unified Alert Data Model
Define a common internal representation that preserves information from both systems:
typescript
interface UnifiedAlert {
*// Common fields*
id: string;
name: string;
severity: 'critical' | 'high' | 'medium' | 'low' | 'info';
state: 'firing' | 'pending' | 'resolved' | 'acknowledged';
source: 'prometheus' | 'opensearch';
timestamp: string; *// ISO8601*
*// Alert details*
message: string;
description?: string;
*// Source-specific metadata (preserved as-is)*
sourceMetadata: PrometheusMetadata | OpenSearchMetadata;
*// Navigation context*
drillDownLinks: {
metrics?: string; *// Link to metrics explorer*
traces?: string; *// Link to trace analytics*
logs?: string; *// Link to log explorer*
};
*// Management capabilities*
managementActions: {
canAcknowledge: boolean;
canSilence: boolean;
canEdit: boolean;
canDelete: boolean;
};
}
interface PrometheusMetadata {
labels: Record<string, string>;
annotations: Record<string, string>;
generatorURL: string;
fingerprint: string;
receivers: string[];
startsAt: string;
endsAt?: string;
updatedAt: string;
ruleId?: string; *// For rule management*
silenceId?: string; *// If alert is silenced*
}
interface OpenSearchMetadata {
monitorId: string;
monitorName: string;
triggerId: string;
triggerName: string;
4. Alert Management Data Models
Define structures for creating and managing alerts:
typescript
interface PrometheusAlertRule {
alert: string; *// Alert name*
expr: string; *// PromQL expression*
for?: string; *// Duration (e.g., "5m")*
labels?: Record<string, string>;
annotations?: Record<string, string>;
}
interface PrometheusRuleGroup {
name: string;
interval?: string;
rules: PrometheusAlertRule[];
}
interface PrometheusSilence {
matchers: Array<{
name: string;
value: string;
isRegex: boolean;
}>;
startsAt: string;
endsAt: string;
createdBy: string;
comment: string;
}
interface OpenSearchMonitor {
name: string;
type: 'query_level_monitor' | 'bucket_level_monitor' | 'doc_level_monitor';
enabled: boolean;
schedule: {
period: {
interval: number;
unit: 'MINUTES' | 'HOURS' | 'DAYS';
};
};
inputs: Array<{
search: {
indices: string[];
query: object;
};
}>;
triggers: Array<{
name: string;
severity: string;
condition: {
script: {
source: string;
lang: 'painless';
};
6. Unified Alerts View UI (longer term view)
Core Features:
- Unified alert list: Tabular view showing alerts from both sources
- Source indicator: Visual badge indicating Prometheus vs OpenSearch origin
- Filtering: Filter by source, severity, state, time range
- Sorting: Sort by timestamp, severity, name
- Search: Full-text search across alert names and descriptions
- Detail panel: Side panel showing full alert details with source-specific metadata
- Drill-down navigation: One-click navigation to related metrics, traces, logs
Management Features:
- Create alert button: Launch unified alert creation wizard
- Inline editing: Edit alert rules/monitors directly from list view
- Bulk operations: Select multiple alerts for acknowledge/silence operations
- Quick actions menu: Context menu for individual alert management
- Silence manager: Dedicated view for managing active silences
- Alert lifecycle controls: Enable/disable, delete, duplicate alerts
State Mapping
Table
| Prometheus State | OpenSearch State | Unified State |
|---|---|---|
| firing | ACTIVE | Firing |
| pending | - | Pending |
| resolved | COMPLETED | Resolved |
| - | ACKNOWLEDGED | Acknowledged |
| - | ERROR | Error |
Alert Management Operations Mapping
Table
| Operation | Prometheus Implementation | OpenSearch Implementation |
|---|---|---|
| Create | POST rule to Rules API | POST monitor to Alerting API |
| Edit | PUT rule to Rules API | PUT monitor to Alerting API |
| Delete | DELETE rule from Rules API | DELETE monitor from Alerting API |
| Acknowledge | Create silence with duration | POST to acknowledge endpoint |
| Silence | Create silence via Silences API | Create silence (future enhancement) |
| Enable/Disable | Not applicable (delete rule) | Update monitor enabled field |
Open Questions
- Milestone validation Would providing read-only view of Prometheus alerts provide value to the community or does full CRUD and unified view required to provide value
- Unified view - Options for providing the right level of detail at the expense of additional network calls to provide a blended view of alerts across Prometheus workspaces and OpenSearch domains (e.g. caching, lazy loading)
- Unified alert manager - Does it make sense to have a centralized alert manager experience where OpenSearch manages all alerts from different data sources (concerns about OpenSearch scaling to handle Prometheus level alerting rules of 100k+ rules in a workspace)
- OpenSearch native metrics Uber is working on native support of metrics in OpenSearch what does alerting support look like when that manifests