-
Notifications
You must be signed in to change notification settings - Fork 123
Description
What is the bug?
Bucket-level monitors incorrectly trigger alerts when the monitor's input query fails due to OpenSearchRejectedExecutionException. The exception causes ctx.results to be empty, and TriggerService.kt:192 then throws an IndexOutOfBoundsException when attempting to access ctx.results[0]. This exception is caught and misinterpreted as a trigger condition being met, resulting in false positive alerts.
How can one reproduce the bug?
Steps to reproduce the behavior:
- Configure a bucket-level monitor with a log aggregation query
- Create resource exhaustion conditions on OpenSearch cluster:
- Search thread pool queue saturation (e.g., 1033 tasks / 1000 capacity)
- High heap usage
- Long-running queries (3+ minute average) - Monitor executes during resource exhaustion
- Observe false positive alert triggered even though no logs matched the condition
What is the expected behavior?
When the monitor's input query fails due to OpenSearchRejectedExecutionException, the monitor should:
- Log an error indicating input collection failed
- NOT trigger the alert action
- Optionally retry the query or mark the execution as failed
Actual Behavior:
The monitor incorrectly triggers the alert action because:
- Search queue rejects query → OpenSearchRejectedExecutionException
- Input collection fails → ctx.results is an empty list
- TriggerService.kt:192 attempts ctx.results[0] → IndexOutOfBoundsException
- Exception handling interprets this as trigger condition met → False alert sent
What is your host/environment?
- OS: linux
Do you have any screenshots?
N/A
Do you have any additional context?
Identified root cause
No validation for ctx.results being non-empty before accessing index 0 at TriggerService.kt#L192