-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Context
- Incident: Q27IUX3O3X3EKP (P1, high) — https://microsoft-sre-agent-test.pagerduty.com/incidents/Q27IUX3O3X3EKP
- Affected pattern: Admin page failure due to IndexOutOfRangeException during User-Agent parsing in
DashboardController.ExtractCriticalSegment() - Azure resource for tracking:
/subscriptions/be8d491e-109c-4ee1-aaee-dc7615af0a42/resourceGroups/mrsharm-operations-agent-3p-rg/providers/Microsoft.Web/sites/cpu-app
Summary
- Symptom: Critical exception
IndexOutOfRangeExceptionbreaks admin page. - Trigger: Security audit system User-Agent parsing assumed fixed structure; out-of-bounds indexing on malformed input.
- Interim mitigation (from incident): Temporarily bypass User-Agent parsing for affected requests.
- Root cause (from incident): Unvalidated input parsing; missing bounds checks/defensive parsing.
Runbook
- Symptoms & Detection
- Signals: sudden spike in 5xx/Exceptions; logs show
IndexOutOfRangeExceptionfromDashboardController.ExtractCriticalSegment. - Check: App logs/exceptions, failed request count, key traces around Admin page requests.
- Immediate Triage
- Confirm blast radius (admin endpoints only vs global).
- Capture recent offending User-Agent strings and request paths.
- Toggle/feature-flag: disable/bypass UA parsing for admin endpoints if available.
- Safe Mitigation
- Hotfix: guard parsing behind a config flag; default to safe path on parse failure.
- Input sanitation: if UA missing or malformed, skip extraction and continue request.
- Root-Cause Validation & Permanent Fix
- Add bounds and null/length checks before indexing arrays/lists.
- Replace manual indexing with safe parsing (TryParse pattern) and fallback behavior.
- Add structured parsing with defensive defaults; unit tests for edge UA cases.
- Rollback/Toggle Guidance
- Keep the bypass flag until the fix is validated in prod; rollback by re-enabling parsing once error rate stable for 30–60m.
- Post-Fix Verification
- Metrics: exceptions/5xx back to baseline; no new
IndexOutOfRangeExceptionfor 60m. - Synthetic test: admin page loads with a set of malformed/edge UA headers.
- Logs: verify warning-level entries for skipped parsing without error spikes.
- Prevention & Follow-ups
- Add input fuzz tests for UA parsing.
- Add guardrails: circuit-break parsing after N failures per minute.
- Observability: structured log for parse decisions and counts.
References
- PD incident summary/root cause: unvalidated input parsing leading to OOB access.
- Incident link: https://microsoft-sre-agent-test.pagerduty.com/incidents/Q27IUX3O3X3EKP
- Azure resource:
/subscriptions/be8d491e-109c-4ee1-aaee-dc7615af0a42/resourceGroups/mrsharm-operations-agent-3p-rg/providers/Microsoft.Web/sites/cpu-app
This issue was created by mrsharm-sri1111--3f136ed8
Tracked by the SRE agent here
Reactions are currently unavailable