Skip to content

Runbook: IndexOutOfRangeException in User-Agent parsing (DashboardController.ExtractCriticalSegment) – from PD incident Q27IUX3O3X3EKP #129

@mrsharm

Description

@mrsharm

Context

  • Incident: Q27IUX3O3X3EKP (P1, high) — https://microsoft-sre-agent-test.pagerduty.com/incidents/Q27IUX3O3X3EKP
  • Affected pattern: Admin page failure due to IndexOutOfRangeException during User-Agent parsing in DashboardController.ExtractCriticalSegment()
  • Azure resource for tracking: /subscriptions/be8d491e-109c-4ee1-aaee-dc7615af0a42/resourceGroups/mrsharm-operations-agent-3p-rg/providers/Microsoft.Web/sites/cpu-app

Summary

  • Symptom: Critical exception IndexOutOfRangeException breaks admin page.
  • Trigger: Security audit system User-Agent parsing assumed fixed structure; out-of-bounds indexing on malformed input.
  • Interim mitigation (from incident): Temporarily bypass User-Agent parsing for affected requests.
  • Root cause (from incident): Unvalidated input parsing; missing bounds checks/defensive parsing.

Runbook

  1. Symptoms & Detection
  • Signals: sudden spike in 5xx/Exceptions; logs show IndexOutOfRangeException from DashboardController.ExtractCriticalSegment.
  • Check: App logs/exceptions, failed request count, key traces around Admin page requests.
  1. Immediate Triage
  • Confirm blast radius (admin endpoints only vs global).
  • Capture recent offending User-Agent strings and request paths.
  • Toggle/feature-flag: disable/bypass UA parsing for admin endpoints if available.
  1. Safe Mitigation
  • Hotfix: guard parsing behind a config flag; default to safe path on parse failure.
  • Input sanitation: if UA missing or malformed, skip extraction and continue request.
  1. Root-Cause Validation & Permanent Fix
  • Add bounds and null/length checks before indexing arrays/lists.
  • Replace manual indexing with safe parsing (TryParse pattern) and fallback behavior.
  • Add structured parsing with defensive defaults; unit tests for edge UA cases.
  1. Rollback/Toggle Guidance
  • Keep the bypass flag until the fix is validated in prod; rollback by re-enabling parsing once error rate stable for 30–60m.
  1. Post-Fix Verification
  • Metrics: exceptions/5xx back to baseline; no new IndexOutOfRangeException for 60m.
  • Synthetic test: admin page loads with a set of malformed/edge UA headers.
  • Logs: verify warning-level entries for skipped parsing without error spikes.
  1. Prevention & Follow-ups
  • Add input fuzz tests for UA parsing.
  • Add guardrails: circuit-break parsing after N failures per minute.
  • Observability: structured log for parse decisions and counts.

References


This issue was created by mrsharm-sri1111--3f136ed8
Tracked by the SRE agent here

Metadata

Metadata

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions