Sentry Alert Rules & Configuration

Overview

This document covers the comprehensive alert configuration for Sentry error tracking and monitoring of JudgeFinder.io.

Sentry Setup

Project Configuration

DSN (Data Source Name):

https://[key]@sentry.io/[project-id]

Environment Variables:

SENTRY_DSN=<your-dsn>
NEXT_PUBLIC_SENTRY_DSN=<your-dsn>  # For client-side errors
SENTRY_TRACES_SAMPLE_RATE=0.1      # 10% of transactions
SENTRY_REPLAYS_SESSION_SAMPLE_RATE=0   # Disabled by default
SENTRY_REPLAYS_ON_ERROR_SAMPLE_RATE=1  # 100% on errors

Alert Destinations

Email: notifications@judgefinder.io
Slack: #monitoring channel
PagerDuty: Critical incidents (optional)

Alert Rules

Rule 1: Error Rate > 1% (CRITICAL)

Trigger Conditions:

When event.type == "error"
Frequency: > 1% of all events in last 5 minutes
Affected environments: production, staging

Notification Actions:

Send to: Slack #monitoring + Email
Mention: @channel in Slack
Create incident: Yes
Page on-call: Yes

Configuration:

name: Error Rate > 1%
filter:
  environment: [production, staging]
  event.type: error
  percentage: 1.0
  timeframe: 5m
actions:
  - slack:
      channel: '#monitoring'
      mention: '@channel'
  - email:
      recipient: 'notifications@judgefinder.io'
  - create_incident: true
  - pagerduty:
      severity: critical

Response SLA: 15 minutes

Rule 2: Unhandled Exceptions (CRITICAL)

Trigger Conditions:

When exception is caught
Handled: false
First occurrence: Yes
Environment: production

Notification Actions:

Send to: Email + Slack (immediate)
Alert frequency: Once per exception type per hour
Create Sentry issue: Yes

Configuration:

name: Unhandled Exceptions
filter:
  environment: production
  exception.handled: false
  first_occurrence: true
actions:
  - slack:
      channel: '#alerts'
      mention: '@developers'
  - email:
      recipient: ['admin@judgefinder.io', 'ops@judgefinder.io']
  - create_issue: true
  - escalate_after: 10m
    to: pagerduty

Response SLA: 5 minutes

Rule 3: API Response Time > 2 Seconds (WARNING)

Trigger Conditions:

When measurement.duration > 2000 (milliseconds)
Endpoint: /api/*
Frequency: > 10 occurrences in 10 minutes
Environment: production

Notification Actions:

Send to: Slack #performance
Trigger performance review
Do not alert for first occurrence

Configuration:

name: API Response Time > 2s
filter:
  environment: production
  transaction: /api/*
  measurement.duration: '>2000'
  frequency: '>10:10m'
actions:
  - slack:
      channel: '#performance'
  - metric_alert: response_time_warning
  - log_to_cloudwatch: true

Response SLA: 30 minutes

Rule 4: Database Query > 5 Seconds (WARNING)

Trigger Conditions:

When database operation duration > 5000ms
Database: Supabase/PostgreSQL
Frequency: > 5 occurrences in 15 minutes

Notification Actions:

Log to monitoring dashboard
Alert engineering team
No paging (not critical)

Configuration:

name: Slow Database Query
filter:
  span.op: db.query
  measurement.duration: '>5000'
  frequency: '>5:15m'
actions:
  - slack:
      channel: '#performance'
      message: 'Slow database query detected'
  - log_to_cloudwatch: true
  - create_monitoring_ticket: true

Response SLA: 1 hour

Rule 5: Memory Usage > 80% (WARNING)

Trigger Conditions:

When heap memory usage > 80%
Frequency: Sustained for 5 minutes
Environment: production

Notification Actions:

Alert ops team
Log memory profile
Trigger performance review

Configuration:

name: Memory Usage High
filter:
  environment: production
  memory.heap_usage_percentage: '>80'
  duration: 5m
actions:
  - slack:
      channel: '#ops'
      message: 'High memory usage on production'
  - email:
      recipient: 'ops@judgefinder.io'
  - capture_heap_dump: true

Response SLA: 30 minutes

Rule 6: Authentication Failures (MEDIUM)

Trigger Conditions:

When auth-related exceptions occur
Exception contains: "auth", "token", "permission"
Frequency: > 20 occurrences in 10 minutes

Notification Actions:

Alert security team
Log all details
Review for potential breach

Configuration:

name: Authentication Failures
filter:
  event.type: error
  tags.error_type: [auth, authentication, token]
  frequency: '>20:10m'
actions:
  - slack:
      channel: '#security'
      mention: '@security'
  - email:
      recipient: 'security@judgefinder.io'
  - log_to_cloudwatch:
      log_group: security_events
  - escalate_to: incident_commander

Response SLA: 10 minutes

Rule 7: Third-Party API Failures (MEDIUM)

Trigger Conditions:

CourtListener API failures
Stripe API failures
External service timeouts
Frequency: > 5 in 5 minutes

Notification Actions:

Alert integration team
Log API response
Monitor for degradation

Configuration:

name: External API Failures
filter:
  tags.service: [courtlistener, stripe, external]
  event.type: error
  frequency: '>5:5m'
actions:
  - slack:
      channel: '#integrations'
  - email:
      recipient: ['integrations@judgefinder.io']
  - create_monitoring_alert: true
  - auto_disable_feature: false

Response SLA: 30 minutes

Rule 8: Payment Processing Errors (CRITICAL)

Trigger Conditions:

Stripe payment failures
Transaction declined
Webhook processing fails
Frequency: > 1 in 5 minutes

Notification Actions:

Immediate paging
Alert payment ops
Create incident
Capture transaction details

Configuration:

name: Payment Processing Errors
filter:
  tags.feature: payments
  tags.service: stripe
  event.type: error
  frequency: '>1:5m'
actions:
  - pagerduty:
      severity: critical
      title: 'Payment processing failure'
  - slack:
      channel: '#payments'
      mention: '@payments-team'
  - email:
      recipient: ['payments@judgefinder.io', 'finance@judgefinder.io']
  - create_incident: true
  - capture_transaction_details: true

Response SLA: 5 minutes

Rule 9: Deployment Errors (HIGH)

Trigger Conditions:

New release deployment
Error rate spike > 5x baseline
First hour after deployment
Environment: production

Notification Actions:

Alert deployment team
Suggest rollback
Create deployment incident

Configuration:

name: Post-Deployment Error Spike
filter:
  environment: production
  release: release.*
  time_since_release: '0-60m'
  error_rate_increase: '>5x'
actions:
  - slack:
      channel: '#deployments'
      message: 'Error spike detected post-deployment'
      suggest_rollback: true
  - email:
      recipient: 'devops@judgefinder.io'
  - create_deployment_incident: true

Response SLA: 15 minutes

Rule 10: JavaScript Console Errors (LOW)

Trigger Conditions:

Client-side JavaScript errors
Frequency: > 100 in 1 hour
Environment: production
Not ignored errors

Notification Actions:

Log to performance dashboard
Weekly summary email
No real-time alert

Configuration:

name: JavaScript Console Errors
filter:
  environment: production
  platform: javascript
  event.type: error
  frequency: '>100:1h'
  ignore_tags: [expected_error]
actions:
  - slack:
      channel: '#frontend'
      frequency: weekly
  - email:
      recipient: 'frontend-team@judgefinder.io'
      frequency: daily_digest
  - log_to_cloudwatch: true

Response SLA: No SLA (informational)

Alert Management

Enable/Disable Rules

Via Web Console:

Go to: Sentry Project > Alerts
Toggle rules on/off
Click Save

Via API:

# Disable rule
curl -X PUT https://sentry.io/api/0/projects/{org}/{project}/rules/{rule_id}/ \
  -H "Authorization: Bearer {auth_token}" \
  -d '{"enabled": false}'

# Enable rule
curl -X PUT https://sentry.io/api/0/projects/{org}/{project}/rules/{rule_id}/ \
  -H "Authorization: Bearer {auth_token}" \
  -d '{"enabled": true}'

Suppress/Ignore Errors

Ignore specific exception:

filter:
  error.message: "Network error"
  action: ignore

Ignore error by pattern:

filter:
  error.message: "*network*"
  environment: staging
  action: ignore

Resurrecting ignored errors:

Go to: Sentry Project > Settings > Ignore
Click "Restore" for the error
Confirm

Alert Grouping

Smart grouping enabled:

Groups similar errors by:
- Exception type
- Stack trace
- Error message fingerprint

Custom grouping:

group_by:
  - exception.type
  - tags.service
  - tags.endpoint

Sentry Workflow Integration

Slack Integration

Setup:

Go to: Sentry Project > Integrations > Slack
Authorize Sentry app to Slack workspace
Select channels for notifications
Configure notification frequency

Slack Commands:

@sentry-app ignore
@sentry-app resolve
@sentry-app assign @username
@sentry-app status

Email Notifications

Configure:

Go to: Settings > Notifications > Project Alerts
Add recipients:

Digest Email:

Frequency: Daily
Time: 9:00 AM UTC
Include: All alerts from past 24h

GitHub Integration

Auto-create issues:

Go to: Integrations > GitHub
Authorize repository
Configure:
- Create issue on: Error > threshold
- Issue template: Use standard template
- Auto-assign: @dev-team

Issue Template:

## Error: {error.title}

**Severity:** {error.level}
**First seen:** {error.first_seen}
**Last seen:** {error.last_seen}
**Occurrences:** {error.count}

### Stack Trace

\`\`\`
{error.stack_trace}
\`\`\`

### Context

- User: {user.username}
- URL: {request.url}
- Environment: {environment}

### Action Items

- [ ] Investigate root cause
- [ ] Create fix
- [ ] Test fix
- [ ] Deploy to production

Monitoring Dashboard

Sentry Dashboard Setup

Custom dashboard:

// Metrics to display
{
  "widgets": [
    {
      "title": "Error Rate (Last 24h)",
      "type": "stat",
      "query": "event.type:error"
    },
    {
      "title": "Top 10 Errors",
      "type": "table",
      "query": "event.type:error",
      "sort": "-frequency"
    },
    {
      "title": "Error Trend",
      "type": "line_chart",
      "query": "event.type:error",
      "period": "1h"
    },
    {
      "title": "Slowest Transactions",
      "type": "table",
      "query": "measurement.duration:>1000"
    },
    {
      "title": "User Impact",
      "type": "stat",
      "query": "affected_users"
    }
  ]
}

Key Metrics to Track

Error Volume
- Total errors per day
- Error growth rate
- Critical vs. warning errors
Response Time
- p50, p95, p99 latencies
- Slow transaction endpoints
- Database query performance
User Impact
- Unique users affected
- Error blast radius
- Session impact
Error Sources
- Frontend vs. backend
- Top error types
- New errors introduced

Alert Testing

Test an Alert Rule

# Create a test event
curl -X POST https://[key]@sentry.io/api/[project_id]/store/ \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Test alert",
    "level": "error",
    "exception": {
      "values": [
        {
          "type": "TestException",
          "value": "This is a test error"
        }
      ]
    }
  }'

Verify Alert Delivery

Create test error in staging
Verify Slack message sent
Verify email received
Confirm PagerDuty incident (if configured)
Review in Sentry dashboard

Best Practices

Alert Fatigue Prevention

Set appropriate thresholds:
- Avoid alerting on every error
- Use frequency-based rules
- Set time windows for context
Use alert grouping:
- Group similar errors
- Reduce noise
- Focus on unique issues
Regular rule reviews:
- Monthly: Check rule effectiveness
- Disable unused rules
- Adjust thresholds based on data

Incident Response

Upon alert:

Acknowledge alert in Slack
Create incident ticket
Assign to on-call engineer
Begin investigation
Implement fix
Verify resolution
Document post-mortem

Performance Monitoring

Track these metrics:

Frontend performance (LCP, FID, CLS)
API endpoint latency
Database query duration
Third-party API response times
Memory and CPU usage

Integration with Monitoring Stack

Connect to UptimeRobot

When UptimeRobot detects downtime:

Automatically create Sentry alert
Tag with service and endpoint
Correlate with error spike
Provide context for response

Connect to CloudWatch

Stream Sentry errors to CloudWatch:

Sentry -> Webhook -> Lambda -> CloudWatch

Configuration:

webhook:
  url: https://lambda.amazonaws.com/webhook
  events: [error, transaction]
  include_details: true

Cost Optimization

Event Quota Management

Sentry Plan: Business tier - $9/month per project

Quota allocation:

Errors: 10M events/month
Transactions: 50M events/month
Replays: 1000 sessions/month

Cost optimization:

Use beforeSend to filter events
Reduce sample rates in non-critical envs
Ignore known harmless errors
Archive old issues

Sample Configuration

// sentry.client.config.ts
Sentry.init({
  beforeSend(event, hint) {
    // Ignore known errors
    if (event.message?.includes('Network error')) {
      return null
    }

    // Ignore in non-production
    if (process.env.NODE_ENV !== 'production') {
      return null
    }

    return event
  },
  tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
})

Troubleshooting

Events not appearing in Sentry

Verify DSN is correct
Check browser console for errors
Verify beforeSend isn't filtering
Check network requests (DevTools)
Review Sentry project settings

Duplicate alerts

Check rule conditions
Verify grouping is working
Adjust frequency thresholds
Review filter conditions

Missing context

Enable breadcrumbs
Attach user context
Add custom tags
Include HTTP request details

Next Steps

Set up Slack integration
Configure critical rules (Rules 1, 2, 8)
Enable warning rules (Rules 3-7)
Create dashboard
Configure team notifications
Document runbook for each rule
Schedule monthly review

Resources

Sentry Docs: https://docs.sentry.io/
Alert Rules: https://docs.sentry.io/product/alerts/
Integrations: https://docs.sentry.io/product/integrations/
Best Practices: https://docs.sentry.io/platforms/javascript/best-practices/

FilesExpand file tree

SENTRY_ALERTS.md

Latest commit

History

SENTRY_ALERTS.md

File metadata and controls

Sentry Alert Rules & Configuration

Overview

Sentry Setup

Project Configuration

Alert Destinations

Alert Rules

Rule 1: Error Rate > 1% (CRITICAL)

Rule 2: Unhandled Exceptions (CRITICAL)

Rule 3: API Response Time > 2 Seconds (WARNING)

Rule 4: Database Query > 5 Seconds (WARNING)

Rule 5: Memory Usage > 80% (WARNING)

Rule 6: Authentication Failures (MEDIUM)

Rule 7: Third-Party API Failures (MEDIUM)

Rule 8: Payment Processing Errors (CRITICAL)

Rule 9: Deployment Errors (HIGH)

Rule 10: JavaScript Console Errors (LOW)

Alert Management

Enable/Disable Rules

Suppress/Ignore Errors

Alert Grouping

Sentry Workflow Integration

Slack Integration

Email Notifications

GitHub Integration

Monitoring Dashboard

Sentry Dashboard Setup

Key Metrics to Track

Alert Testing

Test an Alert Rule

Verify Alert Delivery

Best Practices

Alert Fatigue Prevention

Incident Response

Performance Monitoring

Integration with Monitoring Stack

Connect to UptimeRobot

Connect to CloudWatch

Cost Optimization

Event Quota Management

Sample Configuration

Troubleshooting

Events not appearing in Sentry

Duplicate alerts

Missing context

Next Steps

Resources