Skip to content

Feature: Add free alerting integration (Discord webhook preferred) #1

@1kuna

Description

@1kuna

We need hands-off alerting for node/training failures.

Scope:

  • Add alert sender with a pluggable backend (Discord webhook preferred for free tier), alternative: Slack webhook or SMTP email (free options).
  • Trigger on critical self-check failures (S3 down, missing env, vendor outage), and on repeated job failures.
  • Integrate into node loop and training loop; keep alerts rate-limited and deduplicated.

Proposed approach:

  • Implement ops/alerts.py with send_alert(title, body) and backends: discord_webhook, slack_webhook, email(smtp).
  • Configure via .env: ALERT_BACKEND, DISCORD_WEBHOOK_URL, SLACK_WEBHOOK_URL, SMTP_*.
  • Add a small circuit breaker to avoid alert storms.

Acceptance:

  • Simulated S3 outage -> one alert created and logged, retries after backoff without spamming.
  • Missing vendor keys -> warning notifies once per day until resolved.
  • Documentation in README with setup instructions.

Notes:

  • Start with Discord Webhook (free) as default backend; others optional.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions