Notifications and database availability #105

gmarziou · 2026-04-02T08:12:55Z

gmarziou
Apr 2, 2026

Imagine my database server goes down for an hour, what would happen to all these connection errors?
Will they get buffered in memory and written to database once it gets back?
When the errors cannot be written to db, are the notifications (mail, chat) able to be sent immediately?

In any case, I think that having a separate error database running on another server than primary db is probably a safe choice or using a local sqlite db for errors.

I will try to run some tests locally when stoppping a docker db for primary.

AnjanJ · 2026-04-03T03:03:09Z

AnjanJ
Apr 3, 2026
Maintainer

Hey @gmarziou — thank you for bringing this up, and honestly it's wonderful that you're engaging so deeply with the project. Questions like this are exactly what make RED better.

What actually happens when the DB goes down:

In sync mode (async_logging: false): the error write fails, the exception is caught internally, and the error is silently dropped. No buffer, no replay. Notifications don't fire either — they're dispatched only after a successful DB write, so there's no path to alert you via Slack/email if the DB write fails.

In async mode with the :async adapter (the new default): errors are enqueued in an in-process thread pool. If the DB is still down when the job runs, same result — dropped.

In async mode with Sidekiq or SolidQueue: this is the closest thing to "buffering" currently. Jobs are persisted to Redis (Sidekiq) or a queue table (SolidQueue) before execution. If the error DB is down when the job runs, Sidekiq will retry with exponential backoff — up to 25 retries over ~21 days. If the DB recovers within that window, the error will eventually be written. This is the best resilience posture RED currently offers.

The philosophical paradox:

Any DB-backed error tracker has an inherent blind spot: it can't capture errors that prevent it from writing to its own DB. This is true for Solid Errors, for exception_notification, and to some extent even SaaS trackers (Sentry has a local buffer but drops errors if the Sentry server is unreachable for long enough). It's not a solvable problem entirely — only partially mitigatable.

Your separate DB suggestion is correct:

Isolating the error DB on a different server means your app's primary DB going down doesn't affect error capture at all. This is the highest-leverage mitigation available today, and it's already supported — use_separate_database: true. If you're running anything production-critical, this is the recommended setup.

SQLite as a local error DB:

Interesting idea. SQLite has zero network dependency — errors write to a local file regardless of what's happening on your DB servers. The tradeoffs: one writer at a time (could bottleneck under high error rates), no built-in replication, and disk-based failures (full disk, permission errors). For low-to-medium traffic apps it could work well. This is something worth exploring as a future supported adapter.

What could theoretically be built:

A lightweight disk-based fallback buffer — when the error DB write fails, serialize the error to a local JSON file (similar to what enable_crash_capture already does for process crashes), and import those files on next successful DB connection. This would close the gap for short outages. Not built yet, but not architecturally difficult either.

Would love to hear what you find in your local Docker tests — concrete failure data would be really useful for prioritising this. And if I've missed anything in this analysis, or if you have ideas on how to approach the fallback buffer or SQLite adapter, I'd love to hear your thinking — you clearly have good instincts here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notifications and database availability #105

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Notifications and database availability #105

Uh oh!

gmarziou Apr 2, 2026

Replies: 1 comment

Uh oh!

AnjanJ Apr 3, 2026 Maintainer

gmarziou
Apr 2, 2026

AnjanJ
Apr 3, 2026
Maintainer