Replies: 1 comment
-
|
Hey @gmarziou — thank you for bringing this up, and honestly it's wonderful that you're engaging so deeply with the project. Questions like this are exactly what make RED better. What actually happens when the DB goes down: In sync mode ( In async mode with the In async mode with Sidekiq or SolidQueue: this is the closest thing to "buffering" currently. Jobs are persisted to Redis (Sidekiq) or a queue table (SolidQueue) before execution. If the error DB is down when the job runs, Sidekiq will retry with exponential backoff — up to 25 retries over ~21 days. If the DB recovers within that window, the error will eventually be written. This is the best resilience posture RED currently offers. The philosophical paradox: Any DB-backed error tracker has an inherent blind spot: it can't capture errors that prevent it from writing to its own DB. This is true for Solid Errors, for exception_notification, and to some extent even SaaS trackers (Sentry has a local buffer but drops errors if the Sentry server is unreachable for long enough). It's not a solvable problem entirely — only partially mitigatable. Your separate DB suggestion is correct: Isolating the error DB on a different server means your app's primary DB going down doesn't affect error capture at all. This is the highest-leverage mitigation available today, and it's already supported — SQLite as a local error DB: Interesting idea. SQLite has zero network dependency — errors write to a local file regardless of what's happening on your DB servers. The tradeoffs: one writer at a time (could bottleneck under high error rates), no built-in replication, and disk-based failures (full disk, permission errors). For low-to-medium traffic apps it could work well. This is something worth exploring as a future supported adapter. What could theoretically be built: A lightweight disk-based fallback buffer — when the error DB write fails, serialize the error to a local JSON file (similar to what Would love to hear what you find in your local Docker tests — concrete failure data would be really useful for prioritising this. And if I've missed anything in this analysis, or if you have ideas on how to approach the fallback buffer or SQLite adapter, I'd love to hear your thinking — you clearly have good instincts here. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @AnjanJ
Imagine my database server goes down for an hour, what would happen to all these connection errors?
Will they get buffered in memory and written to database once it gets back?
When the errors cannot be written to db, are the notifications (mail, chat) able to be sent immediately?
In any case, I think that having a separate error database running on another server than primary db is probably a safe choice or using a local sqlite db for errors.
I will try to run some tests locally when stoppping a docker db for primary.
Beta Was this translation helpful? Give feedback.
All reactions