-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Hi there, We're seeing thousands of errors (5-10% of errors, and when we increase load we can even crash the connection pool and crash our server due to these errors) in prod that happens when we lose our connection to Clickhouse Cloud. We have confirmed with them that these errors don't show up in their graphs, so, something is going on with the connection/reconnection handling, We're trying to figure out why we are loosing so many connections. ** (Mint.TransportError) socket closed is the error we are having, in some places we have had to add retries and that worked, but in some cases even 10 retries won't work.
reading the code, it seems the error may be in maybe_reconnect/1 as it returned conn (the broken connection) on reconnect failure, so subsequent requests would use a dead socket and fail with socket closed / timeout errors, maybe we should return something like {:disconnect, reason, conn} and update callers to handle it so it can be propagated to DBConnection so it knows the connection fails and drops that connection instead of using it?
we are using
ch 0.6.1
elixir 1.18.3-otp-27
erlang 27.3.3