Skip to content

A ton of socket closed errors on prod #287

@sescobb27

Description

@sescobb27

Hi there, We're seeing thousands of errors (5-10% of errors, and when we increase load we can even crash the connection pool and crash our server due to these errors) in prod that happens when we lose our connection to Clickhouse Cloud. We have confirmed with them that these errors don't show up in their graphs, so, something is going on with the connection/reconnection handling, We're trying to figure out why we are loosing so many connections. ** (Mint.TransportError) socket closed is the error we are having, in some places we have had to add retries and that worked, but in some cases even 10 retries won't work.

reading the code, it seems the error may be in maybe_reconnect/1 as it returned conn (the broken connection) on reconnect failure, so subsequent requests would use a dead socket and fail with socket closed / timeout errors, maybe we should return something like {:disconnect, reason, conn} and update callers to handle it so it can be propagated to DBConnection so it knows the connection fails and drops that connection instead of using it?

we are using

ch 0.6.1
elixir 1.18.3-otp-27
erlang 27.3.3

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions