Fix use-after-close race causing SEGV #251

jeremy · 2025-11-24T23:20:03Z

Fixes a race condition that causes SIGSEGV when a connection is closed while another thread is mid-query.

The library isn't meant to be thread-safe, but in case of mistaken concurrent use let's raise exceptions rather than crashing. Prefer both belt and suspenders when they're cheap and available.

Adds NULL deref guards to protect against connection close while awaiting IO (rb_wait_for_single_fd without GVL). Now raises Trilogy::ConnectionClosed instead of segfaulting.

Guards:

read_packet: check socket and buffer before read
trilogy_flush_writes: check socket before write
current_packet_type: check buffer before access
Ruby wrapper flush_writes: early exit on closed socket
Ruby read_query_response: check socket after wait returns
Ruby buffer_checkout/checkin: NULL pool fallback

/cc @jhawthorn

Fixes a race condition that causes SIGSEGV when a connection is closed while another thread is mid-query. The library isn't meant to be thread-safe, but in case of mistaken concurrent use let's raise exceptions rather than crashing. Prefer both belt and suspenders when they're cheap and available. Adds NULL deref guards to protect against connection close while awaiting IO (rb_wait_for_single_fd without GVL). Now raises Trilogy::ConnectionClosed instead of segfaulting. Guards: * read_packet: check socket and buffer before read * trilogy_flush_writes: check socket before write * current_packet_type: check buffer before access * Ruby wrapper flush_writes: early exit on closed socket * Ruby read_query_response: check socket after wait returns * Ruby buffer_checkout/checkin: NULL pool fallback

jhawthorn · 2025-12-01T21:11:36Z

I don't think I agree with this approach. Using a connection in parallel is so dangerous that a SEGV is preferable to a silent failure (or a failure that may be retried at another level of the stack, like TRILOGY_CLOSED_CONNECTION is likely to be)

jeremy · 2025-12-01T22:51:31Z

@jhawthorn Goal here is louder failure, not silent failure. When a developer makes the mistake of concurrent access, a Ruby exception pointing at the call site will help them fix it faster than a SEGV they may never see with a native backtrace.

(SEGV in production is often silent or lost in the noise, manifesting as a worker failure + supervisor restart rather than an exception that can be reported and acted on. You get a C backtrace and crash dump rather than a relevant Ruby backtrace, and many environments have core dumps disabled. The Ruby exception is easier to observe and act on: it leads you straight to where concurrent access originated and gets logged with relevant context.)

Good point re. retry risk. Could raise a distinct exception like Trilogy::UnsafeConcurrentAccess that's clearly fatal.

Otherwise, these are cheap NULL checks. They don't make the library thread-safe or encourage concurrent use; they just make violations debuggable rather than catastrophic. Big bonus is that they communicate that the library is fully aware of where state violations may occur and how they should be handled.

jhawthorn · 2025-12-01T23:14:34Z

I think this will provide a false sense of security that invalid concurrent access will raise cleanly. In reality there's no guarantee these NULL checks will practically guard against issues. The value could just as easily become NULL the instant after it is checked.

I would like something that better guards against concurrent access, but it should either look like the mutexes added by #226 or using atomic operations to mark when a connection is in use. Anything else is unsound.

matthewd · 2025-12-01T23:30:26Z

Yeah, I agree with @jhawthorn.

Big bonus is that they communicate that the library is fully aware of where state violations may occur and how they should be handled.

It's not though, and that's the danger: these aren't "oopsie you pushed concurrent-access a little too far, have an exception and back up a bit"; they're evidence that you have fundamentally mishandled the library in a way that has probably already been leaking your customers' data to each other in ways we have no practical way of detecting let alone reporting.

jeremy · 2025-12-01T23:34:40Z

Agreed the preconditions and checks are inherently racy and are incomplete without synchronization. I'll withdraw until that's addressed and these can become invariant checks (indicating improper synchronization impl) rather than concurrency bomb sniffers.

jeremy closed this Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix use-after-close race causing SEGV #251

Fix use-after-close race causing SEGV #251

Uh oh!

jeremy commented Nov 24, 2025 •

edited

Loading

Uh oh!

jhawthorn commented Dec 1, 2025

Uh oh!

jeremy commented Dec 1, 2025

Uh oh!

jhawthorn commented Dec 1, 2025

Uh oh!

matthewd commented Dec 1, 2025

Uh oh!

jeremy commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix use-after-close race causing SEGV #251

Fix use-after-close race causing SEGV #251

Uh oh!

Conversation

jeremy commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhawthorn commented Dec 1, 2025

Uh oh!

jeremy commented Dec 1, 2025

Uh oh!

jhawthorn commented Dec 1, 2025

Uh oh!

matthewd commented Dec 1, 2025

Uh oh!

jeremy commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeremy commented Nov 24, 2025 •

edited

Loading