Skip to content

Fix hang when HEGEL_SERVER_COMMAND points to incompatible binary#124

Open
DRMacIver wants to merge 3 commits intomainfrom
fix/bad-server-command-hang
Open

Fix hang when HEGEL_SERVER_COMMAND points to incompatible binary#124
DRMacIver wants to merge 3 commits intomainfrom
fix/bad-server-command-hang

Conversation

@DRMacIver
Copy link
Copy Markdown
Member

@DRMacIver DRMacIver commented Mar 26, 2026

Tests would hang if you were using an old version of hegel-core that didn't support the --stdio flag. This fixes that and adds some comprehensive debugging messages when the server start doesn't work.

@DRMacIver DRMacIver marked this pull request as ready for review March 26, 2026 11:48
@DRMacIver DRMacIver requested a review from Liam-DeVoe March 26, 2026 11:48
Copy link
Copy Markdown
Member

@Liam-DeVoe Liam-DeVoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes me nervous and I after thinking about it I figured out why.

hegel-rust was implicitly relying on hegel_server_binary --stdio to be part of the protocol. It isn't, and we didn't bump the protocol in hegel-core when we introduced it, or our supported protocol versions in hegel-rust when we used it.

Instead of adding more specific error messages (which BTW also implicitly make the hegel_server_binary --version format a part of the protocol, which it isn't formally - ie this PR won't work with non-hegel-core-servers), I would prefer to drop the error message changes in this PR and fix this at the root by remembering to bump the protocol version when adding command-line flags that are actually part of the protocol. If we do that, we would get the standard protocol error message here.

This is another thing that would have been caught by #39. I'm going to work on that now.

@Liam-DeVoe
Copy link
Copy Markdown
Member

OK, obvious problem with this approach: the server protocol version is only communicated during the handshake. I'll modify my proposal to depend on hegeldev/hegel-core#67, and then this error path parses that protocol version to give the right error message.

@DRMacIver
Copy link
Copy Markdown
Member Author

I think this doesn't make --version part of the protocol, because it explicitly handles the case where --version doesn't work. It just makes the debugging experience better when --version is supported, which I think is fine.

Agreed we probably should have bumped the min version of the protocol but... I don't think there was any way to make this a non-breaking change, and it's part of the slightly awkward reality that Hegel isn't really independent of the server yet. --stdio already broke that, but it was always implicitly the case that if we are launching the binary ourselves then the CLI is part of the interface.

It will become much easier again when we move everything over to stdio as the default and can drop that flag and just specify that a hegel "server" always communicates that way.

Agreed this PR is a sign we're doing things not quite right, but I think it's still a strict improvement on the status quo.

DRMacIver and others added 2 commits March 27, 2026 17:01
When the server process exits immediately (e.g. unsupported --stdio flag
or non-hegel binary), recv() on the channel blocked forever because the
Sender handles in channel_senders were never dropped. Drop all senders
when the reader thread exits so pending recv() calls unblock.

Also improve error messages: detect whether the binary is a hegel-core
of the wrong version ("possibly wrong hegel-core version") vs not a
hegel binary at all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@DRMacIver DRMacIver force-pushed the fix/bad-server-command-hang branch from 8fe2b7f to 7a1bf43 Compare March 27, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants