Fix hang when HEGEL_SERVER_COMMAND points to incompatible binary#124
Fix hang when HEGEL_SERVER_COMMAND points to incompatible binary#124
Conversation
There was a problem hiding this comment.
This PR makes me nervous and I after thinking about it I figured out why.
hegel-rust was implicitly relying on hegel_server_binary --stdio to be part of the protocol. It isn't, and we didn't bump the protocol in hegel-core when we introduced it, or our supported protocol versions in hegel-rust when we used it.
Instead of adding more specific error messages (which BTW also implicitly make the hegel_server_binary --version format a part of the protocol, which it isn't formally - ie this PR won't work with non-hegel-core-servers), I would prefer to drop the error message changes in this PR and fix this at the root by remembering to bump the protocol version when adding command-line flags that are actually part of the protocol. If we do that, we would get the standard protocol error message here.
This is another thing that would have been caught by #39. I'm going to work on that now.
|
OK, obvious problem with this approach: the server protocol version is only communicated during the handshake. I'll modify my proposal to depend on hegeldev/hegel-core#67, and then this error path parses that protocol version to give the right error message. |
|
I think this doesn't make Agreed we probably should have bumped the min version of the protocol but... I don't think there was any way to make this a non-breaking change, and it's part of the slightly awkward reality that Hegel isn't really independent of the server yet. It will become much easier again when we move everything over to stdio as the default and can drop that flag and just specify that a hegel "server" always communicates that way. Agreed this PR is a sign we're doing things not quite right, but I think it's still a strict improvement on the status quo. |
When the server process exits immediately (e.g. unsupported --stdio flag
or non-hegel binary), recv() on the channel blocked forever because the
Sender handles in channel_senders were never dropped. Drop all senders
when the reader thread exits so pending recv() calls unblock.
Also improve error messages: detect whether the binary is a hegel-core
of the wrong version ("possibly wrong hegel-core version") vs not a
hegel binary at all.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8fe2b7f to
7a1bf43
Compare
Tests would hang if you were using an old version of hegel-core that didn't support the
--stdioflag. This fixes that and adds some comprehensive debugging messages when the server start doesn't work.