-
Notifications
You must be signed in to change notification settings - Fork 529
Add regression test for WebSocket close propagation through containers #6057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add regression test for WebSocket close propagation through containers #6057
Conversation
Exercises the full eyeball path: external client connects over a real TCP socket to workerd serve, which forwards through a Durable Object to a container via getTcpPort(8080).fetch() with WebSocket upgrade. Data proxies correctly; the close handshake does not complete.
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
Merging this PR will degrade performance by 11.48%
Performance Changes
Comparing Footnotes
|
| ws.close(1000, 'client closing'); | ||
| }); | ||
|
|
||
| ws.addEventListener('close', (event) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You always need to make sure to reciprocate the close:
ws.close(1000, 'closing in response to client close');
That is likely the bug you have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is a gap in my repro, but even with reciprocation, I've verified the close event doesn't propagate back to the client. And my earliest test for this on the sandbox-sdk was a server-initiated close scenario, which would still remain.
I can try to expand on the repro to add these cases if that'd be useful!
Summary
This is a repro for the WebSocket close propagation bug discussed in capnproto/capnproto#2556. Sandbox SDK recently introduced a new API to connect xterm.js to a PTY within the container via Websocket. The SDK proxies a WebSocket from the container (via workerd’s
tcpPort.fetch()upgrade path) and then returns that WebSocket to the browser. When testing this feature locally, messages proxy correctly in both directions, but closing the WebSocket does not work. Upon inspection, on the container side it looks like the close handshake never completes unless the connection is forcibly terminated. I wasn’t able to reproduce this behaviour in production; it seems specific to the local-dev stack.The capnproto PR isolated the root cause to KJ's optimised WebSocket pumping, but broadly disabling optimised pumps is obviously not the right approach. This PR provides a workerd-level repro so the behaviour can be fixed here.
Details
The issue is specific to the eyeball coupling path. A worker that accepts the
tcpPort.fetch()WebSocket internally viaws.accept()observes close events fine. The failure happens when that WebSocket is returned asResponse.webSocketand workerd couples it to the external client.Test design
Uses
sh_testwithworkerd serveand an external Node WebSocket client rather than a stub-basedwd_test. Stub-based tests use an in-runtime client that doesn't exercise theResponse.webSocketcoupling boundary, so it works well.The test verifies a bidirectional echo (proving the connection works), then initiates a clean close and expects code 1000 back. It is expected to fail until the underlying bug is fixed — currently with
Timed out after 5000ms waiting for close event.