-
Notifications
You must be signed in to change notification settings - Fork 5
Fix/remove worker id race condition #856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The client was storing _worker_id from server responses, but this caused a race condition during socket reconnection: 1. Socket reconnects with NEW sid 2. Server assigns pending job to NEW sid 3. Socket event handler receives job:assign 4. But _worker_id still holds OLD value (not yet updated) 5. Worker sends wrong worker_id to server → 400 BAD REQUEST The fix removes _worker_id entirely and always uses socket.sio.sid directly. This is safe because: - The server always assigns jobs to the socket's current sid - The socket that receives job:assign is always the one with that sid - socket.sio.sid always reflects the current connection Changes: - Remove _worker_id field from ZnDraw dataclass - Simplify sid property to return socket.sio.sid directly - Remove worker_id storage in socket_manager._register_extensions_after_join - Update tests to use vis.sid instead of vis._worker_id Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The register_extension() and register_filesystem() methods in api_manager.py were returning workerId from server responses, but callers no longer use this value after removing _worker_id storage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughChanges add workerId propagation from join responses to ZnDraw instances, update filesystem registration parameters (from provider_type to fs_type), and refactor worker identifier usage across tests and logging to use session IDs instead of stored worker_id attributes. Changes
Sequence Diagram(s)sequenceDiagram
participant Server as Socket.IO Server
participant Client as socket_manager
participant ZnDraw as ZnDraw Instance
participant Extensions as Extensions/Filesystem
Client->>Server: Connect & Join Room
Server->>Server: Generate SID
Server->>Client: Join Response + workerId (SID)
Client->>ZnDraw: Store workerId to _worker_id
Client->>Extensions: Re-register Extensions (public/private)
Client->>Extensions: Re-register Filesystems<br/>(fs_type, public flag)
Extensions->>ZnDraw: Update internal state
Client->>ZnDraw: Update logging with sid
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/zndraw/socket_manager.py`:
- Around line 176-185: The re-registration loop passes wrong parameters to
APIManager.register_filesystem causing a TypeError; update the call in
socket_manager.py to match APIManager.register_filesystem's signature by passing
name, fs_type (use provider.__class__.__name__), socket_manager=self, and
public=fs["public"] and remove the unexpected root_path/provider_type arguments;
locate the loop iterating self.zndraw._filesystems and the call to
register_filesystem to make this change.
In `@src/zndraw/zndraw.py`:
- Around line 1574-1582: The log message after registering an extension still
uses the label "worker_id" even though it now uses the socket id (self.sid);
update the log.info call that follows self.api.register_extension (the f-string
referencing name and scope) to use a clearer label such as "sid" or "socket_id"
(e.g., "(sid: {self.sid})") and make the same change for the second occurrence
around the register_extension call near the later block (the similar log.info at
the other location).
In `@tests/test_job_endpoints.py`:
- Around line 141-145: Add an explicit timeout=10 kwarg to every requests call
in this test file: locate all uses of the requests module (e.g., the
requests.put call that updates job status using
f"{server}/api/rooms/test/jobs/{job.job_id}/status" and other
requests.get/post/put calls that pass headers=get_jwt_auth_headers(server,
"testuser") or similar) and append timeout=10 to each call so the existing
positional and keyword arguments remain unchanged; ensure you update all nine
occurrences in this file to match the codebase standard.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/zndraw/api_manager.pysrc/zndraw/socket_manager.pysrc/zndraw/zndraw.pytests/test_job_endpoints.py
💤 Files with no reviewable changes (1)
- src/zndraw/api_manager.py
🧰 Additional context used
📓 Path-based instructions (2)
**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/test_*.py: Do not use@pytest.mark.xfailor similar - all tests must pass
When designing new tests, read the old tests first to understand the existing patterns
Usepytest.mark.parametrizeto avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class
Files:
tests/test_job_endpoints.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: All default values must be defined exclusively within the Pydantic model - do not scatter fallback logic throughout the codebase
Do not perform null checks combined with hardcoded literals for default values - rely on the Pydantic schema to populate default values during initialization
You cannot use LUA scripts in Redis
If sensible, implement collections.abc interfaces for your classes, such as MutableMapping or MutableSequence
Use numpy style docstrings that are concise and to the point
Use type hints wherever possible - uselist[int|float] | Noneinstead oft.Optional[t.List[int|float]]
Imports should always be at the top of the file unless they affect startup time of ZnDraw and can be lazy loaded
Files:
tests/test_job_endpoints.pysrc/zndraw/socket_manager.pysrc/zndraw/zndraw.py
🧬 Code graph analysis (3)
tests/test_job_endpoints.py (1)
src/zndraw/zndraw.py (1)
sid(686-697)
src/zndraw/socket_manager.py (2)
src/zndraw/zndraw.py (1)
register_extension(1519-1581)src/zndraw/api_manager.py (1)
register_extension(319-375)
src/zndraw/zndraw.py (2)
src/zndraw/api_manager.py (2)
register_extension(319-375)register_filesystem(377-429)src/zndraw/app/worker_routes.py (1)
register_filesystem(303-487)
🪛 Ruff (0.14.11)
tests/test_job_endpoints.py
166-166: Probable use of requests call without timeout
(S113)
283-283: Probable use of requests call without timeout
(S113)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pytest (3.12, ubuntu-latest)
- GitHub Check: pytest (3.13, ubuntu-latest)
- GitHub Check: pytest (3.11, ubuntu-latest)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
In
In
In
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #856 +/- ##
==========================================
- Coverage 80.04% 79.77% -0.27%
==========================================
Files 165 165
Lines 20131 20205 +74
==========================================
+ Hits 16114 16119 +5
- Misses 4017 4086 +69 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…emoval (#857) * Initial plan * fix: address review comments - update register_filesystem params, log labels, and add request timeouts Co-authored-by: PythonFZ <46721498+PythonFZ@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: PythonFZ <46721498+PythonFZ@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/zndraw/socket_manager.py`:
- Around line 176-184: The reconnect loop is using the wrong keys and field
names when re-registering filesystems; change the loop to iterate the stored
mapping so you pass the original filesystem name (not the composite loop key)
and the stored fs instance's provider; specifically, when iterating
self.zndraw._filesystems use the actual stored name and stored fs object (the
entry that contains the provider under key "fs", or use the instance directly
per src/zndraw/zndraw.py), then call self.zndraw.api.register_filesystem with
name set to the original filesystem name and fs_type derived from the stored fs
instance's provider (provider.__class__.__name__) and public from the stored
entry, ensuring you reference the correct keys used by the _filesystems
structure rather than fs["provider"] and the composite loop key.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/zndraw/socket_manager.pysrc/zndraw/zndraw.pytests/test_job_endpoints.py
🧰 Additional context used
📓 Path-based instructions (2)
**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/test_*.py: Do not use@pytest.mark.xfailor similar - all tests must pass
When designing new tests, read the old tests first to understand the existing patterns
Usepytest.mark.parametrizeto avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class
Files:
tests/test_job_endpoints.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: All default values must be defined exclusively within the Pydantic model - do not scatter fallback logic throughout the codebase
Do not perform null checks combined with hardcoded literals for default values - rely on the Pydantic schema to populate default values during initialization
You cannot use LUA scripts in Redis
If sensible, implement collections.abc interfaces for your classes, such as MutableMapping or MutableSequence
Use numpy style docstrings that are concise and to the point
Use type hints wherever possible - uselist[int|float] | Noneinstead oft.Optional[t.List[int|float]]
Imports should always be at the top of the file unless they affect startup time of ZnDraw and can be lazy loaded
Files:
tests/test_job_endpoints.pysrc/zndraw/zndraw.pysrc/zndraw/socket_manager.py
🧬 Code graph analysis (2)
tests/test_job_endpoints.py (1)
src/zndraw/zndraw.py (1)
sid(686-697)
src/zndraw/socket_manager.py (3)
src/zndraw/zndraw.py (2)
register_extension(1519-1581)register_filesystem(1630-1702)src/zndraw/api_manager.py (2)
register_extension(319-375)register_filesystem(377-429)src/zndraw/app/worker_routes.py (1)
register_filesystem(303-487)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pytest (3.13, ubuntu-latest)
- GitHub Check: pytest (3.11, ubuntu-latest)
- GitHub Check: pytest (3.12, ubuntu-latest)
🔇 Additional comments (7)
src/zndraw/zndraw.py (3)
685-697: Sid property docstring clarity looks good.
Clear explanation of the socket.io session ID and its usage.
1574-1581: Logging now matches the sid-based workflow.
Nice consistency with the new worker/session semantics.
1695-1702: Filesystem registration call matches the updated API signature.
Looks consistent with the fs_type-based API.tests/test_job_endpoints.py (3)
77-100: Timeouts on GET requests reduce test hangs.
Good alignment with the rest of the test suite.
133-207: Status updates now consistently usevis.sidwith timeouts.
This keeps job lifecycle tests aligned with the new session semantics.
220-295: Job refresh/lifecycle updates are consistent and resilient.
Nice to see timeouts added across the remaining PUT calls.src/zndraw/socket_manager.py (1)
157-174: Extension re-registration matches the updated API usage.
No worker_id capture is consistent with the sid-based flow.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
- Add workerId to room:join response (single source of truth) - Capture _worker_id on connect and reconnect - Remove redundant _worker_id captures from registration methods (DRY) - Fix filesystem re-registration keys (fs_entry['fs'], fs_entry['name']) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.