Skip to content

Conversation

@PythonFZ
Copy link
Member

@PythonFZ PythonFZ commented Jan 16, 2026

Summary by CodeRabbit

  • New Features

    • Worker identification is now included in room-join responses for better session tracking.
  • Bug Fixes

    • Improved worker identification handling during reconnection flows.
    • Added request timeouts to prevent operations from hanging.
  • Refactor

    • Updated filesystem registration parameter handling for consistency.

✏️ Tip: You can customize this high-level summary in your review settings.

PythonFZ and others added 2 commits January 16, 2026 13:30
The client was storing _worker_id from server responses, but this caused
a race condition during socket reconnection:

1. Socket reconnects with NEW sid
2. Server assigns pending job to NEW sid
3. Socket event handler receives job:assign
4. But _worker_id still holds OLD value (not yet updated)
5. Worker sends wrong worker_id to server → 400 BAD REQUEST

The fix removes _worker_id entirely and always uses socket.sio.sid
directly. This is safe because:
- The server always assigns jobs to the socket's current sid
- The socket that receives job:assign is always the one with that sid
- socket.sio.sid always reflects the current connection

Changes:
- Remove _worker_id field from ZnDraw dataclass
- Simplify sid property to return socket.sio.sid directly
- Remove worker_id storage in socket_manager._register_extensions_after_join
- Update tests to use vis.sid instead of vis._worker_id

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The register_extension() and register_filesystem() methods in
api_manager.py were returning workerId from server responses, but
callers no longer use this value after removing _worker_id storage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

Changes add workerId propagation from join responses to ZnDraw instances, update filesystem registration parameters (from provider_type to fs_type), and refactor worker identifier usage across tests and logging to use session IDs instead of stored worker_id attributes.

Changes

Cohort / File(s) Summary
Socket connection and worker ID propagation
src/zndraw/socket_manager.py
On both initial and auto-reconnect paths, now capture and store workerId from join response via self.zndraw._worker_id. In filesystem re-registration after join, iterate over filesystem entries as dictionaries, extract fs instance, and call register_filesystem with renamed parameter fs_type (replacing provider_type) and removed root_path parameter.
Worker ID storage and logging updates
src/zndraw/zndraw.py
Removed persistent storage of worker_id after extension and filesystem registration. Updated logging references to use self.sid instead of stored worker_id. Added comments clarifying that worker_id derives from room:join response and that REST registration paths do not store worker_id locally.
Test updates for worker identifier and timeout
tests/test_job_endpoints.py
Changed all job status API calls to use vis.sid instead of vis._worker_id for workerId in payloads. Added timeout=10 parameter to GET and PUT requests to prevent test hangs.
Join response payload expansion
src/zndraw/app/events.py
Added "workerId": sid field to the successful join response, exposing the server's Socket.IO SID for worker identification.

Sequence Diagram(s)

sequenceDiagram
    participant Server as Socket.IO Server
    participant Client as socket_manager
    participant ZnDraw as ZnDraw Instance
    participant Extensions as Extensions/Filesystem

    Client->>Server: Connect & Join Room
    Server->>Server: Generate SID
    Server->>Client: Join Response + workerId (SID)
    Client->>ZnDraw: Store workerId to _worker_id
    Client->>Extensions: Re-register Extensions (public/private)
    Client->>Extensions: Re-register Filesystems<br/>(fs_type, public flag)
    Extensions->>ZnDraw: Update internal state
    Client->>ZnDraw: Update logging with sid
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Refactor/room join minimal response #841: Refactors room-join response payload and adjusts socket_manager join/reconnect logic, modifying the same join/connect code paths affected by workerId propagation changes in this PR.

Poem

🐰 A worker now knows its own name,
Through the join response it came—
SessionID as wings it flies,
While filesystems are re-organized,
The socket dance, forever the same! 🎪

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly addresses the main objective: fixing a worker ID race condition. It accurately summarizes the core issue being resolved across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/zndraw/socket_manager.py`:
- Around line 176-185: The re-registration loop passes wrong parameters to
APIManager.register_filesystem causing a TypeError; update the call in
socket_manager.py to match APIManager.register_filesystem's signature by passing
name, fs_type (use provider.__class__.__name__), socket_manager=self, and
public=fs["public"] and remove the unexpected root_path/provider_type arguments;
locate the loop iterating self.zndraw._filesystems and the call to
register_filesystem to make this change.

In `@src/zndraw/zndraw.py`:
- Around line 1574-1582: The log message after registering an extension still
uses the label "worker_id" even though it now uses the socket id (self.sid);
update the log.info call that follows self.api.register_extension (the f-string
referencing name and scope) to use a clearer label such as "sid" or "socket_id"
(e.g., "(sid: {self.sid})") and make the same change for the second occurrence
around the register_extension call near the later block (the similar log.info at
the other location).

In `@tests/test_job_endpoints.py`:
- Around line 141-145: Add an explicit timeout=10 kwarg to every requests call
in this test file: locate all uses of the requests module (e.g., the
requests.put call that updates job status using
f"{server}/api/rooms/test/jobs/{job.job_id}/status" and other
requests.get/post/put calls that pass headers=get_jwt_auth_headers(server,
"testuser") or similar) and append timeout=10 to each call so the existing
positional and keyword arguments remain unchanged; ensure you update all nine
occurrences in this file to match the codebase standard.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9617d98 and 1f4e3d5.

📒 Files selected for processing (4)
  • src/zndraw/api_manager.py
  • src/zndraw/socket_manager.py
  • src/zndraw/zndraw.py
  • tests/test_job_endpoints.py
💤 Files with no reviewable changes (1)
  • src/zndraw/api_manager.py
🧰 Additional context used
📓 Path-based instructions (2)
**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/test_*.py: Do not use @pytest.mark.xfail or similar - all tests must pass
When designing new tests, read the old tests first to understand the existing patterns
Use pytest.mark.parametrize to avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class

Files:

  • tests/test_job_endpoints.py
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: All default values must be defined exclusively within the Pydantic model - do not scatter fallback logic throughout the codebase
Do not perform null checks combined with hardcoded literals for default values - rely on the Pydantic schema to populate default values during initialization
You cannot use LUA scripts in Redis
If sensible, implement collections.abc interfaces for your classes, such as MutableMapping or MutableSequence
Use numpy style docstrings that are concise and to the point
Use type hints wherever possible - use list[int|float] | None instead of t.Optional[t.List[int|float]]
Imports should always be at the top of the file unless they affect startup time of ZnDraw and can be lazy loaded

Files:

  • tests/test_job_endpoints.py
  • src/zndraw/socket_manager.py
  • src/zndraw/zndraw.py
🧬 Code graph analysis (3)
tests/test_job_endpoints.py (1)
src/zndraw/zndraw.py (1)
  • sid (686-697)
src/zndraw/socket_manager.py (2)
src/zndraw/zndraw.py (1)
  • register_extension (1519-1581)
src/zndraw/api_manager.py (1)
  • register_extension (319-375)
src/zndraw/zndraw.py (2)
src/zndraw/api_manager.py (2)
  • register_extension (319-375)
  • register_filesystem (377-429)
src/zndraw/app/worker_routes.py (1)
  • register_filesystem (303-487)
🪛 Ruff (0.14.11)
tests/test_job_endpoints.py

166-166: Probable use of requests call without timeout

(S113)


283-283: Probable use of requests call without timeout

(S113)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pytest (3.12, ubuntu-latest)
  • GitHub Check: pytest (3.13, ubuntu-latest)
  • GitHub Check: pytest (3.11, ubuntu-latest)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@PythonFZ
Copy link
Member Author

@copilot

In @src/zndraw/socket_manager.py:

  • Around line 176-185: The re-registration loop passes wrong parameters to
    APIManager.register_filesystem causing a TypeError; update the call in
    socket_manager.py to match APIManager.register_filesystem's signature by passing
    name, fs_type (use provider.class.name), socket_manager=self, and
    public=fs["public"] and remove the unexpected root_path/provider_type arguments;
    locate the loop iterating self.zndraw._filesystems and the call to
    register_filesystem to make this change.

In @src/zndraw/zndraw.py:

  • Around line 1574-1582: The log message after registering an extension still
    uses the label "worker_id" even though it now uses the socket id (self.sid);
    update the log.info call that follows self.api.register_extension (the f-string
    referencing name and scope) to use a clearer label such as "sid" or "socket_id"
    (e.g., "(sid: {self.sid})") and make the same change for the second occurrence
    around the register_extension call near the later block (the similar log.info at
    the other location).

In @tests/test_job_endpoints.py:

  • Around line 141-145: Add an explicit timeout=10 kwarg to every requests call
    in this test file: locate all uses of the requests module (e.g., the
    requests.put call that updates job status using
    f"{server}/api/rooms/test/jobs/{job.job_id}/status" and other
    requests.get/post/put calls that pass headers=get_jwt_auth_headers(server,
    "testuser") or similar) and append timeout=10 to each call so the existing
    positional and keyword arguments remain unchanged; ensure you update all nine
    occurrences in this file to match the codebase standard.

Copy link
Contributor

Copilot AI commented Jan 16, 2026

@PythonFZ I've opened a new pull request, #857, to work on those changes. Once the pull request is ready, I'll request review from you.

@codecov-commenter
Copy link

codecov-commenter commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.77%. Comparing base (9617d98) to head (ea2ae9d).
⚠️ Report is 6 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/zndraw/socket_manager.py 57.14% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #856      +/-   ##
==========================================
- Coverage   80.04%   79.77%   -0.27%     
==========================================
  Files         165      165              
  Lines       20131    20205      +74     
==========================================
+ Hits        16114    16119       +5     
- Misses       4017     4086      +69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI and others added 2 commits January 16, 2026 16:33
…emoval (#857)

* Initial plan

* fix: address review comments - update register_filesystem params, log labels, and add request timeouts

Co-authored-by: PythonFZ <46721498+PythonFZ@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: PythonFZ <46721498+PythonFZ@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/zndraw/socket_manager.py`:
- Around line 176-184: The reconnect loop is using the wrong keys and field
names when re-registering filesystems; change the loop to iterate the stored
mapping so you pass the original filesystem name (not the composite loop key)
and the stored fs instance's provider; specifically, when iterating
self.zndraw._filesystems use the actual stored name and stored fs object (the
entry that contains the provider under key "fs", or use the instance directly
per src/zndraw/zndraw.py), then call self.zndraw.api.register_filesystem with
name set to the original filesystem name and fs_type derived from the stored fs
instance's provider (provider.__class__.__name__) and public from the stored
entry, ensuring you reference the correct keys used by the _filesystems
structure rather than fs["provider"] and the composite loop key.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f4e3d5 and 640d662.

📒 Files selected for processing (3)
  • src/zndraw/socket_manager.py
  • src/zndraw/zndraw.py
  • tests/test_job_endpoints.py
🧰 Additional context used
📓 Path-based instructions (2)
**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/test_*.py: Do not use @pytest.mark.xfail or similar - all tests must pass
When designing new tests, read the old tests first to understand the existing patterns
Use pytest.mark.parametrize to avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class

Files:

  • tests/test_job_endpoints.py
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: All default values must be defined exclusively within the Pydantic model - do not scatter fallback logic throughout the codebase
Do not perform null checks combined with hardcoded literals for default values - rely on the Pydantic schema to populate default values during initialization
You cannot use LUA scripts in Redis
If sensible, implement collections.abc interfaces for your classes, such as MutableMapping or MutableSequence
Use numpy style docstrings that are concise and to the point
Use type hints wherever possible - use list[int|float] | None instead of t.Optional[t.List[int|float]]
Imports should always be at the top of the file unless they affect startup time of ZnDraw and can be lazy loaded

Files:

  • tests/test_job_endpoints.py
  • src/zndraw/zndraw.py
  • src/zndraw/socket_manager.py
🧬 Code graph analysis (2)
tests/test_job_endpoints.py (1)
src/zndraw/zndraw.py (1)
  • sid (686-697)
src/zndraw/socket_manager.py (3)
src/zndraw/zndraw.py (2)
  • register_extension (1519-1581)
  • register_filesystem (1630-1702)
src/zndraw/api_manager.py (2)
  • register_extension (319-375)
  • register_filesystem (377-429)
src/zndraw/app/worker_routes.py (1)
  • register_filesystem (303-487)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pytest (3.13, ubuntu-latest)
  • GitHub Check: pytest (3.11, ubuntu-latest)
  • GitHub Check: pytest (3.12, ubuntu-latest)
🔇 Additional comments (7)
src/zndraw/zndraw.py (3)

685-697: Sid property docstring clarity looks good.
Clear explanation of the socket.io session ID and its usage.


1574-1581: Logging now matches the sid-based workflow.
Nice consistency with the new worker/session semantics.


1695-1702: Filesystem registration call matches the updated API signature.
Looks consistent with the fs_type-based API.

tests/test_job_endpoints.py (3)

77-100: Timeouts on GET requests reduce test hangs.
Good alignment with the rest of the test suite.


133-207: Status updates now consistently use vis.sid with timeouts.
This keeps job lifecycle tests aligned with the new session semantics.


220-295: Job refresh/lifecycle updates are consistent and resilient.
Nice to see timeouts added across the remaining PUT calls.

src/zndraw/socket_manager.py (1)

157-174: Extension re-registration matches the updated API usage.
No worker_id capture is consistent with the sid-based flow.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

- Add workerId to room:join response (single source of truth)
- Capture _worker_id on connect and reconnect
- Remove redundant _worker_id captures from registration methods (DRY)
- Fix filesystem re-registration keys (fs_entry['fs'], fs_entry['name'])

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@PythonFZ PythonFZ merged commit fe5b9c9 into main Jan 19, 2026
6 checks passed
@PythonFZ PythonFZ deleted the fix/remove-worker-id-race-condition branch January 19, 2026 21:50
@coderabbitai coderabbitai bot mentioned this pull request Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants