Skip to content

Conversation

@polfeliu
Copy link

@polfeliu polfeliu commented Dec 17, 2025

Hi, I previously contributed with this PR: #156 which has prevented a lot of crashes in the applications/scripts I do. However, I've detected there's a very small race condition that I would like to address.

The Problem

A race condition occurs when close() is called while another thread executes an SDO transaction or other context-dependent operation:

  1. Thread A acquires critical section locks during SDO operation
  2. Thread B calls close() and destroys those locks
  3. Thread A waits indefinitely on a lock that no longer exists
  4. Result: hangs, crashes, undefined behavior

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletecriticalsection
"If a critical section is deleted while it is still owned, the state of the threads waiting for ownership of the deleted critical section is undefined."

The Solution

Implement a GIL-protected operation counter with timeout-based close handshake:

  • On operation start: Increment counter, reject if closing
  • On operation end: Decrement counter, signal event when zero
  • On close: Set closing flag, wait up to 5 seconds for counter to reach zero, then cleanup regardless

@bnjmnp
Copy link
Owner

bnjmnp commented Jan 4, 2026

Hi @polfeliu. Thanks for another great suggestion. I'm starting to think this through and I'm wondering: If PySOEM doesn't use internal threading, and race conditions only occur if the user implements threading on their side, why should the protection mechanism be built into PySOEM rather than being handled in the user's code?

@polfeliu
Copy link
Author

polfeliu commented Jan 7, 2026

Hi @bnjmnp You’re right that PySOEM doesn’t create threads, but the race occurs because close() tears down internal locks that only PySOEM knows about.

Users can’t reliably guard against that without fully wrapping the library, essentially building a “PySOEMsafethreaded” layer, which feels like a heavy burden. It’s common to start with a simple single-threaded script and later add threads, so having safety built-in helps avoid surprises and rewrites to add a simple thread.

I think thread/memory safety management is a matter of opinion these days, but I prefer not to push that responsibility to higher layers. It usually makes things more complicated or fragmented. That said, I will respect your opinion as the maintainer.

A lightweight handshake (closing flag + op counter) makes close() safe and predictable without extra complexity for users, and prevents undefined behavior at the C level. Minimal overhead, big robustness win.

@bnjmnp
Copy link
Owner

bnjmnp commented Jan 11, 2026

You're right, adding thread safety is a great improvement. Thank you for elaborating on this; it's helpful to have the detailed reasoning for these changes tracked here in the PR.
Why did you went for a Draft PR? Can we turn it to a normal PR?

@bnjmnp
Copy link
Owner

bnjmnp commented Jan 11, 2026

Also, it would be interesting how much the new _operation_context() effects the performance. Maybe you can investigate the PDO cycles possible within a second, before and after it was added?

@polfeliu
Copy link
Author

polfeliu commented Jan 12, 2026

It's on draft because I wanted to document the changes I did before I went on holidays :) but I still want to do some more tests, I can certainly include these pdo timing tests. If they go well I don't expect any more changes. I'll let you know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants