Skip to content

fix+refactor: race-free session saves with shared locked_save_json() helper (closes #51)#73

Closed
sanjayrohith wants to merge 5 commits intoHKUDS:mainfrom
sanjayrohith:fix/session-file-locking-all-harnesses
Closed

fix+refactor: race-free session saves with shared locked_save_json() helper (closes #51)#73
sanjayrohith wants to merge 5 commits intoHKUDS:mainfrom
sanjayrohith:fix/session-file-locking-all-harnesses

Conversation

@sanjayrohith
Copy link

@sanjayrohith sanjayrohith commented Mar 14, 2026

What this fixes

Closes #51 — Race conditions in multi-session project state syncing.


The problem

Every harness wrote session JSON with a bare open("w") + json.dump() and no locking. Three compounding bugs allowed concurrent writes to silently corrupt or lose project data:

# Bug Impact
1 json.dump and LOCK_UN inside the same try as LOCK_EX If json.dump raised OSError, the except wrote a second time → double JSON blob
2 LOCK_UN was inline, not in finally Any exception between lock and unlock left the lock permanently held
3 open("w") truncates before flock() is called Concurrent open("w") zeroes the file even while another process holds the lock

The fix

A locked_save_json(path, data, **dump_kwargs) helper implements the correct sequence:

def locked_save_json(path, data, **dump_kwargs) -> None:
    try:
        f = open(path, "r+")        # no truncation on open
    except FileNotFoundError:
        f = open(path, "w")         # first save — file is new, nothing to lose
    with f:
        _locked = False
        try:
            import fcntl
            fcntl.flock(f.fileno(), fcntl.LOCK_EX)
            _locked = True
        except (ImportError, OSError):
            pass                    # Windows / unsupported FS — proceed unlocked
        try:
            f.seek(0)
            f.truncate()            # truncate INSIDE the lock
            json.dump(data, f, **dump_kwargs)
            f.flush()               # flush before releasing the lock
        finally:
            if _locked:
                fcntl.flock(f.fileno(), fcntl.LOCK_UN)  # always released

Each session.py save method becomes a single call:

locked_save_json(save_path, self.project, indent=2, default=str)

Correctness properties

Property Status
No truncation before lock is held r+ doesn't truncate
json.dump called exactly once ✅ single write path
LOCK_UN always reached finally block
Buffer flushed before unlock f.flush() inside try
Works on Windows / no-flock filesystems ImportError/OSError fallback
Works on first save (file doesn't exist) FileNotFoundError"w" fallback

Files changed

10 modified session.py files — each save method is now a one-liner:

Harness Method After
gimp save_session() locked_save_json(save_path, self.project, ...)
blender save_session() locked_save_json(save_path, self.project, ...)
inkscape save_session() locked_save_json(save_path, self.project, ...)
audacity save_session() locked_save_json(save_path, self.project, ...)
libreoffice save_session() locked_save_json(save_path, self.project, ...)
obs-studio save_session() locked_save_json(save_path, self.project, ...)
kdenlive save_session() locked_save_json(save_path, self.project, ...)
shotcut save_session_state() locked_save_json(path, state, ...)
drawio save_session_state() locked_save_json(path, state, ...)
anygen save() locked_save_json(path, data, ...)

Supersedes PR #52 (anygen-only fix with the double-write and truncation bugs).


Latest update — single shared module (b976dbd)

Following reviewer feedback, the helper has been moved from 10 per-harness copies into a single shared namespace-package contribution:

shared/agent-harness/
└── cli_anything/
    └── utils/
        └── io.py   ← THE one copy of locked_save_json()

This is a new cli-anything-shared installable package that contributes cli_anything.utils to the existing cli_anything namespace (the same PEP 420 pattern every harness already uses). Each harness setup.py now lists it as a dependency, and every session.py imports from the single canonical path:

from cli_anything.utils.io import locked_save_json

Any future change to the locking logic touches one file instead of ten.

Net change for this commit: +38 / -343 (305 lines removed across the 10 deleted per-harness copies)


Test plan

  • cd shared/agent-harness && pip install -e . installs cli_anything.utils.io successfully
  • cd <harness>/agent-harness && pip install -e . && python3 -m pytest cli_anything/<harness>/tests/ -v passes for all 10 harnesses
  • Concurrent save: two processes calling save_session() simultaneously produce valid, non-empty JSON with no interleaving
  • First-save path: save_session() on a non-existent path creates a valid file
  • Windows / fallback path: write succeeds even when fcntl raises ImportError

Fixes HKUDS#51. When multiple CLI commands run concurrently against the same
project file, concurrent writes to session/project JSON could silently
overwrite each other due to missing file locking.

Adds exclusive fcntl.flock() locking inside every save method across
all 10 harnesses that write JSON state to disk:
- gimp, blender, inkscape, audacity, libreoffice, obs-studio, kdenlive:
  save_session() in core/session.py
- shotcut, drawio: save_session_state() in core/session.py
- anygen: save() in core/session.py

A try/except (ImportError, OSError) fallback ensures the write still
proceeds on Windows (no fcntl) or unsupported filesystems. The lock is
explicitly released after the write completes.

PR HKUDS#52 addressed this for anygen only; this commit extends the same
protection to all remaining harnesses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 14, 2026 15:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent concurrent CLI commands from corrupting or losing harness session/project JSON by adding fcntl.flock(LOCK_EX) around JSON save operations across all harnesses.

Changes:

  • Add an exclusive fcntl.flock() around json.dump(...) in each harness’s session save method.
  • Add a fallback to write without locking when fcntl is unavailable or locking fails.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
anygen/agent-harness/cli_anything/anygen/core/session.py Add flock-based locking to Session.save() JSON writes
audacity/agent-harness/cli_anything/audacity/core/session.py Add flock-based locking to save_session() JSON writes
blender/agent-harness/cli_anything/blender/core/session.py Add flock-based locking to save_session() JSON writes
drawio/agent-harness/cli_anything/drawio/core/session.py Add flock-based locking to save_session_state() JSON writes
gimp/agent-harness/cli_anything/gimp/core/session.py Add flock-based locking to save_session() JSON writes
inkscape/agent-harness/cli_anything/inkscape/core/session.py Add flock-based locking to save_session() JSON writes
kdenlive/agent-harness/cli_anything/kdenlive/core/session.py Add flock-based locking to save_session() JSON writes
libreoffice/agent-harness/cli_anything/libreoffice/core/session.py Add flock-based locking to save_session() JSON writes
obs-studio/agent-harness/cli_anything/obs_studio/core/session.py Add flock-based locking to save_session() JSON writes
shotcut/agent-harness/cli_anything/shotcut/core/session.py Add flock-based locking to save_session_state() JSON writes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +105 to +109
with open(path, "w") as f:
json.dump(data, f, indent=2, default=str)
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(data, f, indent=2, default=str)
Comment on lines +124 to +130
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(self.project, f, indent=2, default=str)
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
except (ImportError, OSError):
json.dump(self.project, f, indent=2, default=str)
Comment on lines +114 to +118
with open(save_path, "w") as f:
json.dump(self.project, f, indent=2, default=str)
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(self.project, f, indent=2, default=str)
Comment on lines +121 to +125
with open(save_path, "w") as f:
json.dump(self.project, f, indent=2, default=str)
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(self.project, f, indent=2, default=str)
Comment on lines +106 to +112
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(data, f, indent=2, default=str)
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
except (ImportError, OSError):
json.dump(data, f, indent=2, default=str)
Comment on lines +112 to +116
with open(save_path, "w") as f:
json.dump(self.project, f, indent=2, default=str)
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(self.project, f, indent=2, default=str)
Comment on lines +117 to +121
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(self.project, f, indent=2, default=str)
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
except (ImportError, OSError):
json.dump(self.project, f, indent=2, default=str)
Comment on lines +114 to +118
with open(save_path, "w") as f:
json.dump(self.project, f, indent=2, default=str)
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(self.project, f, indent=2, default=str)
Comment on lines +124 to +128
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(self.project, f, indent=2, default=str)
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
except (ImportError, OSError):
json.dump(self.project, f, indent=2, default=str)
Comment on lines +115 to +121
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
json.dump(self.project, f, indent=2, default=str)
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
except (ImportError, OSError):
json.dump(self.project, f, indent=2, default=str)
@sehawq
Copy link
Collaborator

sehawq commented Mar 14, 2026

Technical concern: In multiple files the code calls json.dump(...) before acquiring the lock and then calls
json.dump(...) again after flock(). That results in two JSON blobs in the same file (invalid JSON), and the
first write is unprotected. Also open(..., "w") truncates the file before the lock is taken, so concurrent
writers can still clobber each other.

Recommended pattern:

  • open file
  • acquire LOCK_EX
  • seek(0) + truncate()
  • write once
  • unlock in finally
  • fallback to non-locking write only if fcntl is unavailable

Example:

with open(path, "w") as f:
    try:
        import fcntl
        fcntl.flock(f.fileno(), fcntl.LOCK_EX)
        f.seek(0); f.truncate()
        json.dump(data, f, indent=2, default=str)
    finally:
        fcntl.flock(f.fileno(), fcntl.LOCK_UN)
except (ImportError, OSError):
    json.dump(data, f, indent=2, default=str)

The previous attempt (on this branch) placed json.dump() and flock(LOCK_UN)
inside the same try block as flock(LOCK_EX). This introduced three bugs:

1. Double write: if json.dump raised OSError (e.g. disk full), the except
   clause would call json.dump a second time on a partially-written file,
   producing invalid JSON.

2. Lock not in finally: a crash between LOCK_EX and LOCK_UN left the
   exclusive lock held forever, deadlocking any subsequent writer.

3. seek/truncate outside lock: open("w") truncates the file before the
   lock is acquired, so concurrent writers could clobber each other's
   already-written data during the window between open() and flock().

Fix — separate lock acquisition from the write in every save method:

  with open(path, "w") as f:
      _locked = False
      try:
          import fcntl
          fcntl.flock(f.fileno(), fcntl.LOCK_EX)
          _locked = True
      except (ImportError, OSError):
          pass          # Windows or unsupported FS — proceed unlocked
      try:
          f.seek(0)
          f.truncate()  # truncate INSIDE the lock
          json.dump(data, f, ...)
      finally:
          if _locked:
              fcntl.flock(f.fileno(), fcntl.LOCK_UN)

This ensures: json.dump is called exactly once; LOCK_UN is always reached
via finally; seek+truncate happen while the lock is held; and the fallback
for non-fcntl platforms requires no extra code path.

Applied to all 10 harnesses: gimp, blender, inkscape, audacity, libreoffice,
obs-studio, kdenlive, shotcut, drawio, anygen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sanjayrohith sanjayrohith changed the title fix: add fcntl file locking to all harness session save methods fix: correct fcntl file-locking across all 10 harnesses (closes #51) Mar 14, 2026
@sanjayrohith
Copy link
Author

Technical concern: In multiple files the code calls json.dump(...) before acquiring the lock and then calls json.dump(...) again after flock(). That results in two JSON blobs in the same file (invalid JSON), and the first write is unprotected. Also open(..., "w") truncates the file before the lock is taken, so concurrent writers can still clobber each other.

Recommended pattern:

* open file

* acquire `LOCK_EX`

* `seek(0)` + `truncate()`

* write **once**

* unlock in `finally`

* fallback to non-locking write only if `fcntl` is unavailable

Example:

with open(path, "w") as f:
    try:
        import fcntl
        fcntl.flock(f.fileno(), fcntl.LOCK_EX)
        f.seek(0); f.truncate()
        json.dump(data, f, indent=2, default=str)
    finally:
        fcntl.flock(f.fileno(), fcntl.LOCK_UN)
except (ImportError, OSError):
    json.dump(data, f, indent=2, default=str)

i have fixed the Technical concern you have mentioned and i have updated the PR

@sehawq
Copy link
Collaborator

sehawq commented Mar 14, 2026

Thanks for the update! This fixes the “double write”, but there’s still one race:

open(path, "w") truncates the file before the lock is taken.
So concurrent writers can still clobber each other even if we flock after opening.

Suggestion: open in r+ (or a+) to avoid pre-lock truncation, then LOCK_EX, then seek(0)+truncate()+json.dump() once, and unlock in finally.

The previous iteration still used open(path, "w") which truncates the
file at open() time — before flock() is called. A concurrent process
doing open("w") on the same path can zero out the file even while
another process holds the exclusive lock, because OS-level truncation
is not gated on advisory flock locks.

Fix: open with "r+" (no truncation, read-write) and fall back to "w"
only for the first save when the file does not yet exist. The
seek(0)+truncate() that clears stale content now happens *inside* the
lock, so cooperative writers can never clobber each other's committed
data.

    try:
        f = open(path, "r+")   # no truncation; raises if file absent
    except FileNotFoundError:
        f = open(path, "w")    # first save — file creation only
    with f:
        _locked = False
        try:
            import fcntl
            fcntl.flock(f.fileno(), fcntl.LOCK_EX)
            _locked = True
        except (ImportError, OSError):
            pass
        try:
            f.seek(0)
            f.truncate()       # truncate INSIDE the lock
            json.dump(data, f, indent=2, default=str)
        finally:
            if _locked:
                fcntl.flock(f.fileno(), fcntl.LOCK_UN)

Applied to all 10 harnesses: gimp, blender, inkscape, audacity,
libreoffice, obs-studio, kdenlive, shotcut, drawio, anygen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 14, 2026 16:51
@sanjayrohith sanjayrohith changed the title fix: correct fcntl file-locking across all 10 harnesses (closes #51) fix: eliminate all race conditions in session file saves across all 10 harnesses (closes #51) Mar 14, 2026
@sanjayrohith
Copy link
Author

Thanks for the update! This fixes the “double write”, but there’s still one race:

open(path, "w") truncates the file before the lock is taken. So concurrent writers can still clobber each other even if we flock after opening.

Suggestion: open in r+ (or a+) to avoid pre-lock truncation, then LOCK_EX, then seek(0)+truncate()+json.dump() once, and unlock in finally.

Thanks for the continued review — you're absolutely right.

open("w") truncates before flock() is called, so a concurrent writer doing open("w") on the same path can zero out the file even while another process holds the exclusive lock. The lock never protected the truncation, only the write.

Fixed in the latest commit (7223836): switched to open("r+") (no truncation) with a FileNotFoundError fallback to "w" for the first save when the file doesn't exist yet. The seek(0) + truncate() now happens inside the lock across all 10 harnesses:

try:
    f = open(path, "r+")        # no truncation on open
except FileNotFoundError:
    f = open(path, "w")         # first save only — file is empty, nothing to lose
with f:
    _locked = False
    try:
        import fcntl
        fcntl.flock(f.fileno(), fcntl.LOCK_EX)
        _locked = True
    except (ImportError, OSError):
        pass
    try:
        f.seek(0)
        f.truncate()            # truncation now inside the lock
        json.dump(data, f, indent=2, default=str)
    finally:
        if _locked:
            fcntl.flock(f.fileno(), fcntl.LOCK_UN)

The PR description has also been updated to document both rounds of fixes.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix race conditions and file corruption risks during concurrent session/project state saves across multiple harnesses by switching from open(..., "w") to a “lock → seek(0)/truncate() → json.dump() → unlock” pattern (with a best-effort fcntl fallback).

Changes:

  • Update session save routines to open existing files with r+ and truncate only after acquiring an exclusive lock.
  • Ensure lock acquisition is separated from the JSON write and that unlock happens in a finally block.
  • Apply the same locking/write pattern across 10 harness session implementations.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
shotcut/agent-harness/cli_anything/shotcut/core/session.py Updates save_session_state() to lock before truncating/writing JSON.
obs-studio/agent-harness/cli_anything/obs_studio/core/session.py Updates save_session() to lock before truncating/writing project JSON.
libreoffice/agent-harness/cli_anything/libreoffice/core/session.py Updates save_session() to lock before truncating/writing project JSON.
kdenlive/agent-harness/cli_anything/kdenlive/core/session.py Updates save_session() to lock before truncating/writing project JSON.
inkscape/agent-harness/cli_anything/inkscape/core/session.py Updates save_session() to lock before truncating/writing project JSON.
gimp/agent-harness/cli_anything/gimp/core/session.py Updates save_session() to lock before truncating/writing project JSON.
drawio/agent-harness/cli_anything/drawio/core/session.py Updates save_session_state() to lock before truncating/writing JSON.
blender/agent-harness/cli_anything/blender/core/session.py Updates save_session() to lock before truncating/writing project JSON.
audacity/agent-harness/cli_anything/audacity/core/session.py Updates save_session() to lock before truncating/writing project JSON.
anygen/agent-harness/cli_anything/anygen/core/session.py Updates save() to lock before truncating/writing session JSON.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

try:
f.seek(0)
f.truncate()
json.dump(self.project, f, indent=2, default=str)
try:
f.seek(0)
f.truncate()
json.dump(self.project, f, indent=2, default=str)
try:
f.seek(0)
f.truncate()
json.dump(self.project, f, indent=2, default=str)
try:
f.seek(0)
f.truncate()
json.dump(self.project, f, indent=2, default=str)
f.truncate()
json.dump(state, f, indent=2)
finally:
if _locked:
f.truncate()
json.dump(state, f, indent=2)
finally:
if _locked:
Comment on lines +118 to +123
f.seek(0)
f.truncate()
json.dump(data, f, indent=2, default=str)
finally:
if _locked:
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
try:
f.seek(0)
f.truncate()
json.dump(self.project, f, indent=2, default=str)
f.truncate()
json.dump(self.project, f, indent=2, default=str)
finally:
if _locked:
try:
f.seek(0)
f.truncate()
json.dump(self.project, f, indent=2, default=str)
@sehawq
Copy link
Collaborator

sehawq commented Mar 14, 2026

I rechecked the latest commit (7223836). The earlier locking issues are resolved:

  • Uses r+ to avoid pre-lock truncation (with w fallback on first create)
  • Acquires LOCK_EX before writing
  • Single write after seek(0) + truncate()

From a consistency standpoint this looks solid now. Thanks for iterating on this.

@sanjayrohith
Copy link
Author

I rechecked the latest commit (7223836). The earlier locking issues are resolved:

* Uses `r+` to avoid pre-lock truncation (with `w` fallback on first create)

* Acquires `LOCK_EX` before writing

* Single write after `seek(0) + truncate()`

From a consistency standpoint this looks solid now. Thanks for iterating on this.

Thanks for the thorough review — the back-and-forth really helped catch issues that would have been subtle to debug in a concurrent agentic workload. Happy to have landed on something solid.

If the maintainers are happy to merge, this will give all 10 harnesses consistent, race-free session saves. Let me know if there's anything else to address before it's merged.

@sehawq
Copy link
Collaborator

sehawq commented Mar 14, 2026

Thanks for the update. The latest commit looks solid to me — no further blockers on my side.

@yuh-yang
Copy link
Collaborator

Happy to merge. But my question here would be: even agents call multiple tools at the same time, will they ever call multiple save() functions at the same time??

@sanjayrohith @sehawq

@sehawq
Copy link
Collaborator

sehawq commented Mar 15, 2026

That’s a good question. In practice, concurrent save() calls can occur in the following situations:

  • If multiple agents or tool calls run in parallel on the same session directory,
  • If a single agent triggers conflicting tasks (e.g., background validation/testing + foreground command),
  • Or if an external automation system runs two CLI processes targeting the same project.

Although rare, the impact is significant when this occurs (corrupted/incomplete JSON).
The lock imposes minimal overhead and does not alter behavior in common single-writer scenarios,
making it a low-cost safeguard.

If you prefer a smaller change (which I would), I can refactor this into a shared safe_write_json() helper
and keep per-harness changes minimal. Happy to open a new PR if that’s preferred.
@yuh-yang

@sanjayrohith
Copy link
Author

Happy to merge. But my question here would be: even agents call multiple tools at the same time, will they ever call multiple save() functions at the same time??

@sanjayrohith @sehawq

The three scenarios @sehawq outlined are the exact motivation (parallel agent tool calls, background validation + foreground command, and multi-process automation). Even if rare today, it's the kind of failure that's hard to reproduce and debug when it does happen, so the lock is worth having.

On the helper idea — I agree it's the cleaner long-term shape. Rather than blocking this PR, I'd suggest:

Merge this PR as-is — it's a complete, correct fix for the race condition across all 10 harnesses
Follow up with a refactor PR that extracts the locking block into a shared _safe_write_json() helper — either in each session.py or a shared utility module — reducing the per-harness boilerplate to a single call
Happy to open that follow-up myself, or if @sehawq prefers to, that works too. Just want to make sure the correctness fix lands first.

@yuh-yang

@sehawq
Copy link
Collaborator

sehawq commented Mar 15, 2026

Thanks for the fix—the race condition issue has been resolved nicely. I suggest a small refactoring to make this PR cleaner and easier to maintain:

  1. The locking and writing logic is duplicated across approximately 10 test cases. Please extract this into a single helper function (e.g., core/session_io.py or utils/io.py) like locked_save_json(path, data).
  2. Maintain the exact safe sequence within the helper function:
    • open(path, “r+”) (fall back to “w” if not)
    • flock(LOCK_EX)
    • seek(0); truncate(); json.dump(...)
    • flush() + optional os.fsync()
    • finally: flock(LOCK_UN) and close

This way, each test suite calls only the locked_save_json(...) function; this reduces changes and centralizes future fixes.

Optional: fcntl is only available on Unix; if Windows support is important, consider using a platform wrapper or the portalocker alternative. After this adjustment, we can stay on track with our main goal, reduce the diff, and use cleaner code.

…harnesses

The locking logic (r+ open, LOCK_EX, seek+truncate, json.dump, flush,
LOCK_UN in finally) was duplicated inline across all 10 session.py files.
Extract it into a single locked_save_json(path, data, **dump_kwargs)
helper in each harness's utils/io.py, then replace every inline block
with a one-line call.

Benefits:
- Future fixes to the locking sequence touch one file per harness, not
  the entire save_session() body
- Each session.py save method is now a readable one-liner
- Adds f.flush() to ensure buffered writes reach the OS before unlock
- Uses relative imports (from ..utils.io) consistent with existing style

No behaviour change; the helper implements the identical safe sequence:
  open(r+) / FileNotFoundError fallback to open(w)
  → LOCK_EX (ImportError/OSError fallback for Windows / no-flock FS)
  → seek(0) + truncate()
  → json.dump(data, f, **dump_kwargs)
  → f.flush()
  → finally: LOCK_UN

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sanjayrohith sanjayrohith changed the title fix: eliminate all race conditions in session file saves across all 10 harnesses (closes #51) fix+refactor: race-free session saves with shared locked_save_json() helper (closes #51) Mar 15, 2026
@sanjayrohith
Copy link
Author

Thanks for the fix—the race condition issue has been resolved nicely. I suggest a small refactoring to make this PR cleaner and easier to maintain:

1. The locking and writing logic is duplicated across approximately 10 test cases. Please extract this into a single helper function (e.g., `core/session_io.py` or `utils/io.py`) like `locked_save_json(path, data)`.

2. Maintain the exact safe sequence within the helper function:
   
   * `open(path, “r+”)` (fall back to `“w”` if not)
   * `flock(LOCK_EX)`
   * `seek(0); truncate(); json.dump(...)`
   * `flush()` + optional `os.fsync()`
   * `finally`: `flock(LOCK_UN)` and close

This way, each test suite calls only the locked_save_json(...) function; this reduces changes and centralizes future fixes.

Optional: fcntl is only available on Unix; if Windows support is important, consider using a platform wrapper or the portalocker alternative. After this adjustment, we can stay on track with our main goal, reduce the diff, and use cleaner code.

Done — implemented in the latest commit (03d1ad0).

Extracted locked_save_json(path, data, **dump_kwargs) into utils/io.py within each harness. The helper follows exactly the sequence you outlined:

open(r+) / FileNotFoundError → open(w)
→ flock(LOCK_EX)
→ seek(0) + truncate()
→ json.dump(data, f, **dump_kwargs)
→ f.flush()
→ finally: flock(LOCK_UN)

Each session.py save method is now a single line:

locked_save_json(save_path, self.project, indent=2, default=str)

Used relative imports (from ..utils.io import locked_save_json) to stay consistent with the existing style in shotcut and drawio. On Windows/unsupported filesystems the ImportError/OSError fallback ensures the write still proceeds unlocked.

os.fsync() was left out for now to avoid the performance overhead on every auto-save — happy to add it behind an optional flag if that's preferred.
@sehawq @yuh-yang

@sehawq
Copy link
Collaborator

sehawq commented Mar 15, 2026

Thanks for the update — this is already in good shape and I’m happy with the correctness fix.
If you choose to go one step further, here’s a clean follow‑up path that keeps behavior the same while reducing long‑term maintenance cost:

  1. Move locked_save_json() into a single shared module (e.g., core/io.py) so the helper isn’t duplicated across harnesses.
  2. Standardize the import path across all harnesses for consistency.
  3. Keep the helper signature stable: locked_save_json(path, data, **dump_kwargs).
  4. Preserve the current platform fallback behavior; optional portalocker can be a later enhancement.
  5. Consider an optional fsync=False flag if you ever need stronger durability.

These are purely for code organization and PR maintenance — they reduce boilerplate and make future fixes a one‑place change instead of 10 copies.

…ls.io module

Previously locked_save_json() lived in utils/io.py inside each of the 10
harnesses — identical code in 10 places. A bug in the locking logic would
require 10 co-ordinated edits.

Introduce a new shared namespace-package contribution at shared/agent-harness/
that registers cli_anything.utils.io into the existing cli_anything namespace.
The helper — and any future changes to it — now live in exactly one file:

    shared/agent-harness/cli_anything/utils/io.py

Each harness session.py now imports from the single shared location:

    from cli_anything.utils.io import locked_save_json

Each harness setup.py gains one dependency:

    "cli-anything-shared"

The 10 per-harness utils/io.py copies are removed (.gitignore updated to
track shared/agent-harness/ following the same whitelist pattern as every
other harness). The helper signature, safe-write sequence, and platform
fallback behaviour are unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 15, 2026 09:57
@sanjayrohith
Copy link
Author

Thanks for the update — this is already in good shape and I’m happy with the correctness fix. If you choose to go one step further, here’s a clean follow‑up path that keeps behavior the same while reducing long‑term maintenance cost:

1. Move `locked_save_json()` into a single shared module (e.g., `core/io.py`) so the helper isn’t duplicated across harnesses.

2. Standardize the import path across all harnesses for consistency.

3. Keep the helper signature stable: `locked_save_json(path, data, **dump_kwargs)`.

4. Preserve the current platform fallback behavior; optional `portalocker` can be a later enhancement.

5. Consider an optional `fsync=False` flag if you ever need stronger durability.

These are purely for code organization and PR maintenance — they reduce boilerplate and make future fixes a one‑place change instead of 10 copies.

Done — implemented in the latest commit (03d1ad0). Moved locked_save_json out of each harness's own utils/io.py into a single canonical location at shared/agent-harness/cli_anything/utils/io.py, installed as the cli-anything-shared namespace-package. All 10 harnesses now import from cli_anything.utils.io — one file to maintain, no duplication.
@sehawq @yuh-yang

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses multi-process race conditions during session/project JSON persistence by replacing direct open("w") + json.dump() writes with a shared locked_save_json() helper intended to write under an exclusive advisory file lock.

Changes:

  • Add a new cli-anything-shared harness package exposing cli_anything.utils.io.locked_save_json().
  • Update multiple harness session.py implementations to import and use locked_save_json() for JSON persistence.
  • Add cli-anything-shared as a dependency in each affected harness and update .gitignore to include the new shared/ directory.

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
shotcut/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
shotcut/agent-harness/cli_anything/shotcut/core/session.py Switches session-state JSON save to locked_save_json().
shared/agent-harness/setup.py Introduces the new cli-anything-shared Python distribution.
shared/agent-harness/cli_anything/utils/io.py Adds locked_save_json() implementation (locking + truncate + dump).
shared/agent-harness/cli_anything/utils/init.py Creates cli_anything.utils package for shared utilities.
obs-studio/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
obs-studio/agent-harness/cli_anything/obs_studio/core/session.py Switches project save to locked_save_json().
libreoffice/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
libreoffice/agent-harness/cli_anything/libreoffice/core/session.py Switches project save to locked_save_json().
kdenlive/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
kdenlive/agent-harness/cli_anything/kdenlive/core/session.py Switches project save to locked_save_json().
inkscape/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
inkscape/agent-harness/cli_anything/inkscape/core/session.py Switches project save to locked_save_json().
gimp/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
gimp/agent-harness/cli_anything/gimp/core/session.py Switches project save to locked_save_json().
drawio/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
drawio/agent-harness/cli_anything/drawio/core/session.py Switches session-state JSON save to locked_save_json().
blender/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
blender/agent-harness/cli_anything/blender/core/session.py Switches project save to locked_save_json().
audacity/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
audacity/agent-harness/cli_anything/audacity/core/session.py Switches project save to locked_save_json().
anygen/agent-harness/setup.py Adds cli-anything-shared dependency for shared I/O helper.
anygen/agent-harness/cli_anything/anygen/core/session.py Switches auto-save JSON write to locked_save_json().
.gitignore Ensures shared/agent-harness/ is tracked under the repo’s ignore rules.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from cli_anything.utils.io import locked_save_json
from lxml import etree

from ..utils import mlt_xml
from cli_anything.utils.io import locked_save_json
Comment on lines 3 to +8
import json
import os
import copy
from typing import Dict, Any, Optional, List
from datetime import datetime
from cli_anything.utils.io import locked_save_json
Comment on lines 7 to +12
import json
import os
import copy
from typing import Dict, Any, Optional, List
from datetime import datetime
from cli_anything.utils.io import locked_save_json
Comment on lines +8 to 10
from cli_anything.utils.io import locked_save_json


@sehawq
Copy link
Collaborator

sehawq commented Mar 15, 2026

Awesome, thanks for doing this cleanup. Centralizing locked_save_json into the shared cli_anything.utils.io is exactly the right long‑term shape — one file to maintain and no duplication across harnesses. This looks good to merge from my side. @yuh-yang

@yuh-yang
Copy link
Collaborator

I do appreciate this well-maintained PR with detailed discussions from @sehawq and @sanjayrohith . While I think the current version does look great, I'm hesitant on whether we should introduce this shared cli-anything-utils thing, publishing it and let people install it. My concerns may come from:

  • people use cli-anything for various use-cases. The current solution only works for some professional softwares with the concept of sessions, and where they can be racing.
  • our current way of publishing those cli packages are already very distributed. Introducing a new cli-anything-utils may further add to the complexity.
  • I'm not sure if there'd be times we'd be able to add more things to cli-anything-utils? Also, people only use cli-anything for one single software to build their own harness publishable package. Coding agents would find it hard to judge which functions should be in cli-anything-utils. And it makes that people must clone and update(commit) this dir every time. :(

What are other people's thoughts? Would be great to hear!

@sanjayrohith
Copy link
Author

I do appreciate this well-maintained PR with detailed discussions from @sehawq and @sanjayrohith . While I think the current version does look great, I'm hesitant on whether we should introduce this shared cli-anything-utils thing, publishing it and let people install it. My concerns may come from:

* people use cli-anything for various use-cases. The current solution only works for some professional softwares with the concept of sessions, and where they can be racing.

* our current way of publishing those cli packages are already very distributed. Introducing a new cli-anything-utils may further add to the complexity.

* I'm not sure if there'd be times we'd be able to add more things to cli-anything-utils? Also, people only use cli-anything for one single software to build their own harness publishable package. Coding agents would find it hard to judge which functions should be in cli-anything-utils. And it makes that people must clone and update(commit) this dir every time. :(

What are other people's thoughts? Would be great to hear!

all three concerns are completely valid.

You're right that cli-anything-utils is premature. I introduced it in the last refactor to avoid 10 identical copies of the helper, but the trade-off (new installable package, new dependency, generator confusion) isn't worth it.

Proposed resolution: drop the shared module entirely.

The 25-line locked_save_json helper can live directly inside each harness's session.py as a module-level private function. No new package, no extra setup.py dependency, no directory to clone. The three correctness bugs it fixes stay fixed:

Bug | Fix -- | -- open("w") truncates before lock is held | r+ open, truncate only inside the lock LOCK_UN not in finally | always-runs finally block json.dump called twice on OSError | single write path

Each harness becomes self-contained again — exactly the current model. The only change visible in each session.py is that the old bare open("w") + json.dump() is replaced by a one-liner call to the local helper.

I'll update the branch to revert the shared module and inline the function in all 10 session files. Would that approach work for you?

@yuh-yang

@sehawq
Copy link
Collaborator

sehawq commented Mar 15, 2026

Thank you for bringing this up—these are entirely valid concerns.

I agree that releasing and maintaining a shared cli-anything-utils package could complicate things, especially if most users are only using a single harness. To keep things simple, I’d suggest the following:

  • We could keep the utility locally for each harness (as was done initially) to avoid creating a new package.
  • Or, if we want to reduce dependencies without a published package, we can keep a single utility under shared/agent-harness, but integrate it into each harness during the build (no extra installation step required for users).

The primary goal is to safely resolve the race condition; the shared package is optional. I’m not taking a stance on whether this PR should be merged; my goal was only to improve correctness and maintainability within its scope. I prefer to follow the maintainer’s decision.
@yuh-yang

@yuh-yang
Copy link
Collaborator

I merged #94 as a cleaned version of this PR.

@yuh-yang yuh-yang closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race conditions in multi-session project state syncing

4 participants