fix+refactor: race-free session saves with shared locked_save_json() helper (closes #51)#73
Conversation
Fixes HKUDS#51. When multiple CLI commands run concurrently against the same project file, concurrent writes to session/project JSON could silently overwrite each other due to missing file locking. Adds exclusive fcntl.flock() locking inside every save method across all 10 harnesses that write JSON state to disk: - gimp, blender, inkscape, audacity, libreoffice, obs-studio, kdenlive: save_session() in core/session.py - shotcut, drawio: save_session_state() in core/session.py - anygen: save() in core/session.py A try/except (ImportError, OSError) fallback ensures the write still proceeds on Windows (no fcntl) or unsupported filesystems. The lock is explicitly released after the write completes. PR HKUDS#52 addressed this for anygen only; this commit extends the same protection to all remaining harnesses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent concurrent CLI commands from corrupting or losing harness session/project JSON by adding fcntl.flock(LOCK_EX) around JSON save operations across all harnesses.
Changes:
- Add an exclusive
fcntl.flock()aroundjson.dump(...)in each harness’s session save method. - Add a fallback to write without locking when
fcntlis unavailable or locking fails.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| anygen/agent-harness/cli_anything/anygen/core/session.py | Add flock-based locking to Session.save() JSON writes |
| audacity/agent-harness/cli_anything/audacity/core/session.py | Add flock-based locking to save_session() JSON writes |
| blender/agent-harness/cli_anything/blender/core/session.py | Add flock-based locking to save_session() JSON writes |
| drawio/agent-harness/cli_anything/drawio/core/session.py | Add flock-based locking to save_session_state() JSON writes |
| gimp/agent-harness/cli_anything/gimp/core/session.py | Add flock-based locking to save_session() JSON writes |
| inkscape/agent-harness/cli_anything/inkscape/core/session.py | Add flock-based locking to save_session() JSON writes |
| kdenlive/agent-harness/cli_anything/kdenlive/core/session.py | Add flock-based locking to save_session() JSON writes |
| libreoffice/agent-harness/cli_anything/libreoffice/core/session.py | Add flock-based locking to save_session() JSON writes |
| obs-studio/agent-harness/cli_anything/obs_studio/core/session.py | Add flock-based locking to save_session() JSON writes |
| shotcut/agent-harness/cli_anything/shotcut/core/session.py | Add flock-based locking to save_session_state() JSON writes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| with open(path, "w") as f: | ||
| json.dump(data, f, indent=2, default=str) | ||
| try: | ||
| import fcntl | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(data, f, indent=2, default=str) |
| try: | ||
| import fcntl | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(self.project, f, indent=2, default=str) | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_UN) | ||
| except (ImportError, OSError): | ||
| json.dump(self.project, f, indent=2, default=str) |
| with open(save_path, "w") as f: | ||
| json.dump(self.project, f, indent=2, default=str) | ||
| try: | ||
| import fcntl | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(self.project, f, indent=2, default=str) |
| with open(save_path, "w") as f: | ||
| json.dump(self.project, f, indent=2, default=str) | ||
| try: | ||
| import fcntl | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(self.project, f, indent=2, default=str) |
| try: | ||
| import fcntl | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(data, f, indent=2, default=str) | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_UN) | ||
| except (ImportError, OSError): | ||
| json.dump(data, f, indent=2, default=str) |
| with open(save_path, "w") as f: | ||
| json.dump(self.project, f, indent=2, default=str) | ||
| try: | ||
| import fcntl | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(self.project, f, indent=2, default=str) |
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(self.project, f, indent=2, default=str) | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_UN) | ||
| except (ImportError, OSError): | ||
| json.dump(self.project, f, indent=2, default=str) |
| with open(save_path, "w") as f: | ||
| json.dump(self.project, f, indent=2, default=str) | ||
| try: | ||
| import fcntl | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(self.project, f, indent=2, default=str) |
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(self.project, f, indent=2, default=str) | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_UN) | ||
| except (ImportError, OSError): | ||
| json.dump(self.project, f, indent=2, default=str) |
| try: | ||
| import fcntl | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_EX) | ||
| json.dump(self.project, f, indent=2, default=str) | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_UN) | ||
| except (ImportError, OSError): | ||
| json.dump(self.project, f, indent=2, default=str) |
|
Technical concern: In multiple files the code calls Recommended pattern:
Example: with open(path, "w") as f:
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
f.seek(0); f.truncate()
json.dump(data, f, indent=2, default=str)
finally:
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
except (ImportError, OSError):
json.dump(data, f, indent=2, default=str) |
The previous attempt (on this branch) placed json.dump() and flock(LOCK_UN)
inside the same try block as flock(LOCK_EX). This introduced three bugs:
1. Double write: if json.dump raised OSError (e.g. disk full), the except
clause would call json.dump a second time on a partially-written file,
producing invalid JSON.
2. Lock not in finally: a crash between LOCK_EX and LOCK_UN left the
exclusive lock held forever, deadlocking any subsequent writer.
3. seek/truncate outside lock: open("w") truncates the file before the
lock is acquired, so concurrent writers could clobber each other's
already-written data during the window between open() and flock().
Fix — separate lock acquisition from the write in every save method:
with open(path, "w") as f:
_locked = False
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
_locked = True
except (ImportError, OSError):
pass # Windows or unsupported FS — proceed unlocked
try:
f.seek(0)
f.truncate() # truncate INSIDE the lock
json.dump(data, f, ...)
finally:
if _locked:
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
This ensures: json.dump is called exactly once; LOCK_UN is always reached
via finally; seek+truncate happen while the lock is held; and the fallback
for non-fcntl platforms requires no extra code path.
Applied to all 10 harnesses: gimp, blender, inkscape, audacity, libreoffice,
obs-studio, kdenlive, shotcut, drawio, anygen.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
i have fixed the Technical concern you have mentioned and i have updated the PR |
|
Thanks for the update! This fixes the “double write”, but there’s still one race:
Suggestion: open in |
The previous iteration still used open(path, "w") which truncates the
file at open() time — before flock() is called. A concurrent process
doing open("w") on the same path can zero out the file even while
another process holds the exclusive lock, because OS-level truncation
is not gated on advisory flock locks.
Fix: open with "r+" (no truncation, read-write) and fall back to "w"
only for the first save when the file does not yet exist. The
seek(0)+truncate() that clears stale content now happens *inside* the
lock, so cooperative writers can never clobber each other's committed
data.
try:
f = open(path, "r+") # no truncation; raises if file absent
except FileNotFoundError:
f = open(path, "w") # first save — file creation only
with f:
_locked = False
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
_locked = True
except (ImportError, OSError):
pass
try:
f.seek(0)
f.truncate() # truncate INSIDE the lock
json.dump(data, f, indent=2, default=str)
finally:
if _locked:
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
Applied to all 10 harnesses: gimp, blender, inkscape, audacity,
libreoffice, obs-studio, kdenlive, shotcut, drawio, anygen.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thanks for the continued review — you're absolutely right. open("w") truncates before flock() is called, so a concurrent writer doing open("w") on the same path can zero out the file even while another process holds the exclusive lock. The lock never protected the truncation, only the write. Fixed in the latest commit (7223836): switched to open("r+") (no truncation) with a FileNotFoundError fallback to "w" for the first save when the file doesn't exist yet. The seek(0) + truncate() now happens inside the lock across all 10 harnesses: The PR description has also been updated to document both rounds of fixes. |
There was a problem hiding this comment.
Pull request overview
This PR aims to fix race conditions and file corruption risks during concurrent session/project state saves across multiple harnesses by switching from open(..., "w") to a “lock → seek(0)/truncate() → json.dump() → unlock” pattern (with a best-effort fcntl fallback).
Changes:
- Update session save routines to open existing files with
r+and truncate only after acquiring an exclusive lock. - Ensure lock acquisition is separated from the JSON write and that unlock happens in a
finallyblock. - Apply the same locking/write pattern across 10 harness session implementations.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| shotcut/agent-harness/cli_anything/shotcut/core/session.py | Updates save_session_state() to lock before truncating/writing JSON. |
| obs-studio/agent-harness/cli_anything/obs_studio/core/session.py | Updates save_session() to lock before truncating/writing project JSON. |
| libreoffice/agent-harness/cli_anything/libreoffice/core/session.py | Updates save_session() to lock before truncating/writing project JSON. |
| kdenlive/agent-harness/cli_anything/kdenlive/core/session.py | Updates save_session() to lock before truncating/writing project JSON. |
| inkscape/agent-harness/cli_anything/inkscape/core/session.py | Updates save_session() to lock before truncating/writing project JSON. |
| gimp/agent-harness/cli_anything/gimp/core/session.py | Updates save_session() to lock before truncating/writing project JSON. |
| drawio/agent-harness/cli_anything/drawio/core/session.py | Updates save_session_state() to lock before truncating/writing JSON. |
| blender/agent-harness/cli_anything/blender/core/session.py | Updates save_session() to lock before truncating/writing project JSON. |
| audacity/agent-harness/cli_anything/audacity/core/session.py | Updates save_session() to lock before truncating/writing project JSON. |
| anygen/agent-harness/cli_anything/anygen/core/session.py | Updates save() to lock before truncating/writing session JSON. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| try: | ||
| f.seek(0) | ||
| f.truncate() | ||
| json.dump(self.project, f, indent=2, default=str) |
| try: | ||
| f.seek(0) | ||
| f.truncate() | ||
| json.dump(self.project, f, indent=2, default=str) |
| try: | ||
| f.seek(0) | ||
| f.truncate() | ||
| json.dump(self.project, f, indent=2, default=str) |
| try: | ||
| f.seek(0) | ||
| f.truncate() | ||
| json.dump(self.project, f, indent=2, default=str) |
| f.truncate() | ||
| json.dump(state, f, indent=2) | ||
| finally: | ||
| if _locked: |
| f.truncate() | ||
| json.dump(state, f, indent=2) | ||
| finally: | ||
| if _locked: |
| f.seek(0) | ||
| f.truncate() | ||
| json.dump(data, f, indent=2, default=str) | ||
| finally: | ||
| if _locked: | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_UN) |
| try: | ||
| f.seek(0) | ||
| f.truncate() | ||
| json.dump(self.project, f, indent=2, default=str) |
| f.truncate() | ||
| json.dump(self.project, f, indent=2, default=str) | ||
| finally: | ||
| if _locked: |
| try: | ||
| f.seek(0) | ||
| f.truncate() | ||
| json.dump(self.project, f, indent=2, default=str) |
|
I rechecked the latest commit (
From a consistency standpoint this looks solid now. Thanks for iterating on this. |
Thanks for the thorough review — the back-and-forth really helped catch issues that would have been subtle to debug in a concurrent agentic workload. Happy to have landed on something solid. If the maintainers are happy to merge, this will give all 10 harnesses consistent, race-free session saves. Let me know if there's anything else to address before it's merged. |
|
Thanks for the update. The latest commit looks solid to me — no further blockers on my side. |
|
Happy to merge. But my question here would be: even agents call multiple tools at the same time, will they ever call multiple save() functions at the same time?? |
|
That’s a good question. In practice, concurrent
Although rare, the impact is significant when this occurs (corrupted/incomplete JSON). If you prefer a smaller change (which I would), I can refactor this into a shared |
The three scenarios @sehawq outlined are the exact motivation (parallel agent tool calls, background validation + foreground command, and multi-process automation). Even if rare today, it's the kind of failure that's hard to reproduce and debug when it does happen, so the lock is worth having. On the helper idea — I agree it's the cleaner long-term shape. Rather than blocking this PR, I'd suggest: Merge this PR as-is — it's a complete, correct fix for the race condition across all 10 harnesses |
|
Thanks for the fix—the race condition issue has been resolved nicely. I suggest a small refactoring to make this PR cleaner and easier to maintain:
This way, each test suite calls only the Optional: |
…harnesses The locking logic (r+ open, LOCK_EX, seek+truncate, json.dump, flush, LOCK_UN in finally) was duplicated inline across all 10 session.py files. Extract it into a single locked_save_json(path, data, **dump_kwargs) helper in each harness's utils/io.py, then replace every inline block with a one-line call. Benefits: - Future fixes to the locking sequence touch one file per harness, not the entire save_session() body - Each session.py save method is now a readable one-liner - Adds f.flush() to ensure buffered writes reach the OS before unlock - Uses relative imports (from ..utils.io) consistent with existing style No behaviour change; the helper implements the identical safe sequence: open(r+) / FileNotFoundError fallback to open(w) → LOCK_EX (ImportError/OSError fallback for Windows / no-flock FS) → seek(0) + truncate() → json.dump(data, f, **dump_kwargs) → f.flush() → finally: LOCK_UN Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Done — implemented in the latest commit (03d1ad0). Extracted locked_save_json(path, data, **dump_kwargs) into utils/io.py within each harness. The helper follows exactly the sequence you outlined: Each session.py save method is now a single line: Used relative imports (from ..utils.io import locked_save_json) to stay consistent with the existing style in shotcut and drawio. On Windows/unsupported filesystems the ImportError/OSError fallback ensures the write still proceeds unlocked. os.fsync() was left out for now to avoid the performance overhead on every auto-save — happy to add it behind an optional flag if that's preferred. |
|
Thanks for the update — this is already in good shape and I’m happy with the correctness fix.
These are purely for code organization and PR maintenance — they reduce boilerplate and make future fixes a one‑place change instead of 10 copies. |
…ls.io module
Previously locked_save_json() lived in utils/io.py inside each of the 10
harnesses — identical code in 10 places. A bug in the locking logic would
require 10 co-ordinated edits.
Introduce a new shared namespace-package contribution at shared/agent-harness/
that registers cli_anything.utils.io into the existing cli_anything namespace.
The helper — and any future changes to it — now live in exactly one file:
shared/agent-harness/cli_anything/utils/io.py
Each harness session.py now imports from the single shared location:
from cli_anything.utils.io import locked_save_json
Each harness setup.py gains one dependency:
"cli-anything-shared"
The 10 per-harness utils/io.py copies are removed (.gitignore updated to
track shared/agent-harness/ following the same whitelist pattern as every
other harness). The helper signature, safe-write sequence, and platform
fallback behaviour are unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Done — implemented in the latest commit (03d1ad0). Moved locked_save_json out of each harness's own utils/io.py into a single canonical location at shared/agent-harness/cli_anything/utils/io.py, installed as the cli-anything-shared namespace-package. All 10 harnesses now import from cli_anything.utils.io — one file to maintain, no duplication. |
There was a problem hiding this comment.
Pull request overview
This PR addresses multi-process race conditions during session/project JSON persistence by replacing direct open("w") + json.dump() writes with a shared locked_save_json() helper intended to write under an exclusive advisory file lock.
Changes:
- Add a new
cli-anything-sharedharness package exposingcli_anything.utils.io.locked_save_json(). - Update multiple harness
session.pyimplementations to import and uselocked_save_json()for JSON persistence. - Add
cli-anything-sharedas a dependency in each affected harness and update.gitignoreto include the newshared/directory.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| shotcut/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| shotcut/agent-harness/cli_anything/shotcut/core/session.py | Switches session-state JSON save to locked_save_json(). |
| shared/agent-harness/setup.py | Introduces the new cli-anything-shared Python distribution. |
| shared/agent-harness/cli_anything/utils/io.py | Adds locked_save_json() implementation (locking + truncate + dump). |
| shared/agent-harness/cli_anything/utils/init.py | Creates cli_anything.utils package for shared utilities. |
| obs-studio/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| obs-studio/agent-harness/cli_anything/obs_studio/core/session.py | Switches project save to locked_save_json(). |
| libreoffice/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| libreoffice/agent-harness/cli_anything/libreoffice/core/session.py | Switches project save to locked_save_json(). |
| kdenlive/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| kdenlive/agent-harness/cli_anything/kdenlive/core/session.py | Switches project save to locked_save_json(). |
| inkscape/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| inkscape/agent-harness/cli_anything/inkscape/core/session.py | Switches project save to locked_save_json(). |
| gimp/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| gimp/agent-harness/cli_anything/gimp/core/session.py | Switches project save to locked_save_json(). |
| drawio/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| drawio/agent-harness/cli_anything/drawio/core/session.py | Switches session-state JSON save to locked_save_json(). |
| blender/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| blender/agent-harness/cli_anything/blender/core/session.py | Switches project save to locked_save_json(). |
| audacity/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| audacity/agent-harness/cli_anything/audacity/core/session.py | Switches project save to locked_save_json(). |
| anygen/agent-harness/setup.py | Adds cli-anything-shared dependency for shared I/O helper. |
| anygen/agent-harness/cli_anything/anygen/core/session.py | Switches auto-save JSON write to locked_save_json(). |
| .gitignore | Ensures shared/agent-harness/ is tracked under the repo’s ignore rules. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| from dataclasses import dataclass, field | ||
| from datetime import datetime, timezone | ||
| from pathlib import Path | ||
| from cli_anything.utils.io import locked_save_json |
| from lxml import etree | ||
|
|
||
| from ..utils import mlt_xml | ||
| from cli_anything.utils.io import locked_save_json |
| import json | ||
| import os | ||
| import copy | ||
| from typing import Dict, Any, Optional, List | ||
| from datetime import datetime | ||
| from cli_anything.utils.io import locked_save_json |
| import json | ||
| import os | ||
| import copy | ||
| from typing import Dict, Any, Optional, List | ||
| from datetime import datetime | ||
| from cli_anything.utils.io import locked_save_json |
| from cli_anything.utils.io import locked_save_json | ||
|
|
||
|
|
|
Awesome, thanks for doing this cleanup. Centralizing |
|
I do appreciate this well-maintained PR with detailed discussions from @sehawq and @sanjayrohith . While I think the current version does look great, I'm hesitant on whether we should introduce this shared cli-anything-utils thing, publishing it and let people install it. My concerns may come from:
What are other people's thoughts? Would be great to hear! |
all three concerns are completely valid. You're right that Proposed resolution: drop the shared module entirely. The 25-line
Bug | Fix
-- | --
open("w") truncates before lock is held | r+ open, truncate only inside the lock
LOCK_UN not in finally | always-runs finally block
json.dump called twice on OSError | single write path
Each harness becomes self-contained again — exactly the current model. The only change visible in each I'll update the branch to revert the shared module and inline the function in all 10 session files. Would that approach work for you? @yuh-yang |
|
Thank you for bringing this up—these are entirely valid concerns. I agree that releasing and maintaining a shared
The primary goal is to safely resolve the race condition; the shared package is optional. I’m not taking a stance on whether this PR should be merged; my goal was only to improve correctness and maintainability within its scope. I prefer to follow the maintainer’s decision. |
|
I merged #94 as a cleaned version of this PR. |
What this fixes
Closes #51 — Race conditions in multi-session project state syncing.
The problem
Every harness wrote session JSON with a bare
open("w")+json.dump()and no locking. Three compounding bugs allowed concurrent writes to silently corrupt or lose project data:json.dumpandLOCK_UNinside the sametryasLOCK_EXjson.dumpraisedOSError, theexceptwrote a second time → double JSON blobLOCK_UNwas inline, not infinallyopen("w")truncates beforeflock()is calledopen("w")zeroes the file even while another process holds the lockThe fix
A
locked_save_json(path, data, **dump_kwargs)helper implements the correct sequence:Each
session.pysave method becomes a single call:Correctness properties
r+doesn't truncatejson.dumpcalled exactly onceLOCK_UNalways reachedfinallyblockf.flush()insidetryImportError/OSErrorfallbackFileNotFoundError→"w"fallbackFiles changed
10 modified
session.pyfiles — each save method is now a one-liner:save_session()locked_save_json(save_path, self.project, ...)save_session()locked_save_json(save_path, self.project, ...)save_session()locked_save_json(save_path, self.project, ...)save_session()locked_save_json(save_path, self.project, ...)save_session()locked_save_json(save_path, self.project, ...)save_session()locked_save_json(save_path, self.project, ...)save_session()locked_save_json(save_path, self.project, ...)save_session_state()locked_save_json(path, state, ...)save_session_state()locked_save_json(path, state, ...)save()locked_save_json(path, data, ...)Latest update — single shared module (
b976dbd)Following reviewer feedback, the helper has been moved from 10 per-harness copies into a single shared namespace-package contribution:
This is a new
cli-anything-sharedinstallable package that contributescli_anything.utilsto the existingcli_anythingnamespace (the same PEP 420 pattern every harness already uses). Each harnesssetup.pynow lists it as a dependency, and everysession.pyimports from the single canonical path:Any future change to the locking logic touches one file instead of ten.
Net change for this commit:
+38 / -343(305 lines removed across the 10 deleted per-harness copies)Test plan
cd shared/agent-harness && pip install -e .installscli_anything.utils.iosuccessfullycd <harness>/agent-harness && pip install -e . && python3 -m pytest cli_anything/<harness>/tests/ -vpasses for all 10 harnessessave_session()simultaneously produce valid, non-empty JSON with no interleavingsave_session()on a non-existent path creates a valid filefcntlraisesImportError