Skip to content

Refactor flush logic to avoid cache manager deadlocks#1

Open
JoneKone wants to merge 1 commit intomasterfrom
codex/analyze-flush-sequence-and-deadlock-in-ext4fsd
Open

Refactor flush logic to avoid cache manager deadlocks#1
JoneKone wants to merge 1 commit intomasterfrom
codex/analyze-flush-sequence-and-deadlock-in-ext4fsd

Conversation

@JoneKone
Copy link
Owner

Summary

  • release file resources before invoking cache manager flush so cache operations run outside driver locks
  • gather volume busy ranges before issuing cache flushes to avoid calling CcFlushCache while holding Ext4Fsd locks
  • update shutdown flushing to drop the VCB resource prior to cache flushes and reacquire afterwards

Testing

  • Not run (driver stack and verifier tooling unavailable in container)

https://chatgpt.com/codex/tasks/task_e_68fcb0f6c698833391e035c8896ec6ad

@JoneKone
Copy link
Owner Author

JoneKone commented Oct 25, 2025

Summary

Released each FCB’s main resource before invoking CcFlushCache in Ext2FlushFile, then reacquired it to clear flags only after successful cache operations, preventing cache manager deadlocks while Ext4Fsd locks are held.

Updated the top-level Ext2Flush and shutdown paths to drop and later reacquire the VCB main resource around file and volume flush calls so that cache work executes outside driver-held locks.

Reworked Ext2FlushVcb to capture buffered ranges while internal locks are held and defer CcFlushCache invocations until after those resources are released, keeping BCB activity outside Ext4Fsd lock scopes.

Testing

⚠️ Not run (kernel-mode verifier and heavy-load test harnesses are unavailable in this environment)

Flush path trace

During shutdown the driver now updates metadata while holding Vcb->MainResource, releases it before Ext2FlushFiles/Ext2FlushVolume, and reacquires it only after cache flushes complete, ensuring CcFlushCache is reached with no VCB locks held.

Ext2Flush follows the same pattern for user-initiated flushes: it releases the VCB resource before calling Ext2FlushFiles/Ext2FlushVolume, while Ext2FlushFiles still serializes per-FCB access and calls Ext2FlushFile, which now drops and reacquires each FCB resource around the cache manager call.

Volume-level flushing records outstanding buffer-head ranges while holding internal locks, then performs the cache flushes once those locks are dropped so cache/BBC activity no longer nests inside Ext4Fsd resources.

Reproduction notes

The requested deadlock reproduction with Driver Verifier could not be executed because the container environment lacks kernel-mode driver deployment and verification tooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant