Skip to content

feat: wire up RemapFs inode table serialize/restore for snapshot clones#549

Merged
ejc3 merged 9 commits intomainfrom
fuse-pipe-restore
Mar 4, 2026
Merged

feat: wire up RemapFs inode table serialize/restore for snapshot clones#549
ejc3 merged 9 commits intomainfrom
fuse-pipe-restore

Conversation

@ejc3
Copy link
Copy Markdown
Owner

@ejc3 ejc3 commented Mar 3, 2026

Summary

Wire up RemapFs serialize_table()/restore_from_table() into fcvm's snapshot/clone flow so clone VolumeServers restore the inode table from the snapshot baseline. This eliminates the 1-second TTL glitch window where cached inodes return ENOENT after clone boot.

Also adds lazy handle re-open: when a stale file handle (from the snapshot baseline) gets EBADF, RemapFs automatically reopens the file and retries the operation.

Changes

fuse-pipe

  • AsyncServer::from_arc() — accept pre-wrapped Arc<H> so caller retains reference for serialization
  • VolumeRequest::fh(), with_fh(), ino(), is_dir_handle_op() — protocol helpers for handle manipulation
  • VolumeResponse::is_ebadf() — detect EBADF responses
  • RemapFs::handle_remap — DashMap for old_fh→new_fh mapping after restore
  • RemapFs::try_reopen_handle() — lazy reopen with O_RDWR, race-safe via DashMap::entry()
  • EBADF interception in handle_request_with_groups — auto-retry with reopened handle
  • FD leak fix: loser in concurrent reopen race releases its fd
  • Error logging on corrupt inode table JSON (instead of silent fallback)

fcvm

  • SpawnedVolumes return type with handles + remap_refs (Arc references)
  • spawn_volume_servers_with_tables() — restore inode tables from serialized JSON
  • resolve_volume_path() — deduplicated path validation helper
  • Serialize inode tables to volume-{port}-inode-table.json at snapshot time (pre-start cache path)
  • Load inode tables from snapshot dir at clone time
  • Startup log for portable Arc-based volume servers

Known Limitations

  • CopyFileRange/RemapFileRange fh_out is not remapped after restore (only fh_in). These operations with a stale destination handle will fail. Rare in practice.
  • CLI fcvm snapshot create runs in a separate process without access to the running VolumeServer's RemapFs, so inode table serialization only works in the pre-start snapshot cache path (same process).

Test Results

Integration test test_open_handle_survives_snapshot_restore: baseline VM with portable volume, write file, snapshot, clone, verify clone reads and writes work.

$ make test-root FILTER=test_open_handle STREAM=1
test test_open_handle_survives_snapshot_restore ... ok

ejc3 added 3 commits March 3, 2026 22:20
Wire fuse-pipe's RemapFs serialize_table/restore_from_table into fcvm's
snapshot/clone flow so clones get pre-populated inode mappings instead of
starting with an empty table.

Changes:
- AsyncServer::from_arc() — accepts pre-wrapped Arc<H> so caller retains
  reference for serialization while server runs
- VolumeRequest::fh(), with_fh(), ino(), is_dir_handle_op() — protocol
  helpers for handle remapping
- VolumeResponse::is_ebadf() — detect stale handle errors
- RemapFs handle_remap DashMap + try_reopen_handle() — lazily reopens
  stale file handles after snapshot restore (EBADF → auto-reopen → retry)
- SpawnedVolumes return type with remap_refs for portable volumes
- Serialize inode tables to snapshot dir at snapshot creation time
- Load and restore inode tables when spawning clone VolumeServers

Before: clone VolumeServer starts with empty inode table, old inodes
return EIO for ~1s until FUSE TTL expires, old file handles return EBADF
forever.

After: clone VolumeServer starts with restored inode table, old inodes
work immediately, stale file handles are lazily reopened on first use.

Tested: cargo check, cargo clippy, cargo test -p fuse-pipe (21/21 remap tests pass)
Adds test_open_handle_survives_snapshot_restore that verifies:
- Clone can read files from snapshot through FUSE portable volume
- Clone can write new files (FUSE mount bidirectional)
- Host sees clone's writes (data integrity through clone VolumeServer)

Tests the clone VolumeServer pipeline end-to-end with --portable-volumes.

Tested: make test-root FILTER=test_open_handle_survives (passes)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea92862bc7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +423 to +427
pub fn fh(&self) -> Option<u64> {
match self {
Self::Read { fh, .. }
| Self::Write { fh, .. }
| Self::Release { fh, .. }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include Setattr fh in stale-handle remapping

The new stale-handle recovery path depends on VolumeRequest::fh()/with_fh(), but this helper set does not handle Setattr { fh: Some(...) } requests. Setattr is used for fd-based truncate operations, so after snapshot restore an ftruncate() on a pre-snapshot descriptor can return EBADF permanently because RemapFs never remaps or reopens that handle.

Useful? React with 👍 / 👎.

Comment on lines +458 to +460
Self::CopyFileRange { fh_in, .. } | Self::RemapFileRange { fh_in, .. } => {
*fh_in = new_fh
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remap both handles for copy/remap file range requests

CopyFileRange/RemapFileRange carry both fh_in and fh_out, but the new helper only rewrites fh_in. If the destination handle is stale after restore, retry logic still sends the old fh_out, so these operations continue failing with EBADF instead of recovering like other handle-based ops.

Useful? React with 👍 / 👎.

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Mar 3, 2026

🔍 Claude Review

SEVERITY: medium

Findings

[MEDIUM] File descriptor leak in try_reopen_handle on concurrent race (fuse-pipe/src/server/remap.rs:620-677)

When two threads race to reopen the same stale old_fh, both pass the initial handle_remap.get() check (line 622) and both open a new handle via Open/Opendir. At the handle_remap.entry(old_fh) atomic insert (line 666), one thread wins the Vacant path and stores its handle. The losing thread hits Occupied, returns the winner's handle, but its own opened fd is never released — no Release/Releasedir is sent to the inner handler. This leaks one fd per race. While races require near-simultaneous operations on the same stale handle (uncommon), it's a real resource leak in long-running servers.

[MEDIUM] fh_out not remapped for CopyFileRange/RemapFileRange (fuse-pipe/src/protocol/request.rs:437,458)

fh() only returns fh_in and with_fh() only updates fh_in for CopyFileRange/RemapFileRange. The existing remap_request_inodes correctly remaps both ino_in and ino_out (line 217-224 in remap.rs), but the new stale-handle recovery path at lines 693-697 only checks/remaps one handle. If fh_out is stale after restore, the operation will fail with EBADF and the retry also won't fix it. While copy_file_range is rare, this is an incomplete implementation.

[LOW] Misleading log field entries = json.len() (src/commands/podman/snapshot.rs:139)

serialize_table() returns a String, so json.len() is the byte count, not the number of inode entries. The field name entries is misleading in operator logs — should be bytes.

[LOW] Copy-paste doc comment on test (tests/test_portable_volumes.rs:595)

The doc comment opens with "Verify that --non-blocking-output delivers lines best-effort" which is from a different test. The actual test (test_open_handle_survives_snapshot_restore) verifies clone read/write through portable volumes after snapshot restore.

Summary

The PR adds a solid inode table serialize/restore mechanism for snapshot clones. Two medium issues: a file descriptor leak under concurrent handle reopen races, and incomplete fh_out remapping for CopyFileRange/RemapFileRange operations. Both are fixable additions on top of the PR's work.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Mar 3, 2026

🔧 Auto-Fix Created

I found issues and created a fix PR: #550

Please review and merge the fix PR first, then this PR.

View Claude Run

- Fix FD leak in concurrent reopen race: release the losing thread's
  opened fd via Release/Releasedir when another thread wins the
  handle_remap.entry() race
- Add Setattr fh to fh()/with_fh() helpers (ftruncate uses fd-based path)
- Remap both fh_in and fh_out for CopyFileRange/RemapFileRange
- Fix misleading log field: entries → bytes (json.len() is byte count)
- Fix copy-paste doc comment on test function
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Mar 3, 2026

🔍 Claude Review

SEVERITY: medium

Previous Review Status

The previous review identified 2 medium and 2 low issues. All four were addressed in commit 13526d1 (fix: address review findings for fuse-pipe restore):

  • ✅ FD leak in concurrent reopen race — fixed with Release/Releasedir on loser's fd
  • fh_out not remapped for CopyFileRange/RemapFileRange — partially fixed (see below)
  • ✅ Misleading log field entriesbytes — fixed
  • ✅ Copy-paste doc comment — fixed

Findings

[MEDIUM] with_fh() corrupts fh_out for CopyFileRange/RemapFileRange (refinement of previously-reported issue)

The fix in 13526d1 addressed the missing fh_out remapping by having with_fh() set both fh_in and fh_out to the same value (request.rs:460-464). This introduces a different problem: when only one handle is stale, the other gets overwritten with an incorrect value.

Concrete scenario in handle_request_with_groups (remap.rs:710-714):

  1. A CopyFileRange request arrives with stale fh_in=100 and valid fh_out=200
  2. remapped.fh() returns fh_in=100, which is found in handle_remapnew_fh=300
  3. remapped.with_fh(300) sets both fh_in=300 and fh_out=300
  4. The operation now reads from the correct file but writes to the wrong file (source instead of destination)

The same corruption occurs in the EBADF retry path (remap.rs:728-729).

In practice, CopyFileRange across a snapshot restore boundary is rare, and both handles would typically be stale simultaneously (both came from the previous VolumeServer). But the fix is architecturally incomplete — it needs independent remapping of fh_in and fh_out. Consider adding fh_out() and with_fh_out() helpers and remapping each handle independently.

[LOW] restore_from_table silently discards invalid JSON (remap.rs)

restore_from_table uses serde_json::from_str(json).unwrap_or_default() — if the serialized inode table JSON is corrupt (disk error, partial write), it silently falls back to an empty table with no warning. The caller logs success ("loaded inode table from snapshot") even though restoration effectively failed. A warn! log on parse failure would help operators diagnose stale-inode issues after restore.

Summary

The previous review's fixes were committed directly to the branch. The CopyFileRange/RemapFileRange dual-handle remapping remains architecturally incomplete — the fix for the missing fh_out introduced a new failure mode where a valid fh_out can be overwritten. This is low-risk in practice (rare operation, both handles typically stale after restore) but worth addressing for correctness.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Mar 3, 2026

🔧 Auto-Fix Created

I found issues and created a fix PR: #551

Please review and merge the fix PR first, then this PR.

View Claude Run

ejc3 added 3 commits March 3, 2026 23:39
… handle remap

- Extract resolve_volume_path() to eliminate duplicate canonicalize + is_dir
  between VolumeServer::new() and spawn_volume_servers_with_tables()
- Fix with_fh() for CopyFileRange/RemapFileRange: only remap fh_in, not fh_out
  (they reference different files; setting both to same value corrupts cross-file ops)
- Remove incorrect #[allow(dead_code)] on remap_refs (field IS used for serialization)
- Add debug log when O_RDWR fallback to O_RDONLY occurs in try_reopen_handle
Volumes can be read-write, so silently downgrading to O_RDONLY would
cause subsequent writes to fail. Just use O_RDWR — if the file can't
be opened (e.g., mode 0444), return the error honestly rather than
masking it with a wrong access mode.
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Mar 3, 2026

🔍 Claude Review

SEVERITY: medium

Previous Review Status

Previous reviews (review 1, review 2) identified issues. Fix PRs #550 and #551 were closed; fixes were committed directly to the branch instead. Current status of previously reported issues:

  • ✅ FD leak in concurrent reopen race — fixed in 13526d1
  • ✅ Misleading log field entriesbytes — fixed in 13526d1
  • ✅ Copy-paste doc comment on test — fixed in 13526d1
  • ✅ Silent JSON fallback — fixed with error! log in f072f6a
  • ⚠️ fh_out not remapped for CopyFileRange/RemapFileRange — partially addressed (see below)

Findings

[MEDIUM] fh_out for CopyFileRange/RemapFileRange remains unremapped after snapshot restore (refinement of previously-reported issue)

As noted in review 2, the dual-handle remapping for CopyFileRange/RemapFileRange is architecturally incomplete. Commit 1d49748 correctly fixed the corruption bug (where with_fh() was overwriting both handles), but left fh_out completely unhandled:

  1. fh() (request.rs:438) only returns fh_in — never fh_out
  2. The stale handle pre-check (remap.rs:702-707) only remaps fh_in
  3. The EBADF retry path (remap.rs:714-730) only reopens and retries with fh_in
  4. with_fh() (request.rs:460-465) now correctly only touches fh_in, with a comment claiming "the EBADF retry in RemapFs handles one handle at a time" — but there's no code that ever invokes this for fh_out

If fh_out is stale after restore, the operation will fail with EBADF and the retry will re-open fh_in (which may already be valid), leaving fh_out still stale. The comment on with_fh() is misleading about the actual behavior.

In practice this is low-risk: copy_file_range across a snapshot boundary is rare, and both handles would typically be stale together (both from the old VolumeServer). But the code claims to handle this case and doesn't. Needs either fh_out()/with_fh_out() helpers with independent remapping, or an explicit comment that fh_out remapping is a known limitation.

[LOW] PR description doesn't mention several fix commits

The PR description covers the original feature commit (d99c8f0) well but doesn't mention the 5 subsequent fix commits (13526d1 through f072f6a) that address: concurrent FD leak fix, CopyFileRange handle remap fix, O_RDONLY fallback removal, and corrupt JSON error logging. These were pushed directly to the branch in response to reviews and should be reflected in the description for posterity.

Summary

The core inode table serialize/restore mechanism is solid and the previous review findings have been addressed. The remaining fh_out remapping gap for CopyFileRange/RemapFileRange is a known limitation from prior reviews — low risk in practice but the code's comments overstate what's actually handled. No new issues found beyond what was previously flagged.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Mar 3, 2026

🔧 Auto-Fix Created

I found issues and created a fix PR: #552

Fixes a misleading comment on with_fh() that claimed the EBADF retry handles fh_out remapping (it doesn't), and a pre-existing cargo fmt issue.

View Claude Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Mar 3, 2026

🔍 Claude Review

SEVERITY: low

Previous Review Status

Three previous reviews (review 1, review 2, review 3) identified issues. All fixes were committed directly to the branch (commits 13526d1 through f072f6a). Current status:

  • ✅ FD leak in concurrent reopen race — fixed in 13526d1
  • ✅ Misleading log field entriesbytes — fixed in 13526d1
  • ✅ Copy-paste doc comment on test — fixed in 13526d1
  • ✅ Silent JSON fallback — fixed with error\! log in f072f6a
  • with_fh() CopyFileRange corruption — fixed in 1d49748
  • ⚠️ fh_out unremapped for CopyFileRange/RemapFileRange — known limitation, misleading comment fix in open PR #552

Findings

No new [MEDIUM] or [CRITICAL] issues found.

[LOW] PR description doesn't reflect fix commits — As noted in review 3, the description only covers the original feature. The 5 fix commits (13526d1 through f072f6a) addressing concurrent FD leak, CopyFileRange handle remap, O_RDONLY fallback removal, and corrupt JSON error logging should be mentioned.

[LOW] serve_vsock_with_remap_arc missing startup log — The regular serve_vsock_with_ready_signal path logs "VolumeServer starting" with port, host_path, read_only, and socket info (volume/mod.rs:105-111). The new serve_vsock_with_remap_arc path (used for portable volumes with Arc) skips this log entirely. The caller in spawn_volume_servers_with_tables logs "spawned VolumeServer" (line 296), but this happens before the server task actually starts. The missing startup log makes it harder to correlate VolumeServer activity in production logs.

Summary

The core inode table serialize/restore mechanism is solid and well-tested. All medium issues from previous reviews have been fixed. The remaining fh_out limitation is addressed by open PR #552. No new issues found beyond minor observability and documentation gaps.


Review by Claude | Run

…le volumes

- Update CopyFileRange/RemapFileRange comment to accurately note fh_out is
  NOT remapped (known limitation) instead of misleadingly saying it "must be
  remapped separately"
- Add startup log to serve_vsock_with_remap_arc so portable volume servers
  log the same info as the regular path
@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Mar 4, 2026

🔍 Claude Review

SEVERITY: low

Previous Review Status

Four previous reviews (review 1, review 2, review 3, review 4) identified issues. All medium issues have been fixed in commits 13526d1 through 8a9caad:

  • ✅ FD leak in concurrent reopen race — fixed
  • with_fh() CopyFileRange corruption — fixed
  • ✅ Misleading log field entriesbytes — fixed
  • ✅ Copy-paste doc comment on test — fixed
  • ✅ Silent JSON fallback → error! log — fixed
  • ✅ Missing startup log for portable Arc volumes — fixed
  • fh_out limitation documented with TODO comment

Findings

No new [MEDIUM] or [CRITICAL] issues found.

All previously reported [LOW] issues have either been fixed or remain as noted in prior reviews (PR description not updated for fix commits).

Summary

The inode table serialize/restore implementation is solid. All medium issues from four rounds of review have been addressed. The code correctly aligns remap_refs, inode_tables, and volume_configs indices across the snapshot/clone flow. Ready to merge.


Review by Claude | Run

@claude-claude
Copy link
Copy Markdown
Contributor

claude-claude bot commented Mar 4, 2026

🔧 CI Auto-Fix

Created fix PR: #554

Root cause: During snapshot restore, post_start() probes pasta port forwarding before the VM snapshot is loaded into Firecracker. This forces pasta to attempt L2 forwarding to a non-existent guest, poisoning its connection tracking state. All subsequent HTTP requests through pasta return 0 bytes (TCP connect succeeds but no data flows).

Fix: Skip the premature port probe in post_start() during snapshot restore. Port forwarding is verified later via verify_port_forwarding() after the VM is actually running.

View Claude Run

@ejc3 ejc3 merged commit 9736ffa into main Mar 4, 2026
18 checks passed
@ejc3 ejc3 deleted the fuse-pipe-restore branch March 4, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant