Skip to content

fix: skip pasta port probe during snapshot restore to prevent 0-byte responses#554

Closed
claude-claude[bot] wants to merge 1 commit intofuse-pipe-restorefrom
claude/fix-22649156157
Closed

fix: skip pasta port probe during snapshot restore to prevent 0-byte responses#554
claude-claude[bot] wants to merge 1 commit intofuse-pipe-restorefrom
claude/fix-22649156157

Conversation

@claude-claude
Copy link
Copy Markdown
Contributor

@claude-claude claude-claude bot commented Mar 4, 2026

CI Fix

Fixes CI #22648352091

Problem

The clone_http/rootless/nginx benchmark panicked with:

clone HTTP failed after 10 attempts
addr: 127.0.0.3:8080
last_response: 0 bytes
connect_err: connect succeeded

TCP connections through pasta succeeded but returned 0 bytes. The root cause is a sequencing bug in the snapshot restore path:

  1. restore_from_snapshot() in common.rs calls network.post_start() (line 997) before loading the VM snapshot (line 1033+)
  2. post_start() calls wait_for_port_forwarding() which does TcpStream::connect("127.0.0.3:8080")
  3. At this point, Firecracker is running but no guest exists — the snapshot hasn't been loaded yet
  4. pasta accepts the TCP connection and attempts L2 forwarding (constructs a SYN Ethernet frame) to a non-existent guest VM
  5. This failed L2 handshake poisons pasta's internal connection tracking state
  6. All subsequent data-bearing connections through pasta get 0-byte responses — pasta accepts TCP connects but fails to forward data end-to-end

The health monitor doesn't catch this because it uses nsenter + curl via the bridge (L2 path), completely bypassing pasta's L4 translation. So the VM reports "healthy" while pasta's port forwarding is broken.

Solution

Added a restore_mode flag to PastaNetwork that skips the premature port forwarding probe in post_start() during snapshot restore. Port forwarding is still properly verified later via verify_port_forwarding(), which runs after:

  • The VM snapshot is loaded and resumed
  • fc-agent detects the restore-epoch and sends gratuitous ARP
  • fc-agent reconnects its output vsock (readiness signal)

This is the correct ordering — ports should only be probed when a guest actually exists to respond.

The normal VM boot path (podman/vm_config.rs) is unaffected — post_start() still probes ports there because the VM is already running.


Generated by Claude | Fix Run

…responses

During snapshot restore, post_start() runs BEFORE the VM snapshot is loaded
into Firecracker. The port forwarding probe in post_start() forces pasta to
accept a TCP connection and attempt L2 forwarding to a non-existent guest.
This poisoned connection attempt corrupts pasta's internal connection tracking
state, causing all subsequent data-bearing connections through pasta to return
0 bytes (TCP connect succeeds but HTTP responses are empty).

The fix adds a restore_mode flag to PastaNetwork that skips the premature
port probe in post_start(). Port forwarding is still properly verified later
via verify_port_forwarding(), which runs after the VM is resumed and fc-agent
has sent its gratuitous ARP.

Root cause: common.rs calls network.post_start() at line 997, then loads the
snapshot at line 1033+. The wait_for_port_forwarding() inside post_start()
probes pasta before any guest exists, poisoning pasta's L4 translation state.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ejc3
Copy link
Copy Markdown
Owner

ejc3 commented Mar 4, 2026

Superseded by PR #555 which applies the same fix to main (instead of fuse-pipe-restore branch) and adds a stress test.

@ejc3 ejc3 closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant