Skip to content

IPC handshake can block indefinitely during container startup #405

@Aman-Cool

Description

@Aman-Cool

IPC handshake may block indefinitely during container startup

Description

The IPC handshake between urunc create and urunc start can block indefinitely if the peer process never connects or sends the expected message.

AwaitMessage() currently performs an unbounded wait while:

  • accepting the Unix socket connection
  • reading the IPC message

If the peer process is interrupted (for example due to a containerd restart, OOM kill, or node under heavy load), the waiting process may never exit.

Impact

  • Orphaned urunc --reexec processes
  • Containers stuck in ContainerCreating
  • Gradual resource leaks on the node
  • No clear error surfaced to the caller

Reproduction hints (non-deterministic)

This behavior is timing-dependent, but has been observed when:

  • Restarting containerd between urunc create and urunc start
  • Terminating the urunc start process during startup
  • Running on a heavily loaded node

Minimal repro outline (best-effort)

  1. Start container creation using urunc.
  2. Interrupt the startup sequence before urunc start completes.
  3. Observe that the IPC helper process remains blocked indefinitely.

Expected behavior

Container startup should either succeed or fail with a clear error.
It should not hang indefinitely in failure paths.

Related work

There is an open PR that adds a bounded timeout to the IPC handshake to avoid unbounded blocking. This issue is intended to document the problem and gather feedback on the appropriate behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions