Skip to content

Orphan tap*_urunc left after urunc restart in Kubernetes, preventing NetworkSetup #406

@sidneychang

Description

@sidneychang

Summary

When urunc exits while the Pod's network namespace remains, previously created tap*_urunc devices can persist in the namespace in a NO-CARRIER / DOWN state. Current startup treats the mere existence of a TAP as an active unikernel and refuses to create a new TAP, causing network setup to fail.

Impact

  • urunc Pod restart/retry can fail to configure networking because leftover TAPs block creation of a fresh TAP.
  • Only affects TAPs created by urunc (naming pattern tap*_urunc). Must avoid deleting other CNI/user interfaces.

Steps to reproduce

  1. Deploy the test Pod

Apply the test manifest and wait for the Pod to reach Running:

kubectl apply -f nginx-urunc.yaml
kubectl get pods

output:

deployment.apps/nginx-urunc created
service/nginx-urunc created

NAME                           READY   STATUS    RESTARTS   AGE
nginx-urunc-67f8694dd6-874rc   1/1     Running   0          55s
  1. Locate the QEMU process for the urunc Pod
    List QEMU processes on the host and record the PID of the Pod’s QEMU instance (PID 1374168 in this example):
ps aux | grep qemu

output:

root     1374168 35.0  0.0 840108 85048 ? Ssl 05:28 0:00 /usr/bin/qemu-system-x86_64 ... -net tap,ifname=tap0_urunc ...
  1. Inspect network interfaces inside the Pod netns

Enter the QEMU network namespace and list interfaces:

nsenter -t 1374168 -n ip link

output:

1: lo: <LOOPBACK,UP,LOWER_UP> ...
2: eth0@if288: <BROADCAST,MULTICAST,UP,LOWER_UP> ...
3: tap0_urunc: <BROADCAST,MULTICAST,UP,LOWER_UP> ...

At this point:

eth0 is UP, LOWER_UP
tap0_urunc is UP, LOWER_UP

The Pod is functioning normally

  1. Force-kill the QEMU process (simulate a crash)
kill -9 1374168

Check Pod status:

kubectl get pods

output:

NAME                           READY   STATUS   RESTARTS   AGE
nginx-urunc-67f8694dd6-874rc   0/1     Error    0          65s

Kubernetes then automatically restarts the Pod:

NAME                           READY   STATUS    RESTARTS     AGE
nginx-urunc-67f8694dd6-874rc   1/1     Running   1 (6s ago)   69s
  1. Locate the new QEMU process after restart
    List QEMU processes again and record the new PID (PID 1374761 in this example):
ps aux | grep qemu

output:

root     1374761  1.2  0.0 838364 83976 ? Ssl 05:29 0:00 /usr/bin/qemu-system-x86_64 ...
  1. Inspect interfaces in the new QEMU ns
nsenter -t 1374761 -n ip link

output:

1: lo: <LOOPBACK,UP,LOWER_UP> ...
2: eth0@if288: <BROADCAST,MULTICAST,UP,LOWER_UP> ...
3: tap0_urunc: <NO-CARRIER,BROADCAST,MULTICAST,UP> ... state DOWN

Observed state after restart:

eth0 remains UP, LOWER_UP

tap0_urunc still exists but is now NO-CARRIER and state DOWN

  1. Observe urunc error logs

Relevant log messages from urunc:

Failed to setup network :unsupported operation: can't spawn multiple unikernels in the same network namespace

The Pod restart succeeds at the Kubernetes level, but network setup inside urunc fails due to the presence of the pre-existing tap0_urunc device.

Reason

In kubernetes setups when a pod is getting restarted, the network namespace (created by the pause container) remains active and hence the tap0_urunc device still exists. Therefore, when urunc (re)creates the container it identifies the tap0_urunc device and it does not recreates it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    K8s/ToolsRelated to container/cloud native tools, orchestratorsNetworkbugSomething isn't working

    Type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions