Skip to content

Network VIF leak when destroying VMs #3

@trapframe

Description

@trapframe

We are trying to use lixs to replace oxenstored which is not capable to cope with lots of VMs restore/destroy in parallel.

Everything works fine , VMs are functionnal and the performance of lixs and the whole system is much better than the oxenstored one.

However, we found that when issuing an "xl destroy" the VIF interface linked to the VM is not deleted and xl complains.

Below a debug output of xl

libxl: debug: libxl_domain.c:1040:libxl_domain_destroy: Domain 6:ao 0x56156ae580f0: create: how=(nil) callback=(nil) poller=0x56156ae549b0
libxl: debug: libxl_dm.c:3237:libxl__destroy_device_model: Domain 6:Didn't find dm UID; destroying by pid
libxl: debug: libxl_dm.c:3106:kill_device_model: Device Model signaled
libxl: debug: libxl_event.c:639:libxl__ev_xswatch_register: watch w=0x56156ae5f190 wpath=/local/domain/0/backend/vif/6/0/state token=3/0: register slotnum=3
libxl: debug: libxl_domain.c:1049:libxl_domain_destroy: Domain 6:ao 0x56156ae580f0: inprogress: poller=0x56156ae549b0, flags=i
libxl: debug: libxl_event.c:576:watchfd_callback: watch w=0x56156ae5f190 wpath=/local/domain/0/backend/vif/6/0/state token=3/0: event epath=/local/domain/0/backend/vif/6/0/state
libxl: debug: libxl_event.c:881:devstate_callback: backend /local/domain/0/backend/vif/6/0/state wanted state 6 still waiting state 5
libxl: debug: libxl_linux.c:235:libxl__get_hotplug_script_info: Domain 6:backend_kind 3, no need to execute scripts
libxl: debug: libxl_device.c:1176:device_hotplug: Domain 6:No hotplug script to execute
libxl: debug: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x56156ae5e340: deregister unregistered
libxl: debug: libxl_linux.c:235:libxl__get_hotplug_script_info: Domain 6:backend_kind 3, no need to execute scripts
libxl: debug: libxl_device.c:1176:device_hotplug: Domain 6:No hotplug script to execute
libxl: debug: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x56156ae5e750: deregister unregistered
libxl: debug: libxl_linux.c:235:libxl__get_hotplug_script_info: Domain 6:backend_kind 6, no need to execute scripts
libxl: debug: libxl_device.c:1176:device_hotplug: Domain 6:No hotplug script to execute
libxl: debug: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x56156ae5d550: deregister unregistered
libxl: debug: libxl_aoutils.c:88:xswait_timeout_callback: backend /local/domain/0/backend/vif/6/0/state (hoping for state change to 6): xswait timeout (path=/local/domain/0/backend/vif/6/0/state)
libxl: debug: libxl_event.c:676:libxl__ev_xswatch_deregister: watch w=0x56156ae5f190 wpath=/local/domain/0/backend/vif/6/0/state token=3/0: deregister slotnum=3
libxl: debug: libxl_event.c:865:devstate_callback: backend /local/domain/0/backend/vif/6/0/state wanted state 6  timed out
libxl: debug: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x56156ae5f190: deregister unregistered
libxl: debug: libxl_device.c:1090:device_backend_callback: Domain 6:calling device_backend_cleanup
libxl: debug: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x56156ae5f190: deregister unregistered
libxl: error: libxl_device.c:1105:device_backend_callback: Domain 6:unable to remove device with path /local/domain/0/backend/vif/6/0
libxl: debug: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x56156ae5f290: deregister unregistered
libxl: error: libxl_domain.c:1290:devices_destroy_cb: Domain 6:libxl__devices_destroy failed
libxl: debug: libxl_domain.c:1355:devices_destroy_cb: Domain 6:Forked pid 21164 for destroy of domain
libxl: debug: libxl_event.c:1893:libxl__ao_complete: ao 0x56156ae580f0: complete, rc=0
libxl: debug: libxl_event.c:1862:libxl__ao__destroy: ao 0x56156ae580f0: destroy

The issue seems related to this line:

libxl: debug: libxl_event.c:881:devstate_callback: backend /local/domain/0/backend/vif/6/0/state wanted state 6 still waiting state 5

looking at the lixs debug log or using strace i cannot see a "write" with a state 6 in the socket.
We can see write with state 5 and the operation is successfull:

INFO  [S177] > { type = 11, req_id = 0, tx_id = 210, len = 40, msg = "/local/domain/0/backend/vif/6/0/online 0" }
INFO  [S177] < { type = 11, req_id = 0, tx_id = 210, len = 2, msg = "OK" }
INFO  [S177] > { type = 11, req_id = 0, tx_id = 210, len = 39, msg = "/local/domain/0/backend/vif/6/0/state 5" }
INFO  [S177] < { type = 11, req_id = 0, tx_id = 210, len = 2, msg = "OK" }
INFO  [S177] > { type = 7, req_id = 0, tx_id = 210, len = 2, msg = "T " }
INFO  [S177] < { type = 7, req_id = 0, tx_id = 210, len = 2, msg = "OK" }

if we try to set state to 6 manually before the destroy using:

xenstore-write /local/domain/0/backend/vif/6/0/state 6

The destroy is OK and xl does not complains, but only part of the interface is delete (the -qemu one) but the vif remains (need to call ifconfig -a to see it )

We are running Xen 4.13

Any idea to troubleshoot this further ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions