Skip to content

CARP enabled boxes with NetFlow cause routing loop #279

@m2martin

Description

@m2martin

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

Boxes sharing addresses using CARP cause routing loops/packet amplification as soon as NetFlow is activated on CARP-enabled interfaces and both boxes receive the same packet.
A CARP backup box routes packets destined to adresses of the current CARP master, either its physical and virtual address - and afterwards vice versa.

To Reproduce

Steps to reproduce the behavior:

  1. Setup two boxes
  2. Delete NAT rules and create a floating any/any/allow rule, allow local nets, etc. for debugging
  3. Setup the following network layout
                      TEST-PC1
                         |
     --+-----------------+----------------+--
       |                                  |
      em0                                em0
+-------------+                    +-------------+
| OPNsense 01 |em2  <-pfsync->  em2| OPNsense 02 |
+-------------+                    +-------------+
      em1                                em1
       |                                  |
     --+-----------------+----------------+--
                         |
                      TEST-PC2

LAN (em0)
   CARP VIP     192.168.50.10/24
   OPNsense 01  192.168.50.11/24
   OPNsense 02  192.168.50.12/24
   TEST-PC1     192.168.50.50/24
WAN (em1)
   CARP VIP     192.168.222.10/24
   OPNsense 01  192.168.222.11/24
   OPNsense 02  192.168.222.12/24
   TEST-PC2     192.168.222.50/24
pfsync (em2)
   OPNsense 01  192.168.99.1/24
   OPNsense 02  192.168.99.2/24
  1. Set OPNsense CARP VIPs as gateway on Test-PCs
  2. Verify that packets route correctly and do failover tests (should succeed)
  3. Enable NetFlow for em0 and em1 with local collection
  4. Reboot

To test, you can either ping Test-PC2 from Test-PC1, ping a physical box' address or ping the CARP VIP.

You will see that both boxes will process the packet, the non-destined box will route it with TTL -= 1 which will end in a loop until TTL times out.

Important:

If you are using a switched network, you will see the bahaviour only when the switch is flooding packets (e.g. at DLF, after STP-TC, etc.). This will limit the effect to a very short time. To have a persistent lab environment, use hubs or a non-switched virtual network.

To stop or mitigate the problem, you can stop the service samplicate, manually shut down interfaces netflow_emX using ngctl or disable CARP.

Expected behavior

CARP Failover Cluster with NetFlow functionality without re-routed/looped packets.

Describe alternatives you considered

Disable NetFlow or use NetFlow without CARP.

Screenshots

/

Relevant log files

/

Additional context

I tried to reproduce a real issue observed with hardware boxes. This finding does not only affect my tests, it is a result of a real network debugging.

Environment

Software version used:

OPNsense 25.7.11_2-amd64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions