private-octopus · huitema · Jan 30, 2026 · Jan 30, 2026 · Feb 21, 2026 · Feb 22, 2026
diff --git a/.gitignore b/.gitignore
@@ -22,3 +22,4 @@
 /doc/draft.txt
 /doc/draft.xml
 /doc/tracker.log
+/scripts/mem_subset.csv
diff --git a/doc/c4-design.md b/doc/c4-design.md
@@ -1051,6 +1051,137 @@ and then maybe switch to startup mode if a lot of capacity is
 available. This is something that we intend to test, but have not
 implemented yet.
 
+## Adaptation to ECN/L4S
+
+Tests with L4S active queue management showed the tension between the
+periodic updates and L4S goal to minimize queue sizes. Typical L4S deployment
+start marking packets with ECN/CE when the queue size is about 1.5ms, and
+increase the mark rate progressively as the queue size increases,
+reaching 100% when the queue size is about 2ms. If C4 pushes at 25% every 6 RTT,
+and if the bandwidth estimate is accurate,
+the queue size will increase by 25% of the RTT during the first roundtrip,
+before any correction signal can be applied. The increased marking
+rate will affect all connections sharing the bottleneck, which is
+not desirable.
+
+L4S is tuned for the "Prague" algorithm, which increases CWIN by one packet every
+RTT. In a typical trial with a 20ms RTT and a 100 Mbps data rate, it takes 0.12ms
+to send a packet, and thus 12.5 RTT before building a queue of 1.5ms. In the same
+conditions, C4 would have increased the rate by 25% after 6 RTT in the
+aggressive scenario, thus triggering a high rate of marking.
+
+The cascade process made the problem even worse. If a push at 6.25% does increase
+the nominal rate, the next push will be at 25%. If that push and the next one
+did increase the nominal rate, C4 will reenter the initial phase, even if some
+of the pushes did cause ECN/CE marks. The initial phase will then cause a lot
+of packet losses, which will degrade performance.
+
+
+To mitigate this issue, we had to add a "very low" pushing mode, setting the
+pushing rate to only 3.125% if the previous push resulted in a high rate of ECN/CE marks.
+We also replaced the somewhat adhoc "count of successive probes" by the management
+of a "probe level", defining 4 levels:
+
+- level 0: pushing at 3.125%, spend 1 cycle in cruising before pushing.
+- level 1: pushing at 6.25%, spend 4 cycles in cruising before pushing.
+- level 2: pushing at 25%, spend at most 1 cycle in cruising before pushing.
+- level 3: pushing at 25%, spend at most 1 cycle in cruising before pushing.
+
+The "probe level" is updated after the recovery phase as follow:
+
+- if the previous probe was successful and did not result in a high rate of ECN/CE marks,
+  increase the probe level by 1. If the probe level was already at 3, reenter the startup phase.
+- if the previous probe was successful but did result in a high rate of ECN/CE marks,
+  remain at the same probe level.
+- if the previous probe was not successful but did not result in a high rate of ECN/CE marks,
+  stay at probe level 0 if already at that level, otherwise move back to probe level 1.
+- if the previous probe was not successful and did result in a high rate of ECN/CE marks,
+  move to probe level 0.
+
+This logic treats the CE marking differently from other congestion signals, because
+the CE marks are an intentional indication of congestion by the network, and is thus
+less ambiguous than delay increases or packet losses, which can be caused by other
+factors such as delay jitter or random transmission issues. Simulations show that
+this logic allows to quickly discover the available capacity in L4S networks, whithout spuriously
+reentering the startup phase and causing packet losses. It is equivalent to the
+previous logic when the network does not support L4S.
+
+# Revisiting the Initial Phase
+
+Our November 2025 design of C4 included a "rate based"
+initial phase, during which C4 will send at twice the "nominal rate",
+monitor acknowledgments and increase the nominal rate if measurements
+increase, and exit if congestion is detected or if the measurements
+do not increase for 3 consecutive RTT. That algorithm works
+well in most scenario, but we were observing early exits in
+"high delay jitter" scenarios, such as Wi-Fi networks with lots of
+packet collisions.
+
+After observing that phenomenon, we realized that the
+rate based algorithm was failing in case of high delay jitter
+because it was setting the CWND to the product of pacing rate
+and the "nominal" max RTT. The nominal Max RTT was set to a fixed
+value, observed either before the initial phase or on the first
+roundtrip in that phase. It would work if the initial phase
+started during a high jitter event and the initial RTT was large
+enough, but in many case it was not and became a limiting
+factor.
+
+## Why not increasing Max RTT during Initial phase?
+
+In the initial phase, the algorithm tries to discover the bandwidth
+and does not yet have a good estimate of delay jitter, which typically
+requires a series of measurements. In these conditions, it is
+easy to underestimate the max RTT. On the other hand, the flow is
+deliberately probing at a high data rate. If the algorithm
+allows updates of max RTT during that phase, the risks of
+spiraling into buffer boat are very high, but if the CWND
+remains too low, the risk of exiting startup with a severely
+underestimated data rate is also very high.
+
+We tried to develop simple rules to classify the delay measurements
+between caused by jitter, and caused by congestion. If we could do that,
+we would be able to increase the max RTT safely, when appropriate.
+However, we could not find variables that were both easy to monitor
+and well correlated with the actual cause of the delay. 
+
+
+## Building a robust initial estimator
+
+The "rate based" initial estimator requires estimating both the
+data rate and the max RTT simultaneously. In contrast, the "CWND based"
+initial estimator use in algorithms like Reno or Cubic
+only requires estimating the CWND, plus a possibly
+loose estimate of the data rate. The Reno algorithm is remarkably
+simple: just increase the CWND by the number of bytes acknowledged,
+without any explicit dependency on the measured latency.
+
+The Reno algorithm terminates when packet losses are observed,
+leading to bufferbloat. Hystart improves that by terminating when
+the measured delays start increasing, but this can lead to early
+exit in case of delay jitter. The rate based algorithm terminate when
+the measured bandwidth stops growing, which provides good
+results. Our proposal is to combine a Reno like growth of the
+CWND with a rate-control like exit condition.
+
+Of course, things are not that simple. The "rate" test only stops the
+growth of the CWND after the third "non growing" round. If CWND doubles
+after each round it becomes excessive, buffers fill up, and lots
+of packets are lost. We dealt with that problem by essentially
+freezing the increases of after the first "non growing" round.
+If a larger measurement happens before 3 RTT, the increases
+resume, otherwise, C4 exits the initial phase.
+
+When the initial phase completes, we retain as estimate of the
+data rate the highest value measured so far.
+We also want to obtain a reasonable estimate of the "max RTT".
+In the Reno logic, the "ssthresh" is set to half the CWND
+value before congestion is detected. C4 will not use the
+ssthresh variable after exiting the Initial phase, but it
+can set the max RTT to the quotient of ssthresh by the
+final rate estimate.
+
+
 # State Machine
 
 The state machine for C4 has the following states: