Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@
/doc/draft.txt
/doc/draft.xml
/doc/tracker.log
/scripts/mem_subset.csv
131 changes: 131 additions & 0 deletions doc/c4-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -1051,6 +1051,137 @@ and then maybe switch to startup mode if a lot of capacity is
available. This is something that we intend to test, but have not
implemented yet.

## Adaptation to ECN/L4S

Tests with L4S active queue management showed the tension between the
periodic updates and L4S goal to minimize queue sizes. Typical L4S deployment
start marking packets with ECN/CE when the queue size is about 1.5ms, and
increase the mark rate progressively as the queue size increases,
reaching 100% when the queue size is about 2ms. If C4 pushes at 25% every 6 RTT,
and if the bandwidth estimate is accurate,
the queue size will increase by 25% of the RTT during the first roundtrip,
before any correction signal can be applied. The increased marking
rate will affect all connections sharing the bottleneck, which is
not desirable.

L4S is tuned for the "Prague" algorithm, which increases CWIN by one packet every
RTT. In a typical trial with a 20ms RTT and a 100 Mbps data rate, it takes 0.12ms
to send a packet, and thus 12.5 RTT before building a queue of 1.5ms. In the same
conditions, C4 would have increased the rate by 25% after 6 RTT in the
aggressive scenario, thus triggering a high rate of marking.

The cascade process made the problem even worse. If a push at 6.25% does increase
the nominal rate, the next push will be at 25%. If that push and the next one
did increase the nominal rate, C4 will reenter the initial phase, even if some
of the pushes did cause ECN/CE marks. The initial phase will then cause a lot
of packet losses, which will degrade performance.


To mitigate this issue, we had to add a "very low" pushing mode, setting the
pushing rate to only 3.125% if the previous push resulted in a high rate of ECN/CE marks.
We also replaced the somewhat adhoc "count of successive probes" by the management
of a "probe level", defining 4 levels:

- level 0: pushing at 3.125%, spend 1 cycle in cruising before pushing.
- level 1: pushing at 6.25%, spend 4 cycles in cruising before pushing.
- level 2: pushing at 25%, spend at most 1 cycle in cruising before pushing.
- level 3: pushing at 25%, spend at most 1 cycle in cruising before pushing.

The "probe level" is updated after the recovery phase as follow:

- if the previous probe was successful and did not result in a high rate of ECN/CE marks,
increase the probe level by 1. If the probe level was already at 3, reenter the startup phase.
- if the previous probe was successful but did result in a high rate of ECN/CE marks,
remain at the same probe level.
- if the previous probe was not successful but did not result in a high rate of ECN/CE marks,
stay at probe level 0 if already at that level, otherwise move back to probe level 1.
- if the previous probe was not successful and did result in a high rate of ECN/CE marks,
move to probe level 0.

This logic treats the CE marking differently from other congestion signals, because
the CE marks are an intentional indication of congestion by the network, and is thus
less ambiguous than delay increases or packet losses, which can be caused by other
factors such as delay jitter or random transmission issues. Simulations show that
this logic allows to quickly discover the available capacity in L4S networks, whithout spuriously
reentering the startup phase and causing packet losses. It is equivalent to the
previous logic when the network does not support L4S.

# Revisiting the Initial Phase

Our November 2025 design of C4 included a "rate based"
initial phase, during which C4 will send at twice the "nominal rate",
monitor acknowledgments and increase the nominal rate if measurements
increase, and exit if congestion is detected or if the measurements
do not increase for 3 consecutive RTT. That algorithm works
well in most scenario, but we were observing early exits in
"high delay jitter" scenarios, such as Wi-Fi networks with lots of
packet collisions.

After observing that phenomenon, we realized that the
rate based algorithm was failing in case of high delay jitter
because it was setting the CWND to the product of pacing rate
and the "nominal" max RTT. The nominal Max RTT was set to a fixed
value, observed either before the initial phase or on the first
roundtrip in that phase. It would work if the initial phase
started during a high jitter event and the initial RTT was large
enough, but in many case it was not and became a limiting
factor.

## Why not increasing Max RTT during Initial phase?

In the initial phase, the algorithm tries to discover the bandwidth
and does not yet have a good estimate of delay jitter, which typically
requires a series of measurements. In these conditions, it is
easy to underestimate the max RTT. On the other hand, the flow is
deliberately probing at a high data rate. If the algorithm
allows updates of max RTT during that phase, the risks of
spiraling into buffer boat are very high, but if the CWND
remains too low, the risk of exiting startup with a severely
underestimated data rate is also very high.

We tried to develop simple rules to classify the delay measurements
between caused by jitter, and caused by congestion. If we could do that,
we would be able to increase the max RTT safely, when appropriate.
However, we could not find variables that were both easy to monitor
and well correlated with the actual cause of the delay.


## Building a robust initial estimator

The "rate based" initial estimator requires estimating both the
data rate and the max RTT simultaneously. In contrast, the "CWND based"
initial estimator use in algorithms like Reno or Cubic
only requires estimating the CWND, plus a possibly
loose estimate of the data rate. The Reno algorithm is remarkably
simple: just increase the CWND by the number of bytes acknowledged,
without any explicit dependency on the measured latency.

The Reno algorithm terminates when packet losses are observed,
leading to bufferbloat. Hystart improves that by terminating when
the measured delays start increasing, but this can lead to early
exit in case of delay jitter. The rate based algorithm terminate when
the measured bandwidth stops growing, which provides good
results. Our proposal is to combine a Reno like growth of the
CWND with a rate-control like exit condition.

Of course, things are not that simple. The "rate" test only stops the
growth of the CWND after the third "non growing" round. If CWND doubles
after each round it becomes excessive, buffers fill up, and lots
of packets are lost. We dealt with that problem by essentially
freezing the increases of after the first "non growing" round.
If a larger measurement happens before 3 RTT, the increases
resume, otherwise, C4 exits the initial phase.

When the initial phase completes, we retain as estimate of the
data rate the highest value measured so far.
We also want to obtain a reasonable estimate of the "max RTT".
In the Reno logic, the "ssthresh" is set to half the CWND
value before congestion is detected. C4 will not use the
ssthresh variable after exiting the Initial phase, but it
can set the max RTT to the quotient of ssthresh by the
final rate estimate.


# State Machine

The state machine for C4 has the following states:
Expand Down
Loading