Skip to content

Long exfil throughput stays low / collapses under sustained load #14

@Mygod

Description

@Mygod

Summary

Long exfil transfers can become disproportionately slow compared to downloads, even on loopback. The behavior appears protocol‑level: polls (which are needed to elicit ACKs/data) are only sent when picoquic produces a packet, but pacing/cwnd collapse suppresses sends, starving ACKs and locking the connection into a low‑throughput regime.

Environment

  • Slipstream: local build from main
  • Mode: direct connection (client points at server as resolver)
  • Network: loopback, no netem

Reproduction (direct connection, loopback)

  1. Start a TCP sink for the server target:
    nc -l 5201 > /dev/null
  2. Start the slipstream server:
    slipstream-server \
      --dns-listen-port=8853 \
      --target-address=127.0.0.1:5201 \
      --domain=test.com
  3. Start the slipstream client:
    slipstream-client \
      --congestion-control=bbr \
      --tcp-listen-port=7000 \
      --resolver=127.0.0.1:8853 \
      --domain=test.com
  4. Exfiltrate 100MB and time it:
    /usr/bin/time -f "elapsed=%e" sh -c \
      "dd if=/dev/zero bs=1M count=100 2>/dev/null | nc 127.0.0.1 7000"
  5. (Optional) Repeat with 10MB (count=10) for comparison.

Observed

Example 100MB run (loopback):

  • end‑to‑end exfil: ~1.65 MiB/s
  • end‑to‑end download: ~46.7 MiB/s

Despite zero loss and minimal RTT, exfil remains much slower than download and does not recover over time. Short transfers often complete before the low‑throughput regime is visible.

Expected

Exfil throughput should stay in the same order of magnitude as download for local direct‑connection tests, and should not collapse purely as transfer size increases.

Suspected Cause (protocol‑level)

Polling in slipstream depends on the QUIC stack producing a packet:

  • The client requests a poll by setting cnx->is_poll_requested and calling picoquic_prepare_packet_ex.
  • In src/slipstream_sockloop.c, if send_length == 0, the poll loop stops.
  • In subprojects/picoquic/picoquic/sender.c, poll insertion only happens when pacing allows sending.

This creates a feedback loop under sustained exfil:

  1. Congestion controller shrinks cwin/pacing_rate.
  2. picoquic_prepare_packet_ex yields send_length == 0.
  3. Polls stop (no DNS queries), so ACKs/data are not elicited.
  4. RTT samples worsen and pacing remains low.

The behavior aligns with docs/protocol.md (poll frames are non‑ACK‑eliciting), but in the DNS tunnel this can suppress the only mechanism that drives ACKs back from the server.

Proposed Fix (primary)

Introduce a rate‑limited poll bypass when there is in‑flight data:

  • Allow emitting a poll frame even if pacing would normally block sends.
  • Rate‑limit the bypass (e.g., one poll per RTT/4 or a small fixed interval).
  • Keep polls non‑ACK‑eliciting to avoid ping‑pong storms.

This preserves congestion safety while preventing the “no poll → no ACK → low cwnd → no poll” deadlock.

Alternative / Additional Mitigations

  • Keep is_poll_requested sticky until a poll is actually emitted (do not drop on send_length == 0).
  • Trigger periodic poll requests while bytes_in_transit > 0, rather than tying them solely to inbound DNS responses.

Files referenced

  • src/slipstream_sockloop.c (poll request + send loop)
  • docs/protocol.md (poll frame behavior)
  • subprojects/picoquic/picoquic/sender.c (poll insertion gated by pacing)
Token usage: total=11,254,186 input=9,024,612 (+ 263,297,024 cached) output=2,229,574 (reasoning 1,750,464)
To continue this session, run codex resume 019b8083-f51c-7ed1-88cf-dbdfb7c71136

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions