Skip to content

DoH3 upstream gets stuck with http3 timeouts / "transport is closed" until ctrld is restarted (dev-882c59d on UniFi OS 10.0.162) #271

@KittyFarts

Description

@KittyFarts

Summary

After running for a while, ctrld CLI (dev-882c59d) on my UniFi UCG-Fiber stops resolving all queries.
Internet connectivity on the router is fine (routing works, ping to external IPs works), but every DNS query through ctrld fails.

The only way to recover is to either:

  • restart the ctrld CLI process, or
  • reboot the UCG-Fiber

Once restarted, resolution immediately works again using the same config and upstream.


Environment

  • Device: Ubiquiti UCG-Fiber (Cloud Gateway Fiber)
  • OS: UniFiOS 10.0.162 (EA release)
  • ctrld CLI version: dev-882c59d
  • Platform: Linux arm64 (UniFi OS)
  • Config (relevant parts):

[service]
cache_enable = true
cache_size = 300000
cache_ttl_override = 0
cache_serve_stale = true
max_concurrent_requests = 4096
metrics_query_stats = true
leak_on_upstream_failure = false
log_level = "info"
log_path = "/tmp/ctrld.log"

[listener]
  [listener.0]
  ip = '0.0.0.0'
  port = 5354

[network]
  [network.0]
  name = 'Network 0'
  cidrs = ['0.0.0.0/0', '::/0']

[upstream]
  [upstream.0]
  type = 'doh3'
  endpoint = 'https://dns.controld.com/RESOLVERID'
  • What happens:

After some time (in this case around 2025-12-07T17:23), ctrld starts logging a burst of http3 errors, then marks upstream.0 as down and never recovers on its own.

Example snippet (resolver ID was removed):

{"level":"warn","time":"2025-12-07T17:23:16-08:00.501","message":"using direct IP for \"https://dns.controld.com/RESOLVERID\": 76.76.2.22"}
{"level":"warn","error":"Get \"https://dns.controld.com/RESOLVERID?...\": http3: parsing frame failed: timeout: no recent network activity","time":"2025-12-07T17:23:16-08:00.501","message":"[a11b40] retrying request after fallback to direct ip"}
{"level":"error","error":"could not perform request: Get \"https://dns.controld.com/RESOLVERID?...\": http3: parsing frame failed: timeout: no recent network activity","time":"2025-12-07T17:23:16-08:00.506","message":"[27f140] failed to resolve query"}

{"level":"error","error":"could not perform request: Get \"https://76.76.2.22/RESOLVERID?...\": http3: transport is closed","time":"2025-12-07T17:23:23-08:00.467","message":"[fffc79] failed to resolve query"}
{"level":"warn","time":"2025-12-07T17:23:26-08:00.507","message":"upstream \"upstream.0\" marked as down after 10 seconds (failure count: 5)"}

... then many repeats of:

{"level":"error","error":"could not perform request: Get \"https://76.76.2.22/RESOLVERID?...\": http3: transport is closed","time":"2025-12-07T17:24:13-08:00.722","message":"[...] failed to resolve query"}
{"level":"warn","time":"2025-12-07T17:24:13-08:00.722","message":"upstream \"upstream.0\" marked as down immediately (failure count: 5x)"}
{"level":"error","time":"2025-12-07T17:24:13-08:00.722","message":"[...] all [upstream.0] endpoints failed"}

During this time, clients keep sending queries and all of them fail with the same http3: transport is closed / parsing frame failed: timeout: no recent network activity messages. No successful replies are logged until I restart ctrld.

After restarting the ctrld service, resolution works again immediately against the same DoH3 endpoint.

  • What I’ve checked:

Router still has internet: ping 1.1.1.1 and ping 76.76.2.22 from the UCG-Fiber succeed while ctrld is in this broken state.

No firewall blocks between the UCG-Fiber and 76.76.2.22.

This has happened multiple times on dev-882c59d with the same pattern and is always fixed by restarting ctrld or rebooting the gateway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions