Skip to content

test#1

Open
0pcom wants to merge 247 commits intomasterfrom
develop
Open

test#1
0pcom wants to merge 247 commits intomasterfrom
develop

Conversation

@0pcom
Copy link
Copy Markdown
Owner

@0pcom 0pcom commented Jul 13, 2023

Fixes #

Changes:

How to test this PR:

0pcom added 29 commits February 4, 2026 13:36
GetServers: replace log.Fatal with log.Error + retry loop on
dmsg-discovery errors and fix broken recursive call.
ListenAndServe: replace log.Fatal with returned error on dmsg
listen failure. Update dependencies.
Add GET /dmsg-discovery/servers/clients to return all client entries
grouped by delegated server, and GET /dmsg-discovery/server/{pk}/clients
for a single server. Includes store, API, and client implementations.
- Simplify response format: return client PKs instead of full entry objects
  - /servers/clients: { "server_pk": ["client_pk1", ...], ... }
  - /server/{pk}/clients: ["client_pk1", "client_pk2", ...]
Replace global math/rand.Shuffle with a locally-seeded random generator
using crypto/rand. This ensures each dmsg client connects to servers in
a truly random order, preventing load imbalance when multiple clients
start simultaneously.
* Add HTTP endpoint documentation and examples to service help menus

dmsg-discovery: Added endpoint list and example JSON responses
dmsg-server: Added endpoint documentation

JSON examples are colorized using tidwall/pretty.

* Remove errant ANSI escape codes from flag descriptions

The ANSI reset codes (\033[0m) and newlines (\n\r) in flag descriptions
were causing issues with Cobra help menu rendering, disrupting color
output and formatting. These codes have been removed from all flag
descriptions across the codebase.

* Revert "Remove errant ANSI escape codes from flag descriptions"

This reverts commit 5a8ed71.

* Vendor skywire 48c9b3e7bf79 (help menu improvements)

* Vendor skywire f11c468 with coloredcobra help template fix

Updates skywire to include color functions in the custom help template
for usage=false mode, enabling colored output in dmsg CLI help.

* Add complete response examples for dmsg-discovery endpoints

Includes examples for:
- GET /health
- GET /dmsg-discovery/entry/{pk} (client and server entries)
- POST /dmsg-discovery/entry/ (new and update responses)
- DEL /dmsg-discovery/entry
- GET /dmsg-discovery/entries
- GET /dmsg-discovery/visorEntries
- GET /dmsg-discovery/available_servers
- GET /dmsg-discovery/all_servers
- GET /dmsg-discovery/servers/clients
- GET /dmsg-discovery/server/{pk}/clients

* Use actual buildinfo in health example with fallback values

* Show actual JSON arrays for list endpoint examples

* Use actual DMSG server entries from embedded deployment config in examples

* Style flags with defaults on new line to match skywire services
…kycoin#340)

When a dmsg server returns a non-public IP address (e.g., from a LAN
dmsg server), the client now logs a warning and continues trying other
servers instead of immediately returning an error. This ensures visors
connected to local dmsg servers can still obtain their public IP for
survey generation.
* Pass TERM environment variable from client to PTY

- Add Env field to CommandReq struct for passing environment variables
  through the RPC protocol
- Update Pty.Start to accept and merge client environment variables
  with host environment (client vars override host vars)
- Capture essential env vars (TERM, COLORTERM, LANG, LC_ALL) from
  CLI client and pass to remote PTY
- Default to TERM=xterm-256color if not set by client
- Set TERM=xterm-256color for UI sessions (web terminal)
- Remove debug log.Print("xxxx") in ui.go
- Update dependencies

This fixes terminal rendering issues caused by missing TERM variable
when connecting to remote PTY sessions.

* Improve dmsgpty-ui with resize support and reconnection

UI improvements:
- Add terminal resize support: client sends resize events to server,
  server updates PTY dimensions via SetPtySize
- Add automatic reconnection with up to 5 retry attempts
- Add visual connection status messages (connecting, connected,
  disconnected, reconnecting)
- Improve terminal styling with VS Code-like dark theme
- Add cursor blinking and better font settings
- Debounce resize events to prevent excessive updates
- Fix duplicate HTML closing tags at end of term.html

Server improvements:
- Add wsReader that intercepts JSON resize messages from WebSocket
- Parse resize messages with type="resize", cols, rows fields
- Forward regular terminal data to PTY unchanged
- Add bounds checking for terminal dimensions
Use binary WebSocket mode instead of text mode for PTY data.

PTY output contains bytes that aren't valid UTF-8 (raw terminal escape
sequences, binary data). Text mode WebSocket frames require valid UTF-8,
causing "Could not decode a text frame as UTF-8" errors and disconnects.

Binary mode handles raw terminal data correctly while the frontend
already supports ArrayBuffer messages.

Changes:
- ui.go: Change websocket.MessageText to websocket.MessageBinary
…nce (skycoin#345)

* Fix accept loop exits that permanently kill stream/connection acceptance

Non-fatal errors in accept loops were causing permanent exit of the
goroutine. The listener/session remains open but never accepts again,
silently stopping all new connections until process restart.

Fixed in: dmsgctrl ServeListener, server session smux and yamux
stream accept loops.

* Fix panics that can crash dmsg client, server, and pty host

- client.go: Fix send-on-closed-channel race in session serve goroutine
  by adding mutex protection and select with default case
- client.go: Fix unsafe type assertion on ctx.Value("dmsgServer") that
  panics if value is not a string
- stream.go: Convert prepareFields from panic to error return so callers
  can handle noise initialization failure gracefully
- host.go: Change whitelist error from Panic to Error log level so
  transient whitelist errors don't crash the pty host
- noise.go: Replace panic in RemoteStatic with error log and empty key
  return so corrupted handshake data doesn't crash the process
- read_writer.go: Replace panic in ReadRawFrame Discard with error
  return so reader errors propagate instead of crashing

* Fix CI lint errors: errcheck on writeIPRequest, naked returns in stream.go

- Add error check for prepareFields in writeIPRequest (errcheck)
- Replace all naked returns with explicit returns in writeRequest,
  writeIPRequest, and readRequest to satisfy nakedret linter

* Fix pre-existing lint errors in dmsgpty

- ui.go: Check bw.Close() return value (errcheck)
- ui_html.go: Remove blank lines after opening brace (gofmt)
- ui_html.go: Cap gzip decompression with LimitReader (gosec G110)

* Fix errcheck lint: use nolint directive for deferred bw.Close()

* Fix gofmt alignment on nolint comment
Use errors.Is(err, dmsg.ErrEntityClosed) to cleanly exit the accept
loop on shutdown instead of logging warnings for every pending accept.
* Fix DialStream to try alternative servers on stream failure

DialStream returned immediately on the first existing session's
stream dial failure without trying other delegated servers. This
caused persistent failures when a dmsg server relay was broken,
even though other servers could relay successfully.

Now both phases (existing sessions and new sessions) continue to
the next server when DialStream fails, matching the fallback
pattern already used by LookupIP.

* Fix FallbackRoundTripper request body consumption on retry

When the first transport fails, RoundTrip consumes the request
body. Subsequent transports receive an empty body, causing POST/PUT
requests to silently fail. Buffer the body upfront and reset it
for each retry attempt.

* Fix multiple bugs in core dmsg server/client and CI lint error

- Fix gosec G104 lint error in FallbackRoundTripper (CI fix)
- Defer wg.Done() in server session goroutine to prevent Close() hang on panic
- Move delEntry inside once.Do so Server.Close() is fully idempotent
- Add missing continue after empty server discovery to avoid falling
  through to connection logic with zero entries
- Reset client backoff to initial value on successful session establishment
- Fix data race: hold sessionsMx when reading c.sessions in
  startUpdateEntryLoop and initilizeClientEntry

* Fix 32 bugs across dmsg codebase

noise:
- Fix DecryptWithNonceMap never recording used nonces (replay attack)
- Fix TCP conn leak in establishConn on post-dial failure
- Fix Listener.Accept leaking conn on handshake failure
- Fix handshake goroutine leak on timeout (set deadline to unblock)
- Panic on DH crypto errors instead of silently returning zero key

disc:
- Deep-copy DelegatedServers slice in Entry.Copy()
- Fix PutEntry corrupting caller's entry sequence on failure
- Fix PutEntry returning wrong error variable on Entry() failure
- Drain response body before close for HTTP connection reuse

dmsgcurl:
- Fix response body leak on maxSize error path
- Fix division by zero in ProgressWriter when Content-Length unknown
- Replace log.Fatal with error return in Download()
- Fix -t 0 (unlimited retries) doing zero iterations

dmsgpty:
- Add mutex to protect global whitelist state from data races
- Fix open() returning stale ErrNotExist after creating config file
- Buffer excess WebSocket data in wsReader instead of discarding
- Remove infinite keep-alive loop from writeWSError
- Store exec.Cmd and call Wait() to prevent zombie processes (Unix)
- Close ConPty handle on Spawn failure (Windows)
- Use defer f.Close() in WriteConfig to prevent fd leak
- Fix discarded strings.ReplaceAll result in conf.go

dmsgctrl:
- Add write mutex to prevent concurrent conn.Write corruption
- Protect c.err with mutex to fix data race in Close()/Err()
- Close leaked Control+connection when ServeListener channel full

dmsg-discovery:
- Fix inverted nil check that always overwrites caller's logger
- Add defer r.Body.Close() in delEntry handler
- Fix net.ParseIP nil dereference on hostname input
- Use errors.Is() for wrapped error matching in handleError

dmsg-server:
- Buffer error channel to prevent deadlock
- Fix deferred listener close racing with running goroutine
- Move mutex lock before map reads in updateAverageNumberOfPacketsPerMinute

* Fix additional bugs found in second pass

cmd/dmsgcurl:
- Fix defer inside loop leaking response bodies on retry
- Fix closeAndCleanFile always seeing nil error (closure capture)
- Fix division by zero in progress writer when Content-Length unknown

cmd/dmsg-discovery:
- Fix recursive getServers discarding return value
- Fix data race on package-level err variable from goroutines

cmd/dmsgweb:
- Replace TrimRight with TrimSuffix for domain suffix stripping
- Preserve signal context instead of replacing with Background()

cmd/dmsgwebsrv:
- Preserve signal context instead of replacing with Background()

pkg/noise:
- Copy ReadRawFrame data before Discard to prevent buffer aliasing
- Remove no-op slice expression

pkg/dmsghttp:
- Fix goroutine leak in ListenAndServe when Serve returns early
- Add nil check for server.Server before accessing ServerType

pkg/direct:
- Use direct map lookup instead of O(n) scan in Entry()

pkg/disc:
- Remove duplicate nolint comment
e2e-style tests (pkg/dmsgtest/e2e_test.go):
- TestBidirectionalStreams: bidirectional data transfer at 32B/4KB/64KB
- TestMultiServerStreams: streams across multiple servers and clients
- TestConcurrentStreams: 20 simultaneous streams with data integrity
- TestSessionReconnect: client reconnects after server shutdown
- TestListenerAcceptAll: listener accepts multiple connections
- TestPortOccupied: duplicate listen returns ErrPortOccupied
- TestDialNonexistentClient: dial unknown PK returns ErrDiscEntryNotFound

direct client tests (pkg/direct/client_test.go):
- Entry lookup, post, delete, put operations
- AvailableServers/AllServers filtering
- AllEntries enumeration
- ClientsByServer/AllClientsByServer grouping
- GetClientEntry and GetAllEntries utility functions

ioutil tests (pkg/ioutil/buf_read_test.go):
- BufRead with exact fit, short buffer, empty data, large data

noise nonce tests (pkg/noise/nonce_test.go):
- DecryptWithNonceMap replay prevention
- Out-of-order decryption with nonce map
- Encrypt/decrypt roundtrip
- Large payload (64KB) roundtrip
* Add tests for disc, dmsghttp, dmsgctrl, dmsgcurl, dmsgpty, dmsgserver and update dependency graph

Improve test coverage across core packages:
- pkg/disc: 24% -> 85.6% (client lifecycle, HTTP client, entry validation)
- pkg/dmsghttp: 23.8% -> 65.5% (transport, GetServers, ListenAndServe)
- pkg/dmsgctrl: 49.3% -> 84.9% (ServeListener, ping/pong, concurrency)
- pkg/dmsgcurl: 16.2% -> 44.2% (URL parsing, progress writer, CancellableCopy)
- pkg/dmsgpty: 43.1% -> 47.5% (whitelist, RPC utils, config)
- pkg/dmsgserver: 0% -> 88.2% (config generation, flush)

Also update README to use `go run github.com/loov/goda@latest` and
regenerate the dependency graph SVG.

* Eliminate internal packages to enable external testing

Move all internal packages to pkg/ so they can be imported and tested
by external packages, addressing the testing infrastructure limitation.

Package moves:
- internal/servermetrics -> pkg/dmsg/metrics
- internal/discmetrics -> pkg/disc/metrics
- internal/cli + internal/flags -> pkg/dmsgclient (merged)
- internal/dmsg-discovery/api -> pkg/discovery/api
- internal/dmsg-discovery/store -> pkg/discovery/store
- internal/dmsg-server/api -> pkg/dmsgserver (merged with existing config)
- internal/fsutil -> deleted (inlined os.Stat at single call site)

Only internal/e2e/ remains, containing integration test infrastructure
that is legitimately test-only.

API renames in pkg/dmsgserver: API -> ServerAPI, New -> NewServerAPI

* Refactor: extract cmd boilerplate, convert to go:embed, fix panics, add CloseQuietly

Command boilerplate:
- Add ExecName() and Execute() helpers to pkg/dmsgclient
- Replace duplicated Use: expression and Execute() in all 13 cmd packages

Embedded HTML:
- Convert pkg/dmsgpty/ui_html.go from 5738-line hex literal to //go:embed
  with term.html.gz asset file (same runtime behavior)

Panic fixes (library code only, tests left as-is):
- pkg/dmsg/types.go: SignBytes, MakeSignedStreamRequest/Response now return errors
- pkg/dmsg/util.go: encodeGob now returns error
- pkg/dmsg/const.go: shuffleServers now returns error
- pkg/dmsg/metrics/victoria_metrics.go: invalid delta logs instead of panicking
- pkg/dmsgcurl/dmsgcurl.go: String() returns error string instead of panicking
- pkg/dmsgpty/ui.go: writeHeader returns error instead of panicking
- All callers updated to handle new error returns

Error suppression:
- Add pkg/ioutil.CloseQuietly for deferred Close() calls

Also regenerate dependency graph SVG.

* Split large files and add composable sub-interfaces

File splits (same package, no API changes):
- pkg/dmsg/client.go (723 lines) -> client.go + client_sessions.go + client_dial.go
- cmd/dmsg-discovery/commands/dmsg-discovery.go (533 lines) -> dmsg-discovery.go + examples.go
- pkg/dmsgclient/cli.go (553 lines) -> cli.go + cli_fallback.go

Interface segregation (backwards-compatible, existing interfaces unchanged):
- pkg/disc: Add EntryReader and EntryWriter sub-interfaces; APIClient now embeds them
- pkg/discovery/store: Add EntryStore, ServerLister, EntryEnumerator sub-interfaces;
  Storer now embeds them

All existing implementations continue to satisfy the original interfaces.
The new sub-interfaces allow callers to accept narrower types.

* Fix CI lint errors: gofmt formatting and errcheck in test files

- Run gofmt on cmd files with formatting issues
- Add //nolint:errcheck to test cleanup Close() calls
- Fix indentation in test files

* Fix bugs: resource leaks, race conditions, missing error handling

HIGH:
- cmd/dmsgweb: Fix nil deref crash when url.Parse fails in reverse proxy
- cmd/dmsgweb: Add missing wg.Done() in SOCKS5 goroutine (deadlock on shutdown)
- cmd/dmsgweb: Use defer for wg.Done() in proxyHTTPConn (deadlock if panic)
- pkg/dmsg/client: Make errCh send non-blocking to prevent goroutine hang

MEDIUM:
- pkg/dmsg/server: Server.Close() now returns actual error instead of nil
- pkg/dmsg/server: Close conn when smux/yamux Server init fails (TCP leak)
- pkg/dmsg/client_sessions: Close conn on makeClientSession/mux failure (TCP leak)
- pkg/dmsg/client: Retry initial post on failure instead of giving up
- pkg/dmsg/entity_common: Copy session keys while holding mutex (was empty)
- pkg/dmsg/entity_common: Wrap error context in getServerEntry/getClientEntry
- pkg/dmsg/listener: Close drained streams on listener shutdown (resource leak)
- pkg/dmsgserver: Acquire mutex in SetDmsgServer (data race)
- pkg/dmsgcurl: Use caller's context for dmsgC.Serve (cancellation propagation)
- cmd/dmsgweb: Fix duplicate DmsgDiscURL check (second should be DmsgDiscAddr)
- cmd/dmsgweb: Close both conns after io.Copy to unblock goroutine

LOW:
- pkg/dmsg/client: Fix typo "successed" -> "succeeded", stop ticker leak
- pkg/dmsgpty: Use %w instead of %v for error wrapping

* Fix CI lint: gofmt, errcheck, gosec, misspellings in test files

- Run gofmt on dmsg-server commands and dmsgserver api
- Add //nolint:errcheck,gosec to test helper functions
- Fix "cancelled" -> "canceled" misspelling (3 test files)
- Add //nolint:gosec for G304 (file variable in test) and G114 (test http.Serve)

* Fix remaining CI lint: gofmt, errcheck, gosec annotations

* Fix CI lint: gofmt all files, add errcheck/gosec nolint on conn.Close

* Fix CI: add gosec to remaining nolint annotations

* Update vendor dependencies

- github.com/skycoin/skycoin v0.28.3 -> v0.28.5-alpha1 (90b668188f85)
- github.com/skycoin/skywire v1.3.35 -> v1.3.37
- golang.org/x/crypto v0.48.0 -> v0.49.0
- golang.org/x/net v0.51.0 -> v0.52.0
- golang.org/x/sys v0.41.0 -> v0.42.0
- Various other minor updates (smux, VictoriaMetrics, etc.)

* Update CI to Go 1.26.x and golangci-lint v2.11.4

- Bump go-version from 1.25.x to 1.26.x in CI workflow
- Bump golangci-lint from v2.6.1 to v2.11.4 (built with go1.26)
- Simplify Makefile lint target to single ./... pass

* Suppress new gosec G118/G115 rules from golangci-lint v2.11.4

* Move gosec G118 nolint to go func() line where error is reported

* Update Dockerfiles to Go 1.26 (matches go.mod)
* Add pprof flags to all long-running dmsg services

Add --pprofmode and --pprofaddr flags matching the skywire visor pattern
to dmsg-server, dmsg-discovery, dmsgweb, dmsgpty-host, dmsghttp, and
dmsg-socks5. Extract shared pprof utility to pkg/cmdutil/pprof.go to
eliminate duplication. Supports cpu, mem, mutex, block, trace, and http
profiling modes.

* Update vendor dependencies

go-toml/v2 v2.2.4 -> v2.3.0
skycoin v0.28.5-alpha1 -> v0.28.5

* Fix lint errors in pprof utility

Fix nakedret and gosec G104 violations by using explicit returns and
handling file close errors.

* Vendor skycoin commit f48988877c68

Update github.com/skycoin/skycoin to f48988877c68c8f92773008b6d73ce7f6f357d1e
* Add ephemeral keypair pool for noise handshakes

Pre-generate secp256k1 keypairs in a background goroutine and serve
them from a buffered channel pool. This eliminates the per-handshake
cost of EC key generation under load, allowing burst handling of
concurrent handshakes without blocking on crypto operations.

* Optimize noise handshake and encrypt/decrypt hot paths

- Eliminate per-encrypt nonce buffer allocation by using a reusable
  [8]byte field in the Noise struct
- Pre-allocate output buffer in EncryptUnsafe to avoid append growth
- Add sync.Pool for write frame buffers to reduce allocation pressure
- Skip redundant NewPubKey/NewSecKey validation in DH() since keys are
  already validated by the noise state machine (ECDH still validates)
- Skip cipher.NewPubKey validation in RemoteStatic() since the key was
  already verified during the handshake
- actions/checkout: v3/v4 → v5
- actions/setup-go: v5 → v6
- docker/login-action: v2 → v4
- golangci/golangci-lint-action: v7 → v9

Node.js 20 actions are deprecated and will be forced to Node.js 24
starting June 2nd, 2026.
* Fix server CPU exhaustion under high stream load

- Enforce maxSessions limit: reject new TCP connections when at capacity
  instead of accepting and logging a debug message
- Add per-session concurrent stream limit (2048) using a semaphore to
  prevent a single session (e.g. setup-node) from spawning unbounded
  goroutines that starve the CPU
- Add backoff delay (50ms) on non-fatal stream accept errors to prevent
  tight CPU spin loops when persistent errors occur
- Streams that exceed the concurrency limit are immediately closed
  rather than queued, providing backpressure to the client

* Revert maxSessions rejection to original behavior

maxSessions only controls discovery advertisement, not connection
acceptance. Services and visors connect to all servers regardless
of advertised load, so rejecting sessions would break connectivity.

* Add stream read deadline and fix indentation

- Add read deadline (HandshakeTimeout) on initial stream request read
  so slow or malicious clients cannot hold goroutines and semaphore
  slots indefinitely. Deadline is cleared before the long-lived
  bidirectional copy loop.
- Remove stale TODO comment in server accept loop
- Fix indentation from previous revert

* Ensure pprof HTTP server remains responsive under high load

Run the pprof HTTP server on a dedicated OS thread via
runtime.LockOSThread() and bump GOMAXPROCS by 1 to reserve a thread
for it. This ensures the kernel scheduler gives pprof CPU time even
when the Go runtime is saturated with thousands of stream-handling
goroutines, which is exactly when pprof is needed most to diagnose
the problem.
…skycoin#354)

Streams that complete the handshake but never receive data would block
in smux.waitRead indefinitely, holding their ephemeral port forever.
Over time this exhausts all ~16K ports (49152-65535) on the Porter,
causing "ephemeral port space exhausted" errors for new streams.

Fix by adding a 2-minute idle timeout (StreamIdleTimeout) that is:
- Set as a read deadline after the stream handshake completes
- Refreshed on every successful read, so active streams are unaffected
- Applied on both initiating (DialStream) and responding (acceptStream)

Stale streams will time out, the caller gets an error, and the stream
is closed — releasing its ephemeral port back to the pool.
* Fix audit findings: panics, deadlocks, underflow, and error handling

- Replace panic with error return in updateServerEntry for empty addr
- Fix integer underflow in available sessions calculation (clamp to 0)
- Fix deadlock risk: move session callbacks outside sessionsMx lock,
  have callbacks acquire lock themselves to avoid recursive locking
- Fix double-close in SessionCommon.Close: use else-if so only the
  active mux (smux or yamux) is closed, not both
- Fix unbounded backoff growth when maxBO is 0
- Add logging when listener accept buffer is full (was silent drop)
- Log when error channel is full and errors are dropped

* Fix second audit pass: panics, unbounded reads, missing limits

- Replace panic() with error return in hostMux.Handle and ServeConn
  path match — prevents crashes from malformed URL patterns
- Cap PtyGateway.Read allocation to 64KB to prevent memory exhaustion
  from malicious or buggy RPC requests
- Add MaxHeaderBytes (16KB) to dmsghttp server to mitigate slowloris
- Remove stray println() debug output in dmsgpty-cli
- Fix context.Background() replacing parent context in dmsghttp proxy
  setup — signal cancellation was being lost
- Add 50ms backoff on temporary accept errors in dmsgpty host to
  prevent CPU spin on persistent transient errors

* Fix path traversal in dmsgcurl output file handling

When output is a directory, the URL path was joined directly without
sanitization, allowing paths like ../../etc/passwd to escape the
intended output directory. Use filepath.Base to extract only the
filename component.

* Vendor skywire commit a5facdc74e72

Update github.com/skycoin/skywire to a5facdc74e72d4a3562e90cf7318e0f235b6d48f
Also updates skycoin, pgx, goldmark, and resolves genproto module conflict.
* Implement server-to-server mesh for cross-server client connectivity

Enable clients connected to different dmsg servers to communicate by
having servers peer with each other. This removes the scaling limitation
where clients must be on the same server to reach each other.

Design:
- Servers peer as clients to each other using existing session mechanism
  (TCP + noise XK handshake + yamux), requiring no new transport code
- Peers configured via static config (no discovery dependency)
- When a server can't find destination client locally, it tries
  forwarding through peer server sessions
- 1-hop maximum: peer servers only check local sessions, no further
  forwarding (prevents loops without TTL)
- Original SignedObject forwarded as-is (client signature preserved)
- Backward compatible: no wire protocol changes, existing clients
  work unchanged

Key changes:
- ServerConfig.Peers: static peer server list (PK + address)
- Server.peerSessions: outbound connections to peer servers
- Server.peerPKs: identifies incoming sessions as peer servers
- SessionCommon.isPeer: relaxes SrcAddr.PK check for forwarded requests
- ServerSession.forwardViaPeer: iterates peers on local lookup failure
- maintainPeerConnection: persistent connection with reconnect backoff

Config example:
  "peers": [{"public_key": "02abc...", "address": "1.2.3.4:8081"}]

* Auto-discover peer servers from discovery

Servers now automatically discover and peer with all other servers
registered in dmsg discovery, in addition to statically configured
peers. A background loop queries AllServers periodically and
establishes peer connections to any new servers found.

Static config peers take priority and are always connected. Discovery-
based peers are additive — they're discovered and connected without
requiring any config changes.

This means in the current deployment, all dmsg servers will
automatically mesh with each other as long as they share the same
dmsg discovery.

* Add mesh fallback in DialStream and cross-server e2e test

DialStream now falls back to trying all existing sessions when the
target's delegated servers are unreachable. If the client's server is
meshed with the target's server, the request is forwarded through the
peer connection transparently.

The e2e test verifies: two servers peered via static config, each with
one isolated client (separate filtered discovery), cross-server dial
succeeds with bidirectional 1KB data transfer through the mesh.

* Prefer existing sessions over new connections in DialStream

Reorder DialStream to try mesh forwarding through existing sessions
before attempting to establish new server connections. The new order:

1. Existing sessions matching target's delegated servers (direct, free)
2. All other existing sessions via mesh (free, already connected)
3. New sessions to delegated servers (expensive, last resort)

This avoids unnecessary TCP+noise+yamux handshakes when the client
is already connected to meshed servers that can forward the request.

* Fix session handshake timeout and DefaultMaxSessions inconsistency

- Replace hardcoded 5s timeout in initClient/initServer with the
  HandshakeTimeout constant (20s). The 5s was too aggressive and
  inconsistent with the exported constant used elsewhere.
- Change DefaultMaxSessions from 100 to 2048 to match the actual
  production default in dmsgserver config.
- Use dmsg.DefaultMaxSessions in dmsgserver GenerateDefaultConfig
  instead of a hardcoded 2048, ensuring a single source of truth.

* Update vendor dependencies

bytedance/sonic/loader v0.5.0 -> v0.5.1
gin-contrib/sse v1.1.0 -> v1.1.1

* Add useful Makefile targets from skywire

Add targets ported from skywire's Makefile:
- update-dep: go get -u, tidy, vendor, auto-commit
- update-skywire: update skywire dep to latest develop
- update-skycoin: update skycoin dep to latest develop
- push-deps: commit and push vendor changes
- sync-upstream-develop: sync fork's develop with upstream
- tidy: standalone go mod tidy
- format now depends on tidy (like skywire)
- dep now depends on tidy

* Fix TODO audit: whitelist, waitgroup, kill workaround, stale comments

- Implement SOCKS5 whitelist enforcement: connections from PKs not in
  the --wl list are now rejected (was a no-op despite accepting the flag)
- Add waitgroup to Client for clean goroutine shutdown on Close()
- Remove kill.go force-exit workaround: all commands now use
  cmdutil.SignalContext for proper signal handling
- Document why timestamp tracking passes 0: concurrent streams from the
  same client can arrive out of order, and noise nonce tracking already
  prevents replay at the session level
- Remove resolved TODO on pty_client.go error choice

* Trigger CI re-run

* Improve test reliability for CI flaky tests

- TestControl_Ping: use require.NoError for fail-fast, close controls
  in correct order (responder first) to avoid EOF race on pipe cleanup
- TestHTTPTransport_RoundTrip: use graceful srv.Shutdown() instead of
  raw lis.Close() to let in-flight HTTP requests finish before closing,
  preventing race between handler goroutines and listener teardown

* Fix data race on peerPKs map access

peerPKs was read in isPeerPK (from handleSession goroutines) and
written in discoverAndConnectPeers without synchronization. Protect
both accesses with peerSessionsMx.
The NonceMap (map[uint64]struct{}) grew forever on long-lived sessions,
accumulating one entry per decrypted message. For the setup-node
handling thousands of streams, this leaked megabytes of memory over time.

Replace with NonceWindow: a sliding window using a 1024-bit bitmap
(128 bytes) that tracks the highest nonce seen and the last 1024 nonces
for out-of-order replay detection. Memory usage is constant regardless
of session lifetime.

Since the transport is reliable (TCP via yamux/smux), nonces arrive
mostly in order, so a 1024-entry window is more than sufficient.
Nonces older than the window are rejected as replays.

The old NonceMap and DecryptWithNonceMap are kept but deprecated for
backward compatibility.
* Update README with badges, mesh docs, and dependency graph

- Replace dead Travis CI badge with GitHub Actions badges (test,
  deploy, release), Go Report Card, OpenSSF Scorecard, go.mod version,
  and Arch Linux package badges
- Document server-to-server mesh architecture and configuration
- Add descriptions for dmsgweb, dmsghttp, and dmsg-socks5 tools
- Expand architecture section with key concepts (sessions, streams,
  mesh)
- Regenerate dependency graph with goda

* Update README and disable failing deploy workflow

- Remove deploy badge (always fails), add GoDoc badge
- Replace "mesh" terminology with "relay" and "server-to-server
  connections" for accuracy — dmsg is an anonymous relay system
- Hide deploy.yml workflow by renaming to .deploy.yml (keeps the
  file but GitHub Actions won't run it)
- Document the dial order for cross-server relay
- Clarify that relay servers cannot read stream contents

* Remove GoDoc badge (no license file in repo)

* Skip CI tests when only docs/non-code files change

Add paths-ignore to test workflow so PRs that only modify markdown,
docs, LICENSE, .gitignore, or CHANGELOG don't trigger the full test
suite across all three platforms.
smux (unlike yamux) has no built-in ping. Implement it using a
lightweight stream-level ping protocol:

Client side (SessionCommon.Ping):
- Opens a temporary smux stream
- Writes a 2-byte zero marker [0x00, 0x00] (ping)
- Reads 2-byte echo, measures RTT
- Closes stream (5s deadline)

Server side (serveStream):
- Reads first 2 bytes of each new stream
- If [0x00, 0x00]: echoes the marker back and closes (ping response)
- Otherwise: passes the bytes through to readRequest via MultiReader

The [0x00, 0x00] marker is safe because it represents a zero-length
object, which cannot occur in normal session traffic (valid
SignedObjects always have length > 0).

Yamux sessions continue to use the built-in yamux.Ping().
If Close() runs before Serve() calls wg.Add(1), the WaitGroup counter
is 0, Wait() returns immediately, and then Serve() calls Add(1) on a
completed WaitGroup — a data race. Check the done channel before
wg.Add(1) so Serve() returns ErrClosed if the server is already shut
down.
skycoin#361)

* Optimize DialStream with route caching, latency sorting, and entry caching

- Add route cache: remember which server successfully reached a destination,
  try it first on subsequent dials, evict on failure
- Sort sessions by measured ping latency so lowest-latency server is tried
  first instead of random map iteration order
- Cache discovery entry lookups with 30s TTL to avoid re-querying HTTP
  discovery on every request
- Background ping loop measures all session RTTs every 30s

* Fix dmsgweb proxy: propagate request context and fix error handling

- Use http.NewRequestWithContext to propagate browser request context
  to dmsg dial, so cancellations stop the stream dial immediately
  instead of waiting for the full 20s HandshakeTimeout
- Remove impossible c.String(500) after c.Status() was already written,
  which caused "Headers were already written" warnings in gin

* Refactor HTTPTransport to use http.Transport with dmsg DialContext

Replace manual stream-per-request dial/write/read pattern with Go's
http.Transport using a custom DialContext. Keep-alives are disabled
because dmsg streams use noise-encrypted per-stream handshakes that
make connection reuse unreliable (server ReadTimeout can expire between
requests, and POST requests cannot be retried on stale connections).

Benefits:
- Proper request context propagation through the transport
- Standard error handling and timeout support
- Removes manual wrappedBody response draining hack
- Normalizes dmsg:// URLs to http:// for Go's transport
- Cleans up idle connections on context cancellation

* Fix TCP proxy race, gin server leak, ReverseProxy Director, and HTTP timeouts

- Fix TCP proxy io.Copy race: close both connections after first copy
  returns to unblock the second, preventing goroutine leak
- Replace dlog.Fatal with error return on port overflow (was killing process)
- Replace gin r.Run() with http.Server and graceful Shutdown on context
  cancel, preventing goroutine leak on shutdown
- Pass context to proxyTCPConn/proxyHTTPConn for proper cancellation
- Fix silent ReverseProxy Director failure: parse URL before creating
  proxy, return 500 on parse error instead of forwarding to wrong URL
- Add 30s timeout to HTTP clients in dmsghttp/util.go to prevent hanging

* Harden dmsgweb: connection limits, body limits, close error logging

- Add connection semaphore (max 256) to server-side TCP proxy to prevent
  unbounded goroutine growth from many simultaneous connections
- Fix server-side TCP proxy io.Copy race: close both connections after
  first copy returns, wait for goroutine with done channel
- Add 10MB request body limit via http.MaxBytesReader in HTTP proxy
- Log close errors at debug level instead of silently ignoring them

* Fix CI lint errors: gosec, misspell, and unhandled errors

- Fix G104 (gosec): handle Close() errors with debug logging instead
  of ignoring them in TCP proxy
- Fix G112 (gosec): add ReadHeaderTimeout to HTTP server to prevent
  Slowloris attacks
- Fix G118 (gosec): use parent context for DialStream instead of
  context.Background(); add nolint for intentional Background in
  graceful shutdown
- Fix misspell: cancelled -> canceled in comment

* Revert HTTPTransport to direct stream approach for CI compatibility

The http.Transport wrapper with DisableKeepAlives caused timeouts on
Windows CI and hangs on Linux CI due to Go's transport adding overhead
(Connection: close headers, persistConn goroutines) that interacts
poorly with noise-encrypted streams under concurrent load.

Revert to the proven direct approach: dial stream, write request, read
response, wrap body to close stream. Keep the dmsg:// URL normalization.

* Remove unnecessary nolint:govet directive
* Fix shutdown hang: add timeout to discovery entry deletion

Client.Close() called delEntry(context.Background()) which makes HTTP
requests to the discovery server with no timeout. When discovery is
accessed over dmsg (the transport being closed), the Entry() lookup
falls back to HTTP-over-dmsg which hangs forever since the dmsg
client is already shut down.

Add a 5-second timeout context so Close() always completes.

* Add hidden --with-kill flag for force-exit safety net

Add a hidden persistent flag --with-kill that enables the force-exit
goroutine (3x Ctrl+C = os.Exit). Available on all subcommands as a
safety net when graceful shutdown hangs.

Usage: skywire dmsg web --with-kill
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants