Skip to content

feat(cli): add multi-instance mode via subprocess spawning#11

Open
paradixe wants to merge 13 commits intossmirr:masterfrom
paradixe:feat/multi-instance-subprocess
Open

feat(cli): add multi-instance mode via subprocess spawning#11
paradixe wants to merge 13 commits intossmirr:masterfrom
paradixe:feat/multi-instance-subprocess

Conversation

@paradixe
Copy link

Spawns multiple conduit processes for higher client capacity.

Usage:

conduit start --multi-instance -m 500        # auto: 5 instances (100 clients each)
conduit start --instances 3 -m 300           # explicit: 3 instances

stats.json output:

{
  "liveInstances": 3,
  "totalInstances": 3,
  "connectedClients": 42,
  "instances": [
    {"id": "instance-0", "connected": 15},
    {"id": "instance-1", "connected": 14},
    {"id": "instance-2", "connected": 13}
  ]
}

How it works:

  • Parent spawns N child processes
  • Each child gets own data dir (~/.conduit/instance-N/)
  • Parent aggregates stats, writes combined stats.json
  • Clients/bandwidth split evenly across instances

Why subprocesses:
psiphon.SetNoticeWriter() is a global singleton. In-process multi-instance causes stats N-counting.

Solves the psiphon.SetNoticeWriter() global singleton issue by spawning
separate conduit processes instead of running instances in-process.

Changes:
- Add --multi-instance flag to spawn N child processes (1 per 100 clients)
- Add --instances flag for explicit instance count control
- New multi.go: subprocess spawner with stats aggregation
- Export FormatBytes/FormatDuration for use by multi-instance aggregator
- Parent captures child STATS lines and prints AGGREGATE every 10s
- Stats JSON includes per-instance breakdown
- Final stats write on graceful shutdown

Each child process has its own data directory with separate keys,
avoiding the global SetNoticeWriter conflict that caused N-counted stats.

Usage:
  conduit start --multi-instance -m 200  # spawns 2 instances
  conduit start --instances=3 -m 300     # explicit 3 instances
@amirhnajafiz
Copy link

@paradixe Why not just run separate instances as independent processes (or multiple docker containers)? In the proposed solution, if one of the sub-processes goes down (due to a network spike or any issue that causes it to reset), then all instances go down with it. Maybe a better retention policy would help, so clients aren’t lost when one instance fails?

@paradixe
Copy link
Author

@paradixe Why not just run separate instances as independent processes (or multiple docker containers)? In the proposed solution, if one of the sub-processes goes down (due to a network spike or any issue that causes it to reset), then all instances go down with it. Maybe a better retention policy would help, so clients aren’t lost when one instance fails?

@amirhnajafiz

  • You can run separate containers/systemd units - that works fine. This just adds convenience (single command, unified stats).

  • Re failure isolation: children are independent - if one crashes, others keep running. Only risk is parent dying (then all children die). Could add auto-restart for failed children if needed.

  • Re "why subprocesses at all": see feat(cli): add run-multi command for parallel instances #7 is a global singleton. In-process multi-instance causes stats to be counted N times. Subprocesses give each instance its own globals.

  • Re "why no docker": overhead and compatibility with forks of this repo and people who are continuously running the stuff. We don't necessarily want to push people who don't want docker to add it, and further, docker itself can be fickle in some setups. IMO, docker comes after this is figured, and running one container with multi-instances > multiple conts with one instance.

  • Re "clients aren't lost": Client's are fairly fickle as is. Crashes are generally rare unless OOM or CPU bottleneck (even then, rare).

I appreciate the comment tho and it could be thought to:

  • Auto-restart failed children in the parent
  • Or: just document "use systemd with Restart=always"

But compatibility with previous releases makes huge / substantial change difficult to communicate and not really a fit for a repo with this many users.

wg.Add(1)
go func(idx int, dataDir string) {
defer wg.Done()
if err := m.runInstance(ctx, idx, dataDir, clientsPerInstance, bandwidthPerInstance); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paradixe Thanks for explaining it in detail. I totally understand why you needed subprocesses. My only concern was the restart policy—for example, here, instead of submitting the error to errChan, causing a return and terminating the parent, it might be better to just restart.

That said, I assume if the coordinator goes down, it doesn’t really matter. I’m also assuming the coordinator is just a goroutine that doesn’t do much, but it looks like a failure in one subprocess could stop the parent goroutine, and hence bring down the main process.

If you think this won’t cause any issues, then feel free to ignore my comment.

Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be completely honest, I'm a noob at Go and I very much appreciate you looking out. @ssmirr knows what he wants and he may have written something already so we'll see if this one even makes the cut

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paradixe This patch is actually pretty good. I really appreciate your effort and look forward to having it merged ASAP.

Copy link

@aburnap aburnap Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amirhnajafiz @paradixe - thank you for the work, I was also beginning on a multi-instance fork but you all have done the work.

One comment and many questions:

  • Comment: each instance in the subprocess uses the same entry/exit network interface
  • Question: Same question as @amirhnajafiz, why not instead spin up multiple separate "instances" (containers, LXCs, VM, etc.)? What are the tradeoffs?
    • In my head this approach seems to better allocate CPU resources per "instance", which it seems is the bottleneck (we've spun up servers across ASNs and 1G/10G connections, seems CPU laden?
    • is this your experience too?)
  • But this may come at the tradeoff of easier to blacklist nodes (fqdn, IP). We've had that happen already, as I'm sure you all have experienced, earlier running servers using different protocols.

Thank you all again. This whole conduit project is incredible work, during what must be very personal time.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys I'm so happy for all the interest in this repo and the feedback I'm getting. Thank y'all!


Separate container instances works totally ok! I actually tested with that as a proof of concept before getting the "multi instance" idea out there. And please note, I'm keeping single instance mode so you can use that if it works better for you in a multi-container setup.

Problem is I don't want this CLI to manage containers, because I want to keep it as close as possible to upstream code (the goal is to stay up to date with the features they add + maybe contribute from this fork up if we think a feature is something that fits their philosophy).

But I think there is value in having a multi instance setup, and simplest way we can achieve this is on subprocess. A lot of people really just want to run a binary and not mess with Docker (specially if running on macOS / Windows).

@aburnap It would be great if you could share more about blacklisted node you experienced (I haven't seen this happen) -- feel free to share here or you can DM me on X, whichever works best.

@aburnap yes from what I've seen, usually the CPU is the bottleneck on most servers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssmirr BTW, the code doesn't need to manage the containers/processes. We can patch a small shell script to run multiple instances. Or even use docker-compose. No need for code change.

@ssmirr
Copy link
Owner

ssmirr commented Jan 28, 2026

Hey guys (@paradixe @amirhnajafiz)! It's awesome to see everyone is helping with the feature developments here. I really appreciate it!

I'm review the PR now and had some (somewhat-more-opinionated) WIP locally. I think I might mix the two and test it. I will keep you posted how it goes.

@amirhnajafiz
Copy link

@ssmirr Don't hesitate to ask if you need any help.

- Move regex compilation to package level (fix N-counting perf issue)
- Show per-instance stats only with -v flag
- Properly propagate parent verbosity to child processes
- Always show connection events and errors regardless of verbosity
- Add BytesPerSecondToMbps constant (replaces hardcoded 125000)
- Move byteMultipliers map to package level to avoid repeated allocation
Improves child process management with graceful shutdown, proper I/O handling,
and better concurrency patterns.

Changes:

1. Graceful Shutdown with Timeout (Two-Phase Pattern):
   - Use CommandContext to send SIGTERM when parent context cancelled
   - Child process receives signal and can cleanup gracefully
   - Force-kill with SIGKILL after 2s timeout if child still running
   - Prevents abrupt termination and allows connection cleanup

2. WaitGroup for I/O Goroutines:
   - Promote WaitGroup from local variable to struct field
   - Track stderr/stdout reader goroutines with Add/Done
   - Parent waits for all I/O to complete before exiting
   - Prevents truncated output during shutdown

3. Increased Scanner Buffer Size:
   - Create newLargeBufferScanner() helper function
   - 64KB initial buffer, 1MB maximum (vs 64KB default limit)
   - Prevents scanner failure on long lines (stack traces, verbose JSON)
   - Eliminates duplicate buffer configuration code (DRY)

4. Proper Stream Separation:
   - Write child stderr to os.Stderr (was os.Stdout)
   - Write child stdout to os.Stdout (unchanged)
   - Follows Unix convention: stderr=diagnostics, stdout=data
   - Enables separate redirection: conduit 2>errors.log 1>output.log

5. Concurrent I/O Reading:
   - Move stdout reading to goroutine (was blocking in main flow)
   - Both stderr and stdout now read concurrently
   - Prevents potential deadlock if child writes to both streams
   - Symmetric design improves code clarity

Shutdown Flow:
  User signals shutdown (Ctrl+C)
     ↓
  Context cancelled → CommandContext sends SIGTERM to child
     ↓
  Child receives signal → starts graceful cleanup
     ↓
  Monitor goroutine waits 2 seconds
     ↓
  If child still alive → Force kill (SIGKILL)
     ↓
  cmd.Wait() returns → I/O goroutines finish reading
     ↓
  m.wg.Wait() blocks until all I/O complete
     ↓
  Parent exits cleanly

Constants Added:
  - ShutdownTimeout = 2s (grace period before force-kill)
Replace time-based periodic printing (every 10s) with event-driven printing
that only outputs when stats actually change. Eliminates spam during idle
periods and provides immediate feedback when stats update.
@ssmirr
Copy link
Owner

ssmirr commented Jan 28, 2026

3 hours later 😅

It would be great if you all can also take a look at my changes. (Warning: it is a bit opinionated in some parts of it).

Will merge later in the day and make a build for a few to test before doing final release.

@ssmirr
Copy link
Owner

ssmirr commented Jan 28, 2026

Here is a experimental build from this PR, please give multi-instance a try and let me know if you find any issues:

https://github.com/ssmirr/conduit/releases/tag/experimental-pr11

Copy link

@amirhnajafiz amirhnajafiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssmirr Hope it helps.

…ality

This commit addresses critical feedback from code review:

1. Simplify clientsPerInstance calculation using max()
   - Replace 4-line if-statement with idiomatic max() builtin
   - Ensures minimum of 1 client per instance more concisely

2. Add missing scanner error handling
   - Check scanner.Err() after both stdout and stderr scan loops
   - Prevents silent failures on buffer overflow or I/O errors
   - Follows Go scanner best practices

3. Fix mutex/channel race condition in stats processing
   - CRITICAL: Eliminates lock contention under heavy load
   - Root cause: parseInstanceOutput() held m.mu while signaling
     m.statsChanged, causing aggregateAndPrintStats() to block
     waiting for the same mutex
   - Solution: Release mutex BEFORE signaling channel
   - Changed parseStatsLine() to return bool instead of signaling
   - Moved channel signaling to parseInstanceOutput() after unlock

Race condition flow:
  Before:
    parseInstanceOutput() [HOLDS m.mu]
      → parseStatsLine() signals m.statsChanged
      → aggregateAndPrintStats() wakes, blocks on m.mu
      → Lock contention cascade under load

  After:
    parseInstanceOutput() [HOLDS m.mu]
      → parseStatsLine() returns changed flag
      → m.mu.Unlock() [RELEASES LOCK]
      → Signal m.statsChanged (no lock held)
      → aggregateAndPrintStats() acquires m.mu immediately
      → No contention

This follows the critical concurrency pattern: separate data access
(under mutex) from coordination (channel signaling) to prevent
cascading lock contention in high-throughput scenarios.
@ssmirr
Copy link
Owner

ssmirr commented Jan 28, 2026

This is great, thank you so much for the feedback @amirhnajafiz ! Appreciate your help, changes are applied.

Running a build here: https://github.com/ssmirr/conduit/actions/runs/21454769787/job/61792608490

@amirhnajafiz
Copy link

@ssmirr @paradixe Nice work all.

@ssmirr
Copy link
Owner

ssmirr commented Jan 28, 2026

Great collaboration from everyone!

@paradixe
Copy link
Author

paradixe commented Jan 28, 2026

Previous version (2fd31d4) vs experimental-pr11

Metric Jan 27 (2fd31d4) Jan 28 (experimental-pr11)
Instances 14 45 (multi-instance)
Peak concurrent clients 222 161
Cumulative upload 1.7 TB 3.1 TB
Cumulative download 20.5 TB 37.5 TB

Notes:

  • experimental-pr11 tested with -m 50/75/100 mix across 13 servers
  • ~6 hour runtime, stable operation, no crashes
  • Traffic roughly doubled with multi-instance enabled

No crashes, stable operation. Seems to me that the broker is choosy and/or we have a lot of nodes which makes my nodes less desirable (all US based). If someone can test EU based that'd be more info. Thanks to everyone for their contributions and thoughts.

@ssmirr
Copy link
Owner

ssmirr commented Jan 28, 2026

@paradixe could you update your comment above to add numbers from previous version on same servers for comparison?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants