Skip to content

Performance: Avoid holding sendMu during Flush network writes#386

Merged
Sandertv merged 3 commits intoSandertv:masterfrom
HashimTheArab:performance-improvements-2
Feb 5, 2026
Merged

Performance: Avoid holding sendMu during Flush network writes#386
Sandertv merged 3 commits intoSandertv:masterfrom
HashimTheArab:performance-improvements-2

Conversation

@HashimTheArab
Copy link
Contributor

This PR removes a major send-path bottleneck by ensuring we don’t hold the hot sendMu while doing network writes. Flush() now quickly snapshots and clears the queued batch, then performs the expensive enc.Encode(...) write outside the queue lock. This dramatically reduces mutex contention under load (especially when the write path stalls/backpressures), improving throughput and reducing latency spikes.

Benchmarks show ~32%–99% throughput improvements, depending on workload: ~32% for WritePacket() under simulated slow writes, and up to ~99% for raw Write() when the write path stalls/backpressures.

Also fixes a bug in WritePacket which would appear when ConvertFromLatest returns multiple packets, it was reusing the same buffer across all packets.

Benchmarks

Results

image

Code

package minecraft

import (
	"io"
	"log/slog"
	"net"
	"sync"
	"sync/atomic"
	"testing"
	"time"

	"github.com/sandertv/gophertunnel/minecraft/protocol/packet"
)

type dummyAddr string

func (a dummyAddr) Network() string { return "bench" }
func (a dummyAddr) String() string  { return string(a) }

// discardConn is a net.Conn that discards writes and optionally sleeps to simulate
// a slow network write path (kernel buffers, encryption overhead, etc.).
type discardConn struct {
	writeDelay time.Duration
	closed     atomic.Bool
}

func (c *discardConn) Read(_ []byte) (int, error) { return 0, io.EOF }
func (c *discardConn) Write(p []byte) (int, error) {
	if c.writeDelay > 0 {
		time.Sleep(c.writeDelay)
	}
	return len(p), nil
}
func (c *discardConn) Close() error { c.closed.Store(true); return nil }
func (c *discardConn) LocalAddr() net.Addr {
	return dummyAddr("local")
}
func (c *discardConn) RemoteAddr() net.Addr {
	return dummyAddr("remote")
}
func (c *discardConn) SetDeadline(time.Time) error      { return nil }
func (c *discardConn) SetReadDeadline(time.Time) error  { return nil }
func (c *discardConn) SetWriteDeadline(time.Time) error { return nil }

func newBenchConn(writeDelay time.Duration) *Conn {
	log := slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{Level: slog.LevelError}))
	return newConn(&discardConn{writeDelay: writeDelay}, nil, log, DefaultProtocol, 0, true)
}

func BenchmarkConnWrite_FlushParallel(b *testing.B) {
	cases := []struct {
		name  string
		delay time.Duration
	}{
		{name: "NoDelay", delay: 0},
		{name: "WriteDelay50us", delay: 50 * time.Microsecond},
	}
	for _, tc := range cases {
		b.Run(tc.name, func(b *testing.B) {
			conn := newBenchConn(tc.delay)
			payload := make([]byte, 512)

			var stop atomic.Bool
			var wg sync.WaitGroup
			wg.Add(1)
			go func() {
				defer wg.Done()
				for !stop.Load() {
					_ = conn.Flush()
				}
			}()

			b.ResetTimer()
			b.RunParallel(func(pb *testing.PB) {
				for pb.Next() {
					_, _ = conn.Write(payload)
				}
			})
			b.StopTimer()

			stop.Store(true)
			wg.Wait()
			_ = conn.Flush()
		})
	}
}

func BenchmarkConnWritePacket_FlushParallel(b *testing.B) {
	cases := []struct {
		name  string
		delay time.Duration
	}{
		{name: "NoDelay", delay: 0},
		{name: "WriteDelay50us", delay: 50 * time.Microsecond},
	}
	for _, tc := range cases {
		b.Run(tc.name, func(b *testing.B) {
			conn := newBenchConn(tc.delay)

			var stop atomic.Bool
			var wg sync.WaitGroup
			wg.Add(1)
			go func() {
				defer wg.Done()
				for !stop.Load() {
					_ = conn.Flush()
				}
			}()

			b.ResetTimer()
			b.RunParallel(func(pb *testing.PB) {
				pk := &packet.Text{
					TextType: packet.TextTypeChat,
					Message:  "hi",
				}
				for pb.Next() {
					_ = conn.WritePacket(pk)
				}
			})
			b.StopTimer()

			stop.Store(true)
			wg.Wait()
			_ = conn.Flush()
		})
	}
}

Copy link
Owner

@Sandertv Sandertv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The changes, although not pretty, seem reasonable. I do wonder what sort of difference we would see in real-world applications since Flush() isn't called very often normally. If you do have any data on that, please feel free to post.

buf.Reset()
conn.hdr.PacketID = converted.ID()
_ = conn.hdr.Write(buf)
l := buf.Len()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find on this btw.

@Sandertv Sandertv merged commit c839e60 into Sandertv:master Feb 5, 2026
1 check passed
@HashimTheArab HashimTheArab deleted the performance-improvements-2 branch February 5, 2026 14:55
@HashimTheArab
Copy link
Contributor Author

Thanks for the PR! The changes, although not pretty, seem reasonable. I do wonder what sort of difference we would see in real-world applications since Flush() isn't called very often normally. If you do have any data on that, please feel free to post.

@didntpot

@didntpot
Copy link
Contributor

didntpot commented Feb 6, 2026

Thanks for the PR! The changes, although not pretty, seem reasonable. I do wonder what sort of difference we would see in real-world applications since Flush() isn't called very often normally. If you do have any data on that, please feel free to post.

Before with 143 connections on a Dragonfly server

  • 93.05s total mutex delay
  • 85.20s (91.56%) in (*Conn).Flush
  • ~596ms per connection in Flush contention

File: aeris_practice
Build ID: 9e12147240340cdcdd0c828deef867efa7d2f42d
Type: delay
Time: 2026-01-24 17:39:51 GMT
Showing nodes accounting for 92.99s, 99.93% of 93.05s total
flat flat% sum% cum cum%
91.03s 97.82% 97.82% 91.03s 97.82% sync.(*Mutex).Unlock
1.97s 2.11% 99.93% 1.97s 2.11% runtime.unlock (partial-inline)
0 0% 99.93% 85.20s 91.56% github.com/sandertv/gophertunnel/minecraft.(*Conn).Flush

After with 136 connections on a Dragonfly server

  • 722.30ms total mutex delay
  • 55.70ms (7.71%) in (*Conn).Flush
  • ~0.4ms per connection in Flush contention

File: aeris_practice
Build ID: aee0b9245c95c2d928d9ad566ef325ce19c3ebf7
Type: delay
Time: 2026-01-24 18:41:40 GMT
Showing nodes accounting for 719.60ms, 99.63% of 722.30ms total
flat flat% sum% cum cum%
515.07ms 71.31% 71.31% 515.07ms 71.31% sync.(*Mutex).Unlock (partial-inline)
194.48ms 26.92% 98.24% 194.48ms 26.92% runtime.unlock (partial-inline)
10.05ms 1.39% 99.63% 10.05ms 1.39% runtime._LostContendedRuntimeLock
0 0% 99.63% 55.70ms 7.71% github.com/sandertv/gophertunnel/minecraft.(*Conn).Flush

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants