Skip to content

perf(core): 极限性能优化,路由匹配提升 ~50%,生产网络配置调优#198

Merged
hubertshelley merged 3 commits intomainfrom
perf/extreme-optimization
Mar 25, 2026
Merged

perf(core): 极限性能优化,路由匹配提升 ~50%,生产网络配置调优#198
hubertshelley merged 3 commits intomainfrom
perf/extreme-optimization

Conversation

@hubertshelley
Copy link
Copy Markdown
Member

Summary

  • Handler HashMap clone 消除self.clone().get()self.get() + Arc::clone,消除每请求的 HashMap 深拷贝
  • RouteTree 连接级共享:启动时一次性构建 Arc<RouteTree>,所有 HTTP/QUIC 连接共享,消除每连接重建
  • 中间件链优化:无中间件时快速路径跳过 Next 链构建;Next::call 中避免不必要的 Arc clone
  • not_found 零分配SilentError::NotFound 替代 BusinessError + String 分配
  • Request::from_parts 优化:消除 ..Self::default() 导致的多余 http::Request 构造+拆解
  • TCP_NODELAY:accept 后立即禁用 Nagle 算法,减少跨网络小包延迟
  • hyper Builder 调优:HTTP/1.1 pipeline_flush + HTTP/2 窗口/并发流优化
  • Release profile:LTO=fat, codegen-units=1, strip=symbols
  • no-tracing feature:benchmark 场景编译时关闭 tracing

Benchmark 结果(Criterion 路由匹配层,vs main)

测试项 main 优化后 提升率
simple route match 107.56 ns 53.19 ns 50.6%
route with middleware 107.90 ns 48.78 ns 54.8%
route with multiple middleware 108.56 ns 49.55 ns 54.4%
nested route match 132.32 ns 91.31 ns 31.0%
Complex GET /api/v1/users 133.66 ns 91.60 ns 31.5%
Complex POST /api/v1/posts 134.95 ns 88.97 ns 34.1%
1000 sequential requests 184.69 µs 131.35 µs 28.9%
deep nested 10 levels 207.72 ns 162.97 ns 21.5%
deep nested with params 299.38 ns 256.76 ns 14.2%
3 levels route match 147.58 ns 100.55 ns 31.9%
5 levels route match 168.73 ns 124.33 ns 26.3%
7 levels route match 184.70 ns 143.07 ns 22.5%
10 levels route match 207.63 ns 163.87 ns 21.1%

端到端 HTTP benchmark(bombardier, localhost):~14万 RPS,与 main 持平。
路由匹配层优化在端到端场景中被 I/O 和 runtime 开销稀释(路由匹配仅占总延迟 ~6%)。
TCP_NODELAY 和 hyper 调优在真实跨网络场景中生效。

Test plan

  • cargo check --all-features 零警告
  • cargo clippy --all-features 零警告
  • cargo nextest run --all-features 1780 测试全部通过
  • Criterion benchmark 对比验证
  • bombardier 端到端压测验证
  • 无破坏性 API 变更,用户代码零改动

- handler_trait: 消除 HashMap clone,改为直接引用 + Arc::clone
- route_connection: RouteTree 启动时一次性构建,所有连接共享 Arc<RouteTree>
- route_tree: 无中间件快速路径跳过 Next 链构建;not_found 零分配
- next: 中间件链调用避免不必要的 Arc clone
- quic/service: 同步适配 Arc<RouteTree> 共享机制
- Cargo.toml: release profile 添加 LTO/codegen-units=1/strip
- 新增 no-tracing feature,benchmark 场景编译时关闭 tracing

Benchmark 结果(vs main):
  简单路由: 107ns → 53ns (~50%)
  中间件路由: 108ns → 49ns (~55%)
  嵌套路由: 132ns → 91ns (~31%)
  1000 请求: 185µs → 131µs (~29%)
from_parts 原先通过 ..Self::default() 展开,内部会构造一个完整的
http::Request 再拆解为 Parts,而这个 Parts 随即被调用者传入的值覆盖。
现改为直接内联构造所有字段,避免无意义的中间分配。
- listener: TCP accept 后立即设置 set_nodelay(true),
  禁用 Nagle 算法减少跨网络小包延迟
- route_connection: 配置 hyper Builder 参数
  - HTTP/1.1: 启用 pipeline_flush 减少响应延迟
  - HTTP/2: 流窗口 1MB、连接窗口 2MB、自适应窗口、
    最大并发流 256,提升高延迟网络吞吐
@hubertshelley hubertshelley merged commit 36742a2 into main Mar 25, 2026
3 checks passed
@hubertshelley hubertshelley deleted the perf/extreme-optimization branch March 25, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant