Optimize nt_batch automatically to improve performance by gauravharsha · Pull Request #10 · Green-Phys/green-gpu

gauravharsha · 2025-11-07T20:44:43Z

GW kernel uses cublas::gemm_strided_batched which performs the best when batch size is large.
This PR proposes automatic optimization of nt_batch to achieve high optimal performance, with the following logic:

Set default value: nt_batch = 0
If nt_batch == 0, optimize value for better performance.
Otherwise, use specified value.

Optimization logic:

Keep at least 2 streams, then maximize nt_batch.
If optimized nt_batch is large but n_tau - nt_batch < n_tau / 4 (i.e., second batch is small), we opt instead for nt_batch = n_tau / 2 -- this is debatable but for both small and large applications, this shouldn't make much of a difference.

…n-gpu-buff into stream-cleanup

Copilot

Pull Request Overview

This PR adds automatic optimization of the nt_batch parameter for the CUDA GW solver to maximize GPU performance. Previously, users had to manually specify this value, which required knowledge of GPU memory constraints and optimal batch sizing.

Changed default nt_batch from 1 to 0 (triggers automatic optimization)
Added optimize_ntbatch() function to calculate optimal batch size based on available GPU memory
Extended test coverage to verify both automatic optimization and manual specification modes

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File	Description
src/green/gpu/gpu_factory.h	Updated `nt_batch` parameter default from 1 to 0 and clarified documentation
src/green/gpu/gw_gpu_kernel.h	Added declaration for `optimize_ntbatch()` function with documentation
src/gw_gpu_kernel.cpp	Implemented `optimize_ntbatch()` logic and integrated it into memory checking; updated warning/error messages
test/cu_solver_test.cpp	Extended test coverage to test both automatic optimization (nt_batch="0") and manual setting (nt_batch="1")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/gw_gpu_kernel.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…or very large jobs

Copilot

Pull Request Overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/gw_gpu_kernel.cpp

test/cu_solver_test.cpp

src/gw_gpu_kernel.cpp

egull

Looks good to me. Print nt_batch so we have it in the log (ignore if it's done elsewhere), and then let's try it out!

src/gw_gpu_kernel.cpp

gauravharsha · 2025-11-08T15:57:22Z

Merging now. More tests and benchmarks under way, but the output of unit tests for Hydrogen on ancient Quadro P1000 already show improvements with optimization:

Low memory mode   |   Precision   |   GFLOPS w/ optimization  |   GFLOPS w/ nt_batch=1
-------------------------------------------------------------------------------------------
No                |   Double      |   2.81423                 |   0.505108
Yes               |   Double      |   2.88435                 |   0.514456
No                |   Single      |   23.6168                 |   1.44349
Yes               |   Single      |   27.5014                 |   1.44404

gauravharsha added 4 commits November 7, 2025 15:08

auto optimization of nt_batch

123058b

Merge branch 'stream-cleanup' of https://github.com/gauravharsha/gree…

20f3d32

…n-gpu-buff into stream-cleanup

fix typo in nt_batch optimize

a68a6e9

Fix logic for printiing correct status in nt_batch optimization

282b5ce

gauravharsha requested review from WSLinkK, Copilot and egull November 7, 2025 20:44

Copilot AI reviewed Nov 7, 2025

View reviewed changes

gauravharsha and others added 4 commits November 7, 2025 16:03

Update src/gw_gpu_kernel.cpp documentation

6332382

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

address the case when nt_batch cannot be more than 1

72604c0

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update logic for print messages in src/gw_gpu_kernel.cpp

3404e0f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

copilot documentation suggestions and better handling of exceptions f…

0a5e135

…or very large jobs

gauravharsha requested a review from Copilot November 7, 2025 21:33

Copilot AI reviewed Nov 7, 2025

View reviewed changes

src/gw_gpu_kernel.cpp Show resolved Hide resolved

src/gw_gpu_kernel.cpp Show resolved Hide resolved

test/cu_solver_test.cpp Show resolved Hide resolved

src/gw_gpu_kernel.cpp Show resolved Hide resolved

WSLinkK approved these changes Nov 7, 2025

View reviewed changes

update explanation for nt_batch parameter for CLI

7a7acfc

WSLinkK reviewed Nov 7, 2025

View reviewed changes

src/gw_gpu_kernel.cpp Show resolved Hide resolved

egull reviewed Nov 8, 2025

View reviewed changes

src/gw_gpu_kernel.cpp Show resolved Hide resolved

egull self-requested a review November 8, 2025 11:09

egull approved these changes Nov 8, 2025

View reviewed changes

clean up output statements

dc7809d

gauravharsha merged commit 010479c into Green-Phys:main Nov 8, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize nt_batch automatically to improve performance#10

Optimize nt_batch automatically to improve performance#10
gauravharsha merged 10 commits intoGreen-Phys:mainfrom
gauravharsha:stream-cleanup

gauravharsha commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

egull left a comment

Uh oh!

Uh oh!

gauravharsha commented Nov 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gauravharsha commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

egull left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gauravharsha commented Nov 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants