Async copy of self-energyy from Device to Host by gauravharsha · Pull Request #9 · Green-Phys/green-gpu

gauravharsha · 2025-10-29T05:01:08Z

The PR proposes asynchronous logic for the copy of self-energy results from D2H.

In the current version, we use cudaMemcpy which is a blocking call, i.e., all subsequent qkpt workers wait for the data transfer before starting to read integrals from filesystem.

The Self-energy computation blocks become highly efficient and keep the GPU busy most of the time.

Copilot

Pull Request Overview

This PR refactors the self-energy (Sigma) computation workflow in GPU-accelerated GW calculations to improve asynchronous data handling and memory management. The changes introduce a cleanup mechanism for asynchronous device-to-host data transfers and separate cublas handles for each qkpt worker stream.

Removed Sigmak_stij_host parameter from computation functions, introducing a deferred cleanup pattern
Added cleanup() method and cleanup_req_ flag to manage asynchronous Sigma data transfers
Introduced separate cublas handles for each qkpt worker to enable concurrent operations

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
src/green/gpu/cugw_qpt.h	Added cleanup infrastructure (methods, member variables) and helper functions for managing qkpt workers; modified function signatures to remove host pointer parameters
src/cugw_qpt.cu	Implemented cleanup logic, moved Sigma copy operations into cleanup method, replaced synchronous memcpy with async version
src/green/gpu/cu_routines.h	Added `_qkpt_handles` vector to store separate cublas handles for qkpt streams
src/cu_routines.cu	Integrated new cleanup pattern into solve workflow, created separate cublas handles for each qkpt worker

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/green/gpu/cugw_qpt.h

src/cugw_qpt.cu

src/green/gpu/cugw_qpt.h

… small jobs

src/green/gpu/cugw_qpt.h

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gauravharsha · 2025-11-06T05:56:20Z

Build will fail until PR#3 on Green-Utils is merged.

src/cugw_qpt.cu

egull

Cool, thanks. I think this is ready to merge. It would be easier to review if there were more pull requests about smaller changes (e.g. timers separate from async copy and k-index buffering), but this is not a perfect world...

src/green/gpu/cu_routines.h

src/green/gpu/cugw_qpt.h

src/green/gpu/gpu_kernel.h

egull · 2025-11-06T15:43:44Z

That works. Choose whatever you think is clearest.

…

On Thu, Nov 6, 2025, 4:36 PM Gaurav Harsha ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/green/gpu/gpu_kernel.h <#9 (comment)>: > @@ -66,7 +66,7 @@ namespace green::gpu { */ inline void set_shared_Coulomb() { if (_coul_int_reading_type == as_a_whole) { - statistics.start("Read"); + statistics.start("Allocate shared Coulomb"); I think the event "Read" was a bit of a misnomer. We do not do any reading operations in the scope, only allocate shared memory space to load the ntegrals. Alternatively, I can rename it to "read whole integral". That way the event timing will reflect allocating + reading together — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABW32RN67RDCMM2C7XFCAHT33NTH5AVCNFSM6AAAAACKQKQNEGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTIMRYHA3DMOBTGU> . You are receiving this because your review was requested.Message ID: ***@***.***>

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/gw_gpu_kernel.cpp

src/green/gpu/gw_gpu_kernel.h

src/green/gpu/cugw_qpt.h

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cugw_qpt.cu

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cugw_qpt.cu

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gauravharsha added 14 commits August 16, 2025 22:38

new async self-energy update in qkpt strems

c4f0c74

fix minor bugs

1e76013

more bugs

b386b6d

update definition of the cleanup function

5fa2212

add profile arguments

f8bc8ec

modify profile colors

c15e7b4

adding different handles for each qkpt stream to encourage concurrency

90a761e

add profile statement for polarization retrieval

7f7ff16

typo in nvtx3 function

48e610c

use push/pop range for Pqk_tQP function

1118a67

remove nvtx tags

304e158

add comments and clean up function signatures

5dd3a8e

remove unnecessary stream synchronization calls from obtain_Pq functions

259d3f5

more refactoring

0592ace

gauravharsha requested review from Copilot and egull October 29, 2025 05:01

Copilot AI reviewed Oct 29, 2025

View reviewed changes

redo Pqk_tQP function -- checking stream readiness has some effect on…

d2fffab

… small jobs

WSLinkK approved these changes Nov 3, 2025

View reviewed changes

src/green/gpu/cugw_qpt.h Show resolved Hide resolved

gauravharsha and others added 7 commits November 5, 2025 16:28

Update src/green/gpu/cugw_qpt.h

1aa3a28

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/green/gpu/cugw_qpt.h

552b7e2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/green/gpu/cugw_qpt.h

06cc9ac

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

address Copilot reviews

fb6a62c

rename share coulomb allocation event

3666025

clean up and modify estimation of flops achieved

91ebc4b

fix typo in wait_and_clean_qkpts

2c68663

gauravharsha requested a review from WSLinkK November 6, 2025 05:56

egull reviewed Nov 6, 2025

View reviewed changes

src/cugw_qpt.cu Outdated Show resolved Hide resolved

egull approved these changes Nov 6, 2025

View reviewed changes

src/green/gpu/cu_routines.h Outdated Show resolved Hide resolved

src/green/gpu/cugw_qpt.h Outdated Show resolved Hide resolved

src/green/gpu/cugw_qpt.h Outdated Show resolved Hide resolved

src/green/gpu/gpu_kernel.h Outdated Show resolved Hide resolved

rename qkpt_handles

a7f6141

naming and documentation updates; use reset events at end of solve cycle

6ad23d9

gauravharsha requested a review from Copilot November 6, 2025 16:47

fix function call for flops_achieved()

8e0e0ef

Copilot AI reviewed Nov 6, 2025

View reviewed changes

gauravharsha added 2 commits November 6, 2025 11:58

documentation fixes from copilot review

422947e

documentation and logic fix thanks to copilot

d22fb80

gauravharsha requested a review from Copilot November 6, 2025 17:00

Copilot AI reviewed Nov 6, 2025

View reviewed changes

src/cugw_qpt.cu Outdated Show resolved Hide resolved

gauravharsha mentioned this pull request Nov 6, 2025

Improve efficiency in memory trace and device->host mem-copy #5

Closed

gauravharsha added 7 commits November 6, 2025 14:32

optimize host pinned memory buffers

d07819e

use only specific cuda arch for pauli-master gpu quadro p1000

25acf9c

undo update for test.yaml

c01e1c8

fix mpi_reduce for performance metrics

1ff204a

fix sigmak_stij_buffer allocation - same memory as Gk is conflicting

c36ee81

modify complexity estimation and performance metrics

7bca915

minof fix in complexity estimation

86bc13c

gauravharsha requested a review from Copilot November 6, 2025 21:39

Copilot AI reviewed Nov 6, 2025

View reviewed changes

src/cugw_qpt.cu Outdated Show resolved Hide resolved

src/cugw_qpt.cu Show resolved Hide resolved

gauravharsha and others added 2 commits November 6, 2025 16:44

Update src/cugw_qpt.cu

1bbc908

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/cugw_qpt.cu

69c13e5

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gauravharsha merged commit d7b9d7d into Green-Phys:main Nov 6, 2025
1 check passed

Conversation

gauravharsha commented Oct 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gauravharsha commented Nov 6, 2025

Uh oh!

Uh oh!

egull left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

egull commented Nov 6, 2025 via email

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants