Skip to content

Conversation

@ilectra
Copy link
Collaborator

@ilectra ilectra commented Mar 5, 2025

FIxes #21

Still needs a bit of work:

  • Propagate same changes to Cshift_mpi.h
  • Test and profile with MPI as well (is --grid 16.16.16.32 --mpi 1.1.1.2 ok?`
  • Clean up stuff related to profiling, memory logging, and regression test

@ilectra ilectra self-assigned this Mar 5, 2025
Copy link
Collaborator

@qiUip qiUip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great and very concise! I'm impressed by how much you managed to minimise the changes.

Only "comments" are questions to help me understand things correctly, and to clean up the branch before we merge, which I will do once you give me the go ahead as they are messes I introduced for the profiling branch.

@ilectra
Copy link
Collaborator Author

ilectra commented Mar 21, 2025

To check MPI performance: -grid 32.32.32.48 or -grid 32.32.32.64, for -mpi 1.1.1.4 (1 node) and -mpi 1.1.2.4 (2 nodes).

@ilectra
Copy link
Collaborator Author

ilectra commented Mar 21, 2025

@asifsamiarain , benchmarks to run (make sure you checkout the latest version of this branch):
All MPI configurations to be run with -grid 32.32.32.64, except configuration 1 (no MPI), which is to be run with -grid 24.24.24.48

MPI configurations:

  1. -mpi 1.1.1.1 (no MPI)
  2. -mpi 1.1.1.4 (1 node)
  3. -mpi 1.1.2.4 (2 nodes).

Benchmarks to run:
sp2n w/ MPI configurations 1, 2, 3
su3Mobi w/ MPI configurations 1
su3Prod w/ MPI configurations 1, 2, 3
su3 w/ MPI configurations 1, 2, 3

Let me know if any of this doesn't make sense!

@asifsamiarain
Copy link
Collaborator

asifsamiarain commented Mar 25, 2025

The pgda032 is upstream develop (hash: 3d01486 & dated: 20250306) and pgda034 is also upstream develop but having relevant changes adopted for Grid/cshift/Cshift_common.h, Grid/cshift/Cshift_mpi.h, Grid/cshift/Cshift_table.cc (till hash: 40ee258 & dated: 20250321).

While both experiment ids also possess the changes mentioned via the PRs (paboyle#465 and paboyle#471 please be aware).

A bit about terminology (just in case):
sp2n: Test_hmc_Sp_WilsonFundFermionGauge
su3: Test_hmc_WilsonFermionGauge
su3Mobi: Mobius2p1f
su3Prod: Test_WilsonFlow

[dc-asif1@tursa-login1 pgda032]$ mlc gdchk def | grep "LOG\|sp2n"; mlc gdchk def | grep "LOG\|su3-"; mlc gdchk def | grep "LOG\|su3M"; mlc gdchk def | grep "LOG\|su3P"
                                                                        LOG_NAME          SEC  ckpoint_rng  ckpoint_lat            plaq/Plaquette        NODES
              gd_tursa-gnucuda_001-008_sp2n-24.24.24.48-1.1.1.1_defe00_53180.log  5688.424348           NA           NA        0.5265701266420175   tu-c0r1n90
              gd_tursa-gnucuda_004-008_sp2n-32.32.32.64-1.1.1.4_deff00_53432.log  7139.306419           NA           NA        0.5263363789417429   tu-c0r3n30
              gd_tursa-gnucuda_008-008_sp2n-32.32.32.64-1.1.2.4_deff00_55304.log  3735.331673           NA           NA        0.5263363789417429 tu-c0r0n[72,75]
                                                                        LOG_NAME          SEC  ckpoint_rng  ckpoint_lat            plaq/Plaquette        NODES
               gd_tursa-gnucuda_001-008_su3-24.24.24.48-1.1.1.1_defe00_53178.log  1961.201352     79302b9f     e5021321               0.457048226   tu-c0r0n72
               gd_tursa-gnucuda_004-008_su3-32.32.32.64-1.1.1.4_deff00_53177.log  2439.995319     eb46abe2     eb995f1b               0.457036544   tu-c0r0n72
               gd_tursa-gnucuda_008-008_su3-32.32.32.64-1.1.2.4_deff00_55363.log  1314.512590     eb46abe2     ae1b3afe               0.457036544 tu-c0r0n[72,75]
                                                                        LOG_NAME          SEC  ckpoint_rng  ckpoint_lat            plaq/Plaquette        NODES
           gd_tursa-gnucuda_001-008_su3Mobi-24.24.24.48-1.1.1.1_defe01_54115.log 14093.659630     74bf0c85     cd6b488e               0.597430431   tu-c0r7n45
                                                                        LOG_NAME          SEC  ckpoint_rng  ckpoint_lat            plaq/Plaquette        NODES
           gd_tursa-gnucuda_001-008_su3Prod-24.24.24.48-1.1.1.1_defe01_54239.log   168.958267           NA           NA         0.996634196056962   tu-c0r0n72
           gd_tursa-gnucuda_004-008_su3Prod-32.32.32.64-1.1.1.4_deff01_54089.log   225.550086           NA           NA         0.996599127390423   tu-c0r0n72
           gd_tursa-gnucuda_008-008_su3Prod-32.32.32.64-1.1.2.4_deff01_55153.log   133.869080           NA           NA         0.996599127390423 tu-c0r2n[27,30]
[dc-asif1@tursa-login1 pgda034]$ mlc gdchk def | grep "LOG\|sp2n"; mlc gdchk def | grep "LOG\|su3-"; mlc gdchk def | grep "LOG\|su3M"; mlc gdchk def | grep "LOG\|su3P"
                                                                        LOG_NAME          SEC  ckpoint_rng  ckpoint_lat            plaq/Plaquette        NODES
              gd_tursa-gnucuda_001-008_sp2n-24.24.24.48-1.1.1.1_defe00_55154.log  5461.260282           NA           NA        0.5265701266420175   tu-c0r1n90
              gd_tursa-gnucuda_004-008_sp2n-32.32.32.64-1.1.1.4_deff00_55155.log  7026.832860           NA           NA        0.5263363789417429   tu-c0r1n90
              gd_tursa-gnucuda_008-008_sp2n-32.32.32.64-1.1.2.4_deff00_55305.log  3681.235152           NA           NA        0.5263363789417429 tu-c0r0n[72,75]
                                                                        LOG_NAME          SEC  ckpoint_rng  ckpoint_lat            plaq/Plaquette        NODES
               gd_tursa-gnucuda_001-008_su3-24.24.24.48-1.1.1.1_defe00_55157.log  1837.455302     79302b9f     e5021321               0.457048226   tu-c0r2n24
               gd_tursa-gnucuda_004-008_su3-32.32.32.64-1.1.1.4_deff00_55158.log  2372.930123     eb46abe2     eb995f1b               0.457036544   tu-c0r1n90
               gd_tursa-gnucuda_008-008_su3-32.32.32.64-1.1.2.4_deff00_55362.log  1285.399552     eb46abe2     ae1b3afe               0.457036544 tu-c0r0n[72,75]
                                                                        LOG_NAME          SEC  ckpoint_rng  ckpoint_lat            plaq/Plaquette        NODES
           gd_tursa-gnucuda_001-008_su3Mobi-24.24.24.48-1.1.1.1_defe01_55252.log 14082.409955     74bf0c85     cd6b488e               0.597430431   tu-c0r5n39
                                                                        LOG_NAME          SEC  ckpoint_rng  ckpoint_lat            plaq/Plaquette        NODES
           gd_tursa-gnucuda_001-008_su3Prod-24.24.24.48-1.1.1.1_defe01_55161.log   138.587976           NA           NA         0.996634196056962   tu-c0r2n33
           gd_tursa-gnucuda_004-008_su3Prod-32.32.32.64-1.1.1.4_deff01_55162.log   210.947990           NA           NA         0.996599127390423   tu-c0r2n33
           gd_tursa-gnucuda_008-008_su3Prod-32.32.32.64-1.1.2.4_deff01_55163.log   126.551815           NA           NA         0.996599127390423 tu-c0r2n[27,30]
sp2n: ~3.99%, ~1.58%, ~1.45% reduction
su3: ~6.31%, ~2.75%, ~2.21% reduction
su3Mobi: ~0.08%
su3Prod: ~17.98%, ~6.47%, ~5.47% reduction

@ilectra
Copy link
Collaborator Author

ilectra commented Mar 28, 2025

Closing this PR in favour of paboyle#476 , which is clean and opened against the upstream.

@ilectra ilectra closed this Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate the origins of the small memory transfers from Staple to Cshift

4 participants