Skip to content

Reassess mad/MEs split in tmad timing measurements (speed up the Bridge?) #546

@valassi

Description

@valassi

Reassess mad/MEs split in tmad timing measurements

There is clearly something fishy, that is obvious with ggttggg (but has some small signs in the other processes too): the "mad" part decreases from SIMD/none to higher SIMD exactly like the MEs do. I would tend to exclude that the Fortran is vectorized (also because the compiler flags that determine SIMD/none vs SIMD/avx2 are only in cudacpp I think).

Note also that this is VERY HIGH in c++/none and much lower in Fortran-only...

Most likely, there is some other part (in the bridge??) that is now wrongly attributed to Fortran "mad" and is instead C++ or CUDA.

It would be nice not only to assign it to the right component in timing, but also to speed this up...

See this for ggttggg
https://github.com/madgraph5/madgraph4gpu/blob/d72071a332b98f06dfd7cc6f625748143e8d4c50/epochX/cudacpp/tmad/summaryTable_ggttggg.txt

===========================================================================================================
|            | mad                        | mad               | mad               | sa/brdg   | sa/full   |
-----------------------------------------------------------------------------------------------------------
| ggttggg    | [sec] tot = mad + MEs      | [TOT/sec]         | [MEs/sec]         | [MEs/sec] | [MEs/sec] |
===========================================================================================================
| nevt/grid  |                       8192 |              8192 |              8192 |      8192 |      8192 |
| nevt total |                      90112 |             90112 |             90112 |  256*32*1 |  256*32*1 |
-----------------------------------------------------------------------------------------------------------
| FORTRAN    | 1226.61 =   5.01 + 1221.60 |  7.35e+01 (= 1.0) |  7.38e+01 (= 1.0) |       --- |       --- |
| CPP/none   | 1576.95 = 115.27 + 1461.68 |  5.71e+01 (x 0.8) |  6.16e+01 (x 0.8) |  7.46e+01 |  7.45e+01 |
| CPP/sse4   |  841.07 =  63.89 +  777.19 |  1.07e+02 (x 1.5) |  1.16e+02 (x 1.6) |  1.40e+02 |  1.40e+02 |
| CPP/avx2   |  412.51 =  33.40 +  379.11 |  2.18e+02 (x 3.0) |  2.38e+02 (x 3.2) |  2.89e+02 |  2.89e+02 |
| CPP/512y   |  375.07 =  30.44 +  344.63 |  2.40e+02 (x 3.3) |  2.61e+02 (x 3.5) |  3.23e+02 |  3.22e+02 |
| CPP/512z   |  342.04 =  31.18 +  310.86 |  2.63e+02 (x 3.6) |  2.90e+02 (x 3.9) |  3.14e+02 |  3.15e+02 |
| CUDA/8192  |   19.54 =   7.47 +   12.08 |  4.61e+03 (x62.8) |  7.46e+03 (x101.) |  7.47e+03 |  9.16e+03 |
===========================================================================================================
| nevt/grid  |                                                                    |     16384 |     16384 |
| nevt total |                                                                    |  512*32*1 |  512*32*1 |
--------------                                                                    -------------------------
| CUDA/max   |                                                                    |  9.35e+03 |  9.52e+03 |
|            |                                                                    |           |   (x129.) |
==============                                                                    =========================

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions