Skip to content

小彭老师《c++中的高性能并行编程与优化》的第7讲作业#12

Open
Paul-Laifan wants to merge 1 commit intoparallel101:mainfrom
Paul-Laifan:main
Open

小彭老师《c++中的高性能并行编程与优化》的第7讲作业#12
Paul-Laifan wants to merge 1 commit intoparallel101:mainfrom
Paul-Laifan:main

Conversation

@Paul-Laifan
Copy link
Copy Markdown

  • matrix_randomize: swap loop order (y outer, x inner) for sequential write
  • matrix_transpose: use 32x32 tiling to improve cache locality
  • matrix_multiply: reorder loops to (y,t,x), hoist rhs scalar, 44x speedup
  • matrix_RtAR: use static temp matrices to avoid repeated malloc/free
  • overall: 5.36s -> 0.15s, ~35.7x speedup

以 n=1120 的数据为基准对比(改进前 / 改进后):

函数 改进前 改进后 加速比 优化手段
matrix_randomize 0.000928s 0.000305s ~3x 交换循环顺序,x 在内层保证连续写
matrix_transpose 0.002528s 0.000579s ~4.4x 分块 Tiling(TILE=32)
matrix_multiply 0.904947s 0.020365s ~44x 循环重排为 (y,t,x),内层连续 + 标量提升
matrix_RtAR 1.80908s 0.044072s ~41x 以上优化的叠加 + static 临时变量
overall 5.357s 0.150s ~35.7x

- matrix_randomize: swap loop order (y outer, x inner) for sequential write
- matrix_transpose: use 32x32 tiling to improve cache locality
- matrix_multiply: reorder loops to (y,t,x), hoist rhs scalar, 44x speedup
- matrix_RtAR: use static temp matrices to avoid repeated malloc/free
- overall: 5.36s -> 0.15s, ~35.7x speedup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant