Skip to content

Conversation

@jeffdaily
Copy link
Collaborator

Decrease HCC_ASYNCOPS_SIZE from 16k to 1k.
HCC_KERNARG_BUFFER_SIZE is now an environment variable.
HCC_KERNARG_POOL_SIZE is now an environment variable.

@jeffdaily jeffdaily requested a review from scchan January 25, 2020 00:34
@jeffdaily jeffdaily changed the title cincrease KERNARG_BUFFER_SIZE from 512 to 4k increase KERNARG_BUFFER_SIZE from 512 to 4k Jan 25, 2020
@jeffdaily
Copy link
Collaborator Author

Since this PR also reduces the asyncops size, it could replace #1261 .

@emankov
Copy link
Contributor

emankov commented Jan 25, 2020

Justification for all the numbers is needed.

@jeffdaily
Copy link
Collaborator Author

@emankov

  1. CUDA default kernarg size is 4k. __global__ function parameters are passed to the device via constant memory and are limited to 4 KB. From https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#function-parameters .
  2. PyTorch translate model uses a number of kernels with kernargs > 512 bytes, the current default. Changing the default kernarg buffer size results in a 30% performance improvement since kernargs are no longer allocated on demand.
  3. Since kernarg buffer size is increased by 8 times, HCC_ASYNCOPS_SIZE is reduced by 16 times to keep memory use roughly the same in the worst case, assuming two streams fully queuing to the same device.

@jeffdaily
Copy link
Collaborator Author

@emankov The most important change in this PR is the increase in the default kernarg buffer size. If needed, would such a change be acceptable without the other changes?

@emankov
Copy link
Contributor

emankov commented Jan 27, 2020

@jeffdaily, thank you for explanation. Could you please add just a few words in comments?

@jeffdaily
Copy link
Collaborator Author

@emankov comments added in commit f0e2b40.

@jeffdaily jeffdaily force-pushed the increase_kernarg_buffer_size branch from f0e2b40 to d6b79a3 Compare January 27, 2020 20:57
@scchan scchan merged commit 05d7af3 into ROCm:clang_tot_upgrade Jan 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants