Increment dispatch signal before kernel dispatch in ggml-hsa.cpp#153
Open
aamarnat wants to merge 115 commits intoypapadop-amd:hsa-backendfrom
Open
Increment dispatch signal before kernel dispatch in ggml-hsa.cpp#153aamarnat wants to merge 115 commits intoypapadop-amd:hsa-backendfrom
aamarnat wants to merge 115 commits intoypapadop-amd:hsa-backendfrom
Conversation
* CMake support for HSA backend * Stubs for initializing HSA * Stubs for HSA buffers * Stubs for HSA and host buffers * Using new backend CMake declaration * Additional stubs for HSA backend * Formatting * Adding function to track unimplemented APIs
* Identifying memory pools * Support for buffer type alignment and max size * Cache memory properties * Comments * Using fixed-width integrals * Buffer allocation support
* Adding HSA backend to examples/simple-backend * Adding HSA backend to backend registration * Adding Eigen as a temporary matrix mulmat implementation * Support for device type reporting * Support for free and total memory reporting * Properly reporting which kernels are supported
* More function implementations, cleanups * Remove redundant information, catch exception
* Correcting comment * Add description from agent name * Implementation of backend get_device_description * Comments * Adding cpu backend as fallback for all ops * Marking which functions can be improved and correct guid * Remove needs-implementation marker on more functions * Hide cpu backend internally * Remove extra header
* HSA backend in test-mul-mat * HSA backend in gpt-2-backend * Offloading to CPU backend if operation not implemented
* Creating HSA queue * Zero-init all members * Adding signal support
* Add HSA backend to GPT-2 example * Remove CPU backend from HSA * Returning that it is host buffer for NPU memory * Adding CPY kernels and factoring out kernel code * Formatting, comments * Temporary storage for cpy * Extracting supports_op conditions * Renaming function
* Add option for CPU fallback in CMakeLists * Adding fallback to CPU backend if operation is not supported
* Add operation example * Using tensor count variable * Count source tensors and copy name * Detect if execution failed * Switching test to int32_t * add kernel using XRT * Aligning example size with kernel * Adding dev heap pool * Using HSA in add kernel * Using relaxed write to queue * Remove XRT dependency * Size independent test * Correct elements for kernel * Moving load functions to common.h * Using simplified AIE packet * Moving loading to a kernel registry * Adding constructor * Add kernel * Refactoring add script * Single name for PDI and instr.txt * Refactoring * Generalizing add.py * Adding dims * Comments and error checking * Modularize python script * CMake kernel generation * Remove magic numbers and use GGML data type naming * Adding a structure for NPU kernels and free function * Accepting only contiguous tensors for now * Stub for keeping loaded kernel in context * Passing device info as parameter, renaming contexts for easy filtering * Renaming variables * Reworking example * Using registry of kernels * Using HSA agent name for kernels * Using dladdr to get the kernel directory
* Using static instead of anonymous namespace * Handling exceptions
* clang-format configuration loosely based on ggml-sycl * Formatting
* Comments, disabling copy/move when not allowed * Replacing high / low bits macros * Factoring out dispatch functionality * Free all finished packets
* Missing checks in example * Adding init_tensor support
* Vector add for floats * Handling higher dims upon load * Move tensor testing in the operation supports function * mul_mat kernel compilation * Smaller gemm * Renaming args * Smaller gemm * Copy instead of moving PDI * Unify cmake kernel generation functions * Using latest CMake Python integration * Missing CMake HSA integration * Install kernels * Adding missing dependencies * Updating test to use HSA conditionally. HSA-specific mul_mat test * Encoding all dims in kernel filename
* Using new compilation process in CMake * Loading insts as binary
* Renaming dispatch function * Track allocated memory for packets
* Avoid warnings * Adding extra data to HSA backend tensors * Caching kernel in tensor extra metadata
* Renaming pending data functions. Refactor packet dispatch * Guard all CPU fallback cases * Internal nodes do not init extra until after graph allocation * Assert cleanups * Comments * Separate CMake support
* Add expected find_package definitions * Expose both C and C++ Peano compilers * Remove unused property * Relocating kernels * Output kernels for a device in a directory * Explicit names for Peano compilation
* Python script fixes * Separating kernel discovery to its own header
* Remove conservative asserts * Removing cpy kernel. Delegating to the CPU device for supports_op * Extracting types in example * Create completely independent fallback graph * Correct source tensor iteration * Better messages * Caching emulated tensors
* Update IRON environment set-up * Fixing typo and index url
* Renaming device to arch * Replace device with arch * Fix headers * Use arch in binary_ops * Info logs only during debug * Refactor binary_op implementation * Refactoring unary ops * Unary ops refactor * Temporary storage for input conversion * Adding i16 support for ggml_hsa_assign * Rename device to arch * Lower alignment requirements to 64bytes * Making CoreFunction a dataclass * Fix typos * Unary ops simplification * Adding alignment checks and simplifying tensor creation * Refactor internal nodes
* Aligning tensor sizes for bf16 / int8 / int16 * Using constant. Removing extra import. * Adding comment
…adop-amd#112) * Rephrasing error * Don't return true for is_host on HSA memory. Refactor exception catching. * Reenabling warning and refactoring registration. * Remove printf
* Avoid multiple logging * Abstracting log switch * Enable/disable logging at run-time
* Verbose log when kernel not found * Fix typo * Use ggml_op_is_empty when possible. Remove deadcode * Move ggml-hsa specific tests to separate directory * Move simple-vector example as a test-vector-hsa
* Raise exception if module not found * Moving kernels as generic * Reorganizing IRON kernels * Avoiding shadowing function name * Update README * Update src/ggml-hsa/kernels/build.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Initial plan * Add GitHub Copilot instructions for GGML repository Co-authored-by: ypapadop-amd <102817138+ypapadop-amd@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ypapadop-amd <102817138+ypapadop-amd@users.noreply.github.com>
* Updating requirements * Remove chaning cwd
* Remove CoreFunction from kernel implementation * Moving parameters out of CoreFunction * Per arch num of cols * Hybrid solution with both CoreFunction and external functions helper * Moving more out of the CoreFunction factory * Remove CoreFunction * Renaming kernel files * Remove unused variable
* Update documentation with supported configurations * Update compilation checks * Update src/ggml-hsa/kernels/binary_ops.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Adding new unary ops * Assert if type is not floating point * Fix floor implementation
* Update README on supported NPUs and prerequisites * Update src/ggml-hsa/README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Moving ops and kernel files registration to build script * Using single op to kernel map * Freezing dataclass * Update src/ggml-hsa/kernels/build.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/ggml-hsa/kernels/build.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* More generic kernel discovery * Renaming compilation function to suggest it's for AIE agents * Renaming AIE kernel compiler files * Updating references to AIE kernel compiler files * Use switch-case * Update comments
* Making TensorDesc into a dataclass * Create TensorDesc from ggml_tensor interface * Update src/ggml-hsa/kernels/tensor_desc.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Formatting * Correct TensorDesc missing members * Update src/ggml-hsa/kernels/build.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Adding alternative data type for members --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update README on how to compile * Update src/ggml-hsa/README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update README.md --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
092e35d to
6f9f0ea
Compare
6f9f0ea to
b32f95e
Compare
331c960 to
fcd1205
Compare
04d0c66 to
ba22186
Compare
07b1565 to
8ba19ce
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Increment dispatch signal before kernel dispatch in ggml-hsa.cpp. Handles dec if dispatch fails. Handles multiple packet dispatches.