[DRAFT] Deeploy-GAP9 Platform #143

runwangdl · 2025-12-17T01:33:19Z

Summary

This PR adds complete GAP9 platform support to Deeploy, including platform integration, DMA support, tiling capabilities, CI/CD workflows, and comprehensive testing infrastructure. This represents 20 commits specifically focused on GAP9 development.

Added

GAP9 Platform Support

Initial GAP9 platform integration with full deployer, bindings, and platform configuration (Deeploy/Targets/GAP9/)
GAP9 DMA support with L3 DMA and Mchan DMA implementations
GAP9-specific memory allocation and free templates
GAP9 tiling support for L3 memory
GAP9 CI/CD workflows (.github/workflows/_runner-gap9.yml, .github/workflows/ci-platform-gap9.yml, .github/workflows/ci-platform-gap9-tiled.yml)
Link to PULP-NN, PULP kernels, and Math libraries for GAP9
GAP9 SDK configuration with cluster stack macros
GAP9 GVSoC simulation support

Changed

Minimally modified PULP kernel syntax to fix GAP9 compiler issues. Changes are minimal and maintain compatibility with PULP kernels with GAP9 GCC toolchain-specific requirements
- Transpose operator: Fixed GCC segmentation fault caused by template syntax (commit 9ca4595)
- LayerNorm operator: Resolved epsilon ABI compatibility issue (commit 6b5c2e5)

Known Limitations

L3-L2 Async DMA - Currently synchronous; async blocked by Siracusa inheritance
NE16 Accelerator - Not yet integrated
AutoTiler DW/PW - GAP9 SDK AutoTiler kernels not integrated
GAP9 Float Math - Limited coverage (affects RMSNorm, etc.)

Platform Capabilities

✅ Multi-core (1-8) | ✅ L1/L2/L3 memory | ✅ Multi-channel DMA
✅ GVSoC simulation | ✅ Tiling | ✅ PULP-NN integration

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

- Enable pulling from private GitLab repo - Improve caching for pip, apt and cargo - Fix cMake version - Remove problematic pip installation in favor of apt package - Add ZSH an Oh My ZSH - Add package dependencies for GAP9 SDK - Remove unused files from the container - Fix banshee package problems

… PULPOpen

…Dory

… issueFix duplicate template generation due to PULP inheritance issue

coderabbitai · 2026-01-11T20:45:11Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added experimental GAP9 target: runtime, memory management, DMA acceleration, and wide operator support for tiled and untiled deployments.
CI/CD
- New CI workflows and reusable runners for GAP9 (tiled/untiled) and updated CI images; GAP9-aware ccache generation.
- New Docker image variants with GAP9 toolchain and SDK support.
Documentation
- Added GAP9 usage guide.
Tests
- New GAP9 test runners and platform test harness with emulation support.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Adds comprehensive GAP9 support: CI workflows, container/toolchain updates, GAP9 deployment backend (DMA, bindings, tiler, deployer), target runtime libraries, CMake/gvsoc integration, test runners and test harness for GAP9, and documentation.

Changes

Cohort / File(s)	Summary
CI Workflows `.github/workflows/` (`.github/workflows/ci-platform-gap9.yml`, `.github/workflows/ci-platform-gap9-tiled.yml`, `._runner-gap9.yml`, `._runner-gap9-tiled.yml`, `._select-env.yml`, `ci-deeploy.yml`, `infra-generate-ccache.yml`)	New GAP9 CI pipelines and reusable runners; updated default deeploy image to GAP9 image; added GAP9-specific ccache generation and L3/L2 handling.
Container & Toolchain `Container/` (`Dockerfile.Gap9`, `Dockerfile.deeploy`, `Dockerfile.toolchain`, `Makefile`, `amd64.list`) and `toolchain/`, `Makefile`	New Gap9 Dockerfile; toolchain/image tweaks (CMake arg, SSH/known_hosts, ccache, GAP SDK env vars); Make targets for gap9-toolchain/sdk; patches for banshee/gap9-sdk.
Deeploy GAP9 Platform & Tiling `Deeploy/Targets/GAP9/` (`Platform.py`, `Deployer.py`, `Bindings.py`, `Tiler.py`, `Templates/`, `DMA/*`, `__init__.py`)	New GAP9 platform: variable/constant/struct buffer types, cluster engine, GAP9Mapping, extensive node bindings, tiling-ready bindings, MCHAN/L3 async DMA implementations, allocation/free templates, and GAP9-specific deployer (L3 allocation/loading).
Test Infrastructure & Runners `DeeployTest/` (`testRunner_gap9.py`, `testRunner_tiled_gap9.py`, `Platforms/GAP9/`, `testUtils/*`, `CMakeLists.txt`)	Added GAP9 test runners (tiled/untiled), GAP9 test platform sources (deeploytest, cycle counter), SDK config, test harness integration, gvsoc_install_dir CLI flag, cleanup and debug prints.
Target Libraries (C) for GAP9 `TargetLibraries/GAP9/` (`CMakeLists.txt`, `inc/`, `src/*`)	New static deeploygap9 library and runtime: DMA implementations (mchan), memory layer (ram/fs/cl_ram), utility functions, headers (mchan, dory_dma/mem), cycle counter, and math macros.
Build System / CMake / gvsoc `CMakeLists.txt`, `cmake/*` (`common.cmake`, `gap9/gap9_gvsoc.cmake`, `simulation.cmake`)	Bumped CMake minimum, GAP9 build branch and gvsoc emulation macro (L2 vs L3/readfs), helper to add files to flash, and moved/adjusted try-compile/deeploylib changes.
PULPOpen / Templates / Small Fixes `Deeploy/Targets/PULPOpen/`, `TargetLibraries/PULPOpen/`, `TargetLibraries/Generic/*`	Reordered LayerNorm parameter (epsilon moved), updated templates (Transpose/FloatLayernorm), added math includes in several files, and propagated GAP9 engine core-count handling.
Docs `GAP9.md`, `README.md`	New GAP9 usage doc and README reference.

Sequence Diagram(s)

sequenceDiagram
    participant CI as CI Workflow
    participant Container as Docker/Image Build
    participant TestRunner as TestRunner
    participant CMake as CMake
    participant GVSOC as GVSOC
    participant Network as GAP9 Network Executable

    CI->>Container: build GAP9 image (toolchain + SDK)
    CI->>TestRunner: start test run (tiled / untiled)
    TestRunner->>CMake: configure & build (platform=GAP9)
    CMake->>Network: link deeploygap9, generate binary(s)
    TestRunner->>GVSOC: launch emulation (L2 or L3/readfs)
    GVSOC->>Network: load & execute
    Network->>GVSOC: perform cluster tasks and DMA transfers
    GVSOC->>TestRunner: return cycles & results
    TestRunner->>CI: report pass/fail

sequenceDiagram
    participant App as Host App
    participant Cluster as GAP9 Cluster
    participant DMA as MCHAN/L3 DMA
    participant L1 as L1 TCDM
    participant L2 as L2 RAM
    participant L3 as L3 RAM

    App->>Cluster: init cluster, allocate buffers
    Cluster->>L3: load constants (cl_ram_read)
    Cluster->>DMA: async DMA L3→L2 / L2→L1
    DMA->>L2: transfer data
    Cluster->>L1: compute kernel using L1 buffers
    Cluster->>DMA: write-back L1→L2 / L2→L3
    App->>Cluster: retrieve outputs, compare

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Refactor tiling code generation #105 — Related AsyncDma/Mchan/L3 refactor and DMA abstractions that GAP9 DMA classes build upon.
Support Fully Asynchronous DMAs #114 — Related changes to DMA futures/waiting strategies and async DMA infrastructure reused here.
TinyViT on non-tiled Siracusa #117 — Related propagation of cluster core handling and annotateNCores adjustments referenced by GAP9 engine changes.

Suggested reviewers

Victor-Jung
Xeratec

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 1.32% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[DRAFT] Deeploy-GAP9 Platform' clearly and concisely summarizes the main change: adding GAP9 platform support to Deeploy.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing GAP9 platform support additions, modifications, known limitations, and platform capabilities.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 050f0ae and ff05304.

📒 Files selected for processing (1)

.github/workflows/_select-env.yml

🔇 Additional comments (1)

.github/workflows/_select-env.yml (1)

48-48: LGTM! EOF newline added.

This change adds a newline at the end of the file, which is a best practice and aligns with POSIX standards. No functional changes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents

In @.github/workflows/_runner-gap9.yml:
- Around line 34-40: The CI step "Build Deeploy" currently masks failures by
appending "|| true" to commands (notably the "source
/app/install/gap9-sdk/configs/gap9_evk_audio.sh || true" and "pip install -e .
|| true"); remove the "|| true" from these commands so errors cause the job to
fail, and for the optional sourcing use a guarded conditional (e.g., test for
the file before sourcing) rather than swallowing errors—locate the commands
inside the "Build Deeploy" run block in the workflow and update the "source ...
|| true" and "pip install -e . || true" lines accordingly.
- Around line 47-63: The current loop "echo $testNames | while IFS= read -r
testName; do ... python testRunner_gap9.py -t Tests/$testName ... done" can hide
failures because of the pipe and missing errexit; enable strict failure handling
(set -euo pipefail or at least set -e and set -o pipefail) and restructure the
loop so it doesn't run in a subshell (e.g., read from a
here-string/process-substitution or iterate over testNames), capture each python
invocation's exit status (or maintain a failure flag) and after the loop call
exit with non-zero if any test failed so the job fails when any test fails.

In @.github/workflows/_select-env.yml:
- Line 29: The workflow hardcodes IMAGE="ghcr.io/runwangdl/deeploy:gap9", making
the inputs.docker_image_deeploy input unused and tying CI to a personal repo;
change the assignment for IMAGE in .github/workflows/_select-env.yml to use the
workflow input or the org image instead (e.g., set IMAGE to the inputs variable
docker_image_deeploy or to "ghcr.io/pulp-platform/deeploy:gap9") so contributors
can override it and avoid depending on a personal account.

In @.github/workflows/ci-deeploy.yml:
- Around line 17-20: The workflow input docker_image_deeploy currently defaults
to a personal registry image ("ghcr.io/runwangdl/deeploy:gap9"), which must not
be used for CI; update the default value of the docker_image_deeploy input to
point to an official, organization-managed image (e.g.,
"ghcr.io/pulp-platform/deeploy:gap9" if published, or revert to
"ghcr.io/pulp-platform/deeploy:devel" until a GAP9 image is available) and
ensure any documentation or README referencing docker_image_deeploy is
consistent with the org registry change.

In @Container/Dockerfile.toolchain:
- Line 33: The Dockerfile.toolchain contains a typo in the APT package list:
replace the incorrect package name "ibglib2.0-dev" with the correct
"libglib2.0-dev" in the package installation line (the line that currently lists
"ibglib2.0-dev" among the packages) so the package installation succeeds.

In @DeeployTest/CMakeLists.txt:
- Around line 53-80: The message uses ${CMAKE_MATCH_COUNT} which is wrong for
counting files; replace that usage by computing the HEXLIST length with
list(LENGTH HEXLIST HEXCOUNT) and then log HEXCOUNT in the "Found … hex file(s)"
message; keep the existing GLOB_RECURSE pattern ("${GENERATED_SOURCE}/hex/*")
as-is and ensure you update the message call that currently references
${CMAKE_MATCH_COUNT} to reference ${HEXCOUNT} instead so the reported count
matches the HEXLIST contents.

In @TargetLibraries/GAP9/src/Util.c:
- Around line 18-22: The preprocessor conditional uses a bitwise OR and Apollo
macros that don't exist here; change the operator from '|' to '||',
remove/replace Apollo-specific macros (AM_PART_APOLLO4B, DAM_PART_APOLLO3) and
any call to am_util_stdio_vprintf (which isn't declared) with the correct GAP9
platform macros and the proper GAP9 logging/printf API (or fall back to
vprintf). Locate the conditional around am_util_stdio_vprintf/vprintf in Util.c,
replace the macro checks with the GAP9-specific macro(s) you find via the
suggested rg search, use '||' for logical OR, and ensure the selected function
(vprintf or the GAP9 declared equivalent) is actually declared/available in
included headers.

🟡 Minor comments (15)

Container/Dockerfile.Gap9-3-5 (1)
3-5: Remove duplicate environment variable.

GAP_RISCV_GCC_TOOLCHAIN is set twice with identical values on lines 3 and 5.
Proposed fix
 FROM ghcr.io/pulp-platform/deeploy:latest

 ENV GAP_RISCV_GCC_TOOLCHAIN=/app/install/gcc/gap9
 ENV GAP_SDK_HOME=/app/install/gap9-sdk
-ENV GAP_RISCV_GCC_TOOLCHAIN=/app/install/gcc/gap9
DeeployTest/Platforms/GAP9/src/deeploytest.c-98-98 (1)
98-98: Typo: "Intializing" should be "Initializing".
Fix
-  printf("Intializing\r\n");
+  printf("Initializing\r\n");
DeeployTest/testUtils/platformMapping.py-210-225 (1)
210-225: Missing inputOffsets parameter in GAP9Deployer instantiation.

All other platform deployers in this function pass the inputOffsets parameter, but the GAP9Deployer call on Line 218 omits it. This means user-specified input offsets will be ignored for GAP9 deployments.
🔧 Suggested fix
     deployer = GAP9Deployer(graph,
                             platform,
                             inputTypes,
                             loweringOptimizer,
                             scheduler,
                             name = name,
                             default_channels_first = default_channels_first,
-                            deeployStateDir = deeployStateDir)
+                            deeployStateDir = deeployStateDir,
+                            inputOffsets = inputOffsets)
DeeployTest/Platforms/GAP9/sdk.config-11-19 (1)

11-19: Double-check storage/readfs configuration coherence (FLASH vs MRAM).

You enable CONFIG_DRIVER_TYPE_FLASH, CONFIG_DRIVER_MRAM, and pick CONFIG_READFS_FLASH_TYPE_OSPI=y while the MRAM readfs type is commented. If the intent is “L3/readfs via OSPI”, consider adding a short comment explaining why MRAM is enabled (driver dependency vs actual storage).

DeeployTest/Platforms/GAP9/sdk.config-5-7 (1)

5-7: Ensure board selection is mutually exclusive (or clearly intentional).

Both CONFIG_BOARD_GAP9MOD_V1_0_B=y and CONFIG_BOARD_GAP9EVK_V1_3=y are enabled; many SDKs assume exactly one board is selected, which can lead to conflicting BSP config.
DeeployTest/Platforms/GAP9/CMakeLists.txt-20-23 (1)
20-23: target_compile_options(${ProjectId} INTERFACE network) looks accidental.

network here is a target name, not a compiler flag. Even if it’s harmless (INTERFACE on an executable), it’s confusing and easy to cargo-cult elsewhere.
Proposed fix
 target_link_libraries(${ProjectId} PRIVATE network deeploylib)
-target_compile_options(${ProjectId} INTERFACE network)
 add_gvsoc_emulation(${ProjectId} "gap9.evk")
GAP9.md-6-6 (1)
6-6: Fix grammatical error.

The phrase "does yet not include" should be "does not yet include".
📝 Proposed fix
-To use Deeploy with GAP9, a custom Docker container is required because the official Deeploy Docker image does yet not include the necessary SDKs and dependencies for GAP9 development, because they are not publicly available.
+To use Deeploy with GAP9, a custom Docker container is required because the official Deeploy Docker image does not yet include the necessary SDKs and dependencies for GAP9 development, because they are not publicly available.
GAP9.md-22-22 (1)
22-22: Fix typo in variable name.

The variable name should be DEEPLOY_IMAGE not DEEPOY_IMAGE (missing an 'L').
📝 Proposed fix
-make deeploy DEEPOY_IMAGE=deeploy:gap9
+make deeploy DEEPLOY_IMAGE=deeploy:gap9
DeeployTest/testUtils/testRunner.py-315-315 (1)
315-315: Fix unnecessary f-string prefix.

The assertion message is an f-string without any placeholders. Remove the f prefix for consistency and clarity.
🐛 Proposed fix
-assert self._args.gvsoc_install_dir is not None, f"Environment variable GVSOC_INSTALL_DIR is not set"
+assert self._args.gvsoc_install_dir is not None, "Environment variable GVSOC_INSTALL_DIR is not set"
Based on static analysis hint.
.github/workflows/_runner-gap9-tiled.yml-62-65 (1)
62-65: Error suppression could mask setup failures.

Both the SDK config sourcing and pip installation use || true to suppress errors. This means failures in environment setup or dependency installation will be silently ignored, potentially leading to test failures that are harder to diagnose.

Consider removing || true for the pip install step to ensure dependencies are properly installed:
💡 Proposed fix
-source /app/install/gap9-sdk/configs/gap9_evk_audio.sh || true
-pip install -e . || true
+source /app/install/gap9-sdk/configs/gap9_evk_audio.sh
+pip install -e .
If these failures are expected in some scenarios, please document why silent failure is acceptable.
TargetLibraries/GAP9/CMakeLists.txt-25-27 (1)
25-27: Duplicate NUM_CORES compile definition.

NUM_CORES is added twice:

Line 26: target_compile_options(deeploygap9 PUBLIC -DNUM_CORES=${NUM_CORES})

Line 44: add_compile_definitions(NUM_CORES=${NUM_CORES})

The add_compile_definitions call applies globally to all targets in the directory, which may unintentionally affect other targets. If the intent is to propagate NUM_CORES to pulp-nn-mixed, consider using target_compile_definitions instead.
Suggested fix
-add_compile_definitions(NUM_CORES=${NUM_CORES})
+# NUM_CORES is already propagated via deeploygap9 PUBLIC compile options
Or if pulp-nn-mixed needs it explicitly:
-add_compile_definitions(NUM_CORES=${NUM_CORES})
+target_compile_definitions(pulp-nn-mixed PUBLIC NUM_CORES=${NUM_CORES})
Also applies to: 44-44
TargetLibraries/GAP9/inc/mchan.h-98-105 (1)

98-105: Comment typo: duplicate "v7" reference.

The comment on lines 98-99 mentions "v7" twice. Based on the code logic, the #else branch handles non-v7 (presumably v6), so the comment should read something like:

"MCHAN version 7 takes 2D count and stride in 2 steps; v6 takes it in 1 step with the stride shifted to the upper 16 bits."
Deeploy/Targets/GAP9/Deployer.py-45-68 (1)
45-68: Fix mutable default argument for inputOffsets.

Using {} as a default argument is a Python anti-pattern—the same dictionary instance is shared across all calls, which can lead to subtle bugs.
Proposed fix
     def __init__(self,
                  graph: gs.Graph,
                  deploymentPlatform: DeploymentPlatform,
                  inputTypes: Dict[str, Type[Pointer]],
                  loweringOptimizer: TopologyOptimizer,
                  scheduler: Callable = lambda x: x,
                  name: str = 'DeeployNetwork',
                  default_channels_first = False,
                  deeployStateDir: str = "DeeployStateDir",
-                 inputOffsets = {}):
+                 inputOffsets: Dict[str, int] = None):
+        if inputOffsets is None:
+            inputOffsets = {}
         super().__init__(graph,
TargetLibraries/GAP9/src/dory_mem.c-153-176 (1)
153-176: File handle is never closed after reading.

Same issue as load_file_to_ram - the file descriptor is not closed after use.
🔧 Suggested fix
     offset += load_size;
   }

+  pi_fs_close(fd);
   return offset;
 }
TargetLibraries/GAP9/src/dory_mem.c-126-151 (1)
126-151: File handle is never closed after reading.

The function opens a file with pi_fs_open but never closes it with pi_fs_close, which could lead to resource leaks if called multiple times.
🔧 Suggested fix
   } while (offset < size);

+  pi_fs_close(fd);
   return offset;
 }

🧹 Nitpick comments (41)

TargetLibraries/GAP9/src/Util.c (1)

7-7: Potentially unused include.

DeeployGAP9Math.h doesn't appear to be used in this file. Consider removing it unless it's required for type definitions or forward declarations used elsewhere.

toolchain/banshee.patch (1)

90-98: LGTM: Using libc::c_char improves FFI type correctness.

Changing NONAME from &'static i8 to &'static libc::c_char is the proper way to represent C character pointers in Rust FFI code. While c_char is typically i8, it can be u8 on some platforms, so this change improves portability and type safety.
Deeploy/Targets/GAP9/Templates/__init__.py (1)
5-5: Consider explicit imports for better clarity.

While from . import * is functional for package re-exports, explicit imports improve code clarity and enable static analysis. For example:
from .AllocateTemplate import *
from .FreeTemplate import *
This makes it clear which submodules are being exported and helps static analyzers verify the imports.
Deeploy/Targets/GAP9/__init__.py (1)
5-5: Consider explicit imports for better clarity.

While from . import * is functional for package re-exports, explicit imports improve code clarity and enable static analysis. For example:
from . import Bindings
from . import Deployer
from . import Platform
from . import DMA
from . import Templates
from . import Tiler
This makes it clear which submodules are being exported and helps static analyzers verify the imports.
Container/Makefile (1)

37-41: Add documentation for SSH agent requirement.

The --ssh default flag is necessary—GAP9 SDK builds require SSH access to clone private dependencies from the PULP toolchain repositories. To help users, document this requirement (e.g., in a README or Makefile comment) noting that the build requires an SSH agent with proper credentials configured.
DeeployTest/Platforms/GAP9/inc/CycleCounter.h (1)
7-8: Consider a more specific include guard name.

The include guard CYCLECOUNTER is generic and may conflict with other headers. A more descriptive guard like DEEPLOY_GAP9_CYCLECOUNTER_H or GAP9_CYCLE_COUNTER_H_ would reduce collision risk.
Suggested improvement
-#ifndef CYCLECOUNTER
-#define CYCLECOUNTER
+#ifndef DEEPLOY_GAP9_CYCLECOUNTER_H
+#define DEEPLOY_GAP9_CYCLECOUNTER_H
And at line 22:
-#endif
+#endif // DEEPLOY_GAP9_CYCLECOUNTER_H
Container/Dockerfile.Gap9 (1)
30-32: Trailing whitespace in cleanup command.

Line 32 has trailing whitespace after the path which could cause issues in some contexts.
Proposed fix
 RUN --mount=type=cache,target=/ccache \
     ccache -z && make gap9-toolchain && \
-    rm -rf /app/toolchain/gap9-toolchain 
+    rm -rf /app/toolchain/gap9-toolchain
DeeployTest/Platforms/GAP9/src/CycleCounter.c (1)
10-19: Minor: Function parameter style inconsistent with header.

The header declares functions with (void) but the implementation uses empty (). While technically compatible in C, consistent style is preferred.
Suggested fix for consistency
-void ResetTimer() {
+void ResetTimer(void) {
   pi_perf_conf(1 << PI_PERF_CYCLES);
   pi_perf_reset();
 }

-void StartTimer() { pi_perf_start(); }
+void StartTimer(void) { pi_perf_start(); }

-void StopTimer() { pi_perf_stop(); }
+void StopTimer(void) { pi_perf_stop(); }

-unsigned int getCycles() { return pi_perf_read(PI_PERF_CYCLES); }
+unsigned int getCycles(void) { return pi_perf_read(PI_PERF_CYCLES); }
.github/workflows/infra-generate-ccache.yml (1)
51-66: Silenced configuration sourcing may hide failures.

Line 61 uses || true which suppresses any errors from sourcing the GAP9 config. If the configuration fails, the subsequent tests may run in an incorrect environment without any indication.

Consider logging a warning or removing || true to fail fast on configuration issues.
Suggested improvement
-          source /app/install/gap9-sdk/configs/gap9_evk_audio.sh || true
+          if ! source /app/install/gap9-sdk/configs/gap9_evk_audio.sh; then
+            echo "Warning: Failed to source GAP9 config, continuing anyway"
+          fi
cmake/gap9/gap9_gvsoc.cmake (3)
47-52: Redundant condition check.

The if(GAPY_RUNNER_ARGS) check on line 48 is redundant since the enclosing block (line 24) already confirms GAPY_RUNNER_ARGS is set.
Suggested simplification
         # Add readfs files if provided
-        if(GAPY_RUNNER_ARGS)
-            list(LENGTH GAPY_RUNNER_ARGS num_readfs_files)
-            message(STATUS "[Deeploy GAP9] Adding ${num_readfs_files} readfs file(s)")
-            list(APPEND GAPY_CMD ${GAPY_RUNNER_ARGS})
-        endif()
+        list(LENGTH GAPY_RUNNER_ARGS num_readfs_files)
+        message(STATUS "[Deeploy GAP9] Adding ${num_readfs_files} readfs file(s)")
+        list(APPEND GAPY_CMD ${GAPY_RUNNER_ARGS})
83-85: POST_BUILD has no effect on custom targets.

POST_BUILD is only meaningful for add_custom_command attached to library/executable targets, not for add_custom_target. It can be safely removed from both blocks.

Additionally, VERBATIM is used in L3 mode (line 85) but not in L2 mode (line 121), which could cause inconsistent shell quoting behavior.
Suggested fix

For L3 mode (lines 83-85):
             COMMENT "Simulating ${name} with gapy for GAP9 (L3 mode)"
-            POST_BUILD
             USES_TERMINAL
             VERBATIM
For L2 mode (lines 119-121):
             COMMENT "Simulating ${name} with gvsoc for GAP9 (L2 mode)"
-            POST_BUILD
             USES_TERMINAL
+            VERBATIM
         )
Also applies to: 119-121

73-73: Inconsistent error handling for copy commands.

L3 mode uses bash -c with 2>/dev/null || true (line 73), while L2 mode uses direct CMake command with || true (line 109). The latter may not work correctly as CMake's copy_if_different doesn't recognize shell operators.

Consider using consistent error suppression:
Suggested fix for L2 mode
-            COMMAND ${CMAKE_COMMAND} -E copy_if_different ${CMAKE_BINARY_DIR}/*.bin ${GVSOC_WORKDIR}/ || true
+            COMMAND bash -c "${CMAKE_COMMAND} -E copy_if_different ${CMAKE_BINARY_DIR}/*.bin ${GVSOC_WORKDIR}/ 2>/dev/null || true"
Also applies to: 109-109
TargetLibraries/GAP9/inc/dory_mem.h (3)
7-8: Consider using a more specific include guard name.

The guard __MEM_H__ is very generic and could conflict with other headers in the project or dependencies. A more specific name like __DEEPLOY_GAP9_DORY_MEM_H__ or __GAP9_DORY_MEM_H__ would reduce collision risk.
Suggested fix
-#ifndef __MEM_H__
-#define __MEM_H__
+#ifndef __DEEPLOY_GAP9_DORY_MEM_H__
+#define __DEEPLOY_GAP9_DORY_MEM_H__
And at the end:
-#endif // __MEM_H__
+#endif // __DEEPLOY_GAP9_DORY_MEM_H__
10-12: Missing include for struct pi_device declaration.

The header declares extern struct pi_device ram; but doesn't include the header that defines struct pi_device. Users of this header will get compilation errors unless they include pmsis.h before this header. Consider adding the necessary include or a forward declaration.
Suggested fix
 #include <stddef.h>
+#include "pmsis.h"
 
 extern struct pi_device ram;
Or if you want to minimize includes, add a forward declaration (though this may not work depending on how pi_device is defined):
 #include <stddef.h>
+
+struct pi_device;
 
 extern struct pi_device ram;
19-24: Inconsistent const qualifiers between header and implementation.

The implementation in dory_mem.c uses const size_t size for ram_read and ram_write, but the header declares them without const. While this doesn't affect ABI compatibility, it's good practice to keep declarations consistent for clarity.
Suggested fix
-void ram_read(void *dest, void *src, size_t size);
-void ram_write(void *dest, void *src, size_t size);
+void ram_read(void *dest, void *src, const size_t size);
+void ram_write(void *dest, void *src, const size_t size);
DeeployTest/Platforms/GAP9/src/deeploytest.c (3)
61-63: Inefficient use of pi_cl_team_fork when only core 0 performs work.

CL_CompareFloat forks NUM_CORES threads but CompareFloatOnCluster only executes on core 0 (line 32). This wastes resources spawning idle cores. Consider either using a single-core dispatch or distributing the comparison work across all cores for parallelism.
Option 1: Single core execution
 void CL_CompareFloat(void *arg) {
-  pi_cl_team_fork(NUM_CORES, CompareFloatOnCluster, arg);
+  pi_cl_team_fork(1, CompareFloatOnCluster, arg);
 }
Option 2: Parallelize comparison (if performance is critical)

Distribute num_elements across cores using pi_core_id() to partition work.

110-110: Extract magic number 0x10000000 to a named constant.

The address 0x10000000 is used multiple times to distinguish between L3 and L2 memory regions. Consider defining a descriptive constant to improve readability and maintainability.
Suggested fix
+#define L2_MEM_BASE_ADDR 0x10000000
+
 // Then replace usages:
-    if ((uint32_t)DeeployNetwork_inputs[buf] >= 0x10000000) {
+    if ((uint32_t)DeeployNetwork_inputs[buf] >= L2_MEM_BASE_ADDR) {
Also applies to: 138-138, 176-176

42-55: Float comparison tolerance is hardcoded.

The tolerance 1e-4 for float comparison is hardcoded. For different models or precision requirements, this may need adjustment. Consider making it configurable or at least defining it as a named constant.
Suggested fix
+#define FLOAT_COMPARE_TOLERANCE 1e-4
+
 // In CompareFloatOnCluster:
-      if ((diff < -1e-4) || (diff > 1e-4) || isnan(diff)) {
+      if ((diff < -FLOAT_COMPARE_TOLERANCE) || (diff > FLOAT_COMPARE_TOLERANCE) || isnan(diff)) {
toolchain/gap9-sdk.patch (1)

34-65: Large blocks of commented-out bindings in soc.py.

The patch comments out 32 self.bind() calls for SFU stream bindings. If these are permanently disabled for GAP9 support, consider removing them entirely or adding a comment explaining why they're disabled. Commented-out code can become stale and confusing.

If these bindings are expected to be re-enabled later, add a # TODO: Re-enable when SFU support is added comment. Otherwise, consider removing the commented lines.

Also applies to: 73-104
Deeploy/Targets/GAP9/DMA/MchanDma.py (1)
29-39: Annotate mutable class attributes with typing.ClassVar.

Per static analysis, _transferTemplates and _waitingStrategy are mutable class attributes that should be annotated with ClassVar to clarify they are shared across all instances.
Suggested fix
+from typing import ClassVar, Dict, Tuple
-from typing import Dict, Tuple
 
 # ... 

 class GAP9MchanDma(AsyncDma):
 
-    _transferTemplates = {
+    _transferTemplates: ClassVar[Dict[int, NodeTemplate]] = {
             NodeTemplate(
                 "{ mchan_transfer_t __mchan_tmp = { .cmd = ${cmd}, .size = ${size}, .loc = ${loc}, .ext = ${ext} }; mchan_transfer_push_1d(__mchan_tmp); }"
             ),
             NodeTemplate(
                 "{ mchan_transfer_t __mchan_tmp = { .cmd = ${cmd}, .size = ${size}, .loc = ${loc}, .ext = ${ext}, .ext_size_1d = ${size_1d}, .ext_stride_1d = ${stride_2d} }; mchan_transfer_push_2d(__mchan_tmp); }"
             ),
     }
-    _waitingStrategy = DirectionWaitingStrategy(MchanTransferFuture, "transfer")
+    _waitingStrategy: ClassVar[DirectionWaitingStrategy] = DirectionWaitingStrategy(MchanTransferFuture, "transfer")
DeeployTest/testUtils/codeGenerate.py (1)
269-274: Good practice: Cleanup prevents test contamination.

Removing stale hex files between runs is important for GAP9 L2/L3 testing. The implementation is correct.
Optional: Move import to module level

For consistency with other imports, consider moving shutil to the top-level imports (line 5) instead of importing it locally:
 import os
+import shutil
 from typing import List, Tuple
Then remove the local import on line 270. This is a minor style preference.
TargetLibraries/GAP9/inc/DeeployGAP9Math.h (1)
16-18: Single-core macros are functional but could benefit from documentation.

The macros correctly restrict execution to core 8 (fabric controller) or core 0 (first cluster core). Consider adding comments explaining when to use each macro and the implications of other cores skipping the guarded code.
Suggested documentation
+// Single-core execution macros for GAP9
+// Core 8: Fabric Controller, Core 0: First cluster core
+// Use BEGIN_SINGLE_CORE/END_SINGLE_CORE for code blocks
+// Use SINGLE_CORE for single statements
+// Other cores will skip the guarded code
 #define BEGIN_SINGLE_CORE if (pi_core_id() == 8 || pi_core_id() == 0) {
 #define END_SINGLE_CORE }
 #define SINGLE_CORE if (pi_core_id() == 8 || pi_core_id() == 0)
Deeploy/Targets/PULPOpen/Templates/TransposeTemplate.py (1)
96-103: Good workaround; tighten types (const src, and use data_out_type for dst).

The src/dst pointer approach is much cleaner and should avoid the previous GCC crash. Minor hardening: make src const, and type dst off data_out_type (in case future transpose variants allow differing in/out types). Based on learnings, this also keeps explicit, templated bitwidth types.
Proposed tweak
-// RW: GCC Segmentation fault
-
-${data_in_type.referencedType.typeName} (*src)${shapeStr} = (${data_in_type.referencedType.typeName} (*)${shapeStr})<%text>${data_in}</%text>;
-
-${data_in_type.referencedType.typeName} (*dst)${outShapeStr} = (${data_in_type.referencedType.typeName} (*)${outShapeStr})<%text>${data_out}</%text>;
+// RW: GCC segmentation fault (workaround: simplify indexing via typed pointers)
+
+const ${data_in_type.referencedType.typeName} (*src)${shapeStr} =
+  (const ${data_in_type.referencedType.typeName} (*)${shapeStr})<%text>${data_in}</%text>;
+
+${data_out_type.referencedType.typeName} (*dst)${outShapeStr} =
+  (${data_out_type.referencedType.typeName} (*)${outShapeStr})<%text>${data_out}</%text>;
DeeployTest/Platforms/GAP9/CMakeLists.txt (1)

24-30: Consider avoiding -Wno-error blanket for network.

If the goal is just to get past known SDK/header warnings, keeping the suppression to the specific warning classes is safer than disabling -Werror behavior wholesale.
CMakeLists.txt (2)
38-38: Remove or properly enable the commented debug line.

The commented set( CMAKE_MESSAGE_LOG_LEVEL "DEBUG" ) should either be removed or enabled via a configuration option.
♻️ Proposed fix
-  # set( CMAKE_MESSAGE_LOG_LEVEL "DEBUG" )
Or enable it conditionally:
-  # set( CMAKE_MESSAGE_LOG_LEVEL "DEBUG" )
+  if(CMAKE_VERBOSE_MAKEFILE)
+    set(CMAKE_MESSAGE_LOG_LEVEL "DEBUG")
+  endif()
56-59: Document why common.cmake is excluded for GAP9.

The exclusion of common.cmake for GAP9 suggests a fundamentally different build pattern. Please add a comment explaining why GAP9 requires this special treatment.
 # Import useful functions / macros
 include(${CMAKE_CURRENT_LIST_DIR}/cmake/Util.cmake)
-# Only if not GAP9
+# GAP9 uses GAP SDK's own build system setup, so common.cmake is not needed
 if(NOT platform STREQUAL GAP9)
   include(${CMAKE_CURRENT_LIST_DIR}/cmake/common.cmake)
 endif()
DeeployTest/testRunner_gap9.py (1)
26-28: Avoid accessing private parser attributes.

Directly accessing parser._actions and modifying action defaults is fragile and couples the code to ArgumentParser's internal implementation.
♻️ Proposed refactor using public API

Consider setting the default via environment variable or using set_defaults():
-    # Set default GVSOC install dir
-    for action in parser._actions:
-        if action.dest == 'gvsoc_install_dir':
-            action.default = "${GAP_SDK_HOME}/install/workstation"
-    args = parser.parse_args()
+    import os
+    
+    # Set default GVSOC install dir from GAP SDK if available
+    gap_sdk_home = os.environ.get('GAP_SDK_HOME')
+    if gap_sdk_home:
+        default_gvsoc = os.path.join(gap_sdk_home, 'install', 'workstation')
+        parser.set_defaults(gvsoc_install_dir=default_gvsoc)
+    
+    args = parser.parse_args()
This approach:

Uses the public set_defaults() API

Properly constructs the path using os.path.join

Handles the case when GAP_SDK_HOME is not set

Resolves the environment variable immediately rather than relying on shell expansion
Deeploy/Targets/GAP9/Templates/FreeTemplate.py (2)
7-8: Identical L2 free templates suggest potential consolidation.

gap9L2LocalTemplate and gap9L2GlobalTemplate have identical implementations. If they're truly meant to behave the same way, consider using a single template definition to reduce duplication.

If these are intentionally separate because they may diverge in the future or have different semantic meanings despite identical current implementations, please add a comment explaining the distinction.
♻️ Proposed consolidation
-gap9L2LocalTemplate = NodeTemplate("pi_l2_free(${name}, sizeof(${type.referencedType.typeName}) * ${size});")
-gap9L2GlobalTemplate = NodeTemplate("pi_l2_free(${name}, sizeof(${type.referencedType.typeName}) * ${size});")
+gap9L2FreeTemplate = NodeTemplate("pi_l2_free(${name}, sizeof(${type.referencedType.typeName}) * ${size});")
+gap9L2LocalTemplate = gap9L2FreeTemplate
+gap9L2GlobalTemplate = gap9L2FreeTemplate
12-22: LGTM with minor formatting suggestion.

The generic free template provides comprehensive coverage of memory levels with appropriate fallback handling. The logic correctly routes to the right free API based on the memory level.

Minor formatting suggestion for the compiler block comment:
♻️ Proposed formatting improvement
-//COMPILER BLOCK - MEMORYLEVEL ${_memoryLevel} NOT FOUND \n
+// COMPILER BLOCK - MEMORY LEVEL ${_memoryLevel} NOT FOUND\n
.github/workflows/ci-platform-gap9-tiled.yml (2)

155-182: Consider simplifying redundant matrix entries.

The num-cores: [8] matrix (line 175) has a single value and could be simplified by using a direct value in the with block, similar to how gap9-kernels-tiled-singlebuffer-L2 handles it (line 93). This reduces indirection.

If you intend to expand to multiple core counts in the future, keeping it as a matrix is fine—just noting the inconsistency with the kernels jobs.

184-207: Inconsistent double-buffer specification.

The double-buffer flag is defined in the matrix (line 199) with a single value [true], then passed via ${{ matrix.double-buffer }} (line 207). Compare this to gap9-kernels-tiled-doublebuffer-L2 (line 153) which simply uses double-buffer: true directly.

Consider aligning the approach for consistency—either use a direct value or keep it in the matrix if you plan to test both modes in the same job.

Deeploy/Targets/GAP9/Templates/AllocateTemplate.py (2)

57-68: Silent fallback may hide configuration errors.

The else branch (lines 64-67) silently falls back to L2 allocation when an unknown _memoryLevel is encountered, only leaving a comment in the generated code. This could mask configuration bugs.

Consider emitting a warning or raising an error during code generation instead of silently proceeding:

Alternative: Fail at generation time

Instead of generating fallback code, you could validate _memoryLevel in the deployer before template generation and raise an explicit error for unsupported levels.

10-10: Consider removing commented-out code.

Lines 10, 23, and 30 contain commented-out alternative implementations. If these are obsolete, consider removing them to reduce noise. If they're kept for reference, a brief comment explaining why would help.

TargetLibraries/GAP9/CMakeLists.txt (1)

29-36: Aggressive warning suppressions may hide issues.

Several of these warnings (-Wno-implicit-function-declaration, -Wno-incompatible-pointer-types) can mask real bugs in your own code. Consider whether all these are necessary for the GAP9-specific sources, or if some could be scoped more narrowly to third-party code only.
Deeploy/Targets/GAP9/DMA/L3Dma.py (1)
27-38: Class attribute annotation.

The static analyzer suggests annotating _transferTemplates with typing.ClassVar. While not strictly required, it clarifies intent and prevents accidental instance-level mutation.
Optional fix
+from typing import ClassVar, Dict, Tuple
+
 class GAP9L3Dma(AsyncDma):

-    _transferTemplates = {
+    _transferTemplates: ClassVar[Dict[int, NodeTemplate]] = {
             NodeTemplate(
                 "pi_cl_ram_copy_2d(get_ram_ptr(), ${ext}, ${loc}, ${transfer_size}, ${stride}, ${length}, ${ext2loc}, &${future});"
             )
     }
TargetLibraries/GAP9/inc/mchan.h (1)
86-138: Static functions in header should be static inline.

Functions defined in a header file with just static linkage will be duplicated in each translation unit that includes this header. While this works, using static inline is the conventional approach and hints to the compiler that inlining is preferred.
Proposed changes
-static int mchan_transfer_get_id() { return MCHAN_READ_CMD(); }
+static inline int mchan_transfer_get_id() { return MCHAN_READ_CMD(); }

-static void mchan_transfer_push_1d(mchan_transfer_t trans) {
+static inline void mchan_transfer_push_1d(mchan_transfer_t trans) {
   // ...
 }

-static void mchan_transfer_push_2d(mchan_transfer_t trans) {
+static inline void mchan_transfer_push_2d(mchan_transfer_t trans) {
   // ...
 }
Apply similarly to mchan_transfer_push, mchan_transfer_free, mchan_transfer_busy, and mchan_transfer_wait.
TargetLibraries/GAP9/src/dory_mem.c (1)
129-130: Error messages missing newline character.

The printf statements on lines 129 and 156 are missing the newline character \n at the end, inconsistent with other error messages in this file.
✏️ Suggested fix
-    printf("ERROR: Cannot open file %s! Exiting...", filename);
+    printf("ERROR: Cannot open file %s! Exiting...\n", filename);
Also applies to: 155-157
TargetLibraries/GAP9/src/dory_dma.c (1)

110-143: Consider simplifying single-core 3D transfer logic.

The log2(1) on line 113 always evaluates to 0, making number_of_2d_copies_per_core equal to copy->number_of_2d_copies. This effectively means no parallelization occurs. If this is intentional (single-core execution), the code could be simplified. If multi-core execution is desired, this appears to be incomplete.
Deeploy/Targets/GAP9/Platform.py (3)
263-272: Avoid mutable default arguments and function calls in argument defaults.

The engines parameter uses a mutable list and a function call as default, which can cause unexpected behavior if the list is modified. This is flagged by static analysis (B006, B008).
♻️ Suggested fix
 class GAP9Platform(DeploymentPlatform):

     def __init__(self,
-                 engines = [GAP9ClusterEngine("GAP9Cluster")],
+                 engines = None,
                  variableBuffer = GAP9VariableBuffer,
                  constantBuffer = GAP9ConstantBuffer,
                  structBuffer = GAP9StructBuffer,
                  transientBuffer = GAP9TransientBuffer) -> None:
+        if engines is None:
+            engines = [GAP9ClusterEngine("GAP9Cluster")]
         super().__init__(engines, variableBuffer, constantBuffer, structBuffer, transientBuffer)
274-292: Same mutable default argument issue and missing ClassVar annotation.

The MemoryGAP9Platform class has the same mutable default argument issue with engines. Additionally, untiledOps is a mutable class attribute that should be annotated with typing.ClassVar (RUF012).
♻️ Suggested fix
+from typing import ClassVar, List
+
 class MemoryGAP9Platform(MemoryPlatform):

-    untiledOps = ["add"]
+    untiledOps: ClassVar[List[str]] = ["add"]

     def __init__(self,
                  memoryHierarchy: MemoryHierarchy,
                  defaultTargetMemoryLevel: MemoryLevel,
-                 engines = [GAP9ClusterEngine("GAP9Cluster")],
+                 engines = None,
                  variableBuffer = GAP9VariableBuffer,
                  constantBuffer = GAP9ConstantBuffer,
                  structBuffer = GAP9StructBuffer,
                  transientBuffer = GAP9TransientBuffer) -> None:
+        if engines is None:
+            engines = [GAP9ClusterEngine("GAP9Cluster")]
         super().__init__(memoryHierarchy, defaultTargetMemoryLevel, engines, variableBuffer, constantBuffer,
                          structBuffer, transientBuffer)
295-307: Same ClassVar annotation issue for untiledOps.

The MemoryGAP9PlatformWrapper class also has a mutable class attribute untiledOps that should be annotated with ClassVar.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecae48a and 050f0ae.

📒 Files selected for processing (60)

.github/workflows/_runner-gap9-tiled.yml
.github/workflows/_runner-gap9.yml
.github/workflows/_select-env.yml
.github/workflows/ci-deeploy.yml
.github/workflows/ci-platform-gap9-tiled.yml
.github/workflows/ci-platform-gap9.yml
.github/workflows/infra-generate-ccache.yml
CMakeLists.txt
Container/Dockerfile.Gap9
Container/Dockerfile.deeploy
Container/Dockerfile.toolchain
Container/Makefile
Container/amd64.list
Deeploy/Targets/GAP9/Bindings.py
Deeploy/Targets/GAP9/DMA/L3Dma.py
Deeploy/Targets/GAP9/DMA/MchanDma.py
Deeploy/Targets/GAP9/Deployer.py
Deeploy/Targets/GAP9/Platform.py
Deeploy/Targets/GAP9/Templates/AllocateTemplate.py
Deeploy/Targets/GAP9/Templates/FreeTemplate.py
Deeploy/Targets/GAP9/Templates/__init__.py
Deeploy/Targets/GAP9/Tiler.py
Deeploy/Targets/GAP9/__init__.py
Deeploy/Targets/PULPOpen/Deployer.py
Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py
Deeploy/Targets/PULPOpen/Templates/TransposeTemplate.py
DeeployTest/CMakeLists.txt
DeeployTest/Platforms/GAP9/CMakeLists.txt
DeeployTest/Platforms/GAP9/inc/CycleCounter.h
DeeployTest/Platforms/GAP9/sdk.config
DeeployTest/Platforms/GAP9/src/CycleCounter.c
DeeployTest/Platforms/GAP9/src/deeploytest.c
DeeployTest/testRunner_gap9.py
DeeployTest/testRunner_tiled_gap9.py
DeeployTest/testUtils/codeGenerate.py
DeeployTest/testUtils/platformMapping.py
DeeployTest/testUtils/testRunner.py
GAP9.md
Makefile
README.md
TargetLibraries/GAP9/CMakeLists.txt
TargetLibraries/GAP9/inc/DeeployGAP9Math.h
TargetLibraries/GAP9/inc/DeeployMchan.h
TargetLibraries/GAP9/inc/dory_dma.h
TargetLibraries/GAP9/inc/dory_mem.h
TargetLibraries/GAP9/inc/mchan.h
TargetLibraries/GAP9/src/Util.c
TargetLibraries/GAP9/src/dory_dma.c
TargetLibraries/GAP9/src/dory_mem.c
TargetLibraries/Generic/src/BatchNorm_fp32.c
TargetLibraries/PULPOpen/inc/kernel/Layernorm.h
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
TargetLibraries/PULPOpen/src/GELU.c
TargetLibraries/PULPOpen/src/Layernorm.c
TargetLibraries/PULPOpen/src/Softmax.c
cmake/common.cmake
cmake/gap9/gap9_gvsoc.cmake
cmake/simulation.cmake
toolchain/banshee.patch
toolchain/gap9-sdk.patch

💤 Files with no reviewable changes (1)

cmake/common.cmake

🧰 Additional context used

🧠 Learnings (7)

📚 Learning: 2025-12-02T13:54:22.700Z

Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 69
File: Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py:36-38
Timestamp: 2025-12-02T13:54:22.700Z
Learning: In Deeploy templates (Python files in Deeploy/Targets/PULPOpen/Templates/), always use explicit bitwidth types (e.g., `float${...type.referencedType.typeWidth}_t*`) instead of hardcoded types (e.g., `float*`) to ensure type consistency with templated kernel calls.

Applied to files:

Deeploy/Targets/PULPOpen/Templates/TransposeTemplate.py
Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py

📚 Learning: 2025-09-09T15:43:20.195Z

Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py:120-124
Timestamp: 2025-09-09T15:43:20.195Z
Learning: In GEMMTileConstraint.serializeTilingSolution, transpose flags (transA, transB) must be read from operatorRepresentation and used to adjust NSize calculation and matrix offset/shape calculations, following the pattern in FloatGEMMTileConstraint.

Applied to files:

Deeploy/Targets/PULPOpen/Templates/TransposeTemplate.py

📚 Learning: 2025-09-24T12:17:21.624Z

Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

Deeploy/Targets/PULPOpen/Templates/TransposeTemplate.py

📚 Learning: 2025-09-24T12:49:17.889Z

Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

Deeploy/Targets/PULPOpen/Templates/TransposeTemplate.py
Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
TargetLibraries/PULPOpen/inc/kernel/Layernorm.h

📚 Learning: 2025-09-24T11:43:47.236Z

Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

DeeployTest/testRunner_tiled_gap9.py
.github/workflows/_runner-gap9-tiled.yml
DeeployTest/testRunner_gap9.py

📚 Learning: 2025-09-09T15:58:06.454Z

Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion manually.

Applied to files:

Deeploy/Targets/GAP9/DMA/L3Dma.py
TargetLibraries/GAP9/src/dory_dma.c
Deeploy/Targets/GAP9/DMA/MchanDma.py

📚 Learning: 2025-09-09T15:58:06.454Z

Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion.

Applied to files:

Deeploy/Targets/GAP9/DMA/L3Dma.py
TargetLibraries/GAP9/src/dory_dma.c
Deeploy/Targets/GAP9/DMA/MchanDma.py

🧬 Code graph analysis (13)

DeeployTest/Platforms/GAP9/inc/CycleCounter.h (1)

DeeployTest/Platforms/GAP9/src/CycleCounter.c (4)

ResetTimer (10-13)

StartTimer (15-15)

StopTimer (17-17)

getCycles (19-19)

TargetLibraries/GAP9/inc/dory_dma.h (1)

TargetLibraries/GAP9/src/dory_dma.c (9)

dory_dma_memcpy_hwc_to_chw (38-72)

dory_dma_memcpy_1d_async (74-86)

dory_dma_memcpy_2d_async (88-108)

dory_dma_memcpy_3d_async (110-143)

dory_dma_memcpy_async (145-162)

dory_dma_memcpy_mindims_async (200-208)

dory_dma_free (210-210)

dory_dma_barrier (212-212)

dory_dma_allocate (214-214)

DeeployTest/testUtils/platformMapping.py (2)

Deeploy/Targets/GAP9/Deployer.py (1)

GAP9Deployer (32-102)

Deeploy/Targets/GAP9/Platform.py (3)

GAP9Platform (263-271)

MemoryGAP9Platform (274-292)

MemoryGAP9PlatformWrapper (295-307)

DeeployTest/Platforms/GAP9/src/deeploytest.c (2)

DeeployTest/Platforms/GAP9/src/CycleCounter.c (4)

ResetTimer (10-13)

StartTimer (15-15)

getCycles (19-19)

StopTimer (17-17)

TargetLibraries/GAP9/src/dory_mem.c (3)

mem_init (64-78)

open_fs (52-62)

ram_read (92-94)

TargetLibraries/GAP9/src/dory_dma.c (1)

TargetLibraries/GAP9/inc/mchan.h (5)

mchan_transfer_push_2d (94-106)

mchan_transfer_push_1d (88-92)

mchan_transfer_free (124-124)

mchan_transfer_wait (130-138)

mchan_transfer_get_id (86-86)

Deeploy/Targets/GAP9/Deployer.py (3)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

SignPropDeployer (14-57)

Deeploy/DeeployTypes.py (5)

ConstantBuffer (393-430)

DeploymentPlatform (2377-2420)

TopologyOptimizer (2175-2204)

VariableBuffer (232-360)

outputs (2522-2539)

Deeploy/Targets/PULPOpen/Deployer.py (1)

generateBufferAllocationCode (109-138)

Deeploy/Targets/PULPOpen/Deployer.py (2)

Deeploy/Targets/GAP9/Platform.py (1)

GAP9ClusterEngine (251-260)

Deeploy/Targets/PULPOpen/Platform.py (1)

PULPClusterEngine (246-255)

Deeploy/Targets/GAP9/Templates/AllocateTemplate.py (1)

Deeploy/DeeployTypes.py (1)

NodeTemplate (87-229)

Deeploy/Targets/GAP9/Tiler.py (5)

Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (1)

Conv2DTileConstraint (233-598)

Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (1)

DWConv2DTileConstraint (238-255)

Deeploy/Targets/PULPOpen/TileConstraints/SGDTileConstraint.py (1)

SGDTileConstraint (8-12)

Deeploy/Targets/PULPOpen/TileConstraints/SoftmaxCrossEntropyTileConstraint.py (2)

SoftmaxCrossEntropyTileConstraint (19-109)

SoftmaxCrossEntropyGradTileConstraint (112-115)

Deeploy/TilingExtension/TilerExtension.py (1)

TilingReadyNodeBindings (1027-1035)

Deeploy/Targets/GAP9/Bindings.py (2)

Deeploy/CommonExtensions/DataTypes.py (3)

float32_t (74-78)

int8_t (12-15)

int32_t (24-27)

Deeploy/Targets/GAP9/DMA/MchanDma.py (1)

GAP9MchanDma (27-91)

TargetLibraries/GAP9/inc/dory_mem.h (1)

TargetLibraries/GAP9/src/dory_mem.c (13)

open_fs (52-62)

mem_init (64-78)

get_ram_ptr (80-80)

ram_malloc (82-86)

ram_free (88-90)

ram_read (92-94)

ram_write (96-98)

cl_ram_malloc (100-106)

cl_ram_free (108-112)

cl_ram_read (114-118)

cl_ram_write (120-124)

load_file_to_ram (126-151)

load_file_to_local (153-176)

DeeployTest/testRunner_gap9.py (1)

DeeployTest/testUtils/testRunner.py (6)

TestRunner (285-454)

TestRunnerArgumentParser (110-282)

parse_args (95-107)

parse_args (235-237)

cmake_args (277-282)

run (333-344)

Deeploy/Targets/GAP9/Templates/FreeTemplate.py (1)

Deeploy/DeeployTypes.py (1)

NodeTemplate (87-229)

🪛 Ruff (0.14.10)

DeeployTest/testUtils/testRunner.py

315-315: f-string without any placeholders

Remove extraneous f prefix

(F541)

Deeploy/Targets/GAP9/DMA/L3Dma.py

29-34: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

Deeploy/Targets/GAP9/Templates/__init__.py

5-5: from . import * used; unable to detect undefined names

(F403)

Deeploy/Targets/GAP9/__init__.py

5-5: from . import * used; unable to detect undefined names

(F403)

Deeploy/Targets/GAP9/Deployer.py

54-54: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/GAP9/Platform.py

266-266: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

266-266: Do not perform function call GAP9ClusterEngine in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

276-276: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

281-281: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

281-281: Do not perform function call GAP9ClusterEngine in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

297-297: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

Deeploy/Targets/GAP9/DMA/MchanDma.py

29-38: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

Deeploy/Targets/GAP9/Bindings.py

162-162: zip() without an explicit strict= parameter