array2d · miaobyte · Jul 10, 2025 · Jul 4, 2025 · Jul 5, 2025 · Jul 5, 2025
diff --git a/.cursorrules b/.cursorrules
@@ -6,7 +6,7 @@ Always respond in 中文
 项目分为3部分
 1. 前端。python库的接口风格参考pytorch，其他语言如go,java,c,rust等，后续设计完善。
 2. 调度器，待设计
-3. 执行器，使用c++,cuda,metal,omp simd等,实现不同excuter的算子的前向和反向
+3. 执行器，使用c++,cuda,metal,omp simd等,实现不同executor的算子的前向和反向
 
 关于概念
 deepx.Tensor仅仅就是一个tensor，不像pytorch的tensor，一个tensor其实包含了自身和梯度2个tensor的数据

diff --git a/.github/ISSUE_TEMPLATE/excuter.md → .github/ISSUE_TEMPLATE/executor.md b/.github/ISSUE_TEMPLATE/excuter.md → .github/ISSUE_TEMPLATE/executor.md
@@ -1,8 +1,8 @@
 ---
 name: 执行引擎
 about:按照给定计算图，负责存储、计算、网络传输的执行
-title: '[excuter] '
-labels: excuter,
+title: '[executor] '
+labels: executor,
 assignees: ''
 ---
 

diff --git a/.github/workflows/excuter-cppcommon.yml → .github/workflows/executor-cppcommon.yml b/.github/workflows/excuter-cppcommon.yml → .github/workflows/executor-cppcommon.yml
@@ -2,10 +2,10 @@ name: Excuter/cppcommon Build
 on:
   push:
     paths:
-      - 'excuter/cpp-common/**'
+      - 'executor/cpp-common/**'
   pull_request:
     paths:
-      - 'excuter/cpp-common/**'
+      - 'executor/cpp-common/**'
 env:
   HIGHWAY_VERSION: 1.2.0
 
@@ -48,7 +48,7 @@ jobs:
       uses: actions/cache@v3
       with:
         path: |
-          excuter/cpp-common/build
+          executor/cpp-common/build
           ~/.ccache
         key: ${{ runner.os }}-build-${{ hashFiles('**/CMakeLists.txt') }}
         restore-keys: |
@@ -57,7 +57,7 @@ jobs:
     # 构建 cpp-common 库
     - name: Build Common Library
       run: |
-        cd excuter/cpp-common
+        cd executor/cpp-common
         mkdir -p build && cd build
         cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache ..
         cmake --build . --config Release -j$(nproc)

diff --git a/.github/workflows/excuter-cuda-linux.yml → .github/workflows/executor-cuda-linux.yml b/.github/workflows/excuter-cuda-linux.yml → .github/workflows/executor-cuda-linux.yml
@@ -2,10 +2,10 @@ name: Excuter/cuda-linux Build
 on:
   push:
     paths:
-      - 'excuter/op-mem-cuda/**'
+      - 'executor/op-mem-cuda/**'
   pull_request:
     paths:
-      - 'excuter/op-mem-cuda/**'
+      - 'executor/op-mem-cuda/**'
 env:
   CUDA_VERSION: "12.6.0"
   CUDA_MAJOR_VERSION: "12"
@@ -62,7 +62,7 @@ jobs:
             cd /workspace && \
 
             # 构建 common 库
-            cd excuter/cpp-common && \
+            cd executor/cpp-common && \
             mkdir -p build && cd build && \
             cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -GNinja .. && \
             ninja && \

diff --git a/.github/workflows/excuter-ompsimd-linux.yml → .github/workflows/executor-ompsimd-linux.yml b/.github/workflows/excuter-ompsimd-linux.yml → .github/workflows/executor-ompsimd-linux.yml
@@ -2,10 +2,10 @@ name: Excuter/ompsimd-linux Build
 on:
   push:
     paths:
-      - 'excuter/op-mem-ompsimd/**'
+      - 'executor/op-mem-ompsimd/**'
   pull_request:
     paths:
-      - 'excuter/op-mem-ompsimd/**'
+      - 'executor/op-mem-ompsimd/**'
 env:
   HIGHWAY_VERSION: 1.2.0
 
@@ -48,8 +48,8 @@ jobs:
       uses: actions/cache@v3
       with:
         path: |
-          excuter/op-mem-ompsimd/build
-          excuter/cpp-common/build
+          executor/op-mem-ompsimd/build
+          executor/cpp-common/build
           ~/.ccache
         key: ${{ runner.os }}-build-${{ hashFiles('**/CMakeLists.txt') }}
         restore-keys: |
@@ -84,15 +84,15 @@ jobs:
     # 构建 cpp-common 库
     - name: Build Common Library
       run: |
-        cd excuter/cpp-common
+        cd executor/cpp-common
         mkdir -p build && cd build
         cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache ..
         cmake --build . --config Release -j$(nproc)
 
     # 构建执行器
     - name: CMake Build
       run: |
-        cd excuter/op-mem-ompsimd
+        cd executor/op-mem-ompsimd
         mkdir -p build && cd build
         cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache ..
         cmake --build . --config Release -j$(nproc)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -4,7 +4,7 @@ deepx框架的发展，主要包括五大类方向
 
 + front: 新增模型、module、python类函数等
 + 中间层：包括计算图优化器，插件系统(自动KVcache系统)，自动分布式化，栈tensor自动释放，自动Inplace化等操作
-+ 新增或修改excuter
++ 新增或修改executor
 + 增加或修改算子，进一步可以分为leaftensorfunc(不可分割的基础算子)，fusedtensorfunc（融合算子）
 + 文档丰富：
 + 运维自动化方向

diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ deepx的分前中后端，分别是为前端表达侧，编译替换调度层，
 python sdk提供接近pytorch的API
 也容许其他语言的sdk接入，
 
-+ IR通信调度。不同于pytorch或其他py+bind c++这种单一进程的栈上函数调度执行的方式。deepx各个程序（如front的python sdk，back的计算图编译器优化器、excuter如ompsimd）之间，通过IR实现网络通信调度，需要各自启动对应进程。
++ IR通信调度。不同于pytorch或其他py+bind c++这种单一进程的栈上函数调度执行的方式。deepx各个程序（如front的python sdk，back的计算图编译器优化器、executor如ompsimd）之间，通过IR实现网络通信调度，需要各自启动对应进程。
 
 
 | 维度         | PyTorch类框架          | DeepX                   |
@@ -30,7 +30,7 @@ python sdk提供接近pytorch的API
 + 注册中心:收集当前已就绪的执行器的算子列表,收集算子时耗和空间占用信息
 + 计算图编译器优化器:fusion算子，计算图节点消除,自动生成tensor拆分并行的计算子图并替代原节点
 + 执行调度器：数据并行，流水线并行(前向反向并行)，模型并行。
-+ front生成基础IR，编译器负责进行fusion成excuter注册的高级算子。
++ front生成基础IR，编译器负责进行fusion成executor注册的高级算子。
 
 
 ### 执行层
@@ -44,16 +44,16 @@ Op{args(args_grad),returns(returns_grad)|func run}
 
 Op需要实现run方法
 
-关于excuter，只要能按deepxIR序列执行，并返回结果，就可以接入deepx分布式调度框架，因此，从硬件、指令、加速库、高级框架包括训练、推理引擎，都可以稍作修改，就接入deepx体系。
+关于executor，只要能按deepxIR序列执行，并返回结果，就可以接入deepx分布式调度框架，因此，从硬件、指令、加速库、高级框架包括训练、推理引擎，都可以稍作修改，就接入deepx体系。
 
 当前的
 
 
 #### 默认执行器
-+ cpu执行器,已实现ompsimd。其支持的算子列表[ompsimd](doc/excuter/op-mem-ompsimd/list.md)
++ cpu执行器,已实现ompsimd。其支持的算子列表[ompsimd](docs/executor/op-mem-ompsimd/list.md)
 
 #### GPU执行器
-+ cuda执行器，其支持的算子列表[cuda](doc/excuter/op-mem-cuda/list.md)
++ cuda执行器，其支持的算子列表[cuda](docs/executor/op-mem-cuda/list.md)
 
 欢迎大家提交cuda代码
 

diff --git a/cutlass b/cutlass
diff --git a/doc/excuter/deepx.op.drawio.svg b/doc/excuter/deepx.op.drawio.svg
diff --git a/doc/front/deepx.op.drawio.svg b/doc/front/deepx.op.drawio.svg
diff --git a/doc/.gitignore → docs/.gitignore b/doc/.gitignore → docs/.gitignore
diff --git a/doc/README.md → docs/README.md b/doc/README.md → docs/README.md
@@ -53,7 +53,7 @@ deepx旨在设计一种超大规模自动分布式并行训推一体化的深度
 - 使用分级缓存。通过AI芯片的分级高速缓存，减少芯片的数据读写时间
 - 支持存算一体的GPU、AI加速卡等，减少数据传输
 
-![架构图](./doc/deepx.jpg)
+![架构图](./docs/deepx.jpg)
 
 ## 三、框架前端设计
 

diff --git a/doc/benchmark/broadcast.md → docs/benchmark/broadcast.md b/doc/benchmark/broadcast.md → docs/benchmark/broadcast.md
diff --git a/doc/benchmark/matmul.md → docs/benchmark/matmul.md b/doc/benchmark/matmul.md → docs/benchmark/matmul.md
diff --git a/doc/benchmark/reduce.md → docs/benchmark/reduce.md b/doc/benchmark/reduce.md → docs/benchmark/reduce.md
diff --git a/doc/conf.py → docs/conf.py b/doc/conf.py → docs/conf.py
diff --git a/doc/deepxIR/ir.md → docs/deepxIR/ir.md b/doc/deepxIR/ir.md → docs/deepxIR/ir.md
@@ -21,7 +21,7 @@ newtensor 3 4 5 -> T1
 
 ## 函数定义(funcdef)
 
-函数定义由excuter层负责注册实现,用于声明操作的参数和返回值类型。excuter通过注册funcdef来声明其支持的tensorfunc。
+函数定义由executor层负责注册实现,用于声明操作的参数和返回值类型。executor通过注册funcdef来声明其支持的tensorfunc。
 
 因此需要设置参数、返回值的详细类型约束
 
@@ -50,7 +50,7 @@ matmul(A,B)->C //id=1 created_at=123456789 sent_at=123456790
 
 对于tensorfunc的类型系统，我们只关心与tensor相关的类型系统
 
-参考 excuter/common/src/deepx/dtype.hpp
+参考 executor/common/src/deepx/dtype.hpp
 
 ```
 {
@@ -79,7 +79,7 @@ matmul(A,B)->C //id=1 created_at=123456789 sent_at=123456790
 
 ## funcdef
 
-excuter 负责定义其支持的tensorfunc
+executor 负责定义其支持的tensorfunc
 
 1. 矩阵乘法:
 ```
@@ -89,7 +89,7 @@ matmul(Tensor<float32|float64> A, Tensor<float32|float64> B) -> Tensor<float32|f
 # funccall  
 matmul A,B -> C
 // rtf(remote tensor func)解析器会自动解析参数和返回值的列表
-// excuter会从mem获取A，B，C这3个tensor，并执行matmul操作
+// executor会从mem获取A，B，C这3个tensor，并执行matmul操作
 ```
 
 2. 张量求和:
@@ -100,9 +100,9 @@ sum(Tensor<any> input, vector<int32> dims,var<bool> keepdim) -> Tensor<any> outp
 # funccall
 sum(T1,[0 1],true) -> T2
 // rtf(remote tensor func)解析器会自动解析参数和返回值的列表
-// 其中[0 1]会被解析为vector<int32>，便于excuter执行时使用
-// true会被解析为var<bool> keepdim，便于excuter执行时使用
-// excuter会从mem获取T1，T2这2个tensor，并执行sum操作
+// 其中[0 1]会被解析为vector<int32>，便于executor执行时使用
+// true会被解析为var<bool> keepdim，便于executor执行时使用
+// executor会从mem获取T1，T2这2个tensor，并执行sum操作
 ```
 
 3. 创建新张量:

diff --git a/doc/deepxIR/readme.md → docs/deepxIR/readme.md b/doc/deepxIR/readme.md → docs/deepxIR/readme.md
diff --git a/doc/design.md → docs/design.md b/doc/design.md → docs/design.md
@@ -15,9 +15,9 @@ deepIR{
 }
 ```
 
-excuter执行deepxIR的规则
+executor执行deepxIR的规则
 
-+ excuter执行deepxIR时，不得修改args中的tensor
++ executor执行deepxIR时，不得修改args中的tensor
 + 但deepIR不限制args和returns中的Param同名，这样可以实现类似inplace的操作
 
 

diff --git a/doc/excuter/deepx.op.drawio → docs/executor/deepx.op.drawio b/doc/excuter/deepx.op.drawio → docs/executor/deepx.op.drawio
@@ -4,7 +4,7 @@
       <root>
         <mxCell id="0" />
         <mxCell id="1" parent="0" />
-        <mxCell id="Az3iGQj9nxM2931kqFIP-2" value="excuter&amp;nbsp;&lt;div&gt;ompsimd&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
+        <mxCell id="Az3iGQj9nxM2931kqFIP-2" value="executor&amp;nbsp;&lt;div&gt;ompsimd&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
           <mxGeometry x="-30" y="100" width="120" height="60" as="geometry" />
         </mxCell>
         <mxCell id="Az3iGQj9nxM2931kqFIP-20" value="tensorfunc&amp;lt;T&amp;gt;" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=21;" parent="1" vertex="1">
@@ -25,7 +25,7 @@
             <mxPoint x="-540" y="160" as="targetPoint" />
           </mxGeometry>
         </mxCell>
-        <mxCell id="Az3iGQj9nxM2931kqFIP-3" value="excuter&amp;nbsp;&lt;div&gt;cuda&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
+        <mxCell id="Az3iGQj9nxM2931kqFIP-3" value="executor&amp;nbsp;&lt;div&gt;cuda&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
           <mxGeometry x="440" y="100" width="120" height="60" as="geometry" />
         </mxCell>
         <mxCell id="ZCyPKg8dckpy5I3t4faN-10" value="TFfactory" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=21;" vertex="1" parent="1">
@@ -132,7 +132,7 @@
         <mxCell id="JBEWLCwWRuB5Uu3qIstv-10" value="" style="rounded=1;whiteSpace=wrap;html=1;arcSize=8;" vertex="1" parent="JBEWLCwWRuB5Uu3qIstv-14">
           <mxGeometry x="20" y="30" width="350" height="400" as="geometry" />
         </mxCell>
-        <mxCell id="JBEWLCwWRuB5Uu3qIstv-13" value="process&lt;br&gt;excuter-cpu" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="JBEWLCwWRuB5Uu3qIstv-14">
+        <mxCell id="JBEWLCwWRuB5Uu3qIstv-13" value="process&lt;br&gt;executor-cpu" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="JBEWLCwWRuB5Uu3qIstv-14">
           <mxGeometry width="80" height="80" as="geometry" />
         </mxCell>
         <mxCell id="JBEWLCwWRuB5Uu3qIstv-1" value="tensorfunc" style="swimlane;childLayout=stackLayout;horizontal=1;startSize=50;horizontalStack=0;rounded=1;fontSize=14;fontStyle=0;strokeWidth=2;resizeParent=0;resizeLast=1;shadow=0;dashed=0;align=center;arcSize=4;whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">

diff --git a/docs/executor/deepx.op.drawio.svg b/docs/executor/deepx.op.drawio.svg
diff --git a/doc/excuter/deepx.op.jpg → docs/executor/deepx.op.jpg b/doc/excuter/deepx.op.jpg → docs/executor/deepx.op.jpg
diff --git a/doc/excuter/excuter.md → docs/executor/executor.md b/doc/excuter/excuter.md → docs/executor/executor.md
@@ -1,4 +1,4 @@
-## 如何给excuter添加一个新算子
+## 如何给executor添加一个新算子
 
 ### 层次结构图
 
@@ -14,7 +14,7 @@
 
 #### Op
 
-Op是excuter的算子，是excuter的执行单元
+Op是executor的算子，是executor的执行单元
 
 在程序中，Op是基类，不同的Op有不同的实现，比如Add, Mul, MatMul等。
 每个Op都需要override forward和backward函数
@@ -31,7 +31,7 @@ Matmul会选择选择一个默认的实现
 git clone https://github.com/deepx-org/deepx.git
 
 #### 1.cpu执行器
-cd deepx/excuter/op-mem-ompsimd
+cd deepx/executor/op-mem-ompsimd
 
 需要提前安装好依赖
 + highway需要源码安装
@@ -43,7 +43,7 @@ make build && cd build && cmake .. && make
 
 
 #### 2.cuda执行器
-cd deepx/excuter/op-mem-cuda
+cd deepx/executor/op-mem-cuda
 
 需要提前安装好依赖
 + cuda
@@ -60,7 +60,7 @@ todo
 
 #### 4.front对接测试
 
-1.先启动excuter可执行文件， 位于excuter/op-mem-{cuda/ompsimd}/build，可执行文件名同excuter名
+1.先启动executor可执行文件， 位于executor/op-mem-{cuda/ompsimd}/build，可执行文件名同executor名
 2.然后测试front中py的对应算子脚本（front/py/examples 目录）
 
 可以按照顺序，以此测试

diff --git a/doc/excuter/mix_precision.md → docs/executor/mix_precision.md b/doc/excuter/mix_precision.md → docs/executor/mix_precision.md
@@ -8,7 +8,7 @@ mix precision 是一种混合精度训练方法，它使用 16 位浮点数和 8
 
 在深度学习中，模型通常使用 32 位浮点数进行训练，这样可以确保模型的精度。但是，32 位浮点数占用的显存较大，计算时间较长。因此，为了减少显存占用和计算时间，可以使用 mix precision 训练方法。
 
-## 3. 关于excuter的mix precision的实现
+## 3. 关于executor的mix precision的实现
 
 如：
 

diff --git a/doc/excuter/op-mem-cuda/cublas/api.md → docs/executor/op-mem-cuda/cublas/api.md b/doc/excuter/op-mem-cuda/cublas/api.md → docs/executor/op-mem-cuda/cublas/api.md
diff --git a/doc/excuter/op-mem-cuda/cublaslt/api.md → docs/executor/op-mem-cuda/cublaslt/api.md b/doc/excuter/op-mem-cuda/cublaslt/api.md → docs/executor/op-mem-cuda/cublaslt/api.md
diff --git a/doc/excuter/op-mem-cuda/list.md → docs/executor/op-mem-cuda/list.md b/doc/excuter/op-mem-cuda/list.md → docs/executor/op-mem-cuda/list.md
@@ -1,6 +1,6 @@
 ## op-mem-cuda 支持算子列表 
 
-本页面由 `excuter/op-mem-cuda 生成，请勿手动修改 
+本页面由 `executor/op-mem-cuda 生成，请勿手动修改 
 
 ### matmul
 

diff --git a/doc/excuter/op-mem-ompsimd/contribute.md → docs/executor/op-mem-ompsimd/contribute.md b/doc/excuter/op-mem-ompsimd/contribute.md → docs/executor/op-mem-ompsimd/contribute.md
@@ -1,4 +1,4 @@
-## excuter
+## executor
 
 ### op-mem-ompsimd
 

diff --git a/doc/excuter/op-mem-ompsimd/list.md → docs/executor/op-mem-ompsimd/list.md b/doc/excuter/op-mem-ompsimd/list.md → docs/executor/op-mem-ompsimd/list.md
@@ -1,6 +1,6 @@
 ## op-mem-ompsimd 支持算子列表 
 
-本页面由 `excuter/op-mem-ompsimd 生成，请勿手动修改 
+本页面由 `executor/op-mem-ompsimd 生成，请勿手动修改 
 
 ### matmul
 

diff --git a/doc/excuter/op-mem-ompsimd/range.md → docs/executor/op-mem-ompsimd/range.md b/doc/excuter/op-mem-ompsimd/range.md → docs/executor/op-mem-ompsimd/range.md
@@ -1,4 +1,4 @@
-## excuter
+## executor
 
 ### op-mem-ompsimd
 
@@ -8,9 +8,9 @@ range函数是shape类中的一个函数，用于根据shape对tensor进行omp
 
 定义和实现分别在：
 
-excuter/common/src/deepx/shape.hpp
+executor/common/src/deepx/shape.hpp
 
-excuter/common/src/deepx/shape_range.cpp
+executor/common/src/deepx/shape_range.cpp
 
 | func | omp并行 | omp线程local局部对象 | 调用场景   |
 | ---- | ---- | ------ | ---------- |

diff --git a/doc/excuter/welcome.md → docs/executor/welcome.md b/doc/excuter/welcome.md → docs/executor/welcome.md
diff --git a/doc/front/aboutop.md → docs/front/aboutop.md b/doc/front/aboutop.md → docs/front/aboutop.md
diff --git a/doc/front/deepx.jpg → docs/front/deepx.jpg b/doc/front/deepx.jpg → docs/front/deepx.jpg
diff --git a/docs/front/deepx.op.drawio.svg b/docs/front/deepx.op.drawio.svg
diff --git a/doc/front/deepxpy.drawio.svg → docs/front/deepxpy.drawio.svg b/doc/front/deepxpy.drawio.svg → docs/front/deepxpy.drawio.svg
diff --git a/doc/front/front.md → docs/front/front.md b/doc/front/front.md → docs/front/front.md
diff --git a/doc/front/graph.md → docs/front/graph.md b/doc/front/graph.md → docs/front/graph.md
diff --git a/doc/front/node.md → docs/front/node.md b/doc/front/node.md → docs/front/node.md
diff --git a/doc/front/op.md → docs/front/op.md b/doc/front/op.md → docs/front/op.md
diff --git a/doc/front/py/about.md → docs/front/py/about.md b/doc/front/py/about.md → docs/front/py/about.md
diff --git a/doc/front/py/contribute.md → docs/front/py/contribute.md b/doc/front/py/contribute.md → docs/front/py/contribute.md
diff --git a/doc/front/py/deepx.rst → docs/front/py/deepx.rst b/doc/front/py/deepx.rst → docs/front/py/deepx.rst
diff --git a/doc/highway.md → docs/highway.md b/doc/highway.md → docs/highway.md
diff --git a/doc/index.rst → docs/index.rst b/doc/index.rst → docs/index.rst
@@ -18,7 +18,7 @@ DeepX 原生分布式并行的深度学习训练推理一体框架
    :caption: doc 文档
 
    front/py/deepx/about
-   excuter/op-mem-ompsimd/list
+   executor/op-mem-ompsimd/list
    deepxIR/ir
 
 .. toctree::
@@ -27,9 +27,9 @@ DeepX 原生分布式并行的深度学习训练推理一体框架
 
    front/py/contribute
    scheduler/scheduler
-   excuter/excuter
-   excuter/op-mem-ompsimd/contribute
-   excuter/op-mem-ompsimd/range
+   executor/executor
+   executor/op-mem-ompsimd/contribute
+   executor/op-mem-ompsimd/range
 
 索引和搜索
 ==========

diff --git a/doc/language.md → docs/language.md b/doc/language.md → docs/language.md
@@ -1,4 +1,4 @@
-## c++:计算执行器(excuter)
+## c++:计算执行器(executor)
 
 负责实现tensor的具体计算过程,对接硬件如GPU、CPU的simd指令
 
@@ -33,9 +33,9 @@ deepxctl:提供对deepx体系的所有工具、库、模型、镜像的统一纳
 ## deepxIR
 虽然deepxIR不是独立的编程语言，但是deepx体系的程序格式标准
 
-excuter所执行的内容，就是deepxir的序列或deepxir计算图
+executor所执行的内容，就是deepxir的序列或deepxir计算图
 
-https://github.com/array2d/deepx/blob/main/doc/excuter/op-mem-cuda/list.md
+https://github.com/array2d/deepx/blob/main/docs/executor/op-mem-cuda/list.md
 
 deepxir分为4类
-Original file line number
+Diff line change
@@ Expand Up @@
     在深度学习中，模型通常使用 32 位浮点数进行训练，这样可以确保模型的精度。但是，32 位浮点数占用的显存较大，计算时间较长。因此，为了减少显存占用和计算时间，可以使用 mix precision 训练方法。
-    ## 3. 关于excuter的mix precision的实现
+    ## 3. 关于executor的mix precision的实现
     如：
@@ Expand Down @@