Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .cursorrules
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Always respond in 中文
项目分为3部分
1. 前端。python库的接口风格参考pytorch,其他语言如go,java,c,rust等,后续设计完善。
2. 调度器,待设计
3. 执行器,使用c++,cuda,metal,omp simd等,实现不同excuter的算子的前向和反向
3. 执行器,使用c++,cuda,metal,omp simd等,实现不同executor的算子的前向和反向

关于概念
deepx.Tensor仅仅就是一个tensor,不像pytorch的tensor,一个tensor其实包含了自身和梯度2个tensor的数据
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
name: 执行引擎
about:按照给定计算图,负责存储、计算、网络传输的执行
title: '[excuter] '
labels: excuter,
title: '[executor] '
labels: executor,
assignees: ''
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ name: Excuter/cppcommon Build
on:
push:
paths:
- 'excuter/cpp-common/**'
- 'executor/cpp-common/**'
pull_request:
paths:
- 'excuter/cpp-common/**'
- 'executor/cpp-common/**'
env:
HIGHWAY_VERSION: 1.2.0

Expand Down Expand Up @@ -48,7 +48,7 @@ jobs:
uses: actions/cache@v3
with:
path: |
excuter/cpp-common/build
executor/cpp-common/build
~/.ccache
key: ${{ runner.os }}-build-${{ hashFiles('**/CMakeLists.txt') }}
restore-keys: |
Expand All @@ -57,7 +57,7 @@ jobs:
# 构建 cpp-common 库
- name: Build Common Library
run: |
cd excuter/cpp-common
cd executor/cpp-common
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache ..
cmake --build . --config Release -j$(nproc)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ name: Excuter/cuda-linux Build
on:
push:
paths:
- 'excuter/op-mem-cuda/**'
- 'executor/op-mem-cuda/**'
pull_request:
paths:
- 'excuter/op-mem-cuda/**'
- 'executor/op-mem-cuda/**'
env:
CUDA_VERSION: "12.6.0"
CUDA_MAJOR_VERSION: "12"
Expand Down Expand Up @@ -62,7 +62,7 @@ jobs:
cd /workspace && \

# 构建 common 库
cd excuter/cpp-common && \
cd executor/cpp-common && \
mkdir -p build && cd build && \
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -GNinja .. && \
ninja && \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ name: Excuter/ompsimd-linux Build
on:
push:
paths:
- 'excuter/op-mem-ompsimd/**'
- 'executor/op-mem-ompsimd/**'
pull_request:
paths:
- 'excuter/op-mem-ompsimd/**'
- 'executor/op-mem-ompsimd/**'
env:
HIGHWAY_VERSION: 1.2.0

Expand Down Expand Up @@ -48,8 +48,8 @@ jobs:
uses: actions/cache@v3
with:
path: |
excuter/op-mem-ompsimd/build
excuter/cpp-common/build
executor/op-mem-ompsimd/build
executor/cpp-common/build
~/.ccache
key: ${{ runner.os }}-build-${{ hashFiles('**/CMakeLists.txt') }}
restore-keys: |
Expand Down Expand Up @@ -84,15 +84,15 @@ jobs:
# 构建 cpp-common 库
- name: Build Common Library
run: |
cd excuter/cpp-common
cd executor/cpp-common
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache ..
cmake --build . --config Release -j$(nproc)

# 构建执行器
- name: CMake Build
run: |
cd excuter/op-mem-ompsimd
cd executor/op-mem-ompsimd
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache ..
cmake --build . --config Release -j$(nproc)
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ deepx框架的发展,主要包括五大类方向

+ front: 新增模型、module、python类函数等
+ 中间层:包括计算图优化器,插件系统(自动KVcache系统),自动分布式化,栈tensor自动释放,自动Inplace化等操作
+ 新增或修改excuter
+ 新增或修改executor
+ 增加或修改算子,进一步可以分为leaftensorfunc(不可分割的基础算子),fusedtensorfunc(融合算子)
+ 文档丰富:
+ 运维自动化方向
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ deepx的分前中后端,分别是为前端表达侧,编译替换调度层,
python sdk提供接近pytorch的API
也容许其他语言的sdk接入,

+ IR通信调度。不同于pytorch或其他py+bind c++这种单一进程的栈上函数调度执行的方式。deepx各个程序(如front的python sdk,back的计算图编译器优化器、excuter如ompsimd)之间,通过IR实现网络通信调度,需要各自启动对应进程。
+ IR通信调度。不同于pytorch或其他py+bind c++这种单一进程的栈上函数调度执行的方式。deepx各个程序(如front的python sdk,back的计算图编译器优化器、executor如ompsimd)之间,通过IR实现网络通信调度,需要各自启动对应进程。


| 维度 | PyTorch类框架 | DeepX |
Expand All @@ -30,7 +30,7 @@ python sdk提供接近pytorch的API
+ 注册中心:收集当前已就绪的执行器的算子列表,收集算子时耗和空间占用信息
+ 计算图编译器优化器:fusion算子,计算图节点消除,自动生成tensor拆分并行的计算子图并替代原节点
+ 执行调度器:数据并行,流水线并行(前向反向并行),模型并行。
+ front生成基础IR,编译器负责进行fusion成excuter注册的高级算子
+ front生成基础IR,编译器负责进行fusion成executor注册的高级算子


### 执行层
Expand All @@ -44,16 +44,16 @@ Op{args(args_grad),returns(returns_grad)|func run}

Op需要实现run方法

关于excuter,只要能按deepxIR序列执行,并返回结果,就可以接入deepx分布式调度框架,因此,从硬件、指令、加速库、高级框架包括训练、推理引擎,都可以稍作修改,就接入deepx体系。
关于executor,只要能按deepxIR序列执行,并返回结果,就可以接入deepx分布式调度框架,因此,从硬件、指令、加速库、高级框架包括训练、推理引擎,都可以稍作修改,就接入deepx体系。

当前的


#### 默认执行器
+ cpu执行器,已实现ompsimd。其支持的算子列表[ompsimd](doc/excuter/op-mem-ompsimd/list.md)
+ cpu执行器,已实现ompsimd。其支持的算子列表[ompsimd](docs/executor/op-mem-ompsimd/list.md)

#### GPU执行器
+ cuda执行器,其支持的算子列表[cuda](doc/excuter/op-mem-cuda/list.md)
+ cuda执行器,其支持的算子列表[cuda](docs/executor/op-mem-cuda/list.md)

欢迎大家提交cuda代码

Expand Down
1 change: 1 addition & 0 deletions cutlass
Submodule cutlass added at bbe579
4 changes: 0 additions & 4 deletions doc/excuter/deepx.op.drawio.svg

This file was deleted.

4 changes: 0 additions & 4 deletions doc/front/deepx.op.drawio.svg

This file was deleted.

File renamed without changes.
2 changes: 1 addition & 1 deletion doc/README.md → docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ deepx旨在设计一种超大规模自动分布式并行训推一体化的深度
- 使用分级缓存。通过AI芯片的分级高速缓存,减少芯片的数据读写时间
- 支持存算一体的GPU、AI加速卡等,减少数据传输

![架构图](./doc/deepx.jpg)
![架构图](./docs/deepx.jpg)

## 三、框架前端设计

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
14 changes: 7 additions & 7 deletions doc/deepxIR/ir.md → docs/deepxIR/ir.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ newtensor 3 4 5 -> T1

## 函数定义(funcdef)

函数定义由excuter层负责注册实现,用于声明操作的参数和返回值类型。excuter通过注册funcdef来声明其支持的tensorfunc
函数定义由executor层负责注册实现,用于声明操作的参数和返回值类型。executor通过注册funcdef来声明其支持的tensorfunc

因此需要设置参数、返回值的详细类型约束

Expand Down Expand Up @@ -50,7 +50,7 @@ matmul(A,B)->C //id=1 created_at=123456789 sent_at=123456790

对于tensorfunc的类型系统,我们只关心与tensor相关的类型系统

参考 excuter/common/src/deepx/dtype.hpp
参考 executor/common/src/deepx/dtype.hpp

```
{
Expand Down Expand Up @@ -79,7 +79,7 @@ matmul(A,B)->C //id=1 created_at=123456789 sent_at=123456790

## funcdef

excuter 负责定义其支持的tensorfunc
executor 负责定义其支持的tensorfunc

1. 矩阵乘法:
```
Expand All @@ -89,7 +89,7 @@ matmul(Tensor<float32|float64> A, Tensor<float32|float64> B) -> Tensor<float32|f
# funccall
matmul A,B -> C
// rtf(remote tensor func)解析器会自动解析参数和返回值的列表
// excuter会从mem获取A,B,C这3个tensor,并执行matmul操作
// executor会从mem获取A,B,C这3个tensor,并执行matmul操作
```

2. 张量求和:
Expand All @@ -100,9 +100,9 @@ sum(Tensor<any> input, vector<int32> dims,var<bool> keepdim) -> Tensor<any> outp
# funccall
sum(T1,[0 1],true) -> T2
// rtf(remote tensor func)解析器会自动解析参数和返回值的列表
// 其中[0 1]会被解析为vector<int32>,便于excuter执行时使用
// true会被解析为var<bool> keepdim,便于excuter执行时使用
// excuter会从mem获取T1,T2这2个tensor,并执行sum操作
// 其中[0 1]会被解析为vector<int32>,便于executor执行时使用
// true会被解析为var<bool> keepdim,便于executor执行时使用
// executor会从mem获取T1,T2这2个tensor,并执行sum操作
```

3. 创建新张量:
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions doc/design.md → docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ deepIR{
}
```

excuter执行deepxIR的规则
executor执行deepxIR的规则

+ excuter执行deepxIR时,不得修改args中的tensor
+ executor执行deepxIR时,不得修改args中的tensor
+ 但deepIR不限制args和returns中的Param同名,这样可以实现类似inplace的操作


Expand Down
6 changes: 3 additions & 3 deletions doc/excuter/deepx.op.drawio → docs/executor/deepx.op.drawio
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="Az3iGQj9nxM2931kqFIP-2" value="excuter&amp;nbsp;&lt;div&gt;ompsimd&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxCell id="Az3iGQj9nxM2931kqFIP-2" value="executor&amp;nbsp;&lt;div&gt;ompsimd&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="-30" y="100" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Az3iGQj9nxM2931kqFIP-20" value="tensorfunc&amp;lt;T&amp;gt;" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=21;" parent="1" vertex="1">
Expand All @@ -25,7 +25,7 @@
<mxPoint x="-540" y="160" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="Az3iGQj9nxM2931kqFIP-3" value="excuter&amp;nbsp;&lt;div&gt;cuda&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxCell id="Az3iGQj9nxM2931kqFIP-3" value="executor&amp;nbsp;&lt;div&gt;cuda&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="440" y="100" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="ZCyPKg8dckpy5I3t4faN-10" value="TFfactory" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=21;" vertex="1" parent="1">
Expand Down Expand Up @@ -132,7 +132,7 @@
<mxCell id="JBEWLCwWRuB5Uu3qIstv-10" value="" style="rounded=1;whiteSpace=wrap;html=1;arcSize=8;" vertex="1" parent="JBEWLCwWRuB5Uu3qIstv-14">
<mxGeometry x="20" y="30" width="350" height="400" as="geometry" />
</mxCell>
<mxCell id="JBEWLCwWRuB5Uu3qIstv-13" value="process&lt;br&gt;excuter-cpu" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="JBEWLCwWRuB5Uu3qIstv-14">
<mxCell id="JBEWLCwWRuB5Uu3qIstv-13" value="process&lt;br&gt;executor-cpu" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="JBEWLCwWRuB5Uu3qIstv-14">
<mxGeometry width="80" height="80" as="geometry" />
</mxCell>
<mxCell id="JBEWLCwWRuB5Uu3qIstv-1" value="tensorfunc" style="swimlane;childLayout=stackLayout;horizontal=1;startSize=50;horizontalStack=0;rounded=1;fontSize=14;fontStyle=0;strokeWidth=2;resizeParent=0;resizeLast=1;shadow=0;dashed=0;align=center;arcSize=4;whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
Expand Down
4 changes: 4 additions & 0 deletions docs/executor/deepx.op.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
10 changes: 5 additions & 5 deletions doc/excuter/excuter.md → docs/executor/executor.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 如何给excuter添加一个新算子
## 如何给executor添加一个新算子

### 层次结构图

Expand All @@ -14,7 +14,7 @@

#### Op

Op是excuter的算子,是excuter的执行单元
Op是executor的算子,是executor的执行单元

在程序中,Op是基类,不同的Op有不同的实现,比如Add, Mul, MatMul等。
每个Op都需要override forward和backward函数
Expand All @@ -31,7 +31,7 @@ Matmul会选择选择一个默认的实现
git clone https://github.com/deepx-org/deepx.git

#### 1.cpu执行器
cd deepx/excuter/op-mem-ompsimd
cd deepx/executor/op-mem-ompsimd

需要提前安装好依赖
+ highway需要源码安装
Expand All @@ -43,7 +43,7 @@ make build && cd build && cmake .. && make


#### 2.cuda执行器
cd deepx/excuter/op-mem-cuda
cd deepx/executor/op-mem-cuda

需要提前安装好依赖
+ cuda
Expand All @@ -60,7 +60,7 @@ todo

#### 4.front对接测试

1.先启动excuter可执行文件, 位于excuter/op-mem-{cuda/ompsimd}/build,可执行文件名同excuter名
1.先启动executor可执行文件, 位于executor/op-mem-{cuda/ompsimd}/build,可执行文件名同executor名
2.然后测试front中py的对应算子脚本(front/py/examples 目录)

可以按照顺序,以此测试
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ mix precision 是一种混合精度训练方法,它使用 16 位浮点数和 8

在深度学习中,模型通常使用 32 位浮点数进行训练,这样可以确保模型的精度。但是,32 位浮点数占用的显存较大,计算时间较长。因此,为了减少显存占用和计算时间,可以使用 mix precision 训练方法。

## 3. 关于excuter的mix precision的实现
## 3. 关于executor的mix precision的实现

如:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## op-mem-cuda 支持算子列表

本页面由 `excuter/op-mem-cuda 生成,请勿手动修改
本页面由 `executor/op-mem-cuda 生成,请勿手动修改

### matmul

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## excuter
## executor

### op-mem-ompsimd

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## op-mem-ompsimd 支持算子列表

本页面由 `excuter/op-mem-ompsimd 生成,请勿手动修改
本页面由 `executor/op-mem-ompsimd 生成,请勿手动修改

### matmul

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## excuter
## executor

### op-mem-ompsimd

Expand All @@ -8,9 +8,9 @@ range函数是shape类中的一个函数,用于根据shape对tensor进行omp

定义和实现分别在:

excuter/common/src/deepx/shape.hpp
executor/common/src/deepx/shape.hpp

excuter/common/src/deepx/shape_range.cpp
executor/common/src/deepx/shape_range.cpp

| func | omp并行 | omp线程local局部对象 | 调用场景 |
| ---- | ---- | ------ | ---------- |
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes
4 changes: 4 additions & 0 deletions docs/front/deepx.op.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
8 changes: 4 additions & 4 deletions doc/index.rst → docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ DeepX 原生分布式并行的深度学习训练推理一体框架
:caption: doc 文档

front/py/deepx/about
excuter/op-mem-ompsimd/list
executor/op-mem-ompsimd/list
deepxIR/ir

.. toctree::
Expand All @@ -27,9 +27,9 @@ DeepX 原生分布式并行的深度学习训练推理一体框架

front/py/contribute
scheduler/scheduler
excuter/excuter
excuter/op-mem-ompsimd/contribute
excuter/op-mem-ompsimd/range
executor/executor
executor/op-mem-ompsimd/contribute
executor/op-mem-ompsimd/range

索引和搜索
==========
Expand Down
6 changes: 3 additions & 3 deletions doc/language.md → docs/language.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## c++:计算执行器(excuter)
## c++:计算执行器(executor)

负责实现tensor的具体计算过程,对接硬件如GPU、CPU的simd指令

Expand Down Expand Up @@ -33,9 +33,9 @@ deepxctl:提供对deepx体系的所有工具、库、模型、镜像的统一纳
## deepxIR
虽然deepxIR不是独立的编程语言,但是deepx体系的程序格式标准

excuter所执行的内容,就是deepxir的序列或deepxir计算图
executor所执行的内容,就是deepxir的序列或deepxir计算图

https://github.com/array2d/deepx/blob/main/doc/excuter/op-mem-cuda/list.md
https://github.com/array2d/deepx/blob/main/docs/executor/op-mem-cuda/list.md

deepxir分为4类

Expand Down
Loading
Loading