diff --git a/.gitignore b/.gitignore index 5563689..65d1591 100644 --- a/.gitignore +++ b/.gitignore @@ -27,3 +27,5 @@ yolov4_training/yolov4.conv.137 yolov4_training/build_docker.sh yolov4_training/dockerfile_tmp yolov4_training/yolov4.conv.137 +det-demo-tmi/voc_dog +site/ diff --git a/.readthedocs.yaml b/.readthedocs.yaml new file mode 100644 index 0000000..e2645f9 --- /dev/null +++ b/.readthedocs.yaml @@ -0,0 +1,20 @@ +# .readthedocs.yaml +# Read the Docs configuration file +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +# Required +version: 2 + +# Set the version of Python and other tools you might need +build: + os: ubuntu-22.04 + tools: + python: "3.11" + +mkdocs: + configuration: mkdocs.yml + +# Optionally declare the Python requirements required to build your docs +python: + install: + - requirements: docs/requirements.txt diff --git a/README.MD b/README.MD deleted file mode 100644 index bcba683..0000000 --- a/README.MD +++ /dev/null @@ -1,105 +0,0 @@ -# ymir-executor 使用文档 - -## det-yolov4-training - -- yolov4的训练镜像,采用mxnet与darknet框架,默认cuda版本为`10.1`,无法直接在高版本显卡如GTX3080/GTX3090上运行,需要修改dockerfile将cuda版本提升为11.1以上,并修改其它依赖。 - -## det-yolov4-mining - -- yolov4挖掘与推理镜像,与det-yolov4-training对应 - -## det-yolov5-tmi - -- yolov5训练、挖掘及推理镜像,训练时会从github上下载权重 - -- yolov5-FAQ - - - 镜像训练时权重下载出错或慢:提前将权重下载好并复制到镜像`/app`目录下或通过ymir导入预训练模型,在训练时进行加载。 - -## live-code-executor - -- 可以通过`git_url`, `git_branch`从网上clone代码到镜像并运行 - -- 参考 [live-code](https://github.com/IndustryEssentials/ymir-remote-git) - -## det-mmdetection-tmi - -- mmdetection 训练、挖掘及推理镜像,目前还没开发完 - - -## 如何制作自己的ymir-executor - -- [ymir-executor 制作指南](https://github.com/IndustryEssentials/ymir/blob/dev/docs/ymir-dataset-zh-CN.md) - -## 如何导入预训练模型 - -- [如何导入外部模型](https://github.com/IndustryEssentials/ymir/blob/dev/docs/import-extra-models.md) - - - 通过ymir网页端的 `模型管理/模型列表/导入模型` 同样可以导入模型 - ---- - -# FAQ - -- apt 或 pip 安装慢或出错 - - - 采用国内源,如在docker file 中添加如下命令 - - ``` - RUN sed -i 's/archive.ubuntu.com/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list - - RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple - ``` - -- docker build 的时候出错,找不到相应docker file或`COPY/ADD`时出错 - - - 回到项目根目录或docker file对应根目录,确保docker file 中`COPY/ADD`的文件与文件夹能够访问,以yolov5为例. - - ``` - cd ymir-executor - - docker build -t ymir-executor/yolov5 . -f det-yolov5-tmi/cuda111.dockerfile - ``` - -- 镜像运行完`/in`与`/out`目录中的文件被清理 - - - ymir系统为节省空间,会在任务`成功结束`后删除其中不必要的文件,如果不想删除,可以在部署ymir时,修改文件`ymir/command/mir/tools/command_run_in_out.py`,注释其中的`_cleanup(work_dir=work_dir)`。注意需要重新构建后端镜像 - - ``` - cd ymir - docker build -t industryessentials/ymir-backend --build-arg PIP_SOURCE=https://pypi.mirrors.ustc.edu.cn/simple --build-arg SERVER_MODE='dev' -f Dockerfile.backend . - - docker-compose down -v && docker-compose up -d - ``` - -- 训练镜像如何调试 - - - 先通过失败任务的tensorboard链接拿到任务id,如`t000000100000175245d1656933456` - - - 进入ymir部署目录 `ymir-workplace/sandbox/work_dir/TaskTypeTraining/t000000100000175245d1656933456/sub_task/t000000100000175245d1656933456`, `ls` 可以看到以下结果 - - ``` - # ls - in out task_config.yaml - ``` - - - 挂载目录并运行镜像``,注意需要将ymir部署目录挂载到镜像中 - - ``` - docker run -it --gpus all -v $PWD/in:/in -v $PWD/out:/out -v : bash - - # 以/home/ymir/ymir-workplace作为ymir部署目录为例 - docker run -it --gpus all -v $PWD/in:/in -v $PWD/out:/out -v /home/ymir/ymir-workplace:/home/ymir/ymir-workplace bash - ``` - - - 推理与挖掘镜像调试同理,注意对应目录均为`ymir-workplace/sandbox/work_dir/TaskTypeMining` - -- 模型精度/速度如何权衡与提升 - - - 模型精度与数据集大小、数据集质量、学习率、batch size、 迭代次数、模型结构、数据增强方式、损失函数等相关,在此不做展开,详情参考: - - - [Object Detection in 20 Years: A Survey](https://arxiv.org/abs/1905.05055) - - - [Paper with Code: Object Detection](https://paperswithcode.com/task/object-detection) - - - [awesome object detection](https://github.com/amusi/awesome-object-detection) diff --git a/README.md b/README.md new file mode 100644 index 0000000..5032783 --- /dev/null +++ b/README.md @@ -0,0 +1,80 @@ +# ymir-executor documentation [English](./README.md) | [简体中文](./README_zh-CN.md) + +- 🏠 [ymir](https://github.com/IndustryEssentials/ymir) + +- 📺 [video tutorial](https://b23.tv/KS5b5oF) + +- 👨‍👩‍👧‍👧 [Image Community](http://pubimg.vesionbook.com:8110/img) search and share open source. + +- 📘 [Documence](https://ymir-executor-fork.readthedocs.io/zh/latest/#) + +## overview + +| docker image | [finetune](https://github.com/modelai/ymir-executor-fork/wiki/use-yolov5-to-finetune-or-training-model) | tensorboard | args/cfg options | framework | onnx | pretrained weights | +| - | - | - | - | - | - | - | +| yolov4 | ? | ✔️ | ❌ | darknet + mxnet | ❌ | local | +| yolov5 | ✔️ | ✔️ | ✔️ | pytorch | ✔️ | local+online | +| yolov7 | ✔️ | ✔️ | ✔️ | pytorch | ❌ | local+online | +| mmdetection | ✔️ | ✔️ | ✔️ | pytorch | ❌ | local+online | +| detectron2 | ✔️ | ✔️ | ✔️ | pytorch | ❌ | online | +| vidt | ? | ✔️ | ✔️ | pytorch | ❌ | online | +| nanodet | ✔️ | ✔️ | ❌ | pytorch_lightning | ❌ | local+online | + +- `online` pretrained weights may download through network + +- `local` pretrained weights have copied to docker images when building image + +### benchmark + +- training dataset: voc2012-train 5717 images +- validation dataset: voc2012-val 5823 images +- image size: 640 + +gpu: single Tesla P4 + +| docker image | batch size | epoch number | model | voc2012 val map50 | training time | note | +| - | - | - | - | - | - | - | +| yolov5 | 16 | 100 | yolov5s | 70.05% | 9h | coco-pretrained | +| vidt | 2 | 100 | swin-nano | 54.13% | 2d | imagenet-pretrained | +| yolov4 | 4 | 20000 steps | yolov4 | 66.18% | 2d | imagenet-pretrained | +| yolov7 | 16 | 100 | yolov7-tiny | 70% | 8h | coco-pretrained | + +gpu: single GeForce GTX 1080 Ti + +| docker image | image size | batch size | epoch number | model | voc2012 val map50 | training time | note | +| - | - | - | - | - | - | - | - | +| yolov4 | 608 | 64/32 | 20000 steps | yolov4 | 72.73% | 6h | imagenet-pretrained | +| yolov5 | 640 | 16 | 100 | yolov5s | 70.35% | 2h | coco-pretrained | +| yolov7 | 640 | 16 | 100 | yolov7-tiny | 70.4% | 5h | coco-pretrained | +| mmdetection | 640 | 16 | 100 | yolox_tiny | 66.2% | 5h | coco-pretrained | +| detectron2 | 640 | 2 | 20000 steps | retinanet_R_50_FPN_1x | 53.54% | 2h | imagenet-pretrained | +| nanodet | 416 | 16 | 100 | nanodet-plus-m_416 | 58.63% | 5h | imagenet-pretrained | + +--- + +## how to import pretrained model weights + +- [import and finetune model](https://github.com/modelai/ymir-executor-fork/wiki/import-and-finetune-model) + +- [import pretainted model weights](https://github.com/IndustryEssentials/ymir/blob/master/dev_docs/import-extra-models.md) + +## reference + +### object detection +- [ymir-yolov5](https://github.com/modelai/ymir-yolov5) +- [ymir-yolov7](https://github.com/modelai/ymir-yolov7) +- [ymir-nanodet](https://github.com/modelai/ymir-nanodet) +- [ymir-mmyolo](https://github.com/modelai/ymir-mmyolo) +- [ymir-vidt](https://github.com/modelai/ymir-vidt) +- [ymir-detectron2](https://github.com/modelai/ymir-detectron2) + +### semantic segmenation +- [ymir-mmsegmentation](https://github.com/modelai/ymir-mmsegmentation) + +### instance segmentation +- [ymir-yolov5-seg](https://github.com/modelai/ymir-yolov5-seg) + +### resource +- [ymir-executor-sdk](https://github.com/modelai/ymir-executor-sdk) ymir_exc package, help to develop your image +- [ymir-executor-verifier](https://github.com/modelai/ymir-executor-verifier) test your ymir image +- [ymir-flask](https://github.com/modelai/ymir-flask) deploy your model on website diff --git a/README_zh-CN.md b/README_zh-CN.md new file mode 100644 index 0000000..2a02159 --- /dev/null +++ b/README_zh-CN.md @@ -0,0 +1,84 @@ +# ymir-executor 使用文档 [English](./README.md) | [简体中文](./README_zh-CN.md) + +- 🏠 [ymir](https://github.com/IndustryEssentials/ymir) + +- 📺 [视频教程](https://b23.tv/KS5b5oF) + +- 👨‍👩‍👧‍👧 [镜像社区](http://pubimg.vesionbook.com:8110/img) 可搜索到所有公开的ymir算法镜像, 同时可共享其他人发布的镜像。 + +- 📘 [文档](https://ymir-executor-fork.readthedocs.io/zh/latest/#) + +## 比较 + +| docker image | [finetune](https://github.com/modelai/ymir-executor-fork/wiki/use-yolov5-to-finetune-or-training-model) | tensorboard | args/cfg options | framework | onnx | pretrained weight | +| - | - | - | - | - | - | - | +| yolov4 | ? | ✔️ | ❌ | darknet + mxnet | ❌ | local | +| yolov5 | ✔️ | ✔️ | ✔️ | pytorch | ✔️ | local+online | +| yolov7 | ✔️ | ✔️ | ✔️ | pytorch | ❌ | local+online | +| mmdetection | ✔️ | ✔️ | ✔️ | pytorch | ❌ | local+online | +| detectron2 | ✔️ | ✔️ | ✔️ | pytorch | ❌ | online | +| vidt | ? | ✔️ | ✔️ | pytorch | ❌ | online | +| nanodet | ✔️ | ✔️ | ❌ | pytorch_lightning | ❌ | local+online | + +- `online` 预训练权重可能在训练时通过网络下载 + +- `local` 预训练权重在构建镜像时复制到了镜像 + +### benchmark + +- 训练集: voc2012-train 5717 images +- 测试集: voc2012-val 5823 images +- 图像大小: 640 (nanodet为416, yolov4为608) + +**由于 coco 数据集包含 voc 数据集中的类, 因此这个对比并不公平, 仅供参考** + +gpu: single Tesla P4 + +| docker image | batch size | epoch number | model | voc2012 val map50 | training time | note | +| - | - | - | - | - | - | - | +| yolov5 | 16 | 100 | yolov5s | 70.05% | 9h | coco-pretrained | +| vidt | 2 | 100 | swin-nano | 54.13% | 2d | imagenet-pretrained | +| yolov4 | 4 | 20000 steps | yolov4 | 66.18% | 2d | imagenet-pretrained | +| yolov7 | 16 | 100 | yolov7-tiny | 70% | 8h | coco-pretrained | + +gpu: single GeForce GTX 1080 Ti + +| docker image | image size | batch size | epoch number | model | voc2012 val map50 | training time | note | +| - | - | - | - | - | - | - | - | +| yolov4 | 608 | 64/32 | 20000 steps | yolov4 | 72.73% | 6h | imagenet-pretrained | +| yolov5 | 640 | 16 | 100 | yolov5s | 70.35% | 2h | coco-pretrained | +| yolov7 | 640 | 16 | 100 | yolov7-tiny | 70.4% | 5h | coco-pretrained | +| mmdetection | 640 | 16 | 100 | yolox_tiny | 66.2% | 5h | coco-pretrained | +| detectron2 | 640 | 2 | 20000 steps | retinanet_R_50_FPN_1x | 53.54% | 2h | imagenet-pretrained | +| nanodet | 416 | 16 | 100 | nanodet-plus-m_416 | 58.63% | 5h | imagenet-pretrained | + +--- + +## 如何导入预训练模型 + +- [如何导入并精调外部模型](https://github.com/modelai/ymir-executor-fork/wiki/import-and-finetune-model) + +- [如何导入外部模型](https://github.com/IndustryEssentials/ymir/blob/master/dev_docs/import-extra-models.md) + + - 通过ymir网页端的 `模型管理/模型列表/导入模型` 同样可以导入模型 + +## 参考 + +### 目标检测 +- [ymir-yolov5](https://github.com/modelai/ymir-yolov5) +- [ymir-yolov7](https://github.com/modelai/ymir-yolov7) +- [ymir-nanodet](https://github.com/modelai/ymir-nanodet) +- [ymir-mmyolo](https://github.com/modelai/ymir-mmyolo) +- [ymir-vidt](https://github.com/modelai/ymir-vidt) +- [ymir-detectron2](https://github.com/modelai/ymir-detectron2) + +### 语义分割 +- [ymir-mmsegmentation](https://github.com/modelai/ymir-mmsegmentation) + +### 实例分割 +- [ymir-yolov5-seg](https://github.com/modelai/ymir-yolov5-seg) + +### 资源 +- [ymir-executor-sdk](https://github.com/modelai/ymir-executor-sdk) ymir_exc 包,辅助开发镜像 +- [ymir-executor-verifier](https://github.com/modelai/ymir-executor-verifier) 测试镜像工具 +- [ymir-flask](https://github.com/modelai/ymir-flask) 云端部署示例 diff --git a/det-demo-tmi/Dockerfile b/det-demo-tmi/Dockerfile new file mode 100644 index 0000000..dae7ae0 --- /dev/null +++ b/det-demo-tmi/Dockerfile @@ -0,0 +1,30 @@ +# a docker file for an sample training / mining / infer executor + +FROM python:3.8.13-alpine + +RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apk/repositories +# Add bash +RUN apk add bash +# Required to build numpy wheel +RUN apk add g++ git make + +COPY requirements.txt ./ +RUN pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple + +WORKDIR /app +# copy user code to WORKDIR +COPY ./app/start.py /app/ + +# copy user config template and manifest.yaml to /img-man +RUN mkdir -p /img-man +COPY img-man/*.yaml /img-man/ + +# view https://github.com/protocolbuffers/protobuf/issues/10051 for detail +ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +# entry point for your app +# the whole docker image will be started with `nvidia-docker run ` +# and this command will run automatically + +RUN echo "python /app/start.py" > /usr/bin/start.sh +CMD bash /usr/bin/start.sh diff --git a/det-demo-tmi/README.md b/det-demo-tmi/README.md new file mode 100644 index 0000000..36853ec --- /dev/null +++ b/det-demo-tmi/README.md @@ -0,0 +1,282 @@ +# ymir 用户自定义镜像制作指南 + +!!!最新文档参考 https://ymir-executor-fork.readthedocs.io/zh/latest/object_detection/simple_det_training/ + +此处文档为ymir1.3.0时编写,现ymir最新版为ymir2.1.0, 相应代码中的接口也更新到ymir2.1.0,需要安装相应版本的sdk。 + +``` +pip install "git+https://github.com/modelai/ymir-executor-sdk.git@ymir2.1.0" +``` + +## 目的 + +此文档面向以下人员: + +* 为 ymir 开发训练,挖掘及推理镜像的算法人员及工程人员 + +* 希望将已经有的训练,挖掘及推理镜像对接到 ymir 系统的算法及工程人员 + +此文档将详细描述如何使用 ymir executor framework 开发新的镜像。 + +![](../docs/ymir-docker-develop.drawio.png) + +## 准备工作 + +1. 下载 ymir 工程 并构建自己的demo镜像: + +``` +git clone https://github.com/modelai/ymir-executor-fork -b ymir-dev +cd ymir-executor-fork/det-demo-tmi + +docker build -t ymir/executor:det-demo-tmi . +``` + +2. 下载voc dog 数据集 + +``` +sudo apt install wget unzip + +wget https://github.com/modelai/ymir-executor-fork/releases/download/dataset/voc_dog_debug_sample.zip -O voc_dog_debug_sample.zip + +unzip voc_dog_debug_sample.zip +``` +运行上述脚本将得到如下目录 +``` +voc_dog +├── in # 输入目录 +│ ├── annotations # 标注文件目录 +│ ├── assets # 图像文件目录 +│ ├── train-index.tsv # 训练集索引文件 +│ └── val-index.tsv # 验证集索引文件 +└── out # 输出目录 +``` + +3. 配置 `/in/env.yaml` 与 `/in/config.yaml` + + * 示例 `voc_dog/in/env.yaml` + + * protocol_version: ymir1.3.0之后添加的字段,说明ymir接口版本 + + ``` + task_id: task0 + protocol_version: 1.0.0 + run_training: True + run_mining: False + run_infer: False + input: + root_dir: /in + assets_dir: /in/assets + annotations_dir: /in/annotations + models_dir: /in/models + training_index_file: /in/train-index.tsv + val_index_file: /in/val-index.tsv + candidate_index_file: /in/candidate-index.tsv + config_file: /in/config.yaml + output: + root_dir: /out + models_dir: /out/models + tensorboard_dir: /out/tensorboard + training_result_file: /out/models/result.yaml + mining_result_file: /out/result.tsv + infer_result_file: /out/infer-result.json + monitor_file: /out/monitor.txt + executor_log_file: /out/ymir-executor-out.log + ``` + + * 示例 `voc_dog/in/config.yaml` + ``` + class_names: + - dog + export_format: ark:raw + gpu_count: 1 + # gpu_id: '0,1,2,3' + gpu_id: '0' + pretrained_model_params: [] + shm_size: 128G + task_id: t00000020000020167c11661328921 + + # just for test, remove this key in your own docker image + expected_map: 0.983 # expected map for training task + idle_seconds: 60 # idle seconds for each task + ``` + +4. 运行测试镜像 +``` +# 交互式运行 +docker run -it --rm -v $PWD/voc_dog/in:/in -v $PWD/voc_dog/out:/out ymir/executor:det-demo-tmi bash +> bash /usr/bin/start.sh + +# 直接运行 +docker run --rm -v $PWD/voc_dog/in:/in -v $PWD/voc_dog/out:/out ymir/executor:det-demo-tmi +``` + +## ymir 对镜像的调用流程 + +ymir 通过 mir train / mir mining / mir infer 命令启动镜像,遵循以下步骤: + +1. 导出镜像需要用的图像资源以及标注资源文件 + +2. 准备镜像配置 config.yaml 及 env.yaml + +3. 通过 nvidia-docker run 激活镜像,在启动镜像时,将提供以下目录及文件: + +| 目录或文件 | 说明 | 权限 | +| --- | --- | --- | +| `/in/env.yaml` | 任务类型,任务 id,数据集索引文件位置等信息 | 只读 | +| `/in/config.yaml` | 镜像本身所用到的超参等标注信息 | 只读 | +| `/in/*-index.tsv` | 数据集索引文件 | 只读 | +| `/in/models` | 预训练模型存放目录 | 只读 | +| `/in/assets` | 图像资源存放目录 | 只读 | +| `/in/annotations` | 标注文件存放目录 | 只读 | +| `/out/tensorboard` | tensorboard 日志写入目录 | 读写 | +| `/out/models` | 结果模型保存目录 | 读写 | + +4. 镜像启动以后,完成自己的训练、挖掘或推理任务,将相应结果写入对应文件,若成功,则返回 0,若失败,则返回非 0 错误码 + +5. ymir 将正确结果或异常结果归档,完成整个过程 + +## 训练、挖掘与推理镜像的开发工具包 ymir_exc + +`app/start.py` 展示了一个简单的镜像执行部分,此文档也将基于这个样例工程来说明如何使用`ymir_exc`来开发镜像。 + +关于这个文件,有以下部分值得注意: + +1. 在 Dockerfile 中,最后一条命令说明了:当此镜像被 ymir 系统通过 nvidia-docker run 启动时,默认执行的是 `bash /usr/bin/start.sh`, 即调用 `python /app/start.py` 命令,也就是此工程中的 `app/start.py` 文件 + +2. 镜像框架相关的所有内容都在 `ymir_exc` 包中,包括以下部分: + + 安装方式 `pip install "git+https://github.com/modelai/ymir-executor-sdk.git@ymir2.1.0"`, 注意通过 ~~`pip install ymir_exc`~~ 的方式安装的版本不具有 `ymir_exc.util` 包。前者在后者的代码基础上进行了扩展,提供了更多的功能(如 `ymir_exc.util`)。 + + * `env`:环境,提供任务类型,任务 id 等信息 + + * `dataset_reader`:使用数据集读取器来取得数据集信息 + + * `result_writer`:写入训练,挖掘以及推理结果 + + * `monitor`:写入进度信息 + + * `util`: 常用函数, 如`get_merged_config()` + +3. 使用 `cfg=util.get_merged_config()` 可以取得默认的 `EasyDict` 实例,这个实例的`cfg.ymir`来源于文件 `/in/env.yaml`,如果出于测试的目的想要更改这个默认文件,可以直接更改 `settings.DEFAULT_ENV_FILE_PATH`,但在实际封装成镜像的时候,应该把它的值重新指回成默认的 `/in/env.yaml`. `cfg.param`则来源于`/in/config.yaml` + +4. 在 `start()` 方法中,通过 `cfg.ymir` 中的 `run_training` / `run_mining` / `run_infer` 来判断本次需要执行的任务类型。如果任务类型是本镜像不支持的,可以直接报错 + +5. 虽然 `app/start.py` 展示的是一个训练,挖掘和推理多合一的镜像,开发者也可以分成若干个独立的镜像,例如,训练一个,挖掘和推理合成一个。实际应用中,镜像可以同时运行推理和挖掘这两个任务,注意其进度与单独运行时不同。 + + * 单独运行时,推理或者挖掘的进度值 `percent` 在 [0, 1] 区间,并通过 `monitor.write_monitor_logger(percent)` 记录在 `/out/monitor.txt` 中。 + + * 同时运行时, 假设先进行挖掘任务, 那么挖掘的进度值在 [0, 0.5] 区间,推理的进度度值在 [0.5, 1] 区间。 + +## 训练过程 + +`app/start.py` 中的函数 `_run_training` 展示了一个训练功能的样例,有以下部分需要注意: + +1. 超参的取得 + + * 使用 `cfg.param` 取得外部传入的超参数等信息 + + * 每个训练镜像都应该准备一个超参模板 `training-template.yaml`,ymir 系统将以此模板为基础提供超参 + + * 以下 key 为保留字,将由系统指定: + +| key | 类型 | 说明 | +| --- | --- | --- | +| class_names | list | 类别 | +| gpu_id | str | 可使用的 gpu id,以英文逗号分隔,如果为空,则表示用 cpu 训练 | +| pretrained_model_params | list | 预训练模型列表,如果指定了,则表示需要基于此模型做继续训练 | + +2. 训练集和验证集的取得:使用 `cfg.ymir.input.training_index_file` 和 `cfg.ymir.input.val_index_file` 取得训练集和验证集的索引文件。索引文件中每一行为图像绝对路径与标注绝对路径,以`\t`进行分隔。 +``` +from ymir_exc.util import get_merged_config + +cfg = get_merged_config() +with open(cfg.ymir.input.training_index_file, 'r') as fp: + lines = fp.readlines() + +for idx, line in enumerate(lines): + image_path, annotation_path = line.strip().split() + ... +``` + +3. 模型的保存 + + * 模型按当前正在进行的 stage name,分目录保存 + + * 在 `cfg.ymir.output.models_dir` 中提供了模型的保存目录,用户可以使用 pytorch, mxnet, darknet 等训练框架自带的保存方法将模型保存在此目录下的以当前 stage_name 命名的子目录中 + + * 例如,如果需要保存 stage_name 为 'epoch-5000' 的模型,则需要把这些模型文件保存到 `os.path.join(cfg.ymir.output.model_dir, 'epoch-5000')` 目录下 + + * 推荐使用 `util.write_ymir_training_result()` 方法保存训练结果 (不带目录的模型名称列表,mAP等) ,它对 `result_writer.write_model_stage()` 进行了封装,兼容性与容错性更好。 + + * 需要保存的模型实际记录在`cfg.ymir.output.training_result_file`中,ymir将依据此文件进行文件打包,供用户下载、迭代训练及推理挖掘。 + +4. 进度的记录:使用 `monitor.write_monitor_logger(percent)` 方法记录任务当前的进度,实际使用时,可以每隔若干轮迭代,根据当前迭代次数和总迭代次数来估算当前进度(一个 0 到 1 之间的数),调用此方法记录 + +## 挖掘过程 + +所谓挖掘过程指的是:提供一个基础模型,以及一个不带标注的候选数据集,在此候选数据集上进行 active learning 算法,得到每张图片的得分,并将这个得分结果保存。 + +`app/start.py` 中的函数 `_run_mining` 展示了一个数据挖掘过程的样例,有以下部分需要注意: + +1. 参数的取得 + + * 使用 `cfg = get_merged_config()` 取得外部传入的参数 `cfg.param` + + * 每个挖掘镜像都应该准备一个参数模板 `mining-template.yaml`,ymir 系统将以此模板为基础提供参数 + + * 以下 key 为保留字,将由系统指定: + +| key | 类型 | 说明 | +| --- | --- | --- | +| class_names | list | 类别 | +| gpu_id | str | 可使用的 gpu id,以英文逗号分隔,如果为空,则表示用 cpu 训练 | +| model_params_path | list | 模型路径列表,镜像应该从里面选择自己可以使用的模型,如果有多个模型可以使用,直接报错 | + +2. 候选集的取得 + + * 进行挖掘任务时,所使用的数据集是一个没有带标注的候选集,可以使用 `cfg.ymir.input.candidate_index_file` 取得挖掘数据集的索引文件,这个文件中每一行为图片的绝对路径。 + + ``` + with open(cfg.ymir.input.candidate_index_file, 'r') as fp: + lines = fp.readlines() + + for line in lines: + image_path = line.strip() + ... + ``` + +3. 结果的保存 + + * 使用 `result_writer.write_mining_result()` 对挖掘结果进行保存, 结果将保存到`cfg.ymir.output.mining_result_file`,ymir将依据这个文件进行新数据集生成。 + +## 推理过程 + +所谓推理过程指的是:提供一个基础模型,以及一个不带标注的候选数据集,在此候选数据集上进行模型推理,得到每张图片的 detection 结果(框,类别,得分),并保存此结果。 + +`app/start.py` 中的函数 `_run_infer` 展示了一个推理过程的样例,有以下部分需要注意: + +1. 参数的取得:同数据挖掘过程 + +2. 候选集的取得:同数据挖掘过程, 也是利用文件 `cfg.ymir.input.candidate_index_file` + +3. 结果的保存 + + * 推理结果本身是一个 dict,key 是候选集图片的路径,value 是一个由 `result_writer.Annotation` 构成的 list + + * 使用 `result_writer.write_infer_result()` 保存推理结果, 推理结果将保存到`cfg.ymir.output.infer_result_file`, ymir将依据这个文件进行结果展示与新数据集生成。 + +## 镜像打包 + +可以在 `Dockerfile` 的基础上构建自己的打包脚本 + +## 测试 + +可以使用以下几种方式进行测试: + +1. 通过 [ymir-executor-verifier](https://github.com/modelai/ymir-executor-verifier) 进行测试 + +2. 通过 ymir web 系统进行测试 + +3. 通过 ymir 命令行启动 mir train / mir mining / mir infer 命令进行测试 + + diff --git a/det-demo-tmi/app/start.py b/det-demo-tmi/app/start.py new file mode 100644 index 0000000..b9d17a7 --- /dev/null +++ b/det-demo-tmi/app/start.py @@ -0,0 +1,222 @@ +import logging +import os +import random +import sys +import time +from typing import List + +from easydict import EasyDict as edict +from tensorboardX import SummaryWriter +from ymir_exc import monitor +from ymir_exc import result_writer as rw +from ymir_exc.util import get_merged_config + + +def start() -> int: + cfg = get_merged_config() + + if cfg.ymir.run_training: + _run_training(cfg) + if cfg.ymir.run_mining: + _run_mining(cfg) + if cfg.ymir.run_infer: + _run_infer(cfg) + + return 0 + + +def _run_training(cfg: edict) -> None: + """ + sample function of training, which shows: + 1. how to get config file + 2. how to read training and validation datasets + 3. how to write logs + 4. how to write training result + """ + # use `env.get_executor_config` to get config file for training + gpu_id: str = cfg.param.get('gpu_id') + class_names: List[str] = cfg.param.get('class_names') + expected_mAP: float = cfg.param.get('expected_map') + idle_seconds: float = cfg.param.get('idle_seconds') + trigger_crash: bool = cfg.param.get('trigger_crash') + # use `logging` or `print` to write log to console + # notice that logging.basicConfig is invoked at executor.env + logging.info(f'gpu device: {gpu_id}') + logging.info(f'dataset class names: {class_names}') + logging.info(f"training config: {cfg.param}") + + # count for image and annotation file + with open(cfg.ymir.input.training_index_file, 'r') as fp: + lines = fp.readlines() + + valid_image_count = 0 + valid_ann_count = 0 + + N = len(lines) + monitor_gap = max(1, N // 100) + for idx, line in enumerate(lines): + asset_path, annotation_path = line.strip().split() + if os.path.isfile(asset_path): + valid_image_count += 1 + + if os.path.isfile(annotation_path): + valid_ann_count += 1 + + # use `monitor.write_monitor_logger` to write write task process percent to monitor.txt + if idx % monitor_gap == 0: + monitor.write_monitor_logger(percent=0.2 * idx / N) + + logging.info(f'total image-ann pair: {N}') + logging.info(f'valid images: {valid_image_count}') + logging.info(f'valid annotations: {valid_ann_count}') + + # use `monitor.write_monitor_logger` to write write task process percent to monitor.txt + monitor.write_monitor_logger(percent=0.2) + + # suppose we have a long time training, and have saved the final model + models_dir = cfg.ymir.output.models_dir + os.makedirs(models_dir, exist_ok=True) + with open(os.path.join(models_dir, 'epoch10.pt'), 'w') as f: + f.write('fake model weight') + with open(os.path.join(models_dir, 'config.py'), 'w') as f: + f.write('fake model config file') + # use `rw.write_model_stage` to save training result + rw.write_model_stage(stage_name='epoch10', + files=['epoch10.pt', 'config.py'], + evaluation_result=dict(mAP=random.random() / 2)) + + _dummy_work(idle_seconds=idle_seconds, trigger_crash=trigger_crash) + + write_tensorboard_log(cfg.ymir.output.tensorboard_dir) + + with open(os.path.join(models_dir, 'epoch20.pt'), 'w') as f: + f.write('fake model weight') + with open(os.path.join(models_dir, 'config.py'), 'w') as f: + f.write('fake model config file') + rw.write_model_stage(stage_name='epoch20', + files=['epoch20.pt', 'config.py'], + evaluation_result=dict(mAP=expected_mAP)) + + # if task done, write 100% percent log + logging.info('training done') + monitor.write_monitor_logger(percent=1.0) + + +def _run_mining(cfg: edict) -> None: + # use `cfg.param` to get config file for training + # pretrained models in `cfg.ymir.input.models_dir` + gpu_id: str = cfg.param.get('gpu_id') + class_names: List[str] = cfg.param.get('class_names') + idle_seconds: float = cfg.param.get('idle_seconds', 60) + trigger_crash: bool = cfg.param.get('trigger_crash', False) + # use `logging` or `print` to write log to console + logging.info(f"mining config: {cfg.param}") + logging.info(f'gpu device: {gpu_id}') + logging.info(f'dataset class names: {class_names}') + + # use `cfg.input.candidate_index_file` to read candidate dataset items + # note that annotations path will be empty str if there's no annotations in that dataset + # count for image files + with open(cfg.ymir.input.candidate_index_file, 'r') as fp: + lines = fp.readlines() + + valid_images = [] + valid_image_count = 0 + for line in lines: + if os.path.isfile(line.strip()): + valid_image_count += 1 + valid_images.append(line.strip()) + + # use `monitor.write_monitor_logger` to write task process to monitor.txt + logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") + monitor.write_monitor_logger(percent=0.2) + + _dummy_work(idle_seconds=idle_seconds, trigger_crash=trigger_crash) + + # write mining result + # here we give a fake score to each assets + total_length = len(valid_images) + mining_result = [(asset_path, index / total_length) for index, asset_path in enumerate(valid_images)] + rw.write_mining_result(mining_result=mining_result) + + # if task done, write 100% percent log + logging.info('mining done') + monitor.write_monitor_logger(percent=1.0) + + +def _run_infer(cfg: edict) -> None: + # use `cfg.param` to get config file for training + # models are transfered in `cfg.ymir.input.models_dir` model_params_path + class_names = cfg.param.get('class_names') + idle_seconds: float = cfg.param.get('idle_seconds', 60) + trigger_crash: bool = cfg.param.get('trigger_crash', False) + seed: int = cfg.param.get('seed', 15) + # use `logging` or `print` to write log to console + logging.info(f"infer config: {cfg.param}") + + # use `cfg.ymir.input.candidate_index_file` to read candidate dataset items + # note that annotations path will be empty str if there's no annotations in that dataset + with open(cfg.ymir.input.candidate_index_file, 'r') as fp: + lines = fp.readlines() + + valid_images = [] + invalid_images = [] + valid_image_count = 0 + for line in lines: + if os.path.isfile(line.strip()): + valid_image_count += 1 + valid_images.append(line.strip()) + else: + invalid_images.append(line.strip()) + + # use `monitor.write_monitor_logger` to write log to console and write task process percent to monitor.txt + logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") + monitor.write_monitor_logger(percent=0.2) + + _dummy_work(idle_seconds=idle_seconds, trigger_crash=trigger_crash) + + # write infer result + fake_anns = [] + random.seed(seed) + for class_name in class_names: + x = random.randint(0, 100) + y = random.randint(0, 100) + w = random.randint(50, 100) + h = random.randint(50, 100) + ann = rw.Annotation(class_name=class_name, score=random.random(), box=rw.Box(x=x, y=y, w=w, h=h)) + + fake_anns.append(ann) + + infer_result = {asset_path: fake_anns for asset_path in valid_images} + for asset_path in invalid_images: + infer_result[asset_path] = [] + rw.write_infer_result(infer_result=infer_result) + + # if task done, write 100% percent log + logging.info('infer done') + monitor.write_monitor_logger(percent=1.0) + + +def _dummy_work(idle_seconds: float, trigger_crash: bool = False, gpu_memory_size: int = 0) -> None: + if idle_seconds > 0: + time.sleep(idle_seconds) + if trigger_crash: + raise RuntimeError('app crashed') + + +def write_tensorboard_log(tensorboard_dir: str) -> None: + tb_log = SummaryWriter(tensorboard_dir) + + total_epoch = 30 + for e in range(total_epoch): + tb_log.add_scalar("fake_loss", 10 / (1 + e), e) + time.sleep(1) + monitor.write_monitor_logger(percent=e / total_epoch) + + +if __name__ == '__main__': + logging.basicConfig(stream=sys.stdout, + format='%(levelname)-8s: [%(asctime)s] %(message)s', + datefmt='%Y%m%d-%H:%M:%S', + level=logging.INFO) + sys.exit(start()) diff --git a/det-demo-tmi/img-man/infer-template.yaml b/det-demo-tmi/img-man/infer-template.yaml new file mode 100644 index 0000000..f360cff --- /dev/null +++ b/det-demo-tmi/img-man/infer-template.yaml @@ -0,0 +1,12 @@ +# infer template for your executor app +# after build image, it should at /img-man/infer-template.yaml +# key: gpu_id, task_id, model_params_path, class_names should be preserved + +# gpu_id: '0' +# task_id: 'default-infer-task' +# model_params_path: [] +# class_names: [] + +# just for test, remove this key in your own docker image +idle_seconds: 3 # idle seconds for each task +seed: 15 diff --git a/det-demo-tmi/img-man/manifest.yaml b/det-demo-tmi/img-man/manifest.yaml new file mode 100644 index 0000000..3353f64 --- /dev/null +++ b/det-demo-tmi/img-man/manifest.yaml @@ -0,0 +1,2 @@ +# object_type: 2 for object detection, 3 for semantic segmentation, default: 2 +"object_type": 2 diff --git a/det-demo-tmi/img-man/mining-template.yaml b/det-demo-tmi/img-man/mining-template.yaml new file mode 100644 index 0000000..3e4b3ae --- /dev/null +++ b/det-demo-tmi/img-man/mining-template.yaml @@ -0,0 +1,11 @@ +# mining template for your executor app +# after build image, it should at /img-man/mining-template.yaml +# key: gpu_id, task_id, model_params_path, class_names should be preserved + +# gpu_id: '0' +# task_id: 'default-mining-task' +# model_params_path: [] +# class_names: [] + +# just for test, remove this key in your own docker image +idle_seconds: 6 # idle seconds for each task diff --git a/det-demo-tmi/img-man/training-template.yaml b/det-demo-tmi/img-man/training-template.yaml new file mode 100644 index 0000000..ac88de3 --- /dev/null +++ b/det-demo-tmi/img-man/training-template.yaml @@ -0,0 +1,13 @@ +# training template for your executor app +# after build image, it should at /img-man/training-template.yaml +# key: gpu_id, task_id, pretrained_model_paths, class_names should be preserved + +# gpu_id: '0' +# task_id: 'default-training-task' +# pretrained_model_params: [] +# class_names: [] +export_format: 'det-ark:raw' + +# just for test, remove this key in your own docker image +expected_map: 0.983 # expected map for training task +idle_seconds: 60 # idle seconds for each task diff --git a/det-demo-tmi/requirements.txt b/det-demo-tmi/requirements.txt new file mode 100644 index 0000000..cadfd37 --- /dev/null +++ b/det-demo-tmi/requirements.txt @@ -0,0 +1,5 @@ +pydantic>=1.8.2 +pyyaml>=5.4.1 +tensorboardX>=2.4 +packaging>=23.0 +ymir_exc@git+https://github.com/modelai/ymir-executor-sdk.git@ymir2.1.0 diff --git a/det-mmdetection-tmi/README.md b/det-mmdetection-tmi/README.md index c1d63cc..f1c0ab6 100644 --- a/det-mmdetection-tmi/README.md +++ b/det-mmdetection-tmi/README.md @@ -1,329 +1,34 @@ -
- -
 
-
- OpenMMLab website - - - HOT - - -      - OpenMMLab platform - - - TRY IT OUT - - -
-
 
+# det-mmdetection-tmi -[![PyPI](https://img.shields.io/pypi/v/mmdet)](https://pypi.org/project/mmdet) -[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmdetection.readthedocs.io/en/latest/) -[![badge](https://github.com/open-mmlab/mmdetection/workflows/build/badge.svg)](https://github.com/open-mmlab/mmdetection/actions) -[![codecov](https://codecov.io/gh/open-mmlab/mmdetection/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmdetection) -[![license](https://img.shields.io/github/license/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/blob/master/LICENSE) -[![open issues](https://isitmaintained.com/badge/open/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues) +- [mmdetection](./README_mmdet.md) - +`mmdetection` framework for object `det`ection `t`raining/`m`ining/`i`nfer task -[📘Documentation](https://mmdetection.readthedocs.io/en/v2.21.0/) | -[🛠️Installation](https://mmdetection.readthedocs.io/en/v2.21.0/get_started.html) | -[👀Model Zoo](https://mmdetection.readthedocs.io/en/v2.21.0/model_zoo.html) | -[🆕Update News](https://mmdetection.readthedocs.io/en/v2.21.0/changelog.html) | -[🚀Ongoing Projects](https://github.com/open-mmlab/mmdetection/projects) | -[🤔Reporting Issues](https://github.com/open-mmlab/mmdetection/issues/new/choose) - -
- -## Introduction - -English | [简体中文](README_zh-CN.md) - -MMDetection is an open source object detection toolbox based on PyTorch. It is -a part of the [OpenMMLab](https://openmmlab.com/) project. - -The master branch works with **PyTorch 1.5+**. - -
-Major features - -- **Modular Design** - - We decompose the detection framework into different components and one can easily construct a customized object detection framework by combining different modules. - -- **Support of multiple frameworks out of box** - - The toolbox directly supports popular and contemporary detection frameworks, *e.g.* Faster RCNN, Mask RCNN, RetinaNet, etc. - -- **High efficiency** - - All basic bbox and mask operations run on GPUs. The training speed is faster than or comparable to other codebases, including [Detectron2](https://github.com/facebookresearch/detectron2), [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) and [SimpleDet](https://github.com/TuSimple/simpledet). - -- **State of the art** - - The toolbox stems from the codebase developed by the *MMDet* team, who won [COCO Detection Challenge](http://cocodataset.org/#detection-leaderboard) in 2018, and we keep pushing it forward. - -
- -Apart from MMDetection, we also released a library [mmcv](https://github.com/open-mmlab/mmcv) for computer vision research, which is heavily depended on by this toolbox. - -## License - -This project is released under the [Apache 2.0 license](LICENSE). - -## Changelog - -**2.22.0** was released in 24/2/2022: - -- Support [MaskFormer](configs/maskformer), [DyHead](configs/dyhead), [OpenImages Dataset](configs/openimages) and [TIMM backbone](configs/timm_example) -- Support visualization for Panoptic Segmentation -- Release a good recipe of using ResNet in object detectors pre-trained by [ResNet Strikes Back](https://arxiv.org/abs/2110.00476), which consistently brings about 3~4 mAP improvements over RetinaNet, Faster/Mask/Cascade Mask R-CNN - -Please refer to [changelog.md](docs/en/changelog.md) for details and release history. - -For compatibility changes between different versions of MMDetection, please refer to [compatibility.md](docs/en/compatibility.md). - -## Overview of Benchmark and Model Zoo - -Results and models are available in the [model zoo](docs/en/model_zoo.md). - -
- Architectures -
- - - - - - - - - - - - - - - - - -
- Object Detection - - Instance Segmentation - - Panoptic Segmentation - - Other -
- - - - - - - -
  • Contrastive Learning
  • - - -
  • Distillation
  • - - -
    - -
    - Components -
    - - - - - - - - - - - - - - - - - -
    - Backbones - - Necks - - Loss - - Common -
    - - - - - - - -
    - -Some other methods are also supported in [projects using MMDetection](./docs/en/projects.md). - -## Installation - -Please refer to [get_started.md](docs/en/get_started.md) for installation. - -## Getting Started - -Please see [get_started.md](docs/en/get_started.md) for the basic usage of MMDetection. -We provide [colab tutorial](demo/MMDet_Tutorial.ipynb), and full guidance for quick run [with existing dataset](docs/en/1_exist_data_model.md) and [with new dataset](docs/en/2_new_data_model.md) for beginners. -There are also tutorials for [finetuning models](docs/en/tutorials/finetune.md), [adding new dataset](docs/en/tutorials/customize_dataset.md), [designing data pipeline](docs/en/tutorials/data_pipeline.md), [customizing models](docs/en/tutorials/customize_models.md), [customizing runtime settings](docs/en/tutorials/customize_runtime.md) and [useful tools](docs/en/useful_tools.md). - -Please refer to [FAQ](docs/en/faq.md) for frequently asked questions. - -## Contributing - -We appreciate all contributions to improve MMDetection. Ongoing projects can be found in out [GitHub Projects](https://github.com/open-mmlab/mmdetection/projects). Welcome community users to participate in these projects. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline. - -## Acknowledgement - -MMDetection is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. -We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors. - -## Citation - -If you use this toolbox or benchmark in your research, please cite this project. +# build docker image ``` -@article{mmdetection, - title = {{MMDetection}: Open MMLab Detection Toolbox and Benchmark}, - author = {Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and - Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and - Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and - Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and - Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong - and Shi, Jianping and Ouyang, Wanli and Loy, Chen Change and Lin, Dahua}, - journal= {arXiv preprint arXiv:1906.07155}, - year={2019} -} +docker build -t ymir-executor/mmdet:cuda102-tmi --build-arg YMIR=1.1.0 -f docker/Dockerfile.cuda102 . + +docker build -t ymir-executor/mmdet:cuda111-tmi --build-arg YMIR=1.1.0 -f docker/Dockerfile.cuda111 . ``` -## Projects in OpenMMLab -- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab foundational library for computer vision. -- [MIM](https://github.com/open-mmlab/mim): MIM installs OpenMMLab packages. -- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark. -- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark. -- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection. -- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab rotated object detection toolbox and benchmark. -- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark. -- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab text detection, recognition, and understanding toolbox. -- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark. -- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark. -- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark. -- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab model compression toolbox and benchmark. -- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab fewshot learning toolbox and benchmark. -- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark. -- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark. -- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark. -- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox. -- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab image and video generative models toolbox. -- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab model deployment framework. +# changelog +- modify `mmdet/datasets/coco.py`, save the evaluation result to `os.environ.get('COCO_EVAL_TMP_FILE')` with json format +- modify `mmdet/core/evaluation/eval_hooks.py`, write training result file and monitor task process +- modify `mmdet/datasets/__init__.py, mmdet/datasets/coco.py` and add `mmdet/datasets/ymir.py`, add class `YmirDataset` to load YMIR dataset. +- modify `requirements/runtime.txt` to add new dependent package. +- add `mmdet/utils/util_ymir.py` for ymir training/infer/mining +- add `ymir_infer.py` for infer +- add `ymir_mining.py` for mining +- add `ymir_train.py` modify `tools/train.py` to update the mmcv config for training +- add `start.py`, the entrypoint for docker image +- add `training-template.yaml, infer-template.yaml, mining-template.yaml` for ymir pre-defined hyper-parameters. +- add `docker/Dockerfile.cuda102, docker/Dockerfile.cuda111` to build docker image +- remove `docker/Dockerfile` to avoid misuse + +--- + +- 2022/09/06: set `find_unused_parameters = True`, fix DDP bug +- 2022/10/18: add `random` and `aldd` mining algorithm. `aldd` algorithm support yolox only. +- 2022/10/19: fix training class_number bug in `recursive_modify_attribute()` diff --git a/det-mmdetection-tmi/README_mmdet.md b/det-mmdetection-tmi/README_mmdet.md new file mode 100644 index 0000000..c1d63cc --- /dev/null +++ b/det-mmdetection-tmi/README_mmdet.md @@ -0,0 +1,329 @@ +
    + +
     
    +
    + OpenMMLab website + + + HOT + + +      + OpenMMLab platform + + + TRY IT OUT + + +
    +
     
    + +[![PyPI](https://img.shields.io/pypi/v/mmdet)](https://pypi.org/project/mmdet) +[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmdetection.readthedocs.io/en/latest/) +[![badge](https://github.com/open-mmlab/mmdetection/workflows/build/badge.svg)](https://github.com/open-mmlab/mmdetection/actions) +[![codecov](https://codecov.io/gh/open-mmlab/mmdetection/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmdetection) +[![license](https://img.shields.io/github/license/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/blob/master/LICENSE) +[![open issues](https://isitmaintained.com/badge/open/open-mmlab/mmdetection.svg)](https://github.com/open-mmlab/mmdetection/issues) + + + +[📘Documentation](https://mmdetection.readthedocs.io/en/v2.21.0/) | +[🛠️Installation](https://mmdetection.readthedocs.io/en/v2.21.0/get_started.html) | +[👀Model Zoo](https://mmdetection.readthedocs.io/en/v2.21.0/model_zoo.html) | +[🆕Update News](https://mmdetection.readthedocs.io/en/v2.21.0/changelog.html) | +[🚀Ongoing Projects](https://github.com/open-mmlab/mmdetection/projects) | +[🤔Reporting Issues](https://github.com/open-mmlab/mmdetection/issues/new/choose) + +
    + +## Introduction + +English | [简体中文](README_zh-CN.md) + +MMDetection is an open source object detection toolbox based on PyTorch. It is +a part of the [OpenMMLab](https://openmmlab.com/) project. + +The master branch works with **PyTorch 1.5+**. + +
    +Major features + +- **Modular Design** + + We decompose the detection framework into different components and one can easily construct a customized object detection framework by combining different modules. + +- **Support of multiple frameworks out of box** + + The toolbox directly supports popular and contemporary detection frameworks, *e.g.* Faster RCNN, Mask RCNN, RetinaNet, etc. + +- **High efficiency** + + All basic bbox and mask operations run on GPUs. The training speed is faster than or comparable to other codebases, including [Detectron2](https://github.com/facebookresearch/detectron2), [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) and [SimpleDet](https://github.com/TuSimple/simpledet). + +- **State of the art** + + The toolbox stems from the codebase developed by the *MMDet* team, who won [COCO Detection Challenge](http://cocodataset.org/#detection-leaderboard) in 2018, and we keep pushing it forward. + +
    + +Apart from MMDetection, we also released a library [mmcv](https://github.com/open-mmlab/mmcv) for computer vision research, which is heavily depended on by this toolbox. + +## License + +This project is released under the [Apache 2.0 license](LICENSE). + +## Changelog + +**2.22.0** was released in 24/2/2022: + +- Support [MaskFormer](configs/maskformer), [DyHead](configs/dyhead), [OpenImages Dataset](configs/openimages) and [TIMM backbone](configs/timm_example) +- Support visualization for Panoptic Segmentation +- Release a good recipe of using ResNet in object detectors pre-trained by [ResNet Strikes Back](https://arxiv.org/abs/2110.00476), which consistently brings about 3~4 mAP improvements over RetinaNet, Faster/Mask/Cascade Mask R-CNN + +Please refer to [changelog.md](docs/en/changelog.md) for details and release history. + +For compatibility changes between different versions of MMDetection, please refer to [compatibility.md](docs/en/compatibility.md). + +## Overview of Benchmark and Model Zoo + +Results and models are available in the [model zoo](docs/en/model_zoo.md). + +
    + Architectures +
    + + + + + + + + + + + + + + + + + +
    + Object Detection + + Instance Segmentation + + Panoptic Segmentation + + Other +
    + + + + + + + +
  • Contrastive Learning
  • + + +
  • Distillation
  • + + +
    + +
    + Components +
    + + + + + + + + + + + + + + + + + +
    + Backbones + + Necks + + Loss + + Common +
    + + + + + + + +
    + +Some other methods are also supported in [projects using MMDetection](./docs/en/projects.md). + +## Installation + +Please refer to [get_started.md](docs/en/get_started.md) for installation. + +## Getting Started + +Please see [get_started.md](docs/en/get_started.md) for the basic usage of MMDetection. +We provide [colab tutorial](demo/MMDet_Tutorial.ipynb), and full guidance for quick run [with existing dataset](docs/en/1_exist_data_model.md) and [with new dataset](docs/en/2_new_data_model.md) for beginners. +There are also tutorials for [finetuning models](docs/en/tutorials/finetune.md), [adding new dataset](docs/en/tutorials/customize_dataset.md), [designing data pipeline](docs/en/tutorials/data_pipeline.md), [customizing models](docs/en/tutorials/customize_models.md), [customizing runtime settings](docs/en/tutorials/customize_runtime.md) and [useful tools](docs/en/useful_tools.md). + +Please refer to [FAQ](docs/en/faq.md) for frequently asked questions. + +## Contributing + +We appreciate all contributions to improve MMDetection. Ongoing projects can be found in out [GitHub Projects](https://github.com/open-mmlab/mmdetection/projects). Welcome community users to participate in these projects. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline. + +## Acknowledgement + +MMDetection is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. +We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors. + +## Citation + +If you use this toolbox or benchmark in your research, please cite this project. + +``` +@article{mmdetection, + title = {{MMDetection}: Open MMLab Detection Toolbox and Benchmark}, + author = {Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and + Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and + Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and + Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and + Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong + and Shi, Jianping and Ouyang, Wanli and Loy, Chen Change and Lin, Dahua}, + journal= {arXiv preprint arXiv:1906.07155}, + year={2019} +} +``` + +## Projects in OpenMMLab + +- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab foundational library for computer vision. +- [MIM](https://github.com/open-mmlab/mim): MIM installs OpenMMLab packages. +- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark. +- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark. +- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection. +- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab rotated object detection toolbox and benchmark. +- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark. +- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab text detection, recognition, and understanding toolbox. +- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark. +- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark. +- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark. +- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab model compression toolbox and benchmark. +- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab fewshot learning toolbox and benchmark. +- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark. +- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark. +- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark. +- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox. +- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab image and video generative models toolbox. +- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab model deployment framework. diff --git a/det-mmdetection-tmi/docker/Dockerfile b/det-mmdetection-tmi/docker/Dockerfile deleted file mode 100644 index 5ee7a37..0000000 --- a/det-mmdetection-tmi/docker/Dockerfile +++ /dev/null @@ -1,25 +0,0 @@ -ARG PYTORCH="1.6.0" -ARG CUDA="10.1" -ARG CUDNN="7" - -FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel - -ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX" -ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all" -ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" - -RUN apt-get update && apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \ - && apt-get clean \ - && rm -rf /var/lib/apt/lists/* - -# Install MMCV -RUN pip install --no-cache-dir --upgrade pip wheel setuptools -RUN pip install --no-cache-dir mmcv-full==1.3.17 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html - -# Install MMDetection -RUN conda clean --all -RUN git clone https://github.com/open-mmlab/mmdetection.git /mmdetection -WORKDIR /mmdetection -ENV FORCE_CUDA="1" -RUN pip install --no-cache-dir -r requirements/build.txt -RUN pip install --no-cache-dir -e . diff --git a/det-mmdetection-tmi/docker/Dockerfile.cuda102 b/det-mmdetection-tmi/docker/Dockerfile.cuda102 new file mode 100644 index 0000000..2fd8643 --- /dev/null +++ b/det-mmdetection-tmi/docker/Dockerfile.cuda102 @@ -0,0 +1,42 @@ +ARG PYTORCH="1.8.1" +ARG CUDA="10.2" +ARG CUDNN="7" + +FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel + +# mmcv>=1.3.17, <=1.5.0 +ARG MMCV="1.4.3" +ARG YMIR="1.1.0" + +ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX" +ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all" +ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" +ENV LANG=C.UTF-8 +ENV FORCE_CUDA="1" +ENV PYTHONPATH=. +ENV YMIR_VERSION=${YMIR} +# Set timezone +RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \ + && echo 'Asia/Shanghai' >/etc/timezone + +RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC \ + && apt-get update \ + && apt-get install -y build-essential ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +# Install ymir-exc sdk and MMCV (no cu102/torch1.8.1, use torch1.8.0 instead) +RUN pip install --no-cache-dir --upgrade pip wheel setuptools \ + && pip install --no-cache-dir mmcv-full==${MMCV} -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html \ + && pip install "git+https://github.com/modelai/ymir-executor-sdk.git@ymir1.3.0" \ + && conda clean --all + +# Install det-mmdetection-tmi +COPY . /app/ +WORKDIR /app +RUN pip install --no-cache-dir -r requirements/runtime.txt \ + && mkdir /img-man \ + && mv *-template.yaml /img-man \ + && echo "cd /app && python3 start.py" > /usr/bin/start.sh + +CMD bash /usr/bin/start.sh diff --git a/det-mmdetection-tmi/docker/Dockerfile.cuda111 b/det-mmdetection-tmi/docker/Dockerfile.cuda111 new file mode 100644 index 0000000..2306105 --- /dev/null +++ b/det-mmdetection-tmi/docker/Dockerfile.cuda111 @@ -0,0 +1,49 @@ +ARG PYTORCH="1.8.0" +ARG CUDA="11.1" +ARG CUDNN="8" + +FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-runtime + +# mmcv>=1.3.17, <=1.5.0 +ARG MMCV="1.4.3" +ARG YMIR="1.1.0" + +ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX" +ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all" +ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" +ENV FORCE_CUDA="1" +ENV PYTHONPATH=. +ENV YMIR_VERSION=${YMIR} +# Set timezone +RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \ + && echo 'Asia/Shanghai' >/etc/timezone + +# Install apt package +RUN apt-get update && apt-get install -y build-essential ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +# Install ymir-exc sdk and MMCV +RUN pip install --no-cache-dir --upgrade pip wheel setuptools \ + && pip install --no-cache-dir mmcv-full==${MMCV} -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html \ + && pip install "git+https://github.com/modelai/ymir-executor-sdk.git@ymir1.3.0" \ + && conda clean --all + +# Install det-mmdetection-tmi +COPY . /app/ +WORKDIR /app +RUN pip install --no-cache-dir -r requirements/runtime.txt \ + && mkdir /img-man \ + && mv *-template.yaml /img-man \ + && echo "cd /app && python3 start.py" > /usr/bin/start.sh + +# Download coco-pretrained yolox weight to /weights +# view https://github.com/open-mmlab/mmdetection/tree/master/configs/yolox for detail +# RUN apt-get update && apt install -y wget && rm -rf /var/lib/apt/lists/* +# RUN mkdir -p /weights && cd /weights \ +# && wget https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_tiny_8x8_300e_coco/yolox_tiny_8x8_300e_coco_20211124_171234-b4047906.pth \ +# && wget https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth \ +# && wget https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth \ +# && wget https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth + +CMD bash /usr/bin/start.sh diff --git a/det-mmdetection-tmi/infer-template.yaml b/det-mmdetection-tmi/infer-template.yaml new file mode 100644 index 0000000..80967de --- /dev/null +++ b/det-mmdetection-tmi/infer-template.yaml @@ -0,0 +1,3 @@ +shm_size: '128G' +cfg_options: '' +conf_threshold: 0.2 diff --git a/det-mmdetection-tmi/mining-template.yaml b/det-mmdetection-tmi/mining-template.yaml new file mode 100644 index 0000000..4e05032 --- /dev/null +++ b/det-mmdetection-tmi/mining-template.yaml @@ -0,0 +1,3 @@ +shm_size: '128G' +mining_algorithm: aldd +class_distribution_scores: '' # 1.0,1.0,0.1,0.2 diff --git a/det-mmdetection-tmi/mining_base.py b/det-mmdetection-tmi/mining_base.py new file mode 100644 index 0000000..c357a80 --- /dev/null +++ b/det-mmdetection-tmi/mining_base.py @@ -0,0 +1,137 @@ +import warnings +from typing import List + +import torch +import torch.nn.functional as F # noqa +from easydict import EasyDict as edict + + +def binary_classification_entropy(p: torch.Tensor) -> torch.Tensor: + """ + p: BCHW, the feature map after sigmoid, range in (0,1) + F.bce(x,y) = -(y * logx + (1-y) * log(1-x)) + """ + # return -(p * torch.log(p) + (1 - p) * torch.log(1 - p)) + return F.binary_cross_entropy(p, p, reduction='none') + + +def multiple_classification_entropy(p: torch.Tensor, activation: str) -> torch.Tensor: + """ + p: BCHW + + yolov5: sigmoid + nanodet: sigmoid + """ + assert activation in ['sigmoid', 'softmax'], f'classification type = {activation}, not in sigmoid, softmax' + + if activation == 'sigmoid': + entropy = F.binary_cross_entropy(p, p, reduction='none') + sum_entropy = torch.sum(entropy, dim=1, keepdim=True) + return sum_entropy + else: + # for origin aldd code, use tf.log(p + 1e-12) + entropy = -(p) * torch.log(p + 1e-7) + sum_entropy = torch.sum(entropy, dim=1, keepdim=True) + return sum_entropy + + +class FeatureMapBasedMining(object): + + def __init__(self, ymir_cfg: edict): + self.ymir_cfg = ymir_cfg + + def mining(self, feature_maps: List[torch.Tensor]) -> torch.Tensor: + raise Exception('not implement') + + +class ALDDMining(FeatureMapBasedMining): + """ + Active Learning for Deep Detection Neural Networks (ICCV 2019) + official code: https://gitlab.com/haghdam/deep_active_learning + + change from tensorflow code to pytorch code + 1. average pooling changed, pad or not? symmetrical pad or not? + 2. max pooling changed, ceil or not? + 3. the resize shape for aggregate feature map + + those small change cause 20%-40% difference for P@N, N=100 for total 1000 images. + P@5: 0.2 + P@10: 0.3 + P@20: 0.35 + P@50: 0.5 + P@100: 0.59 + P@200: 0.73 + P@500: 0.848 + """ + + def __init__(self, ymir_cfg: edict, resize_shape: List[int]): + super().__init__(ymir_cfg) + self.resize_shape = resize_shape + self.max_pool_size = 32 + self.avg_pool_size = 9 + self.align_corners = False + self.num_classes = len(ymir_cfg.param.class_names) + + def extract_conf(self, feature_maps: List[torch.Tensor], format='yolov5') -> List[torch.Tensor]: + """ + extract confidence feature map before sigmoid. + """ + if format == 'yolov5': + # feature_maps: [bs, 3, height, width, xywh + conf + num_classes] + return [f[:, :, :, :, 4] for f in feature_maps] + else: + warnings.warn(f'unknown feature map format {format}') + + return feature_maps + + def mining(self, feature_maps: List[torch.Tensor]) -> torch.Tensor: + """mining for feature maps + feature_maps: [BCHW] + 1. resizing followed by sigmoid + 2. get mining score + """ + # fmap = [Batch size, anchor number = 3, height, width, 5 + class_number] + + list_tmp = [] + for fmap in feature_maps: + resized_fmap = F.interpolate(fmap, self.resize_shape, mode='bilinear', align_corners=self.align_corners) + list_tmp.append(resized_fmap) + conf = torch.cat(list_tmp, dim=1).sigmoid() + scores = self.get_mining_score(conf) + return scores + + def get_mining_score(self, confidence_feature_map: torch.Tensor) -> torch.Tensor: + """ + confidence_feature_map: BCHW, value in (0, 1) + 1. A=sum(avg(entropy(fmap))) B,1,H,W + 2. B=sum(entropy(avg(fmap))) B,1,H,W + 3. C=max(B-A) B,1,h,w + 4. mean(C) B + """ + avg_entropy = F.avg_pool2d(self.get_entropy(confidence_feature_map), + kernel_size=self.avg_pool_size, + stride=1, + padding=0) + sum_avg_entropy = torch.sum(avg_entropy, dim=1, keepdim=True) + + entropy_avg = self.get_entropy( + F.avg_pool2d(confidence_feature_map, kernel_size=self.avg_pool_size, stride=1, padding=0)) + sum_entropy_avg = torch.sum(entropy_avg, dim=1, keepdim=True) + + uncertainty = sum_entropy_avg - sum_avg_entropy + + max_uncertainty = F.max_pool2d(uncertainty, + kernel_size=self.max_pool_size, + stride=self.max_pool_size, + padding=0, + ceil_mode=False) + + return torch.mean(max_uncertainty, dim=(1, 2, 3)) + + def get_entropy(self, feature_map: torch.Tensor) -> torch.Tensor: + if self.num_classes == 1: + # binary cross entropy + return binary_classification_entropy(feature_map) + else: + # multi-class cross entropy + return multiple_classification_entropy(feature_map, activation='sigmoid') diff --git a/det-mmdetection-tmi/mmdet/core/evaluation/eval_hooks.py b/det-mmdetection-tmi/mmdet/core/evaluation/eval_hooks.py index 7c1fbe9..cf07e5b 100644 --- a/det-mmdetection-tmi/mmdet/core/evaluation/eval_hooks.py +++ b/det-mmdetection-tmi/mmdet/core/evaluation/eval_hooks.py @@ -6,18 +6,19 @@ import torch.distributed as dist from mmcv.runner import DistEvalHook as BaseDistEvalHook from mmcv.runner import EvalHook as BaseEvalHook +from mmdet.utils.util_ymir import write_ymir_training_result from torch.nn.modules.batchnorm import _BatchNorm +from ymir_exc import monitor +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process def _calc_dynamic_intervals(start_interval, dynamic_interval_list): assert mmcv.is_list_of(dynamic_interval_list, tuple) dynamic_milestones = [0] - dynamic_milestones.extend( - [dynamic_interval[0] for dynamic_interval in dynamic_interval_list]) + dynamic_milestones.extend([dynamic_interval[0] for dynamic_interval in dynamic_interval_list]) dynamic_intervals = [start_interval] - dynamic_intervals.extend( - [dynamic_interval[1] for dynamic_interval in dynamic_interval_list]) + dynamic_intervals.extend([dynamic_interval[1] for dynamic_interval in dynamic_interval_list]) return dynamic_milestones, dynamic_intervals @@ -25,6 +26,7 @@ class EvalHook(BaseEvalHook): def __init__(self, *args, dynamic_intervals=None, **kwargs): super(EvalHook, self).__init__(*args, **kwargs) + self.ymir_cfg = get_merged_config() self.use_dynamic_intervals = dynamic_intervals is not None if self.use_dynamic_intervals: @@ -43,10 +45,31 @@ def before_train_epoch(self, runner): self._decide_interval(runner) super().before_train_epoch(runner) + def after_train_epoch(self, runner): + """Report the training process for ymir""" + if self.by_epoch: + monitor_interval = max(1, runner.max_epochs // 1000) + if runner.epoch % monitor_interval == 0: + write_ymir_monitor_process(self.ymir_cfg, + task='training', + naive_stage_percent=runner.epoch / runner.max_epochs, + stage=YmirStage.TASK) + super().after_train_epoch(runner) + def before_train_iter(self, runner): self._decide_interval(runner) super().before_train_iter(runner) + def after_train_iter(self, runner): + if not self.by_epoch: + monitor_interval = max(1, runner.max_iters // 1000) + if runner.iter % monitor_interval == 0: + write_ymir_monitor_process(self.ymir_cfg, + task='training', + naive_stage_percent=runner.iter / runner.max_iters, + stage=YmirStage.TASK) + super().after_train_iter(runner) + def _do_evaluate(self, runner): """perform evaluation and save ckpt.""" if not self._should_evaluate(runner): @@ -56,11 +79,18 @@ def _do_evaluate(self, runner): results = single_gpu_test(runner.model, self.dataloader, show=False) runner.log_buffer.output['eval_iter_num'] = len(self.dataloader) key_score = self.evaluate(runner, results) + write_ymir_training_result(last=False, key_score=key_score) # the key_score may be `None` so it needs to skip the action to save # the best checkpoint if self.save_best and key_score: self._save_ckpt(runner, key_score) + # TODO obtain best_score from runner + # best_score = runner.meta['hook_msgs'].get( + # 'best_score', self.init_value_map[self.rule]) + # if self.compare_func(key_score, best_score): + # write_ymir_training_result(key_score) + # Note: Considering that MMCV's EvalHook updated its interface in V1.3.16, # in order to avoid strong version dependency, we did not directly @@ -69,6 +99,7 @@ class DistEvalHook(BaseDistEvalHook): def __init__(self, *args, dynamic_intervals=None, **kwargs): super(DistEvalHook, self).__init__(*args, **kwargs) + self.ymir_cfg = get_merged_config() self.use_dynamic_intervals = dynamic_intervals is not None if self.use_dynamic_intervals: @@ -87,10 +118,31 @@ def before_train_epoch(self, runner): self._decide_interval(runner) super().before_train_epoch(runner) + def after_train_epoch(self, runner): + """Report the training process for ymir""" + if self.by_epoch and runner.rank == 0: + monitor_interval = max(1, runner.max_epochs // 1000) + if runner.epoch % monitor_interval == 0: + write_ymir_monitor_process(self.ymir_cfg, + task='training', + naive_stage_percent=runner.epoch / runner.max_epochs, + stage=YmirStage.TASK) + super().after_train_epoch(runner) + def before_train_iter(self, runner): self._decide_interval(runner) super().before_train_iter(runner) + def after_train_iter(self, runner): + if not self.by_epoch and runner.rank == 0: + monitor_interval = max(1, runner.max_iters // 1000) + if runner.iter % monitor_interval == 0: + write_ymir_monitor_process(self.ymir_cfg, + task='training', + naive_stage_percent=runner.iter / runner.max_iters, + stage=YmirStage.TASK) + super().after_train_iter(runner) + def _do_evaluate(self, runner): """perform evaluation and save ckpt.""" # Synchronization of BatchNorm's buffer (running_mean @@ -101,8 +153,7 @@ def _do_evaluate(self, runner): if self.broadcast_bn_buffer: model = runner.model for name, module in model.named_modules(): - if isinstance(module, - _BatchNorm) and module.track_running_stats: + if isinstance(module, _BatchNorm) and module.track_running_stats: dist.broadcast(module.running_var, 0) dist.broadcast(module.running_mean, 0) @@ -114,17 +165,19 @@ def _do_evaluate(self, runner): tmpdir = osp.join(runner.work_dir, '.eval_hook') from mmdet.apis import multi_gpu_test - results = multi_gpu_test( - runner.model, - self.dataloader, - tmpdir=tmpdir, - gpu_collect=self.gpu_collect) + results = multi_gpu_test(runner.model, self.dataloader, tmpdir=tmpdir, gpu_collect=self.gpu_collect) if runner.rank == 0: print('\n') runner.log_buffer.output['eval_iter_num'] = len(self.dataloader) key_score = self.evaluate(runner, results) - + write_ymir_training_result(last=False, key_score=key_score) # the key_score may be `None` so it needs to skip # the action to save the best checkpoint if self.save_best and key_score: self._save_ckpt(runner, key_score) + + # TODO obtain best_score from runner + # best_score = runner.meta['hook_msgs'].get( + # 'best_score', self.init_value_map[self.rule]) + # if self.compare_func(key_score, best_score): + # write_ymir_training_result(key_score) diff --git a/det-mmdetection-tmi/mmdet/datasets/__init__.py b/det-mmdetection-tmi/mmdet/datasets/__init__.py index f251d07..ff66046 100644 --- a/det-mmdetection-tmi/mmdet/datasets/__init__.py +++ b/det-mmdetection-tmi/mmdet/datasets/__init__.py @@ -15,6 +15,7 @@ from .voc import VOCDataset from .wider_face import WIDERFaceDataset from .xml_style import XMLDataset +from .ymir import YmirDataset __all__ = [ 'CustomDataset', 'XMLDataset', 'CocoDataset', 'DeepFashionDataset', @@ -24,5 +25,5 @@ 'ClassBalancedDataset', 'WIDERFaceDataset', 'DATASETS', 'PIPELINES', 'build_dataset', 'replace_ImageToTensor', 'get_loading_pipeline', 'NumClassCheckHook', 'CocoPanopticDataset', 'MultiImageMixDataset', - 'OpenImagesDataset', 'OpenImagesChallengeDataset' + 'OpenImagesDataset', 'OpenImagesChallengeDataset', 'YmirDataset' ] diff --git a/det-mmdetection-tmi/mmdet/datasets/coco.py b/det-mmdetection-tmi/mmdet/datasets/coco.py index efd6949..7de1cdb 100644 --- a/det-mmdetection-tmi/mmdet/datasets/coco.py +++ b/det-mmdetection-tmi/mmdet/datasets/coco.py @@ -3,6 +3,7 @@ import io import itertools import logging +import os import os.path as osp import tempfile import warnings @@ -12,7 +13,6 @@ import numpy as np from mmcv.utils import print_log from terminaltables import AsciiTable - from mmdet.core import eval_recalls from .api_wrappers import COCO, COCOeval from .builder import DATASETS @@ -592,4 +592,14 @@ def evaluate(self, f'{ap[4]:.3f} {ap[5]:.3f}') if tmp_dir is not None: tmp_dir.cleanup() + + COCO_EVAL_TMP_FILE = os.getenv('COCO_EVAL_TMP_FILE') + if COCO_EVAL_TMP_FILE is not None: + mmcv.dump(eval_results, COCO_EVAL_TMP_FILE, file_format='json') + else: + raise Exception( + 'please set valid environment variable COCO_EVAL_TMP_FILE to write result into json file') + + print_log( + f'\n write eval result to {COCO_EVAL_TMP_FILE}', logger=logger) return eval_results diff --git a/det-mmdetection-tmi/mmdet/datasets/ymir.py b/det-mmdetection-tmi/mmdet/datasets/ymir.py new file mode 100644 index 0000000..9215624 --- /dev/null +++ b/det-mmdetection-tmi/mmdet/datasets/ymir.py @@ -0,0 +1,186 @@ +# Copyright (c) OpenMMLab voc.py. All rights reserved. +# wangjiaxin 2022-04-25 + +import os.path as osp +import imagesize + +import json +from .builder import DATASETS +from .api_wrappers import COCO +from .coco import CocoDataset + + +@DATASETS.register_module() +class YmirDataset(CocoDataset): + """ + converted dataset by ymir system 1.0.0 + + /in/assets: image files directory + /in/annotations: annotation files directory + /in/train-index.tsv: image_file \t annotation_file + /in/val-index.tsv: image_file \t annotation_file + """ + + def __init__(self, + min_size=0, + ann_prefix='annotations', + **kwargs): + self.min_size = min_size + self.ann_prefix = ann_prefix + super(YmirDataset, self).__init__(**kwargs) + + def load_annotations(self, ann_file): + """Load annotation from TXT style ann_file. + + Args: + ann_file (str): Path of TXT file. + + Returns: + list[dict]: Annotation info from TXT file. + """ + + images = [] + categories = [] + # category_id is from 1 for coco, not 0 + for i, name in enumerate(self.CLASSES): + categories.append({'supercategory': 'none', + 'id': i+1, + 'name': name}) + + annotations = [] + instance_counter = 1 + image_counter = 1 + + with open(ann_file, 'r') as fp: + lines = fp.readlines() + + for line in lines: + # split any white space + img_path, ann_path = line.strip().split() + width, height = imagesize.get(img_path) + images.append( + dict(id=image_counter, + file_name=img_path, + ann_path=ann_path, + width=width, + height=height)) + + try: + anns = self.get_txt_ann_info(ann_path) + except Exception as e: + print(f'bad annotation for {ann_path} with {e}') + anns = [] + + for ann in anns: + ann['image_id'] = image_counter + ann['id'] = instance_counter + annotations.append(ann) + instance_counter += 1 + + image_counter += 1 + + # pycocotool coco init + self.coco = COCO() + self.coco.dataset['type'] = 'instances' + self.coco.dataset['categories'] = categories + self.coco.dataset['images'] = images + self.coco.dataset['annotations'] = annotations + self.coco.createIndex() + + # mmdetection coco init + # avoid the filter problem in CocoDataset, view coco_api.py for detail + self.coco.img_ann_map = self.coco.imgToAnns + self.coco.cat_img_map = self.coco.catToImgs + + # get valid category_id (in annotation, start from 1, arbitary) + self.cat_ids = self.coco.get_cat_ids(cat_names=self.CLASSES) + # convert category_id to label(train_id, start from 0) + self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)} + self.img_ids = self.coco.get_img_ids() + # self.img_ids = list(self.coco.imgs.keys()) + assert len(self.img_ids) > 0, 'image number must > 0' + print(f'load {len(self.img_ids)} image from YMIR dataset') + + data_infos = [] + total_ann_ids = [] + for i in self.img_ids: + info = self.coco.load_imgs([i])[0] + info['filename'] = info['file_name'] + data_infos.append(info) + ann_ids = self.coco.get_ann_ids(img_ids=[i]) + total_ann_ids.extend(ann_ids) + assert len(set(total_ann_ids)) == len( + total_ann_ids), f"Annotation ids in '{ann_file}' are not unique!" + return data_infos + + def dump(self, ann_file): + with open(ann_file, 'w') as fp: + json.dump(self.coco.dataset, fp) + + def get_ann_path_from_img_path(self, img_path): + img_id = osp.splitext(osp.basename(img_path))[0] + return osp.join(self.data_root, self.ann_prefix, img_id+'.txt') + + def get_txt_ann_info(self, txt_path): + """Get annotation from TXT file by index. + + Args: + idx (int): Index of data. + + Returns: + dict: Annotation info of specified index. + """ + anns = [] + if osp.exists(txt_path): + with open(txt_path, 'r') as fp: + lines = fp.readlines() + else: + lines = [] + for line in lines: + obj = [int(x) for x in line.strip().split(',')[0:5]] + # YMIR category id starts from 0, coco from 1 + category_id, xmin, ymin, xmax, ymax = obj + h, w = ymax-ymin, xmax-xmin + ignore = 0 + if self.min_size: + assert not self.test_mode + if w < self.min_size or h < self.min_size: + ignore = 1 + + ann = dict( + segmentation=[ + [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax]], + area=w*h, + iscrowd=0, + image_id=None, + bbox=[xmin, ymin, w, h], + category_id=category_id+1, # category id is from 1 for coco + id=None, + ignore=ignore + ) + anns.append(ann) + return anns + + def get_cat_ids(self, idx): + """Get category ids in TXT file by index. + + Args: + idx (int): Index of data. + + Returns: + list[int]: All categories in the image of specified index. + """ + + cat_ids = [] + txt_path = self.data_infos[idx]['ann_path'] + if osp.exists(txt_path): + with open(txt_path, 'r') as fp: + lines = fp.readlines() + else: + lines = [] + + for line in lines: + obj = [int(x) for x in line.strip().split(',')] + # label, xmin, ymin, xmax, ymax = obj + cat_ids.append(obj[0]) + + return cat_ids diff --git a/det-mmdetection-tmi/mmdet/utils/util_ymir.py b/det-mmdetection-tmi/mmdet/utils/util_ymir.py new file mode 100644 index 0000000..6cb9ae2 --- /dev/null +++ b/det-mmdetection-tmi/mmdet/utils/util_ymir.py @@ -0,0 +1,314 @@ +""" +utils function for ymir and yolov5 +""" +import glob +import logging +import os +import os.path as osp +from typing import Any, Iterable, List, Optional, Union + +import mmcv +import yaml +from easydict import EasyDict as edict +from mmcv import Config, ConfigDict +from nptyping import NDArray, Shape, UInt8 +from packaging.version import Version +from ymir_exc import result_writer as rw +from ymir_exc.util import get_merged_config + +BBOX = NDArray[Shape['*,4'], Any] +CV_IMAGE = NDArray[Shape['*,*,3'], UInt8] + + +def modify_mmcv_config(mmcv_cfg: Config, ymir_cfg: edict) -> None: + """ + useful for training process + - modify dataset config + - modify model output channel + - modify epochs, checkpoint, tensorboard config + """ + + def recursive_modify_attribute(mmcv_cfgdict: Union[Config, ConfigDict], attribute_key: str, attribute_value: Any): + """ + recursive modify mmcv_cfg: + 1. mmcv_cfg.attribute_key to attribute_value + 2. mmcv_cfg.xxx.xxx.xxx.attribute_key to attribute_value (recursive) + 3. mmcv_cfg.xxx[i].attribute_key to attribute_value (i=0, 1, 2 ...) + 4. mmcv_cfg.xxx[i].xxx.xxx[j].attribute_key to attribute_value + """ + for key in mmcv_cfgdict: + if key == attribute_key: + mmcv_cfgdict[key] = attribute_value + logging.info(f'modify {mmcv_cfgdict}, {key} = {attribute_value}') + elif isinstance(mmcv_cfgdict[key], (Config, ConfigDict)): + recursive_modify_attribute(mmcv_cfgdict[key], attribute_key, attribute_value) + elif isinstance(mmcv_cfgdict[key], Iterable): + for cfg in mmcv_cfgdict[key]: + if isinstance(cfg, (Config, ConfigDict)): + recursive_modify_attribute(cfg, attribute_key, attribute_value) + + # modify dataset config + ymir_ann_files = dict(train=ymir_cfg.ymir.input.training_index_file, + val=ymir_cfg.ymir.input.val_index_file, + test=ymir_cfg.ymir.input.candidate_index_file) + + # validation may augment the image and use more gpu + # so set smaller samples_per_gpu for validation + samples_per_gpu = ymir_cfg.param.samples_per_gpu + workers_per_gpu = ymir_cfg.param.workers_per_gpu + mmcv_cfg.data.samples_per_gpu = samples_per_gpu + mmcv_cfg.data.workers_per_gpu = workers_per_gpu + + # modify model output channel + num_classes = len(ymir_cfg.param.class_names) + recursive_modify_attribute(mmcv_cfg.model, 'num_classes', num_classes) + + for split in ['train', 'val', 'test']: + ymir_dataset_cfg = dict(type='YmirDataset', + ann_file=ymir_ann_files[split], + img_prefix=ymir_cfg.ymir.input.assets_dir, + ann_prefix=ymir_cfg.ymir.input.annotations_dir, + classes=ymir_cfg.param.class_names, + data_root=ymir_cfg.ymir.input.root_dir, + filter_empty_gt=False) + # modify dataset config for `split` + mmdet_dataset_cfg = mmcv_cfg.data.get(split, None) + if mmdet_dataset_cfg is None: + continue + + if isinstance(mmdet_dataset_cfg, (list, tuple)): + for x in mmdet_dataset_cfg: + x.update(ymir_dataset_cfg) + else: + src_dataset_type = mmdet_dataset_cfg.type + if src_dataset_type in ['CocoDataset', 'YmirDataset']: + mmdet_dataset_cfg.update(ymir_dataset_cfg) + elif src_dataset_type in ['MultiImageMixDataset', 'RepeatDataset']: + mmdet_dataset_cfg.dataset.update(ymir_dataset_cfg) + else: + raise Exception(f'unsupported source dataset type {src_dataset_type}') + + # modify epochs, checkpoint, tensorboard config + if ymir_cfg.param.get('max_epochs', None): + mmcv_cfg.runner.max_epochs = int(ymir_cfg.param.max_epochs) + mmcv_cfg.checkpoint_config['out_dir'] = ymir_cfg.ymir.output.models_dir + tensorboard_logger = dict(type='TensorboardLoggerHook', log_dir=ymir_cfg.ymir.output.tensorboard_dir) + if len(mmcv_cfg.log_config['hooks']) <= 1: + mmcv_cfg.log_config['hooks'].append(tensorboard_logger) + else: + mmcv_cfg.log_config['hooks'][1].update(tensorboard_logger) + + # TODO save only the best top-k model weight files. + # modify evaluation and interval + val_interval: int = int(ymir_cfg.param.get('val_interval', 1)) + if val_interval > 0: + val_interval = min(val_interval, mmcv_cfg.runner.max_epochs) + else: + val_interval = 1 + + mmcv_cfg.evaluation.interval = val_interval + mmcv_cfg.evaluation.metric = ymir_cfg.param.get('metric', 'bbox') + + # save best top-k model weights files + # max_keep_ckpts <= 0 # save all checkpoints + max_keep_ckpts: int = int(ymir_cfg.param.get('max_keep_checkpoints', 1)) + mmcv_cfg.checkpoint_config.interval = mmcv_cfg.evaluation.interval + mmcv_cfg.checkpoint_config.max_keep_ckpts = max_keep_ckpts + + # TODO Whether to evaluating the AP for each class + # mmdet_cfg.evaluation.classwise = True + + # fix DDP error + mmcv_cfg.find_unused_parameters = True + + # set work dir + mmcv_cfg.work_dir = ymir_cfg.ymir.output.models_dir + + args_options = ymir_cfg.param.get("args_options", '') + cfg_options = ymir_cfg.param.get("cfg_options", '') + + # auto load offered weight file if not set by user! + if (args_options.find('--resume-from') == -1 and args_options.find('--load-from') == -1 + and cfg_options.find('load_from') == -1 and cfg_options.find('resume_from') == -1): # noqa: E129 + + weight_file = get_best_weight_file(ymir_cfg) + if weight_file: + if cfg_options: + cfg_options += f' load_from={weight_file}' + else: + cfg_options = f'load_from={weight_file}' + else: + logging.warning('no weight file used for training!') + + +def get_best_weight_file(cfg: edict) -> str: + """ + return the weight file path by priority + find weight file in cfg.param.pretrained_model_params or cfg.param.model_params_path + load coco-pretrained weight for yolox + """ + if cfg.ymir.run_training: + model_params_path: List[str] = cfg.param.get('pretrained_model_params', []) + else: + model_params_path = cfg.param.get('model_params_path', []) + + model_dir = cfg.ymir.input.models_dir + model_params_path = [ + osp.join(model_dir, p) for p in model_params_path + if osp.exists(osp.join(model_dir, p)) and p.endswith(('.pth', '.pt')) + ] + + # choose weight file by priority, best_xxx.pth > latest.pth > epoch_xxx.pth + best_pth_files = [f for f in model_params_path if osp.basename(f).startswith('best_')] + if len(best_pth_files) > 0: + return max(best_pth_files, key=os.path.getctime) + + epoch_pth_files = [f for f in model_params_path if osp.basename(f).startswith(('epoch_', 'iter_'))] + if len(epoch_pth_files) > 0: + return max(epoch_pth_files, key=os.path.getctime) + + if cfg.ymir.run_training: + weight_files = [f for f in glob.glob('/weights/**/*', recursive=True) if f.endswith(('.pth', '.pt'))] + + # load pretrained model weight for yolox only + model_name_splits = osp.basename(cfg.param.config_file).split('_') + if len(weight_files) > 0 and model_name_splits[0] == 'yolox': + yolox_weight_files = [ + f for f in weight_files if osp.basename(f).startswith(f'yolox_{model_name_splits[1]}') + ] + + if len(yolox_weight_files) == 0: + if model_name_splits[1] == 'nano': + # yolox_tiny_8x8_300e_coco_20211124_171234-b4047906.pth or yolox_tiny.py + yolox_weight_files = [f for f in weight_files if osp.basename(f).startswith('yolox_tiny')] + else: + yolox_weight_files = [f for f in weight_files if osp.basename(f).startswith('yolox_s')] + + if len(yolox_weight_files) > 0: + logging.info(f'load yolox pretrained weight {yolox_weight_files[0]}') + return yolox_weight_files[0] + return "" + + +def write_ymir_training_result(last: bool = False, key_score: Optional[float] = None): + YMIR_VERSION = os.environ.get('YMIR_VERSION', '1.2.0') + if Version(YMIR_VERSION) >= Version('1.2.0'): + _write_latest_ymir_training_result(last, key_score) + else: + _write_ancient_ymir_training_result(key_score) + + +def get_topk_checkpoints(files: List[str], k: int) -> List[str]: + """ + keep topk checkpoint files, remove other files. + """ + checkpoints_files = [f for f in files if f.endswith(('.pth', '.pt'))] + + best_pth_files = [f for f in checkpoints_files if osp.basename(f).startswith('best_')] + if len(best_pth_files) > 0: + # newest first + topk_best_pth_files = sorted(best_pth_files, key=os.path.getctime, reverse=True) + else: + topk_best_pth_files = [] + + epoch_pth_files = [f for f in checkpoints_files if osp.basename(f).startswith(('epoch_', 'iter_'))] + if len(epoch_pth_files) > 0: + topk_epoch_pth_files = sorted(epoch_pth_files, key=os.path.getctime, reverse=True) + else: + topk_epoch_pth_files = [] + + # python will check the length of list + return topk_best_pth_files[0:k] + topk_epoch_pth_files[0:k] + + +# TODO save topk checkpoints, fix invalid stage due to delete checkpoint +def _write_latest_ymir_training_result(last: bool = False, key_score: Optional[float] = None): + if key_score: + logging.info(f'key_score is {key_score}') + COCO_EVAL_TMP_FILE = os.getenv('COCO_EVAL_TMP_FILE') + if COCO_EVAL_TMP_FILE is None: + raise Exception('please set valid environment variable COCO_EVAL_TMP_FILE to write result into json file') + + eval_result = mmcv.load(COCO_EVAL_TMP_FILE) + # eval_result may be empty dict {}. + map = eval_result.get('bbox_mAP_50', 0) + + WORK_DIR = os.getenv('YMIR_MODELS_DIR') + if WORK_DIR is None or not osp.isdir(WORK_DIR): + raise Exception(f'please set valid environment variable YMIR_MODELS_DIR, invalid directory {WORK_DIR}') + + # assert only one model config file in work_dir + result_files = [f for f in glob.glob(osp.join(WORK_DIR, '*')) if osp.basename(f) != 'result.yaml'] + + if last: + # save all output file + ymir_cfg = get_merged_config() + max_keep_checkpoints = int(ymir_cfg.param.get('max_keep_checkpoints', 1)) + if max_keep_checkpoints > 0: + topk_checkpoints = get_topk_checkpoints(result_files, max_keep_checkpoints) + result_files = [f for f in result_files if not f.endswith(('.pth', '.pt'))] + topk_checkpoints + + result_files = [osp.basename(f) for f in result_files] + rw.write_model_stage(files=result_files, mAP=float(map), stage_name='last') + else: + result_files = [osp.basename(f) for f in result_files] + # save newest weight file in format epoch_xxx.pth or iter_xxx.pth + weight_files = [ + osp.join(WORK_DIR, f) for f in result_files if f.startswith(('iter_', 'epoch_')) and f.endswith('.pth') + ] + + if len(weight_files) > 0: + newest_weight_file = osp.basename(max(weight_files, key=os.path.getctime)) + + stage_name = osp.splitext(newest_weight_file)[0] + training_result_file = osp.join(WORK_DIR, 'result.yaml') + if osp.exists(training_result_file): + with open(training_result_file, 'r') as f: + training_result = yaml.safe_load(f) + model_stages = training_result.get('model_stages', {}) + else: + model_stages = {} + + if stage_name not in model_stages: + config_files = [f for f in result_files if f.endswith('.py')] + rw.write_model_stage(files=[newest_weight_file] + config_files, mAP=float(map), stage_name=stage_name) + + +def _write_ancient_ymir_training_result(key_score: Optional[float] = None): + if key_score: + logging.info(f'key_score is {key_score}') + + COCO_EVAL_TMP_FILE = os.getenv('COCO_EVAL_TMP_FILE') + if COCO_EVAL_TMP_FILE is None: + raise Exception('please set valid environment variable COCO_EVAL_TMP_FILE to write result into json file') + + eval_result = mmcv.load(COCO_EVAL_TMP_FILE) + # eval_result may be empty dict {}. + map = eval_result.get('bbox_mAP_50', 0) + + ymir_cfg = get_merged_config() + WORK_DIR = ymir_cfg.ymir.output.models_dir + + # assert only one model config file in work_dir + result_files = [f for f in glob.glob(osp.join(WORK_DIR, '*')) if osp.basename(f) != 'result.yaml'] + + max_keep_checkpoints = int(ymir_cfg.param.get('max_keep_checkpoints', 1)) + if max_keep_checkpoints > 0: + topk_checkpoints = get_topk_checkpoints(result_files, max_keep_checkpoints) + result_files = [f for f in result_files if not f.endswith(('.pth', '.pt'))] + topk_checkpoints + + # convert to basename + result_files = [osp.basename(f) for f in result_files] + + training_result_file = osp.join(WORK_DIR, 'result.yaml') + if osp.exists(training_result_file): + with open(training_result_file, 'r') as f: + training_result = yaml.safe_load(f) + + training_result['model'] = result_files + training_result['map'] = max(map, training_result['map']) + else: + training_result = dict(model=result_files, map=map) + + with open(training_result_file, 'w') as f: + yaml.safe_dump(training_result, f) diff --git a/det-mmdetection-tmi/requirements/runtime.txt b/det-mmdetection-tmi/requirements/runtime.txt index f7a2cc7..cf0fac6 100644 --- a/det-mmdetection-tmi/requirements/runtime.txt +++ b/det-mmdetection-tmi/requirements/runtime.txt @@ -2,4 +2,10 @@ matplotlib numpy pycocotools six +scipy terminaltables +easydict +nptyping +imagesize>=1.3.0 +future +tensorboard>=2.5.0 diff --git a/det-mmdetection-tmi/start.py b/det-mmdetection-tmi/start.py new file mode 100644 index 0000000..81e2174 --- /dev/null +++ b/det-mmdetection-tmi/start.py @@ -0,0 +1,80 @@ +import logging +import os +import subprocess +import sys + +from easydict import EasyDict as edict +from ymir_exc import monitor +from ymir_exc.util import find_free_port, get_merged_config + + +def start(cfg: edict) -> int: + logging.info(f'merged config: {cfg}') + + if cfg.ymir.run_training: + _run_training() + elif cfg.ymir.run_mining or cfg.ymir.run_infer: + if cfg.ymir.run_mining: + _run_mining(cfg) + if cfg.ymir.run_infer: + _run_infer() + else: + logging.warning('no task running') + + return 0 + + +def _run_training() -> None: + command = 'python3 ymir_train.py' + logging.info(f'start training: {command}') + subprocess.run(command.split(), check=True) + + # if task done, write 100% percent log + monitor.write_monitor_logger(percent=1.0) + logging.info("training finished") + + +def _run_mining(cfg: edict) -> None: + gpu_id: str = str(cfg.param.get('gpu_id', '0')) + gpu_count = len(gpu_id.split(',')) + mining_algorithm: str = cfg.param.get('mining_algorithm', 'aldd') + + supported_miner = ['cald', 'aldd', 'random', 'entropy'] + assert mining_algorithm in supported_miner, f'unknown mining_algorithm {mining_algorithm}, not in {supported_miner}' + if gpu_count <= 1: + command = f'python3 ymir_mining_{mining_algorithm}.py' + else: + port = find_free_port() + command = f'python3 -m torch.distributed.launch --nproc_per_node {gpu_count} --master_port {port} ymir_mining_{mining_algorithm}.py' # noqa + + logging.info(f'start mining: {command}') + subprocess.run(command.split(), check=True) + logging.info("mining finished") + + +def _run_infer() -> None: + gpu_id: str = str(cfg.param.get('gpu_id', '0')) + gpu_count = len(gpu_id.split(',')) + + if gpu_count <= 1: + command = 'python3 ymir_infer.py' + else: + port = find_free_port() + command = f'python3 -m torch.distributed.launch --nproc_per_node {gpu_count} --master_port {port} ymir_infer.py' # noqa + + logging.info(f'start infer: {command}') + subprocess.run(command.split(), check=True) + logging.info("infer finished") + + +if __name__ == '__main__': + logging.basicConfig(stream=sys.stdout, + format='%(levelname)-8s: [%(asctime)s] %(message)s', + datefmt='%Y%m%d-%H:%M:%S', + level=logging.INFO) + + cfg = get_merged_config() + os.environ.setdefault('YMIR_MODELS_DIR', cfg.ymir.output.models_dir) + os.environ.setdefault('COCO_EVAL_TMP_FILE', os.path.join(cfg.ymir.output.root_dir, 'eval_tmp.json')) + os.environ.setdefault('PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION', 'python') + sys.exit(start(cfg)) diff --git a/det-mmdetection-tmi/tools/train.py b/det-mmdetection-tmi/tools/train.py index b9e9981..78fbe46 100644 --- a/det-mmdetection-tmi/tools/train.py +++ b/det-mmdetection-tmi/tools/train.py @@ -11,12 +11,13 @@ from mmcv import Config, DictAction from mmcv.runner import get_dist_info, init_dist from mmcv.utils import get_git_hash - from mmdet import __version__ from mmdet.apis import init_random_seed, set_random_seed, train_detector from mmdet.datasets import build_dataset from mmdet.models import build_detector from mmdet.utils import collect_env, get_root_logger, setup_multi_processes +from mmdet.utils.util_ymir import modify_mmcv_config +from ymir_exc.util import get_merged_config def parse_args(): @@ -96,8 +97,11 @@ def parse_args(): def main(): args = parse_args() - + ymir_cfg = get_merged_config() cfg = Config.fromfile(args.config) + # modify mmdet config from file + modify_mmcv_config(mmcv_cfg=cfg, ymir_cfg=ymir_cfg) + if args.cfg_options is not None: cfg.merge_from_dict(args.cfg_options) diff --git a/det-mmdetection-tmi/training-template.yaml b/det-mmdetection-tmi/training-template.yaml new file mode 100644 index 0000000..05b11b2 --- /dev/null +++ b/det-mmdetection-tmi/training-template.yaml @@ -0,0 +1,12 @@ +shm_size: '128G' +export_format: 'ark:raw' +samples_per_gpu: 16 # batch size per gpu +workers_per_gpu: 4 +max_epochs: 100 +config_file: 'configs/yolox/yolox_tiny_8x8_300e_coco.py' +args_options: '' +cfg_options: '' +metric: 'bbox' +val_interval: 1 # <0 means evaluation every interval +max_keep_checkpoints: 1 # <0 means save all weight file, 1 means save last and best weight files, k means save topk best weight files and topk epoch/step weigth files +ymir_saved_file_patterns: '' # custom saved files, support python regular expression, use , to split multiple pattern diff --git a/det-mmdetection-tmi/ymir_infer.py b/det-mmdetection-tmi/ymir_infer.py new file mode 100644 index 0000000..62817ad --- /dev/null +++ b/det-mmdetection-tmi/ymir_infer.py @@ -0,0 +1,153 @@ +import argparse +import os +import os.path as osp +import sys +import warnings +from typing import Any, List + +import cv2 +import numpy as np +import torch.distributed as dist +from easydict import EasyDict as edict +from mmcv import DictAction +from mmcv.runner import init_dist +from tqdm import tqdm +from ymir_exc import result_writer as rw +from ymir_exc.util import (YmirStage, get_merged_config, + write_ymir_monitor_process) + +from mmdet.apis import inference_detector, init_detector +from mmdet.apis.test import collect_results_gpu +from mmdet.utils.util_ymir import get_best_weight_file + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +def parse_option(cfg_options: str) -> dict: + parser = argparse.ArgumentParser(description='parse cfg options') + parser.add_argument('--cfg-options', + nargs='+', + action=DictAction, + help='override some settings in the used config, the key-value pair ' + 'in xxx=yyy format will be merged into config file. If the value to ' + 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' + 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' + 'Note that the quotation marks are necessary and that no white space ' + 'is allowed.') + + args = parser.parse_args(f'--cfg-options {cfg_options}'.split()) + return args.cfg_options + + +def mmdet_result_to_ymir(results: List[Any], class_names: List[str]) -> List[rw.Annotation]: + """ + results: List[NDArray[Shape['*,5'], Any]] + """ + ann_list = [] + for idx, result in enumerate(results): + for line in result: + if any(np.isinf(line)): + continue + x1, y1, x2, y2, score = line + ann = rw.Annotation(class_name=class_names[idx], + score=score, + box=rw.Box(x=round(x1), y=round(y1), w=round(x2 - x1), h=round(y2 - y1))) + ann_list.append(ann) + return ann_list + + +def get_config_file(cfg): + if cfg.ymir.run_training: + model_params_path: List = cfg.param.get('pretrained_model_params', []) # type: ignore + else: + model_params_path: List = cfg.param.get('model_params_path', []) # type: ignore + + model_dir = cfg.ymir.input.models_dir + config_files = [ + osp.join(model_dir, p) for p in model_params_path if osp.exists(osp.join(model_dir, p)) and p.endswith(('.py')) + ] + + if len(config_files) > 0: + if len(config_files) > 1: + warnings.warn(f'multiple config file found! use {config_files[0]}') + return config_files[0] + else: + raise Exception(f'no config_file found in {model_dir} and {model_params_path}') + + +class YmirModel: + def __init__(self, cfg: edict): + self.cfg = cfg + + # Specify the path to model config and checkpoint file + config_file = get_config_file(cfg) + checkpoint_file = get_best_weight_file(cfg) + options = cfg.param.get('cfg_options', None) + cfg_options = parse_option(options) if options else None + + # current infer can only use one gpu!!! + # gpu_ids = cfg.param.get('gpu_id', '0') + # gpu_id = gpu_ids.split(',')[0] + gpu_id = max(0, RANK) + # build the model from a config file and a checkpoint file + self.model = init_detector(config_file, checkpoint_file, device=f'cuda:{gpu_id}', cfg_options=cfg_options) + + def infer(self, img): + return inference_detector(self.model, img) + + +def main(): + if LOCAL_RANK != -1: + init_dist(launcher='pytorch', backend="nccl" if dist.is_nccl_available() else "gloo") + + cfg = get_merged_config() + + with open(cfg.ymir.input.candidate_index_file, 'r') as f: + images = [line.strip() for line in f.readlines()] + + max_barrier_times = len(images) // WORLD_SIZE + if RANK == -1: + N = len(images) + tbar = tqdm(images) + else: + images_rank = images[RANK::WORLD_SIZE] + N = len(images_rank) + if RANK == 0: + tbar = tqdm(images_rank) + else: + tbar = images_rank + infer_result_list = [] + model = YmirModel(cfg) + + # write infer result + monitor_gap = max(1, N // 100) + conf_threshold = float(cfg.param.conf_threshold) + for idx, asset_path in enumerate(tbar): + img = cv2.imread(asset_path) + result = model.infer(img) + raw_anns = mmdet_result_to_ymir(result, cfg.param.class_names) + + # batch-level sync, avoid 30min time-out error + if WORLD_SIZE > 1 and idx < max_barrier_times: + dist.barrier() + + infer_result_list.append((asset_path, [ann for ann in raw_anns if ann.score >= conf_threshold])) + + if idx % monitor_gap == 0 and RANK in [0, -1]: + write_ymir_monitor_process(cfg, task='infer', naive_stage_percent=idx / N, stage=YmirStage.TASK) + + if WORLD_SIZE > 1: + dist.barrier() + infer_result_list = collect_results_gpu(infer_result_list, len(images)) + + if RANK in [0, -1]: + infer_result_dict = {k: v for k, v in infer_result_list} + rw.write_infer_result(infer_result=infer_result_dict) + write_ymir_monitor_process(cfg, task='infer', naive_stage_percent=1.0, stage=YmirStage.POSTPROCESS) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/det-mmdetection-tmi/ymir_mining_aldd.py b/det-mmdetection-tmi/ymir_mining_aldd.py new file mode 100644 index 0000000..59eea4b --- /dev/null +++ b/det-mmdetection-tmi/ymir_mining_aldd.py @@ -0,0 +1,75 @@ +import sys + +import torch +from easydict import EasyDict as edict +from mmcv.parallel import collate, scatter +from ymir_exc.util import get_merged_config + +from mining_base import ALDDMining +from mmdet.datasets import replace_ImageToTensor +from mmdet.datasets.pipelines import Compose +from mmdet.models.detectors import YOLOX +from ymir_infer import YmirModel +from ymir_mining_random import RandomMiner + + +class ALDDMiner(RandomMiner): + + def __init__(self, cfg: edict): + super().__init__(cfg) + self.ymir_model = YmirModel(cfg) + mmdet_cfg = self.ymir_model.model.cfg + mmdet_cfg.data.test.pipeline = replace_ImageToTensor(mmdet_cfg.data.test.pipeline) + self.test_pipeline = Compose(mmdet_cfg.data.test.pipeline) + self.aldd_miner = ALDDMining(cfg, [640, 640]) + + def compute_score(self, asset_path: str) -> float: + dict_data = dict(img_info=dict(filename=asset_path), img_prefix=None) + pipeline_data = self.test_pipeline(dict_data) + data = collate([pipeline_data], samples_per_gpu=1) + # just get the actual data from DataContainer + data['img_metas'] = [img_metas.data[0] for img_metas in data['img_metas']] + data['img'] = [img.data[0] for img in data['img']] + # scatter to specified GPU + data = scatter(data, [self.device])[0] + + if isinstance(self.ymir_model.model, YOLOX): + # results = (cls_maps, reg_maps, iou_maps) + # cls_maps: [BxCx52x52, BxCx26x26, BxCx13x13] + # reg_maps: [Bx4x52x52, Bx4x26x26, Bx4x13x13] + # iou_maps: [Bx1x51x52, Bx1x26x26, Bx1x13x13] + results = self.ymir_model.model.forward_dummy(data['img'][0]) + feature_maps = [] + for cls, reg, iou in zip(results[0], results[1], results[2]): + maps = [reg, iou, cls] + feature_maps.append(torch.cat(maps, dim=1)) + mining_score = self.aldd_miner.mining(feature_maps) + + return mining_score.item() + else: + raise NotImplementedError( + 'aldd mining is currently not currently supported with {}, only support YOLOX'.format( + self.ymir_model.model.__class__.__name__)) + + # TODO support other SingleStageDetector + # if isinstance(self.ymir_model.model, SingleStageDetector): + # pass + # elif isinstance(self.ymir_model.model, TwoStageDetector): + # # (rpn_outs, roi_outs) + # # outs = self.ymir_model.model.forward_dummy(img) + # raise NotImplementedError('aldd mining is currently not currently supported TwoStageDetector {}'.format( + # self.ymir_model.model.__class__.__name__)) + # else: + # raise NotImplementedError('aldd mining is currently not currently supported with {}'.format( + # self.ymir_model.model.__class__.__name__)) + + +def main(): + cfg = get_merged_config() + miner = ALDDMiner(cfg) + miner.mining() + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/det-mmdetection-tmi/ymir_mining_cald.py b/det-mmdetection-tmi/ymir_mining_cald.py new file mode 100644 index 0000000..efb253f --- /dev/null +++ b/det-mmdetection-tmi/ymir_mining_cald.py @@ -0,0 +1,391 @@ +""" +data augmentations for CALD method, including horizontal_flip, rotate(5'), cutout +official code: https://github.com/we1pingyu/CALD/blob/master/cald/cald_helper.py +""" +import os +import random +import sys +from typing import Any, Callable, Dict, List, Tuple + +import cv2 +import numpy as np +import torch +import torch.distributed as dist +from mmcv.runner import init_dist +from mmdet.apis.test import collect_results_gpu +from mmdet.utils.util_ymir import BBOX, CV_IMAGE +from nptyping import NDArray +from scipy.stats import entropy +from tqdm import tqdm +from ymir_exc import result_writer as rw +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process +from ymir_infer import YmirModel + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +def intersect(boxes1: BBOX, boxes2: BBOX) -> NDArray: + ''' + Find intersection of every box combination between two sets of box + boxes1: bounding boxes 1, a tensor of dimensions (n1, 4) + boxes2: bounding boxes 2, a tensor of dimensions (n2, 4) + + Out: Intersection each of boxes1 with respect to each of boxes2, + a tensor of dimensions (n1, n2) + ''' + n1 = boxes1.shape[0] + n2 = boxes2.shape[0] + max_xy = np.minimum( + np.expand_dims(boxes1[:, 2:], axis=1).repeat(n2, axis=1), + np.expand_dims(boxes2[:, 2:], axis=0).repeat(n1, axis=0)) + + min_xy = np.maximum( + np.expand_dims(boxes1[:, :2], axis=1).repeat(n2, axis=1), + np.expand_dims(boxes2[:, :2], axis=0).repeat(n1, axis=0)) + inter = np.clip(max_xy - min_xy, a_min=0, a_max=None) # (n1, n2, 2) + return inter[:, :, 0] * inter[:, :, 1] # (n1, n2) + + +def horizontal_flip(image: CV_IMAGE, bbox: BBOX) \ + -> Tuple[CV_IMAGE, BBOX]: + """ + image: opencv image, [height,width,channels] + bbox: numpy.ndarray, [N,4] --> [x1,y1,x2,y2] + """ + image = image.copy() + + width = image.shape[1] + # Flip image horizontally + image = image[:, ::-1, :] + if len(bbox) > 0: + bbox = bbox.copy() + # Flip bbox horizontally + bbox[:, [0, 2]] = width - bbox[:, [2, 0]] + return image, bbox + + +def cutout(image: CV_IMAGE, + bbox: BBOX, + cut_num: int = 2, + fill_val: int = 0, + bbox_remove_thres: float = 0.4, + bbox_min_thres: float = 0.1) -> Tuple[CV_IMAGE, BBOX]: + ''' + Cutout augmentation + image: A PIL image + boxes: bounding boxes, a tensor of dimensions (#objects, 4) + labels: labels of object, a tensor of dimensions (#objects) + fill_val: Value filled in cut out + bbox_remove_thres: Theshold to remove bbox cut by cutout + + Out: new image, new_boxes, new_labels + ''' + image = image.copy() + bbox = bbox.copy() + + if len(bbox) == 0: + return image, bbox + + original_h, original_w, original_channel = image.shape + count = 0 + for _ in range(50): + # Random cutout size: [0.15, 0.5] of original dimension + cutout_size_h = random.uniform(0.05 * original_h, 0.2 * original_h) + cutout_size_w = random.uniform(0.05 * original_w, 0.2 * original_w) + + # Random position for cutout + left = random.uniform(0, original_w - cutout_size_w) + right = left + cutout_size_w + top = random.uniform(0, original_h - cutout_size_h) + bottom = top + cutout_size_h + cutout = np.array([[float(left), float(top), float(right), float(bottom)]]) + + # Calculate intersect between cutout and bounding boxes + overlap_size = intersect(cutout, bbox) + area_boxes = (bbox[:, 2] - bbox[:, 0]) * (bbox[:, 3] - bbox[:, 1]) + ratio = overlap_size / (area_boxes + 1e-14) + # If all boxes have Iou greater than bbox_remove_thres, try again + if ratio.max() > bbox_remove_thres or ratio.max() < bbox_min_thres: + continue + + image[int(top):int(bottom), int(left):int(right), :] = fill_val + count += 1 + if count >= cut_num: + break + return image, bbox + + +def rotate(image: CV_IMAGE, bbox: BBOX, rot: float = 5) -> Tuple[CV_IMAGE, BBOX]: + image = image.copy() + bbox = bbox.copy() + h, w, c = image.shape + center = np.array([w / 2.0, h / 2.0]) + s = max(h, w) * 1.0 + trans = get_affine_transform(center, s, rot, [w, h]) + if len(bbox) > 0: + for i in range(bbox.shape[0]): + x1, y1 = affine_transform(bbox[i, :2], trans) + x2, y2 = affine_transform(bbox[i, 2:], trans) + x3, y3 = affine_transform(bbox[i, [2, 1]], trans) + x4, y4 = affine_transform(bbox[i, [0, 3]], trans) + bbox[i, :2] = [min(x1, x2, x3, x4), min(y1, y2, y3, y4)] + bbox[i, 2:] = [max(x1, x2, x3, x4), max(y1, y2, y3, y4)] + image = cv2.warpAffine(image, trans, (w, h), flags=cv2.INTER_LINEAR) + return image, bbox + + +def get_3rd_point(a: NDArray, b: NDArray) -> NDArray: + direct = a - b + return b + np.array([-direct[1], direct[0]], dtype=np.float32) + + +def get_dir(src_point: NDArray, rot_rad: float) -> List: + sn, cs = np.sin(rot_rad), np.cos(rot_rad) + + src_result = [0, 0] + src_result[0] = src_point[0] * cs - src_point[1] * sn + src_result[1] = src_point[0] * sn + src_point[1] * cs + + return src_result + + +def transform_preds(coords: NDArray, center: NDArray, scale: Any, rot: float, output_size: List) -> NDArray: + trans = get_affine_transform(center, scale, rot, output_size, inv=True) + target_coords = affine_transform(coords, trans) + return target_coords + + +def get_affine_transform(center: NDArray, + scale: Any, + rot: float, + output_size: List, + shift: NDArray = np.array([0, 0], dtype=np.float32), + inv: bool = False) -> NDArray: + if not isinstance(scale, np.ndarray) and not isinstance(scale, list): + scale = np.array([scale, scale], dtype=np.float32) + + scale_tmp = scale + src_w = scale_tmp[0] + dst_w = output_size[0] + dst_h = output_size[1] + + rot_rad = np.pi * rot / 180 + src_dir = get_dir(np.array([0, src_w * -0.5], np.float32), rot_rad) + dst_dir = np.array([0, dst_w * -0.5], np.float32) + + src = np.zeros((3, 2), dtype=np.float32) + dst = np.zeros((3, 2), dtype=np.float32) + src[0, :] = center + scale_tmp * shift + src[1, :] = center + src_dir + scale_tmp * shift + dst[0, :] = [dst_w * 0.5, dst_h * 0.5] + dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5], np.float32) + dst_dir + + src[2:, :] = get_3rd_point(src[0, :], src[1, :]) + dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :]) + + if inv: + trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) + else: + trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) + + return trans + + +def affine_transform(pt: NDArray, t: NDArray) -> NDArray: + new_pt = np.array([pt[0], pt[1], 1.], dtype=np.float32).T + new_pt = np.dot(t, new_pt) + return new_pt[:2] + + +def resize(img: CV_IMAGE, boxes: BBOX, ratio: float = 0.8) -> Tuple[CV_IMAGE, BBOX]: + """ + ratio: <= 1.0 + """ + assert ratio <= 1.0, f'resize ratio {ratio} must <= 1.0' + + h, w, _ = img.shape + ow = int(w * ratio) + oh = int(h * ratio) + resize_img = cv2.resize(img, (ow, oh)) + new_img = np.zeros_like(img) + new_img[:oh, :ow] = resize_img + + if len(boxes) == 0: + return new_img, boxes + else: + return new_img, boxes * ratio + + +def get_ious(boxes1: BBOX, boxes2: BBOX) -> NDArray: + """ + args: + boxes1: np.array, (N, 4), xyxy + boxes2: np.array, (M, 4), xyxy + return: + iou: np.array, (N, M) + """ + area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1]) + area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1]) + iner_area = intersect(boxes1, boxes2) + area1 = area1.reshape(-1, 1).repeat(area2.shape[0], axis=1) + area2 = area2.reshape(1, -1).repeat(area1.shape[0], axis=0) + iou = iner_area / (area1 + area2 - iner_area + 1e-14) + return iou + + +def split_result(result: NDArray) -> Tuple[BBOX, NDArray, NDArray]: + if len(result) > 0: + bboxes = result[:, :4].astype(np.int32) + conf = result[:, 4] + class_id = result[:, 5] + else: + bboxes = np.zeros(shape=(0, 4), dtype=np.int32) + conf = np.zeros(shape=(0, 1), dtype=np.float32) + class_id = np.zeros(shape=(0, 1), dtype=np.int32) + + return bboxes, conf, class_id + + +class CALDMiner(YmirModel): + def mining(self): + with open(self.cfg.ymir.input.candidate_index_file, 'r') as f: + images = [line.strip() for line in f.readlines()] + + max_barrier_times = len(images) // WORLD_SIZE + if RANK == -1: + N = len(images) + tbar = tqdm(images) + else: + images_rank = images[RANK::WORLD_SIZE] + N = len(images_rank) + if RANK == 0: + tbar = tqdm(images_rank) + else: + tbar = images_rank + + monitor_gap = max(1, N // 100) + idx = -1 + beta = 1.3 + mining_result = [] + for idx, asset_path in enumerate(tbar): + if idx % monitor_gap == 0 and RANK in [0, -1]: + write_ymir_monitor_process(self.cfg, task='mining', naive_stage_percent=idx / N, stage=YmirStage.TASK) + + # batch-level sync, avoid 30min time-out error + if WORLD_SIZE > 1 and idx < max_barrier_times: + dist.barrier() + + img = cv2.imread(asset_path) + # xyxy,conf,cls + result = self.predict(img) + bboxes, conf, _ = split_result(result) + if len(result) == 0: + # no result for the image without augmentation + mining_result.append((asset_path, -beta)) + continue + + consistency = 0.0 + aug_bboxes_dict, aug_results_dict = self.aug_predict(img, bboxes) + for key in aug_results_dict: + # no result for the image with augmentation f'{key}' + if len(aug_results_dict[key]) == 0: + consistency += beta + continue + + bboxes_key, conf_key, _ = split_result(aug_results_dict[key]) + cls_scores_aug = 1 - conf_key + cls_scores = 1 - conf + + consistency_per_aug = 2.0 + ious = get_ious(bboxes_key, aug_bboxes_dict[key]) + aug_idxs = np.argmax(ious, axis=0) + for origin_idx, aug_idx in enumerate(aug_idxs): + max_iou = ious[aug_idx, origin_idx] + if max_iou == 0: + consistency_per_aug = min(consistency_per_aug, beta) + p = cls_scores_aug[aug_idx] + q = cls_scores[origin_idx] + m = (p + q) / 2. + js = 0.5 * entropy([p, 1 - p], [m, 1 - m]) + 0.5 * entropy([q, 1 - q], [m, 1 - m]) + if js < 0: + js = 0 + consistency_box = max_iou + consistency_cls = 0.5 * \ + (conf[origin_idx] + conf_key[aug_idx]) * (1 - js) + consistency_per_inst = abs(consistency_box + consistency_cls - beta) + consistency_per_aug = min(consistency_per_aug, consistency_per_inst.item()) + + consistency += consistency_per_aug + + consistency /= len(aug_results_dict) + + mining_result.append((asset_path, consistency)) + + if WORLD_SIZE > 1: + mining_result = collect_results_gpu(mining_result, len(images)) + + return mining_result + + def predict(self, img: CV_IMAGE) -> NDArray: + """ + predict single image and return bbox information + img: opencv BGR, uint8 format + """ + results = self.infer(img) + + xyxy_conf_idx_list = [] + for idx, result in enumerate(results): + for line in result: + if any(np.isinf(line)): + continue + x1, y1, x2, y2, score = line + xyxy_conf_idx_list.append([x1, y1, x2, y2, score, idx]) + + if len(xyxy_conf_idx_list) == 0: + return np.zeros(shape=(0, 6), dtype=np.float32) + else: + return np.array(xyxy_conf_idx_list, dtype=np.float32) + + def aug_predict(self, image: CV_IMAGE, bboxes: BBOX) -> Tuple[Dict[str, BBOX], Dict[str, NDArray]]: + """ + for different augmentation methods: flip, cutout, rotate and resize + augment the image and bbox and use model to predict them. + + return the predict result and augment bbox. + """ + aug_dict: Dict[str, Callable] = dict(flip=horizontal_flip, cutout=cutout, rotate=rotate, resize=resize) + + aug_bboxes = dict() + aug_results = dict() + for key in aug_dict: + aug_img, aug_bbox = aug_dict[key](image, bboxes) + + aug_result = self.predict(aug_img) + aug_bboxes[key] = aug_bbox + aug_results[key] = aug_result + + return aug_bboxes, aug_results + + +def main(): + if LOCAL_RANK != -1: + init_dist(launcher='pytorch', backend="nccl" if dist.is_nccl_available() else "gloo") + + cfg = get_merged_config() + miner = CALDMiner(cfg) + gpu = max(0, LOCAL_RANK) + device = torch.device('cuda', gpu) + miner.model.to(device) + mining_result = miner.mining() + + if RANK in [0, -1]: + rw.write_mining_result(mining_result=mining_result) + write_ymir_monitor_process(cfg, task='mining', naive_stage_percent=1, stage=YmirStage.POSTPROCESS) + + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/det-mmdetection-tmi/ymir_mining_entropy.py b/det-mmdetection-tmi/ymir_mining_entropy.py new file mode 100644 index 0000000..dc18ee3 --- /dev/null +++ b/det-mmdetection-tmi/ymir_mining_entropy.py @@ -0,0 +1,87 @@ +""" +entropy mining +""" +import os +import sys + +import cv2 +import numpy as np +import torch +import torch.distributed as dist +from mmcv.runner import init_dist +from mmdet.apis.test import collect_results_gpu +from tqdm import tqdm +from ymir_exc import result_writer as rw +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process +from ymir_mining_cald import split_result, CALDMiner + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +class EntropyMiner(CALDMiner): + + def mining(self): + with open(self.cfg.ymir.input.candidate_index_file, 'r') as f: + images = [line.strip() for line in f.readlines()] + + max_barrier_times = len(images) // WORLD_SIZE + if RANK == -1: + N = len(images) + tbar = tqdm(images) + else: + images_rank = images[RANK::WORLD_SIZE] + N = len(images_rank) + if RANK == 0: + tbar = tqdm(images_rank) + else: + tbar = images_rank + + monitor_gap = max(1, N // 100) + mining_result = [] + for idx, asset_path in enumerate(tbar): + if idx % monitor_gap == 0 and RANK in [0, -1]: + write_ymir_monitor_process(self.cfg, task='mining', naive_stage_percent=idx / N, stage=YmirStage.TASK) + # batch-level sync, avoid 30min time-out error + if WORLD_SIZE > 1 and idx < max_barrier_times: + dist.barrier() + + img = cv2.imread(asset_path) + # xyxy,conf,cls + result = self.predict(img) + bboxes, conf, _ = split_result(result) + if len(result) == 0: + # no result for the image without augmentation + mining_result.append((asset_path, -10)) + continue + conf = conf.data.cpu().numpy() + mining_result.append((asset_path, -np.sum(conf * np.log2(conf)))) + + if WORLD_SIZE > 1: + mining_result = collect_results_gpu(mining_result, len(images)) + + return mining_result + + +def main(): + if LOCAL_RANK != -1: + init_dist(launcher='pytorch', backend="nccl" if dist.is_nccl_available() else "gloo") + + cfg = get_merged_config() + miner = EntropyMiner(cfg) + gpu = max(0, LOCAL_RANK) + device = torch.device('cuda', gpu) + miner.model.to(device) + mining_result = miner.mining() + + if RANK in [0, -1]: + rw.write_mining_result(mining_result=mining_result) + + write_ymir_monitor_process(cfg, task='mining', naive_stage_percent=1, stage=YmirStage.POSTPROCESS) + + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/det-mmdetection-tmi/ymir_mining_random.py b/det-mmdetection-tmi/ymir_mining_random.py new file mode 100644 index 0000000..0bb5afb --- /dev/null +++ b/det-mmdetection-tmi/ymir_mining_random.py @@ -0,0 +1,87 @@ +import os +import random +import sys + +import torch +import torch.distributed as dist +from easydict import EasyDict as edict +from mmcv.runner import init_dist +from mmdet.apis.test import collect_results_gpu +from tqdm import tqdm +from ymir_exc import result_writer as rw +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +class RandomMiner(object): + + def __init__(self, cfg: edict): + if LOCAL_RANK != -1: + init_dist(launcher='pytorch', backend="nccl" if dist.is_nccl_available() else "gloo") + + self.cfg = cfg + gpu = max(0, LOCAL_RANK) + self.device = f'cuda:{gpu}' + + def mining(self): + with open(self.cfg.ymir.input.candidate_index_file, 'r') as f: + images = [line.strip() for line in f.readlines()] + + max_barrier_times = len(images) // WORLD_SIZE + if RANK == -1: + N = len(images) + tbar = tqdm(images) + else: + images_rank = images[RANK::WORLD_SIZE] + N = len(images_rank) + if RANK == 0: + tbar = tqdm(images_rank) + else: + tbar = images_rank + + monitor_gap = max(1, N // 100) + + mining_result = [] + for idx, asset_path in enumerate(tbar): + if idx % monitor_gap == 0: + write_ymir_monitor_process(cfg=self.cfg, + task='mining', + naive_stage_percent=idx / N, + stage=YmirStage.TASK, + task_order='tmi') + + if WORLD_SIZE > 1 and idx < max_barrier_times: + dist.barrier() + + with torch.no_grad(): + consistency = self.compute_score(asset_path=asset_path) + mining_result.append((asset_path, consistency)) + + if WORLD_SIZE > 1: + mining_result = collect_results_gpu(mining_result, len(images)) + + if RANK in [0, -1]: + rw.write_mining_result(mining_result=mining_result) + write_ymir_monitor_process(cfg=self.cfg, + task='mining', + naive_stage_percent=1, + stage=YmirStage.POSTPROCESS, + task_order='tmi') + return mining_result + + def compute_score(self, asset_path: str) -> float: + return random.random() + + +def main(): + cfg = get_merged_config() + miner = RandomMiner(cfg) + miner.mining() + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/det-mmdetection-tmi/ymir_train.py b/det-mmdetection-tmi/ymir_train.py new file mode 100644 index 0000000..b06d882 --- /dev/null +++ b/det-mmdetection-tmi/ymir_train.py @@ -0,0 +1,84 @@ +import logging +import os +import os.path as osp +import subprocess +import sys + +from easydict import EasyDict as edict +from ymir_exc.util import (YmirStage, find_free_port, get_merged_config, + write_ymir_monitor_process) + +from mmdet.utils.util_ymir import (get_best_weight_file, + write_ymir_training_result) + + +def main(cfg: edict) -> int: + # default ymir config + gpu_id: str = str(cfg.param.get("gpu_id", '0')) + num_gpus = len(gpu_id.split(",")) + if num_gpus == 0: + raise Exception(f'gpu_id = {gpu_id} is not valid, eg: 0 or 2,4') + + classes = cfg.param.class_names + num_classes = len(classes) + if num_classes == 0: + raise Exception('not find class_names in config!') + + # mmcv args config + config_file = cfg.param.get("config_file") + args_options = cfg.param.get("args_options", None) + cfg_options = cfg.param.get("cfg_options", None) + + # auto load offered weight file if not set by user! + if (args_options is None or args_options.find('--resume-from') == -1) and \ + (cfg_options is None or (cfg_options.find('load_from') == -1 and + cfg_options.find('resume_from') == -1)): + + weight_file = get_best_weight_file(cfg) + if weight_file: + if cfg_options: + cfg_options += f' load_from={weight_file}' + else: + cfg_options = f'load_from={weight_file}' + else: + logging.warning('no weight file used for training!') + + write_ymir_monitor_process(cfg, task='training', naive_stage_percent=0.2, stage=YmirStage.POSTPROCESS) + + work_dir = cfg.ymir.output.models_dir + if num_gpus == 0: + # view https://mmdetection.readthedocs.io/en/stable/1_exist_data_model.html#training-on-cpu + os.environ.setdefault('CUDA_VISIBLE_DEVICES', "-1") + cmd = f"python3 tools/train.py {config_file} " + \ + f"--work-dir {work_dir}" + elif num_gpus == 1: + cmd = f"python3 tools/train.py {config_file} " + \ + f"--work-dir {work_dir} --gpu-id {gpu_id}" + else: + os.environ.setdefault('CUDA_VISIBLE_DEVICES', gpu_id) + port = find_free_port() + os.environ.setdefault('PORT', str(port)) + cmd = f"bash ./tools/dist_train.sh {config_file} {num_gpus} " + \ + f"--work-dir {work_dir}" + + if args_options: + cmd += f" {args_options}" + + if cfg_options: + cmd += f" --cfg-options {cfg_options}" + + logging.info(f"training command: {cmd}") + subprocess.run(cmd.split(), check=True) + + # save the last checkpoint + write_ymir_training_result(last=True) + return 0 + + +if __name__ == '__main__': + cfg = get_merged_config() + os.environ.setdefault('YMIR_MODELS_DIR', cfg.ymir.output.models_dir) + os.environ.setdefault('COCO_EVAL_TMP_FILE', osp.join( + cfg.ymir.output.root_dir, 'eval_tmp.json')) + os.environ.setdefault('PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION', 'python') + sys.exit(main(cfg)) diff --git a/det-yolov4-mining/Dockerfile b/det-yolov4-mining/Dockerfile deleted file mode 100644 index 4305760..0000000 --- a/det-yolov4-mining/Dockerfile +++ /dev/null @@ -1,20 +0,0 @@ -FROM industryessentials/mxnet_python:1.5.0_gpu_cu101mkl_py3_ub18 - -RUN sed -i '/developer\.download\.nvidia\.com\/compute\/cuda\/repos/d' /etc/apt/sources.list.d/* \ - && sed -i '/developer\.download\.nvidia\.com\/compute\/machine-learning\/repos/d' /etc/apt/sources.list.d/* \ - && apt-key del 7fa2af80 \ - && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb \ - && dpkg -i cuda-keyring_1.0-1_all.deb -RUN apt-get update && apt-get install -y --no-install-recommends libsm6 libxext6 libfontconfig1 libxrender1 libgl1-mesa-glx \ - && apt-get clean && rm -rf /var/lib/apt/lists/* - -RUN pip3 install --upgrade pip setuptools wheel && pip3 install opencv-python pyyaml scipy tqdm && rm -rf /root/.cache/pip3 - -COPY . /app -WORKDIR /app -RUN cp ./start.sh /usr/bin/start.sh && \ - mkdir -p /img-man && \ - cp ./mining-template.yaml /img-man/mining-template.yaml && \ - cp ./infer-template.yaml /img-man/infer-template.yaml && \ - cp ./README.md /img-man/readme.md -CMD sh /usr/bin/start.sh diff --git a/det-yolov4-mining/cuda112.dockerfile b/det-yolov4-mining/cuda112.dockerfile deleted file mode 100644 index 871b00f..0000000 --- a/det-yolov4-mining/cuda112.dockerfile +++ /dev/null @@ -1,15 +0,0 @@ -FROM industryessentials/ymir-executor:cuda112-yolov4-training - -RUN apt-get update && apt-get install -y --no-install-recommends libsm6 libxext6 libfontconfig1 libxrender1 libgl1-mesa-glx \ - && apt-get clean && rm -rf /var/lib/apt/lists/* - -RUN pip3 install --upgrade pip setuptools wheel && pip3 install opencv-python pyyaml scipy tqdm && rm -rf /root/.cache/pip3 - -COPY . /app -WORKDIR /app -RUN cp ./start.sh /usr/bin/start.sh && \ - mkdir -p /img-man && \ - cp ./mining-template.yaml /img-man/mining-template.yaml && \ - cp ./infer-template.yaml /img-man/infer-template.yaml && \ - cp ./README.md /img-man/readme.md -CMD sh /usr/bin/start.sh diff --git a/det-yolov4-training/.circleci/config.yml b/det-yolov4-tmi/.circleci/config.yml similarity index 100% rename from det-yolov4-training/.circleci/config.yml rename to det-yolov4-tmi/.circleci/config.yml diff --git a/det-yolov4-training/.travis.yml b/det-yolov4-tmi/.travis.yml similarity index 100% rename from det-yolov4-training/.travis.yml rename to det-yolov4-tmi/.travis.yml diff --git a/det-yolov4-training/3rdparty/pthreads/bin/pthreadGC2.dll b/det-yolov4-tmi/3rdparty/pthreads/bin/pthreadGC2.dll similarity index 100% rename from det-yolov4-training/3rdparty/pthreads/bin/pthreadGC2.dll rename to det-yolov4-tmi/3rdparty/pthreads/bin/pthreadGC2.dll diff --git a/det-yolov4-training/3rdparty/pthreads/bin/pthreadVC2.dll b/det-yolov4-tmi/3rdparty/pthreads/bin/pthreadVC2.dll similarity index 100% rename from det-yolov4-training/3rdparty/pthreads/bin/pthreadVC2.dll rename to det-yolov4-tmi/3rdparty/pthreads/bin/pthreadVC2.dll diff --git a/det-yolov4-training/3rdparty/pthreads/include/pthread.h b/det-yolov4-tmi/3rdparty/pthreads/include/pthread.h similarity index 100% rename from det-yolov4-training/3rdparty/pthreads/include/pthread.h rename to det-yolov4-tmi/3rdparty/pthreads/include/pthread.h diff --git a/det-yolov4-training/3rdparty/pthreads/include/sched.h b/det-yolov4-tmi/3rdparty/pthreads/include/sched.h similarity index 100% rename from det-yolov4-training/3rdparty/pthreads/include/sched.h rename to det-yolov4-tmi/3rdparty/pthreads/include/sched.h diff --git a/det-yolov4-training/3rdparty/pthreads/include/semaphore.h b/det-yolov4-tmi/3rdparty/pthreads/include/semaphore.h similarity index 100% rename from det-yolov4-training/3rdparty/pthreads/include/semaphore.h rename to det-yolov4-tmi/3rdparty/pthreads/include/semaphore.h diff --git a/det-yolov4-training/3rdparty/pthreads/lib/libpthreadGC2.a b/det-yolov4-tmi/3rdparty/pthreads/lib/libpthreadGC2.a similarity index 100% rename from det-yolov4-training/3rdparty/pthreads/lib/libpthreadGC2.a rename to det-yolov4-tmi/3rdparty/pthreads/lib/libpthreadGC2.a diff --git a/det-yolov4-training/3rdparty/pthreads/lib/pthreadVC2.lib b/det-yolov4-tmi/3rdparty/pthreads/lib/pthreadVC2.lib similarity index 100% rename from det-yolov4-training/3rdparty/pthreads/lib/pthreadVC2.lib rename to det-yolov4-tmi/3rdparty/pthreads/lib/pthreadVC2.lib diff --git a/det-yolov4-training/3rdparty/stb/include/stb_image.h b/det-yolov4-tmi/3rdparty/stb/include/stb_image.h similarity index 100% rename from det-yolov4-training/3rdparty/stb/include/stb_image.h rename to det-yolov4-tmi/3rdparty/stb/include/stb_image.h diff --git a/det-yolov4-training/3rdparty/stb/include/stb_image_write.h b/det-yolov4-tmi/3rdparty/stb/include/stb_image_write.h similarity index 100% rename from det-yolov4-training/3rdparty/stb/include/stb_image_write.h rename to det-yolov4-tmi/3rdparty/stb/include/stb_image_write.h diff --git a/det-yolov4-training/CMakeLists.txt b/det-yolov4-tmi/CMakeLists.txt similarity index 100% rename from det-yolov4-training/CMakeLists.txt rename to det-yolov4-tmi/CMakeLists.txt diff --git a/det-yolov4-training/DarknetConfig.cmake.in b/det-yolov4-tmi/DarknetConfig.cmake.in similarity index 100% rename from det-yolov4-training/DarknetConfig.cmake.in rename to det-yolov4-tmi/DarknetConfig.cmake.in diff --git a/det-yolov4-training/LICENSE b/det-yolov4-tmi/LICENSE similarity index 100% rename from det-yolov4-training/LICENSE rename to det-yolov4-tmi/LICENSE diff --git a/det-yolov4-training/Makefile b/det-yolov4-tmi/Makefile similarity index 100% rename from det-yolov4-training/Makefile rename to det-yolov4-tmi/Makefile diff --git a/det-yolov4-training/README.md b/det-yolov4-tmi/README.md similarity index 100% rename from det-yolov4-training/README.md rename to det-yolov4-tmi/README.md diff --git a/det-yolov4-training/build.ps1 b/det-yolov4-tmi/build.ps1 similarity index 100% rename from det-yolov4-training/build.ps1 rename to det-yolov4-tmi/build.ps1 diff --git a/det-yolov4-training/calc_map.sh b/det-yolov4-tmi/calc_map.sh similarity index 100% rename from det-yolov4-training/calc_map.sh rename to det-yolov4-tmi/calc_map.sh diff --git a/det-yolov4-training/cfg/9k.labels b/det-yolov4-tmi/cfg/9k.labels similarity index 100% rename from det-yolov4-training/cfg/9k.labels rename to det-yolov4-tmi/cfg/9k.labels diff --git a/det-yolov4-training/cfg/9k.names b/det-yolov4-tmi/cfg/9k.names similarity index 100% rename from det-yolov4-training/cfg/9k.names rename to det-yolov4-tmi/cfg/9k.names diff --git a/det-yolov4-training/cfg/9k.tree b/det-yolov4-tmi/cfg/9k.tree similarity index 100% rename from det-yolov4-training/cfg/9k.tree rename to det-yolov4-tmi/cfg/9k.tree diff --git a/det-yolov4-training/cfg/Gaussian_yolov3_BDD.cfg b/det-yolov4-tmi/cfg/Gaussian_yolov3_BDD.cfg similarity index 100% rename from det-yolov4-training/cfg/Gaussian_yolov3_BDD.cfg rename to det-yolov4-tmi/cfg/Gaussian_yolov3_BDD.cfg diff --git a/det-yolov4-training/cfg/alexnet.cfg b/det-yolov4-tmi/cfg/alexnet.cfg similarity index 100% rename from det-yolov4-training/cfg/alexnet.cfg rename to det-yolov4-tmi/cfg/alexnet.cfg diff --git a/det-yolov4-training/cfg/cd53paspp-gamma.cfg b/det-yolov4-tmi/cfg/cd53paspp-gamma.cfg similarity index 100% rename from det-yolov4-training/cfg/cd53paspp-gamma.cfg rename to det-yolov4-tmi/cfg/cd53paspp-gamma.cfg diff --git a/det-yolov4-training/cfg/cifar.cfg b/det-yolov4-tmi/cfg/cifar.cfg similarity index 100% rename from det-yolov4-training/cfg/cifar.cfg rename to det-yolov4-tmi/cfg/cifar.cfg diff --git a/det-yolov4-training/cfg/cifar.test.cfg b/det-yolov4-tmi/cfg/cifar.test.cfg similarity index 100% rename from det-yolov4-training/cfg/cifar.test.cfg rename to det-yolov4-tmi/cfg/cifar.test.cfg diff --git a/det-yolov4-training/cfg/coco.data b/det-yolov4-tmi/cfg/coco.data similarity index 100% rename from det-yolov4-training/cfg/coco.data rename to det-yolov4-tmi/cfg/coco.data diff --git a/det-yolov4-training/cfg/coco.names b/det-yolov4-tmi/cfg/coco.names similarity index 100% rename from det-yolov4-training/cfg/coco.names rename to det-yolov4-tmi/cfg/coco.names diff --git a/det-yolov4-training/cfg/coco9k.map b/det-yolov4-tmi/cfg/coco9k.map similarity index 100% rename from det-yolov4-training/cfg/coco9k.map rename to det-yolov4-tmi/cfg/coco9k.map diff --git a/det-yolov4-training/cfg/combine9k.data b/det-yolov4-tmi/cfg/combine9k.data similarity index 100% rename from det-yolov4-training/cfg/combine9k.data rename to det-yolov4-tmi/cfg/combine9k.data diff --git a/det-yolov4-training/cfg/crnn.train.cfg b/det-yolov4-tmi/cfg/crnn.train.cfg similarity index 100% rename from det-yolov4-training/cfg/crnn.train.cfg rename to det-yolov4-tmi/cfg/crnn.train.cfg diff --git a/det-yolov4-training/cfg/csdarknet53-omega.cfg b/det-yolov4-tmi/cfg/csdarknet53-omega.cfg similarity index 100% rename from det-yolov4-training/cfg/csdarknet53-omega.cfg rename to det-yolov4-tmi/cfg/csdarknet53-omega.cfg diff --git a/det-yolov4-training/cfg/cspx-p7-mish-omega.cfg b/det-yolov4-tmi/cfg/cspx-p7-mish-omega.cfg similarity index 100% rename from det-yolov4-training/cfg/cspx-p7-mish-omega.cfg rename to det-yolov4-tmi/cfg/cspx-p7-mish-omega.cfg diff --git a/det-yolov4-training/cfg/cspx-p7-mish.cfg b/det-yolov4-tmi/cfg/cspx-p7-mish.cfg similarity index 100% rename from det-yolov4-training/cfg/cspx-p7-mish.cfg rename to det-yolov4-tmi/cfg/cspx-p7-mish.cfg diff --git a/det-yolov4-training/cfg/cspx-p7-mish_hp.cfg b/det-yolov4-tmi/cfg/cspx-p7-mish_hp.cfg similarity index 100% rename from det-yolov4-training/cfg/cspx-p7-mish_hp.cfg rename to det-yolov4-tmi/cfg/cspx-p7-mish_hp.cfg diff --git a/det-yolov4-training/cfg/csresnext50-panet-spp-original-optimal.cfg b/det-yolov4-tmi/cfg/csresnext50-panet-spp-original-optimal.cfg similarity index 100% rename from det-yolov4-training/cfg/csresnext50-panet-spp-original-optimal.cfg rename to det-yolov4-tmi/cfg/csresnext50-panet-spp-original-optimal.cfg diff --git a/det-yolov4-training/cfg/csresnext50-panet-spp.cfg b/det-yolov4-tmi/cfg/csresnext50-panet-spp.cfg similarity index 100% rename from det-yolov4-training/cfg/csresnext50-panet-spp.cfg rename to det-yolov4-tmi/cfg/csresnext50-panet-spp.cfg diff --git a/det-yolov4-training/cfg/darknet.cfg b/det-yolov4-tmi/cfg/darknet.cfg similarity index 100% rename from det-yolov4-training/cfg/darknet.cfg rename to det-yolov4-tmi/cfg/darknet.cfg diff --git a/det-yolov4-training/cfg/darknet19.cfg b/det-yolov4-tmi/cfg/darknet19.cfg similarity index 100% rename from det-yolov4-training/cfg/darknet19.cfg rename to det-yolov4-tmi/cfg/darknet19.cfg diff --git a/det-yolov4-training/cfg/darknet19_448.cfg b/det-yolov4-tmi/cfg/darknet19_448.cfg similarity index 100% rename from det-yolov4-training/cfg/darknet19_448.cfg rename to det-yolov4-tmi/cfg/darknet19_448.cfg diff --git a/det-yolov4-training/cfg/darknet53.cfg b/det-yolov4-tmi/cfg/darknet53.cfg similarity index 100% rename from det-yolov4-training/cfg/darknet53.cfg rename to det-yolov4-tmi/cfg/darknet53.cfg diff --git a/det-yolov4-training/cfg/darknet53_448_xnor.cfg b/det-yolov4-tmi/cfg/darknet53_448_xnor.cfg similarity index 100% rename from det-yolov4-training/cfg/darknet53_448_xnor.cfg rename to det-yolov4-tmi/cfg/darknet53_448_xnor.cfg diff --git a/det-yolov4-training/cfg/densenet201.cfg b/det-yolov4-tmi/cfg/densenet201.cfg similarity index 100% rename from det-yolov4-training/cfg/densenet201.cfg rename to det-yolov4-tmi/cfg/densenet201.cfg diff --git a/det-yolov4-training/cfg/efficientnet-lite3.cfg b/det-yolov4-tmi/cfg/efficientnet-lite3.cfg similarity index 100% rename from det-yolov4-training/cfg/efficientnet-lite3.cfg rename to det-yolov4-tmi/cfg/efficientnet-lite3.cfg diff --git a/det-yolov4-training/cfg/efficientnet_b0.cfg b/det-yolov4-tmi/cfg/efficientnet_b0.cfg similarity index 100% rename from det-yolov4-training/cfg/efficientnet_b0.cfg rename to det-yolov4-tmi/cfg/efficientnet_b0.cfg diff --git a/det-yolov4-training/cfg/enet-coco.cfg b/det-yolov4-tmi/cfg/enet-coco.cfg similarity index 100% rename from det-yolov4-training/cfg/enet-coco.cfg rename to det-yolov4-tmi/cfg/enet-coco.cfg diff --git a/det-yolov4-training/cfg/extraction.cfg b/det-yolov4-tmi/cfg/extraction.cfg similarity index 100% rename from det-yolov4-training/cfg/extraction.cfg rename to det-yolov4-tmi/cfg/extraction.cfg diff --git a/det-yolov4-training/cfg/extraction.conv.cfg b/det-yolov4-tmi/cfg/extraction.conv.cfg similarity index 100% rename from det-yolov4-training/cfg/extraction.conv.cfg rename to det-yolov4-tmi/cfg/extraction.conv.cfg diff --git a/det-yolov4-training/cfg/extraction22k.cfg b/det-yolov4-tmi/cfg/extraction22k.cfg similarity index 100% rename from det-yolov4-training/cfg/extraction22k.cfg rename to det-yolov4-tmi/cfg/extraction22k.cfg diff --git a/det-yolov4-training/cfg/go.test.cfg b/det-yolov4-tmi/cfg/go.test.cfg similarity index 100% rename from det-yolov4-training/cfg/go.test.cfg rename to det-yolov4-tmi/cfg/go.test.cfg diff --git a/det-yolov4-training/cfg/gru.cfg b/det-yolov4-tmi/cfg/gru.cfg similarity index 100% rename from det-yolov4-training/cfg/gru.cfg rename to det-yolov4-tmi/cfg/gru.cfg diff --git a/det-yolov4-training/cfg/imagenet.labels.list b/det-yolov4-tmi/cfg/imagenet.labels.list similarity index 100% rename from det-yolov4-training/cfg/imagenet.labels.list rename to det-yolov4-tmi/cfg/imagenet.labels.list diff --git a/det-yolov4-training/cfg/imagenet.shortnames.list b/det-yolov4-tmi/cfg/imagenet.shortnames.list similarity index 100% rename from det-yolov4-training/cfg/imagenet.shortnames.list rename to det-yolov4-tmi/cfg/imagenet.shortnames.list diff --git a/det-yolov4-training/cfg/imagenet1k.data b/det-yolov4-tmi/cfg/imagenet1k.data similarity index 100% rename from det-yolov4-training/cfg/imagenet1k.data rename to det-yolov4-tmi/cfg/imagenet1k.data diff --git a/det-yolov4-training/cfg/imagenet22k.dataset b/det-yolov4-tmi/cfg/imagenet22k.dataset similarity index 100% rename from det-yolov4-training/cfg/imagenet22k.dataset rename to det-yolov4-tmi/cfg/imagenet22k.dataset diff --git a/det-yolov4-training/cfg/imagenet9k.hierarchy.dataset b/det-yolov4-tmi/cfg/imagenet9k.hierarchy.dataset similarity index 100% rename from det-yolov4-training/cfg/imagenet9k.hierarchy.dataset rename to det-yolov4-tmi/cfg/imagenet9k.hierarchy.dataset diff --git a/det-yolov4-training/cfg/inet9k.map b/det-yolov4-tmi/cfg/inet9k.map similarity index 100% rename from det-yolov4-training/cfg/inet9k.map rename to det-yolov4-tmi/cfg/inet9k.map diff --git a/det-yolov4-training/cfg/jnet-conv.cfg b/det-yolov4-tmi/cfg/jnet-conv.cfg similarity index 100% rename from det-yolov4-training/cfg/jnet-conv.cfg rename to det-yolov4-tmi/cfg/jnet-conv.cfg diff --git a/det-yolov4-training/cfg/lstm.train.cfg b/det-yolov4-tmi/cfg/lstm.train.cfg similarity index 100% rename from det-yolov4-training/cfg/lstm.train.cfg rename to det-yolov4-tmi/cfg/lstm.train.cfg diff --git a/det-yolov4-training/cfg/openimages.data b/det-yolov4-tmi/cfg/openimages.data similarity index 100% rename from det-yolov4-training/cfg/openimages.data rename to det-yolov4-tmi/cfg/openimages.data diff --git a/det-yolov4-training/cfg/resnet101.cfg b/det-yolov4-tmi/cfg/resnet101.cfg similarity index 100% rename from det-yolov4-training/cfg/resnet101.cfg rename to det-yolov4-tmi/cfg/resnet101.cfg diff --git a/det-yolov4-training/cfg/resnet152.cfg b/det-yolov4-tmi/cfg/resnet152.cfg similarity index 100% rename from det-yolov4-training/cfg/resnet152.cfg rename to det-yolov4-tmi/cfg/resnet152.cfg diff --git a/det-yolov4-training/cfg/resnet152_trident.cfg b/det-yolov4-tmi/cfg/resnet152_trident.cfg similarity index 100% rename from det-yolov4-training/cfg/resnet152_trident.cfg rename to det-yolov4-tmi/cfg/resnet152_trident.cfg diff --git a/det-yolov4-training/cfg/resnet50.cfg b/det-yolov4-tmi/cfg/resnet50.cfg similarity index 100% rename from det-yolov4-training/cfg/resnet50.cfg rename to det-yolov4-tmi/cfg/resnet50.cfg diff --git a/det-yolov4-training/cfg/resnext152-32x4d.cfg b/det-yolov4-tmi/cfg/resnext152-32x4d.cfg similarity index 100% rename from det-yolov4-training/cfg/resnext152-32x4d.cfg rename to det-yolov4-tmi/cfg/resnext152-32x4d.cfg diff --git a/det-yolov4-training/cfg/rnn.cfg b/det-yolov4-tmi/cfg/rnn.cfg similarity index 100% rename from det-yolov4-training/cfg/rnn.cfg rename to det-yolov4-tmi/cfg/rnn.cfg diff --git a/det-yolov4-training/cfg/rnn.train.cfg b/det-yolov4-tmi/cfg/rnn.train.cfg similarity index 100% rename from det-yolov4-training/cfg/rnn.train.cfg rename to det-yolov4-tmi/cfg/rnn.train.cfg diff --git a/det-yolov4-training/cfg/strided.cfg b/det-yolov4-tmi/cfg/strided.cfg similarity index 100% rename from det-yolov4-training/cfg/strided.cfg rename to det-yolov4-tmi/cfg/strided.cfg diff --git a/det-yolov4-training/cfg/t1.test.cfg b/det-yolov4-tmi/cfg/t1.test.cfg similarity index 100% rename from det-yolov4-training/cfg/t1.test.cfg rename to det-yolov4-tmi/cfg/t1.test.cfg diff --git a/det-yolov4-training/cfg/tiny-yolo-voc.cfg b/det-yolov4-tmi/cfg/tiny-yolo-voc.cfg similarity index 100% rename from det-yolov4-training/cfg/tiny-yolo-voc.cfg rename to det-yolov4-tmi/cfg/tiny-yolo-voc.cfg diff --git a/det-yolov4-training/cfg/tiny-yolo.cfg b/det-yolov4-tmi/cfg/tiny-yolo.cfg similarity index 100% rename from det-yolov4-training/cfg/tiny-yolo.cfg rename to det-yolov4-tmi/cfg/tiny-yolo.cfg diff --git a/det-yolov4-training/cfg/tiny-yolo_xnor.cfg b/det-yolov4-tmi/cfg/tiny-yolo_xnor.cfg similarity index 100% rename from det-yolov4-training/cfg/tiny-yolo_xnor.cfg rename to det-yolov4-tmi/cfg/tiny-yolo_xnor.cfg diff --git a/det-yolov4-training/cfg/tiny.cfg b/det-yolov4-tmi/cfg/tiny.cfg similarity index 100% rename from det-yolov4-training/cfg/tiny.cfg rename to det-yolov4-tmi/cfg/tiny.cfg diff --git a/det-yolov4-training/cfg/vgg-16.cfg b/det-yolov4-tmi/cfg/vgg-16.cfg similarity index 100% rename from det-yolov4-training/cfg/vgg-16.cfg rename to det-yolov4-tmi/cfg/vgg-16.cfg diff --git a/det-yolov4-training/cfg/vgg-conv.cfg b/det-yolov4-tmi/cfg/vgg-conv.cfg similarity index 100% rename from det-yolov4-training/cfg/vgg-conv.cfg rename to det-yolov4-tmi/cfg/vgg-conv.cfg diff --git a/det-yolov4-training/cfg/voc.data b/det-yolov4-tmi/cfg/voc.data similarity index 100% rename from det-yolov4-training/cfg/voc.data rename to det-yolov4-tmi/cfg/voc.data diff --git a/det-yolov4-training/cfg/writing.cfg b/det-yolov4-tmi/cfg/writing.cfg similarity index 100% rename from det-yolov4-training/cfg/writing.cfg rename to det-yolov4-tmi/cfg/writing.cfg diff --git a/det-yolov4-training/cfg/yolo-voc.2.0.cfg b/det-yolov4-tmi/cfg/yolo-voc.2.0.cfg similarity index 100% rename from det-yolov4-training/cfg/yolo-voc.2.0.cfg rename to det-yolov4-tmi/cfg/yolo-voc.2.0.cfg diff --git a/det-yolov4-training/cfg/yolo-voc.cfg b/det-yolov4-tmi/cfg/yolo-voc.cfg similarity index 100% rename from det-yolov4-training/cfg/yolo-voc.cfg rename to det-yolov4-tmi/cfg/yolo-voc.cfg diff --git a/det-yolov4-training/cfg/yolo.2.0.cfg b/det-yolov4-tmi/cfg/yolo.2.0.cfg similarity index 100% rename from det-yolov4-training/cfg/yolo.2.0.cfg rename to det-yolov4-tmi/cfg/yolo.2.0.cfg diff --git a/det-yolov4-training/cfg/yolo.cfg b/det-yolov4-tmi/cfg/yolo.cfg similarity index 100% rename from det-yolov4-training/cfg/yolo.cfg rename to det-yolov4-tmi/cfg/yolo.cfg diff --git a/det-yolov4-training/cfg/yolo9000.cfg b/det-yolov4-tmi/cfg/yolo9000.cfg similarity index 100% rename from det-yolov4-training/cfg/yolo9000.cfg rename to det-yolov4-tmi/cfg/yolo9000.cfg diff --git a/det-yolov4-training/cfg/yolov1/tiny-coco.cfg b/det-yolov4-tmi/cfg/yolov1/tiny-coco.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov1/tiny-coco.cfg rename to det-yolov4-tmi/cfg/yolov1/tiny-coco.cfg diff --git a/det-yolov4-training/cfg/yolov1/tiny-yolo.cfg b/det-yolov4-tmi/cfg/yolov1/tiny-yolo.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov1/tiny-yolo.cfg rename to det-yolov4-tmi/cfg/yolov1/tiny-yolo.cfg diff --git a/det-yolov4-training/cfg/yolov1/xyolo.test.cfg b/det-yolov4-tmi/cfg/yolov1/xyolo.test.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov1/xyolo.test.cfg rename to det-yolov4-tmi/cfg/yolov1/xyolo.test.cfg diff --git a/det-yolov4-training/cfg/yolov1/yolo-coco.cfg b/det-yolov4-tmi/cfg/yolov1/yolo-coco.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov1/yolo-coco.cfg rename to det-yolov4-tmi/cfg/yolov1/yolo-coco.cfg diff --git a/det-yolov4-training/cfg/yolov1/yolo-small.cfg b/det-yolov4-tmi/cfg/yolov1/yolo-small.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov1/yolo-small.cfg rename to det-yolov4-tmi/cfg/yolov1/yolo-small.cfg diff --git a/det-yolov4-training/cfg/yolov1/yolo.cfg b/det-yolov4-tmi/cfg/yolov1/yolo.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov1/yolo.cfg rename to det-yolov4-tmi/cfg/yolov1/yolo.cfg diff --git a/det-yolov4-training/cfg/yolov1/yolo.train.cfg b/det-yolov4-tmi/cfg/yolov1/yolo.train.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov1/yolo.train.cfg rename to det-yolov4-tmi/cfg/yolov1/yolo.train.cfg diff --git a/det-yolov4-training/cfg/yolov1/yolo2.cfg b/det-yolov4-tmi/cfg/yolov1/yolo2.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov1/yolo2.cfg rename to det-yolov4-tmi/cfg/yolov1/yolo2.cfg diff --git a/det-yolov4-training/cfg/yolov2-tiny-voc.cfg b/det-yolov4-tmi/cfg/yolov2-tiny-voc.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov2-tiny-voc.cfg rename to det-yolov4-tmi/cfg/yolov2-tiny-voc.cfg diff --git a/det-yolov4-training/cfg/yolov2-tiny.cfg b/det-yolov4-tmi/cfg/yolov2-tiny.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov2-tiny.cfg rename to det-yolov4-tmi/cfg/yolov2-tiny.cfg diff --git a/det-yolov4-training/cfg/yolov2-voc.cfg b/det-yolov4-tmi/cfg/yolov2-voc.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov2-voc.cfg rename to det-yolov4-tmi/cfg/yolov2-voc.cfg diff --git a/det-yolov4-training/cfg/yolov2.cfg b/det-yolov4-tmi/cfg/yolov2.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov2.cfg rename to det-yolov4-tmi/cfg/yolov2.cfg diff --git a/det-yolov4-training/cfg/yolov3-openimages.cfg b/det-yolov4-tmi/cfg/yolov3-openimages.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-openimages.cfg rename to det-yolov4-tmi/cfg/yolov3-openimages.cfg diff --git a/det-yolov4-training/cfg/yolov3-spp.cfg b/det-yolov4-tmi/cfg/yolov3-spp.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-spp.cfg rename to det-yolov4-tmi/cfg/yolov3-spp.cfg diff --git a/det-yolov4-training/cfg/yolov3-tiny-prn.cfg b/det-yolov4-tmi/cfg/yolov3-tiny-prn.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-tiny-prn.cfg rename to det-yolov4-tmi/cfg/yolov3-tiny-prn.cfg diff --git a/det-yolov4-training/cfg/yolov3-tiny.cfg b/det-yolov4-tmi/cfg/yolov3-tiny.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-tiny.cfg rename to det-yolov4-tmi/cfg/yolov3-tiny.cfg diff --git a/det-yolov4-training/cfg/yolov3-tiny_3l.cfg b/det-yolov4-tmi/cfg/yolov3-tiny_3l.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-tiny_3l.cfg rename to det-yolov4-tmi/cfg/yolov3-tiny_3l.cfg diff --git a/det-yolov4-training/cfg/yolov3-tiny_obj.cfg b/det-yolov4-tmi/cfg/yolov3-tiny_obj.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-tiny_obj.cfg rename to det-yolov4-tmi/cfg/yolov3-tiny_obj.cfg diff --git a/det-yolov4-training/cfg/yolov3-tiny_occlusion_track.cfg b/det-yolov4-tmi/cfg/yolov3-tiny_occlusion_track.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-tiny_occlusion_track.cfg rename to det-yolov4-tmi/cfg/yolov3-tiny_occlusion_track.cfg diff --git a/det-yolov4-training/cfg/yolov3-tiny_xnor.cfg b/det-yolov4-tmi/cfg/yolov3-tiny_xnor.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-tiny_xnor.cfg rename to det-yolov4-tmi/cfg/yolov3-tiny_xnor.cfg diff --git a/det-yolov4-training/cfg/yolov3-voc.cfg b/det-yolov4-tmi/cfg/yolov3-voc.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-voc.cfg rename to det-yolov4-tmi/cfg/yolov3-voc.cfg diff --git a/det-yolov4-training/cfg/yolov3-voc.yolov3-giou-40.cfg b/det-yolov4-tmi/cfg/yolov3-voc.yolov3-giou-40.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3-voc.yolov3-giou-40.cfg rename to det-yolov4-tmi/cfg/yolov3-voc.yolov3-giou-40.cfg diff --git a/det-yolov4-training/cfg/yolov3.cfg b/det-yolov4-tmi/cfg/yolov3.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3.cfg rename to det-yolov4-tmi/cfg/yolov3.cfg diff --git a/det-yolov4-training/cfg/yolov3.coco-giou-12.cfg b/det-yolov4-tmi/cfg/yolov3.coco-giou-12.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3.coco-giou-12.cfg rename to det-yolov4-tmi/cfg/yolov3.coco-giou-12.cfg diff --git a/det-yolov4-training/cfg/yolov3_5l.cfg b/det-yolov4-tmi/cfg/yolov3_5l.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov3_5l.cfg rename to det-yolov4-tmi/cfg/yolov3_5l.cfg diff --git a/det-yolov4-training/cfg/yolov4-csp-swish.cfg b/det-yolov4-tmi/cfg/yolov4-csp-swish.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-csp-swish.cfg rename to det-yolov4-tmi/cfg/yolov4-csp-swish.cfg diff --git a/det-yolov4-training/cfg/yolov4-csp-x-swish-frozen.cfg b/det-yolov4-tmi/cfg/yolov4-csp-x-swish-frozen.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-csp-x-swish-frozen.cfg rename to det-yolov4-tmi/cfg/yolov4-csp-x-swish-frozen.cfg diff --git a/det-yolov4-training/cfg/yolov4-csp-x-swish.cfg b/det-yolov4-tmi/cfg/yolov4-csp-x-swish.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-csp-x-swish.cfg rename to det-yolov4-tmi/cfg/yolov4-csp-x-swish.cfg diff --git a/det-yolov4-training/cfg/yolov4-csp.cfg b/det-yolov4-tmi/cfg/yolov4-csp.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-csp.cfg rename to det-yolov4-tmi/cfg/yolov4-csp.cfg diff --git a/det-yolov4-training/cfg/yolov4-custom.cfg b/det-yolov4-tmi/cfg/yolov4-custom.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-custom.cfg rename to det-yolov4-tmi/cfg/yolov4-custom.cfg diff --git a/det-yolov4-training/cfg/yolov4-p5-frozen.cfg b/det-yolov4-tmi/cfg/yolov4-p5-frozen.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-p5-frozen.cfg rename to det-yolov4-tmi/cfg/yolov4-p5-frozen.cfg diff --git a/det-yolov4-training/cfg/yolov4-p5.cfg b/det-yolov4-tmi/cfg/yolov4-p5.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-p5.cfg rename to det-yolov4-tmi/cfg/yolov4-p5.cfg diff --git a/det-yolov4-training/cfg/yolov4-p6.cfg b/det-yolov4-tmi/cfg/yolov4-p6.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-p6.cfg rename to det-yolov4-tmi/cfg/yolov4-p6.cfg diff --git a/det-yolov4-training/cfg/yolov4-sam-mish-csp-reorg-bfm.cfg b/det-yolov4-tmi/cfg/yolov4-sam-mish-csp-reorg-bfm.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-sam-mish-csp-reorg-bfm.cfg rename to det-yolov4-tmi/cfg/yolov4-sam-mish-csp-reorg-bfm.cfg diff --git a/det-yolov4-training/cfg/yolov4-tiny-3l.cfg b/det-yolov4-tmi/cfg/yolov4-tiny-3l.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-tiny-3l.cfg rename to det-yolov4-tmi/cfg/yolov4-tiny-3l.cfg diff --git a/det-yolov4-training/cfg/yolov4-tiny-custom.cfg b/det-yolov4-tmi/cfg/yolov4-tiny-custom.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-tiny-custom.cfg rename to det-yolov4-tmi/cfg/yolov4-tiny-custom.cfg diff --git a/det-yolov4-training/cfg/yolov4-tiny.cfg b/det-yolov4-tmi/cfg/yolov4-tiny.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-tiny.cfg rename to det-yolov4-tmi/cfg/yolov4-tiny.cfg diff --git a/det-yolov4-training/cfg/yolov4-tiny_contrastive.cfg b/det-yolov4-tmi/cfg/yolov4-tiny_contrastive.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4-tiny_contrastive.cfg rename to det-yolov4-tmi/cfg/yolov4-tiny_contrastive.cfg diff --git a/det-yolov4-training/cfg/yolov4.cfg b/det-yolov4-tmi/cfg/yolov4.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4.cfg rename to det-yolov4-tmi/cfg/yolov4.cfg diff --git a/det-yolov4-training/cfg/yolov4_iter1000.cfg b/det-yolov4-tmi/cfg/yolov4_iter1000.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4_iter1000.cfg rename to det-yolov4-tmi/cfg/yolov4_iter1000.cfg diff --git a/det-yolov4-training/cfg/yolov4x-mish.cfg b/det-yolov4-tmi/cfg/yolov4x-mish.cfg similarity index 100% rename from det-yolov4-training/cfg/yolov4x-mish.cfg rename to det-yolov4-tmi/cfg/yolov4x-mish.cfg diff --git a/det-yolov4-training/cmake/Modules/FindCUDNN.cmake b/det-yolov4-tmi/cmake/Modules/FindCUDNN.cmake similarity index 100% rename from det-yolov4-training/cmake/Modules/FindCUDNN.cmake rename to det-yolov4-tmi/cmake/Modules/FindCUDNN.cmake diff --git a/det-yolov4-training/cmake/Modules/FindPThreads4W.cmake b/det-yolov4-tmi/cmake/Modules/FindPThreads4W.cmake similarity index 100% rename from det-yolov4-training/cmake/Modules/FindPThreads4W.cmake rename to det-yolov4-tmi/cmake/Modules/FindPThreads4W.cmake diff --git a/det-yolov4-training/cmake/Modules/FindStb.cmake b/det-yolov4-tmi/cmake/Modules/FindStb.cmake similarity index 100% rename from det-yolov4-training/cmake/Modules/FindStb.cmake rename to det-yolov4-tmi/cmake/Modules/FindStb.cmake diff --git a/det-yolov4-training/config_and_train.py b/det-yolov4-tmi/config_and_train.py similarity index 100% rename from det-yolov4-training/config_and_train.py rename to det-yolov4-tmi/config_and_train.py diff --git a/det-yolov4-training/convert_label_ark2txt.py b/det-yolov4-tmi/convert_label_ark2txt.py similarity index 92% rename from det-yolov4-training/convert_label_ark2txt.py rename to det-yolov4-tmi/convert_label_ark2txt.py index 1043b53..2e963f7 100755 --- a/det-yolov4-training/convert_label_ark2txt.py +++ b/det-yolov4-tmi/convert_label_ark2txt.py @@ -1,6 +1,6 @@ import os +import imagesize -import cv2 def _annotation_path_for_image(image_path: str, annotations_dir: str) -> str: @@ -21,18 +21,16 @@ def _convert_annotations(index_file_path: str, dst_annotations_dir: str) -> None files = f.readlines() files = [each.strip() for each in files] + N = len(files) for i, each_img_anno_path in enumerate(files): if i % 1000 == 0: - print(f"converted {i} image annotations") + print(f"converted {i}/{N} image annotations") # each_imgpath: asset path # each_txtfile: annotation path each_imgpath, each_txtfile = each_img_anno_path.split() - img = cv2.imread(each_imgpath) - if img is None: - raise ValueError(f"can not read image: {each_imgpath}") - img_h, img_w, _ = img.shape + img_w, img_h = imagesize.get(each_imgpath) with open(each_txtfile, 'r') as f: txt_content = f.readlines() diff --git a/det-yolov4-training/convert_model_darknet2mxnet_yolov4.py b/det-yolov4-tmi/convert_model_darknet2mxnet_yolov4.py similarity index 100% rename from det-yolov4-training/convert_model_darknet2mxnet_yolov4.py rename to det-yolov4-tmi/convert_model_darknet2mxnet_yolov4.py diff --git a/det-yolov4-training/counters_per_class.txt b/det-yolov4-tmi/counters_per_class.txt similarity index 100% rename from det-yolov4-training/counters_per_class.txt rename to det-yolov4-tmi/counters_per_class.txt diff --git a/det-yolov4-training/Dockerfile b/det-yolov4-tmi/cuda101.dockerfile similarity index 82% rename from det-yolov4-training/Dockerfile rename to det-yolov4-tmi/cuda101.dockerfile index 61ce1f6..66273c3 100644 --- a/det-yolov4-training/Dockerfile +++ b/det-yolov4-tmi/cuda101.dockerfile @@ -1,5 +1,8 @@ FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 ARG PIP_SOURCE=https://pypi.mirrors.ustc.edu.cn/simple + +ENV PYTHONPATH=. + WORKDIR /darknet RUN sed -i 's#http://archive.ubuntu.com#https://mirrors.ustc.edu.cn#g' /etc/apt/sources.list RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC && apt-get update @@ -12,11 +15,13 @@ RUN wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_o RUN rm /usr/bin/python3 RUN ln -s /usr/bin/python3.7 /usr/bin/python3 RUN python3 get-pip.py -RUN pip3 install -i ${PIP_SOURCE} mxnet-cu101==1.5.1 numpy opencv-python pyyaml watchdog tensorboardX six +RUN pip3 install -i ${PIP_SOURCE} mxnet-cu101==1.5.1 numpy opencv-python pyyaml watchdog tensorboardX six scipy tqdm imagesize + ENV DEBIAN_FRONTEND noninteractive RUN apt-get update && apt-get install -y libopencv-dev COPY . /darknet -RUN cp /darknet/make_train_test_darknet.sh /usr/bin/start.sh -RUN mkdir /img-man && cp /darknet/training-template.yaml /img-man/training-template.yaml RUN make -j + +RUN mkdir /img-man && cp /darknet/training-template.yaml /img-man/training-template.yaml && cp /darknet/mining/*-template.yaml /img-man +RUN echo "python3 /darknet/start.py" > /usr/bin/start.sh CMD bash /usr/bin/start.sh diff --git a/det-yolov4-training/cuda112.dockerfile b/det-yolov4-tmi/cuda112.dockerfile similarity index 82% rename from det-yolov4-training/cuda112.dockerfile rename to det-yolov4-tmi/cuda112.dockerfile index 3e6884b..bab5c7d 100644 --- a/det-yolov4-training/cuda112.dockerfile +++ b/det-yolov4-tmi/cuda112.dockerfile @@ -1,5 +1,8 @@ FROM nvidia/cuda:11.2.1-cudnn8-devel-ubuntu18.04 ARG PIP_SOURCE=https://pypi.mirrors.ustc.edu.cn/simple + +ENV PYTHONPATH=. + WORKDIR /darknet RUN sed -i 's#http://archive.ubuntu.com#https://mirrors.ustc.edu.cn#g' /etc/apt/sources.list RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC && apt-get update @@ -12,12 +15,13 @@ RUN wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_o RUN rm /usr/bin/python3 RUN ln -s /usr/bin/python3.7 /usr/bin/python3 RUN python3 get-pip.py -RUN pip3 install -i ${PIP_SOURCE} mxnet-cu112==1.9.1 numpy opencv-python pyyaml watchdog tensorboardX six +RUN pip3 install -i ${PIP_SOURCE} mxnet-cu112==1.9.1 numpy opencv-python pyyaml watchdog tensorboardX six scipy tqdm imagesize ENV DEBIAN_FRONTEND noninteractive RUN apt-get update && apt-get install -y libopencv-dev COPY . /darknet -RUN cp /darknet/make_train_test_darknet.sh /usr/bin/start.sh -RUN mkdir /img-man && cp /darknet/training-template.yaml /img-man/training-template.yaml RUN make -j + +RUN mkdir /img-man && cp /darknet/training-template.yaml /img-man/training-template.yaml && cp /darknet/mining/*-template.yaml /img-man +RUN echo "python3 /darknet/start.py" > /usr/bin/start.sh CMD bash /usr/bin/start.sh diff --git a/det-yolov4-training/darknet.py b/det-yolov4-tmi/darknet.py similarity index 100% rename from det-yolov4-training/darknet.py rename to det-yolov4-tmi/darknet.py diff --git a/det-yolov4-training/darknet_images.py b/det-yolov4-tmi/darknet_images.py similarity index 100% rename from det-yolov4-training/darknet_images.py rename to det-yolov4-tmi/darknet_images.py diff --git a/det-yolov4-training/darknet_video.py b/det-yolov4-tmi/darknet_video.py similarity index 100% rename from det-yolov4-training/darknet_video.py rename to det-yolov4-tmi/darknet_video.py diff --git a/det-yolov4-training/data/9k.tree b/det-yolov4-tmi/data/9k.tree similarity index 100% rename from det-yolov4-training/data/9k.tree rename to det-yolov4-tmi/data/9k.tree diff --git a/det-yolov4-training/data/coco.names b/det-yolov4-tmi/data/coco.names similarity index 100% rename from det-yolov4-training/data/coco.names rename to det-yolov4-tmi/data/coco.names diff --git a/det-yolov4-training/data/coco9k.map b/det-yolov4-tmi/data/coco9k.map similarity index 100% rename from det-yolov4-training/data/coco9k.map rename to det-yolov4-tmi/data/coco9k.map diff --git a/det-yolov4-training/data/goal.txt b/det-yolov4-tmi/data/goal.txt similarity index 100% rename from det-yolov4-training/data/goal.txt rename to det-yolov4-tmi/data/goal.txt diff --git a/det-yolov4-training/data/imagenet.labels.list b/det-yolov4-tmi/data/imagenet.labels.list similarity index 100% rename from det-yolov4-training/data/imagenet.labels.list rename to det-yolov4-tmi/data/imagenet.labels.list diff --git a/det-yolov4-training/data/imagenet.shortnames.list b/det-yolov4-tmi/data/imagenet.shortnames.list similarity index 100% rename from det-yolov4-training/data/imagenet.shortnames.list rename to det-yolov4-tmi/data/imagenet.shortnames.list diff --git a/det-yolov4-training/data/labels/make_labels.py b/det-yolov4-tmi/data/labels/make_labels.py similarity index 100% rename from det-yolov4-training/data/labels/make_labels.py rename to det-yolov4-tmi/data/labels/make_labels.py diff --git a/det-yolov4-training/data/openimages.names b/det-yolov4-tmi/data/openimages.names similarity index 100% rename from det-yolov4-training/data/openimages.names rename to det-yolov4-tmi/data/openimages.names diff --git a/det-yolov4-training/data/voc.names b/det-yolov4-tmi/data/voc.names similarity index 100% rename from det-yolov4-training/data/voc.names rename to det-yolov4-tmi/data/voc.names diff --git a/det-yolov4-training/image_yolov3.sh b/det-yolov4-tmi/image_yolov3.sh similarity index 100% rename from det-yolov4-training/image_yolov3.sh rename to det-yolov4-tmi/image_yolov3.sh diff --git a/det-yolov4-training/image_yolov4.sh b/det-yolov4-tmi/image_yolov4.sh similarity index 100% rename from det-yolov4-training/image_yolov4.sh rename to det-yolov4-tmi/image_yolov4.sh diff --git a/det-yolov4-training/img.txt b/det-yolov4-tmi/img.txt similarity index 100% rename from det-yolov4-training/img.txt rename to det-yolov4-tmi/img.txt diff --git a/det-yolov4-training/include/darknet.h b/det-yolov4-tmi/include/darknet.h similarity index 100% rename from det-yolov4-training/include/darknet.h rename to det-yolov4-tmi/include/darknet.h diff --git a/det-yolov4-training/include/yolo_v2_class.hpp b/det-yolov4-tmi/include/yolo_v2_class.hpp similarity index 100% rename from det-yolov4-training/include/yolo_v2_class.hpp rename to det-yolov4-tmi/include/yolo_v2_class.hpp diff --git a/det-yolov4-training/json_mjpeg_streams.sh b/det-yolov4-tmi/json_mjpeg_streams.sh similarity index 100% rename from det-yolov4-training/json_mjpeg_streams.sh rename to det-yolov4-tmi/json_mjpeg_streams.sh diff --git a/det-yolov4-training/make_train_test_darknet.sh b/det-yolov4-tmi/make_train_test_darknet.sh similarity index 100% rename from det-yolov4-training/make_train_test_darknet.sh rename to det-yolov4-tmi/make_train_test_darknet.sh diff --git a/det-yolov4-mining/.dockerignore b/det-yolov4-tmi/mining/.dockerignore similarity index 100% rename from det-yolov4-mining/.dockerignore rename to det-yolov4-tmi/mining/.dockerignore diff --git a/det-yolov4-mining/README.md b/det-yolov4-tmi/mining/README.md similarity index 100% rename from det-yolov4-mining/README.md rename to det-yolov4-tmi/mining/README.md diff --git a/det-yolov4-mining/active_learning/__init__.py b/det-yolov4-tmi/mining/active_learning/__init__.py similarity index 100% rename from det-yolov4-mining/active_learning/__init__.py rename to det-yolov4-tmi/mining/active_learning/__init__.py diff --git a/det-yolov4-mining/active_learning/apis/__init__.py b/det-yolov4-tmi/mining/active_learning/apis/__init__.py similarity index 100% rename from det-yolov4-mining/active_learning/apis/__init__.py rename to det-yolov4-tmi/mining/active_learning/apis/__init__.py diff --git a/det-yolov4-mining/active_learning/apis/al_api.py b/det-yolov4-tmi/mining/active_learning/apis/al_api.py similarity index 100% rename from det-yolov4-mining/active_learning/apis/al_api.py rename to det-yolov4-tmi/mining/active_learning/apis/al_api.py diff --git a/det-yolov4-mining/active_learning/apis/docker_api.py b/det-yolov4-tmi/mining/active_learning/apis/docker_api.py similarity index 100% rename from det-yolov4-mining/active_learning/apis/docker_api.py rename to det-yolov4-tmi/mining/active_learning/apis/docker_api.py diff --git a/det-yolov4-mining/active_learning/dataset/__init__.py b/det-yolov4-tmi/mining/active_learning/dataset/__init__.py similarity index 100% rename from det-yolov4-mining/active_learning/dataset/__init__.py rename to det-yolov4-tmi/mining/active_learning/dataset/__init__.py diff --git a/det-yolov4-mining/active_learning/dataset/datareader.py b/det-yolov4-tmi/mining/active_learning/dataset/datareader.py similarity index 100% rename from det-yolov4-mining/active_learning/dataset/datareader.py rename to det-yolov4-tmi/mining/active_learning/dataset/datareader.py diff --git a/det-yolov4-mining/active_learning/dataset/labeled_dataset.py b/det-yolov4-tmi/mining/active_learning/dataset/labeled_dataset.py similarity index 100% rename from det-yolov4-mining/active_learning/dataset/labeled_dataset.py rename to det-yolov4-tmi/mining/active_learning/dataset/labeled_dataset.py diff --git a/det-yolov4-mining/active_learning/dataset/unlabeled_dataset.py b/det-yolov4-tmi/mining/active_learning/dataset/unlabeled_dataset.py similarity index 100% rename from det-yolov4-mining/active_learning/dataset/unlabeled_dataset.py rename to det-yolov4-tmi/mining/active_learning/dataset/unlabeled_dataset.py diff --git a/det-yolov4-mining/active_learning/model_inference/__init__.py b/det-yolov4-tmi/mining/active_learning/model_inference/__init__.py similarity index 100% rename from det-yolov4-mining/active_learning/model_inference/__init__.py rename to det-yolov4-tmi/mining/active_learning/model_inference/__init__.py diff --git a/det-yolov4-mining/active_learning/model_inference/centernet.py b/det-yolov4-tmi/mining/active_learning/model_inference/centernet.py similarity index 100% rename from det-yolov4-mining/active_learning/model_inference/centernet.py rename to det-yolov4-tmi/mining/active_learning/model_inference/centernet.py diff --git a/det-yolov4-mining/active_learning/model_inference/yolo_models.py b/det-yolov4-tmi/mining/active_learning/model_inference/yolo_models.py similarity index 100% rename from det-yolov4-mining/active_learning/model_inference/yolo_models.py rename to det-yolov4-tmi/mining/active_learning/model_inference/yolo_models.py diff --git a/det-yolov4-mining/active_learning/strategy/__init__.py b/det-yolov4-tmi/mining/active_learning/strategy/__init__.py similarity index 100% rename from det-yolov4-mining/active_learning/strategy/__init__.py rename to det-yolov4-tmi/mining/active_learning/strategy/__init__.py diff --git a/det-yolov4-mining/active_learning/strategy/aldd.py b/det-yolov4-tmi/mining/active_learning/strategy/aldd.py similarity index 100% rename from det-yolov4-mining/active_learning/strategy/aldd.py rename to det-yolov4-tmi/mining/active_learning/strategy/aldd.py diff --git a/det-yolov4-mining/active_learning/strategy/aldd_yolo.py b/det-yolov4-tmi/mining/active_learning/strategy/aldd_yolo.py similarity index 100% rename from det-yolov4-mining/active_learning/strategy/aldd_yolo.py rename to det-yolov4-tmi/mining/active_learning/strategy/aldd_yolo.py diff --git a/det-yolov4-mining/active_learning/strategy/cald.py b/det-yolov4-tmi/mining/active_learning/strategy/cald.py similarity index 100% rename from det-yolov4-mining/active_learning/strategy/cald.py rename to det-yolov4-tmi/mining/active_learning/strategy/cald.py diff --git a/det-yolov4-mining/active_learning/strategy/data_augment.py b/det-yolov4-tmi/mining/active_learning/strategy/data_augment.py similarity index 100% rename from det-yolov4-mining/active_learning/strategy/data_augment.py rename to det-yolov4-tmi/mining/active_learning/strategy/data_augment.py diff --git a/det-yolov4-mining/active_learning/strategy/random_strategy.py b/det-yolov4-tmi/mining/active_learning/strategy/random_strategy.py similarity index 100% rename from det-yolov4-mining/active_learning/strategy/random_strategy.py rename to det-yolov4-tmi/mining/active_learning/strategy/random_strategy.py diff --git a/det-yolov4-mining/active_learning/utils/__init__.py b/det-yolov4-tmi/mining/active_learning/utils/__init__.py similarity index 100% rename from det-yolov4-mining/active_learning/utils/__init__.py rename to det-yolov4-tmi/mining/active_learning/utils/__init__.py diff --git a/det-yolov4-mining/active_learning/utils/al_log.py b/det-yolov4-tmi/mining/active_learning/utils/al_log.py similarity index 100% rename from det-yolov4-mining/active_learning/utils/al_log.py rename to det-yolov4-tmi/mining/active_learning/utils/al_log.py diff --git a/det-yolov4-mining/active_learning/utils/operator.py b/det-yolov4-tmi/mining/active_learning/utils/operator.py similarity index 100% rename from det-yolov4-mining/active_learning/utils/operator.py rename to det-yolov4-tmi/mining/active_learning/utils/operator.py diff --git a/det-yolov4-mining/al_main.py b/det-yolov4-tmi/mining/al_main.py similarity index 100% rename from det-yolov4-mining/al_main.py rename to det-yolov4-tmi/mining/al_main.py diff --git a/det-yolov4-mining/combined_class.txt b/det-yolov4-tmi/mining/combined_class.txt similarity index 100% rename from det-yolov4-mining/combined_class.txt rename to det-yolov4-tmi/mining/combined_class.txt diff --git a/det-yolov4-mining/docker_main.py b/det-yolov4-tmi/mining/docker_main.py similarity index 89% rename from det-yolov4-mining/docker_main.py rename to det-yolov4-tmi/mining/docker_main.py index 3eb4641..359d066 100644 --- a/det-yolov4-mining/docker_main.py +++ b/det-yolov4-tmi/mining/docker_main.py @@ -9,8 +9,8 @@ import write_result -def _load_config() -> dict: - with open("/in/config.yaml", "r", encoding='utf8') as f: +def _load_config(config_file) -> dict: + with open(config_file, "r", encoding='utf8') as f: config = yaml.safe_load(f) # set default task id @@ -34,10 +34,12 @@ def _load_config() -> dict: if __name__ == '__main__': - config = _load_config() + config = _load_config("/in/config.yaml") - run_infer = int(config['run_infer']) - run_mining = int(config['run_mining']) + with open("/in/env.yaml", "r", encoding='utf8') as f: + env_config = yaml.safe_load(f) + run_infer = int(env_config['run_infer']) + run_mining = int(env_config['run_mining']) if not run_infer and not run_mining: raise ValueError('both run_infer and run_mining set to 0, abort') diff --git a/det-yolov4-mining/docker_readme.md b/det-yolov4-tmi/mining/docker_readme.md similarity index 100% rename from det-yolov4-mining/docker_readme.md rename to det-yolov4-tmi/mining/docker_readme.md diff --git a/det-yolov4-mining/infer-template.yaml b/det-yolov4-tmi/mining/infer-template.yaml similarity index 97% rename from det-yolov4-mining/infer-template.yaml rename to det-yolov4-tmi/mining/infer-template.yaml index dce6501..11c6502 100644 --- a/det-yolov4-mining/infer-template.yaml +++ b/det-yolov4-tmi/mining/infer-template.yaml @@ -14,7 +14,7 @@ write_result: True confidence_thresh: 0.1 nms_thresh: 0.45 max_boxes: 50 -# shm_size: '16G' +shm_size: '128G' # gpu_id: '' # model_params_path: [] # class_names: diff --git a/det-yolov4-mining/mining-template.yaml b/det-yolov4-tmi/mining/mining-template.yaml similarity index 93% rename from det-yolov4-mining/mining-template.yaml rename to det-yolov4-tmi/mining/mining-template.yaml index e02770f..2ff8270 100644 --- a/det-yolov4-mining/mining-template.yaml +++ b/det-yolov4-tmi/mining/mining-template.yaml @@ -13,14 +13,14 @@ model_type: detection strategy: aldd_yolo image_height: 608 image_width: 608 -batch_size: 16 +batch_size: 4 anchors: '12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401' confidence_thresh: 0.1 nms_thresh: 0.45 max_boxes: 50 -# shm_size: '16G' +shm_size: '128G' # gpu_id: '0,1,2,3' # model_params_path: [] # task_id: cycle-node-mined-0 # class_names: -# - expose_rubbish \ No newline at end of file +# - expose_rubbish diff --git a/det-yolov4-mining/monitor_process.py b/det-yolov4-tmi/mining/monitor_process.py similarity index 100% rename from det-yolov4-mining/monitor_process.py rename to det-yolov4-tmi/mining/monitor_process.py diff --git a/det-yolov4-mining/start.sh b/det-yolov4-tmi/mining/start.sh similarity index 100% rename from det-yolov4-mining/start.sh rename to det-yolov4-tmi/mining/start.sh diff --git a/det-yolov4-mining/test_api.py b/det-yolov4-tmi/mining/test_api.py similarity index 100% rename from det-yolov4-mining/test_api.py rename to det-yolov4-tmi/mining/test_api.py diff --git a/det-yolov4-mining/test_centernet.py b/det-yolov4-tmi/mining/test_centernet.py similarity index 100% rename from det-yolov4-mining/test_centernet.py rename to det-yolov4-tmi/mining/test_centernet.py diff --git a/det-yolov4-mining/tools/al_strategsy_union.py b/det-yolov4-tmi/mining/tools/al_strategsy_union.py similarity index 100% rename from det-yolov4-mining/tools/al_strategsy_union.py rename to det-yolov4-tmi/mining/tools/al_strategsy_union.py diff --git a/det-yolov4-mining/tools/imagenet_hard_negative.py b/det-yolov4-tmi/mining/tools/imagenet_hard_negative.py similarity index 100% rename from det-yolov4-mining/tools/imagenet_hard_negative.py rename to det-yolov4-tmi/mining/tools/imagenet_hard_negative.py diff --git a/det-yolov4-mining/tools/plot_dataset_class_hist.py b/det-yolov4-tmi/mining/tools/plot_dataset_class_hist.py similarity index 100% rename from det-yolov4-mining/tools/plot_dataset_class_hist.py rename to det-yolov4-tmi/mining/tools/plot_dataset_class_hist.py diff --git a/det-yolov4-mining/tools/visualize_aldd.py b/det-yolov4-tmi/mining/tools/visualize_aldd.py similarity index 100% rename from det-yolov4-mining/tools/visualize_aldd.py rename to det-yolov4-tmi/mining/tools/visualize_aldd.py diff --git a/det-yolov4-mining/tools/visualize_cald.py b/det-yolov4-tmi/mining/tools/visualize_cald.py similarity index 100% rename from det-yolov4-mining/tools/visualize_cald.py rename to det-yolov4-tmi/mining/tools/visualize_cald.py diff --git a/det-yolov4-mining/write_result.py b/det-yolov4-tmi/mining/write_result.py similarity index 100% rename from det-yolov4-mining/write_result.py rename to det-yolov4-tmi/mining/write_result.py diff --git a/det-yolov4-training/net_cam_v3.sh b/det-yolov4-tmi/net_cam_v3.sh similarity index 100% rename from det-yolov4-training/net_cam_v3.sh rename to det-yolov4-tmi/net_cam_v3.sh diff --git a/det-yolov4-training/net_cam_v4.sh b/det-yolov4-tmi/net_cam_v4.sh similarity index 100% rename from det-yolov4-training/net_cam_v4.sh rename to det-yolov4-tmi/net_cam_v4.sh diff --git a/det-yolov4-training/src/.editorconfig b/det-yolov4-tmi/src/.editorconfig similarity index 100% rename from det-yolov4-training/src/.editorconfig rename to det-yolov4-tmi/src/.editorconfig diff --git a/det-yolov4-training/src/activation_kernels.cu b/det-yolov4-tmi/src/activation_kernels.cu similarity index 100% rename from det-yolov4-training/src/activation_kernels.cu rename to det-yolov4-tmi/src/activation_kernels.cu diff --git a/det-yolov4-training/src/activation_layer.c b/det-yolov4-tmi/src/activation_layer.c similarity index 100% rename from det-yolov4-training/src/activation_layer.c rename to det-yolov4-tmi/src/activation_layer.c diff --git a/det-yolov4-training/src/activation_layer.h b/det-yolov4-tmi/src/activation_layer.h similarity index 100% rename from det-yolov4-training/src/activation_layer.h rename to det-yolov4-tmi/src/activation_layer.h diff --git a/det-yolov4-training/src/activations.c b/det-yolov4-tmi/src/activations.c similarity index 100% rename from det-yolov4-training/src/activations.c rename to det-yolov4-tmi/src/activations.c diff --git a/det-yolov4-training/src/activations.h b/det-yolov4-tmi/src/activations.h similarity index 100% rename from det-yolov4-training/src/activations.h rename to det-yolov4-tmi/src/activations.h diff --git a/det-yolov4-training/src/art.c b/det-yolov4-tmi/src/art.c similarity index 100% rename from det-yolov4-training/src/art.c rename to det-yolov4-tmi/src/art.c diff --git a/det-yolov4-training/src/avgpool_layer.c b/det-yolov4-tmi/src/avgpool_layer.c similarity index 100% rename from det-yolov4-training/src/avgpool_layer.c rename to det-yolov4-tmi/src/avgpool_layer.c diff --git a/det-yolov4-training/src/avgpool_layer.h b/det-yolov4-tmi/src/avgpool_layer.h similarity index 100% rename from det-yolov4-training/src/avgpool_layer.h rename to det-yolov4-tmi/src/avgpool_layer.h diff --git a/det-yolov4-training/src/avgpool_layer_kernels.cu b/det-yolov4-tmi/src/avgpool_layer_kernels.cu similarity index 100% rename from det-yolov4-training/src/avgpool_layer_kernels.cu rename to det-yolov4-tmi/src/avgpool_layer_kernels.cu diff --git a/det-yolov4-training/src/batchnorm_layer.c b/det-yolov4-tmi/src/batchnorm_layer.c similarity index 100% rename from det-yolov4-training/src/batchnorm_layer.c rename to det-yolov4-tmi/src/batchnorm_layer.c diff --git a/det-yolov4-training/src/batchnorm_layer.h b/det-yolov4-tmi/src/batchnorm_layer.h similarity index 100% rename from det-yolov4-training/src/batchnorm_layer.h rename to det-yolov4-tmi/src/batchnorm_layer.h diff --git a/det-yolov4-training/src/blas.c b/det-yolov4-tmi/src/blas.c similarity index 100% rename from det-yolov4-training/src/blas.c rename to det-yolov4-tmi/src/blas.c diff --git a/det-yolov4-training/src/blas.h b/det-yolov4-tmi/src/blas.h similarity index 100% rename from det-yolov4-training/src/blas.h rename to det-yolov4-tmi/src/blas.h diff --git a/det-yolov4-training/src/blas_kernels.cu b/det-yolov4-tmi/src/blas_kernels.cu similarity index 100% rename from det-yolov4-training/src/blas_kernels.cu rename to det-yolov4-tmi/src/blas_kernels.cu diff --git a/det-yolov4-training/src/box.c b/det-yolov4-tmi/src/box.c similarity index 100% rename from det-yolov4-training/src/box.c rename to det-yolov4-tmi/src/box.c diff --git a/det-yolov4-training/src/box.h b/det-yolov4-tmi/src/box.h similarity index 100% rename from det-yolov4-training/src/box.h rename to det-yolov4-tmi/src/box.h diff --git a/det-yolov4-training/src/captcha.c b/det-yolov4-tmi/src/captcha.c similarity index 100% rename from det-yolov4-training/src/captcha.c rename to det-yolov4-tmi/src/captcha.c diff --git a/det-yolov4-training/src/cifar.c b/det-yolov4-tmi/src/cifar.c similarity index 100% rename from det-yolov4-training/src/cifar.c rename to det-yolov4-tmi/src/cifar.c diff --git a/det-yolov4-training/src/classifier.c b/det-yolov4-tmi/src/classifier.c similarity index 100% rename from det-yolov4-training/src/classifier.c rename to det-yolov4-tmi/src/classifier.c diff --git a/det-yolov4-training/src/classifier.h b/det-yolov4-tmi/src/classifier.h similarity index 100% rename from det-yolov4-training/src/classifier.h rename to det-yolov4-tmi/src/classifier.h diff --git a/det-yolov4-training/src/coco.c b/det-yolov4-tmi/src/coco.c similarity index 100% rename from det-yolov4-training/src/coco.c rename to det-yolov4-tmi/src/coco.c diff --git a/det-yolov4-training/src/col2im.c b/det-yolov4-tmi/src/col2im.c similarity index 100% rename from det-yolov4-training/src/col2im.c rename to det-yolov4-tmi/src/col2im.c diff --git a/det-yolov4-training/src/col2im.h b/det-yolov4-tmi/src/col2im.h similarity index 100% rename from det-yolov4-training/src/col2im.h rename to det-yolov4-tmi/src/col2im.h diff --git a/det-yolov4-training/src/col2im_kernels.cu b/det-yolov4-tmi/src/col2im_kernels.cu similarity index 100% rename from det-yolov4-training/src/col2im_kernels.cu rename to det-yolov4-tmi/src/col2im_kernels.cu diff --git a/det-yolov4-training/src/compare.c b/det-yolov4-tmi/src/compare.c similarity index 100% rename from det-yolov4-training/src/compare.c rename to det-yolov4-tmi/src/compare.c diff --git a/det-yolov4-training/src/connected_layer.c b/det-yolov4-tmi/src/connected_layer.c similarity index 100% rename from det-yolov4-training/src/connected_layer.c rename to det-yolov4-tmi/src/connected_layer.c diff --git a/det-yolov4-training/src/connected_layer.h b/det-yolov4-tmi/src/connected_layer.h similarity index 100% rename from det-yolov4-training/src/connected_layer.h rename to det-yolov4-tmi/src/connected_layer.h diff --git a/det-yolov4-training/src/conv_lstm_layer.c b/det-yolov4-tmi/src/conv_lstm_layer.c similarity index 100% rename from det-yolov4-training/src/conv_lstm_layer.c rename to det-yolov4-tmi/src/conv_lstm_layer.c diff --git a/det-yolov4-training/src/conv_lstm_layer.h b/det-yolov4-tmi/src/conv_lstm_layer.h similarity index 100% rename from det-yolov4-training/src/conv_lstm_layer.h rename to det-yolov4-tmi/src/conv_lstm_layer.h diff --git a/det-yolov4-training/src/convolutional_kernels.cu b/det-yolov4-tmi/src/convolutional_kernels.cu similarity index 100% rename from det-yolov4-training/src/convolutional_kernels.cu rename to det-yolov4-tmi/src/convolutional_kernels.cu diff --git a/det-yolov4-training/src/convolutional_layer.c b/det-yolov4-tmi/src/convolutional_layer.c similarity index 100% rename from det-yolov4-training/src/convolutional_layer.c rename to det-yolov4-tmi/src/convolutional_layer.c diff --git a/det-yolov4-training/src/convolutional_layer.h b/det-yolov4-tmi/src/convolutional_layer.h similarity index 100% rename from det-yolov4-training/src/convolutional_layer.h rename to det-yolov4-tmi/src/convolutional_layer.h diff --git a/det-yolov4-training/src/cost_layer.c b/det-yolov4-tmi/src/cost_layer.c similarity index 100% rename from det-yolov4-training/src/cost_layer.c rename to det-yolov4-tmi/src/cost_layer.c diff --git a/det-yolov4-training/src/cost_layer.h b/det-yolov4-tmi/src/cost_layer.h similarity index 100% rename from det-yolov4-training/src/cost_layer.h rename to det-yolov4-tmi/src/cost_layer.h diff --git a/det-yolov4-training/src/cpu_gemm.c b/det-yolov4-tmi/src/cpu_gemm.c similarity index 100% rename from det-yolov4-training/src/cpu_gemm.c rename to det-yolov4-tmi/src/cpu_gemm.c diff --git a/det-yolov4-training/src/crnn_layer.c b/det-yolov4-tmi/src/crnn_layer.c similarity index 100% rename from det-yolov4-training/src/crnn_layer.c rename to det-yolov4-tmi/src/crnn_layer.c diff --git a/det-yolov4-training/src/crnn_layer.h b/det-yolov4-tmi/src/crnn_layer.h similarity index 100% rename from det-yolov4-training/src/crnn_layer.h rename to det-yolov4-tmi/src/crnn_layer.h diff --git a/det-yolov4-training/src/crop_layer.c b/det-yolov4-tmi/src/crop_layer.c similarity index 100% rename from det-yolov4-training/src/crop_layer.c rename to det-yolov4-tmi/src/crop_layer.c diff --git a/det-yolov4-training/src/crop_layer.h b/det-yolov4-tmi/src/crop_layer.h similarity index 100% rename from det-yolov4-training/src/crop_layer.h rename to det-yolov4-tmi/src/crop_layer.h diff --git a/det-yolov4-training/src/crop_layer_kernels.cu b/det-yolov4-tmi/src/crop_layer_kernels.cu similarity index 100% rename from det-yolov4-training/src/crop_layer_kernels.cu rename to det-yolov4-tmi/src/crop_layer_kernels.cu diff --git a/det-yolov4-training/src/csharp/CMakeLists.txt b/det-yolov4-tmi/src/csharp/CMakeLists.txt similarity index 100% rename from det-yolov4-training/src/csharp/CMakeLists.txt rename to det-yolov4-tmi/src/csharp/CMakeLists.txt diff --git a/det-yolov4-training/src/csharp/YoloCSharpWrapper.cs b/det-yolov4-tmi/src/csharp/YoloCSharpWrapper.cs similarity index 100% rename from det-yolov4-training/src/csharp/YoloCSharpWrapper.cs rename to det-yolov4-tmi/src/csharp/YoloCSharpWrapper.cs diff --git a/det-yolov4-training/src/dark_cuda.c b/det-yolov4-tmi/src/dark_cuda.c similarity index 100% rename from det-yolov4-training/src/dark_cuda.c rename to det-yolov4-tmi/src/dark_cuda.c diff --git a/det-yolov4-training/src/dark_cuda.h b/det-yolov4-tmi/src/dark_cuda.h similarity index 100% rename from det-yolov4-training/src/dark_cuda.h rename to det-yolov4-tmi/src/dark_cuda.h diff --git a/det-yolov4-training/src/darknet.c b/det-yolov4-tmi/src/darknet.c similarity index 100% rename from det-yolov4-training/src/darknet.c rename to det-yolov4-tmi/src/darknet.c diff --git a/det-yolov4-training/src/darkunistd.h b/det-yolov4-tmi/src/darkunistd.h similarity index 100% rename from det-yolov4-training/src/darkunistd.h rename to det-yolov4-tmi/src/darkunistd.h diff --git a/det-yolov4-training/src/data.c b/det-yolov4-tmi/src/data.c similarity index 100% rename from det-yolov4-training/src/data.c rename to det-yolov4-tmi/src/data.c diff --git a/det-yolov4-training/src/data.h b/det-yolov4-tmi/src/data.h similarity index 100% rename from det-yolov4-training/src/data.h rename to det-yolov4-tmi/src/data.h diff --git a/det-yolov4-training/src/deconvolutional_kernels.cu b/det-yolov4-tmi/src/deconvolutional_kernels.cu similarity index 100% rename from det-yolov4-training/src/deconvolutional_kernels.cu rename to det-yolov4-tmi/src/deconvolutional_kernels.cu diff --git a/det-yolov4-training/src/deconvolutional_layer.c b/det-yolov4-tmi/src/deconvolutional_layer.c similarity index 100% rename from det-yolov4-training/src/deconvolutional_layer.c rename to det-yolov4-tmi/src/deconvolutional_layer.c diff --git a/det-yolov4-training/src/deconvolutional_layer.h b/det-yolov4-tmi/src/deconvolutional_layer.h similarity index 100% rename from det-yolov4-training/src/deconvolutional_layer.h rename to det-yolov4-tmi/src/deconvolutional_layer.h diff --git a/det-yolov4-training/src/demo.c b/det-yolov4-tmi/src/demo.c similarity index 100% rename from det-yolov4-training/src/demo.c rename to det-yolov4-tmi/src/demo.c diff --git a/det-yolov4-training/src/demo.h b/det-yolov4-tmi/src/demo.h similarity index 100% rename from det-yolov4-training/src/demo.h rename to det-yolov4-tmi/src/demo.h diff --git a/det-yolov4-training/src/detection_layer.c b/det-yolov4-tmi/src/detection_layer.c similarity index 100% rename from det-yolov4-training/src/detection_layer.c rename to det-yolov4-tmi/src/detection_layer.c diff --git a/det-yolov4-training/src/detection_layer.h b/det-yolov4-tmi/src/detection_layer.h similarity index 100% rename from det-yolov4-training/src/detection_layer.h rename to det-yolov4-tmi/src/detection_layer.h diff --git a/det-yolov4-training/src/detector.c b/det-yolov4-tmi/src/detector.c similarity index 100% rename from det-yolov4-training/src/detector.c rename to det-yolov4-tmi/src/detector.c diff --git a/det-yolov4-training/src/dice.c b/det-yolov4-tmi/src/dice.c similarity index 100% rename from det-yolov4-training/src/dice.c rename to det-yolov4-tmi/src/dice.c diff --git a/det-yolov4-training/src/dropout_layer.c b/det-yolov4-tmi/src/dropout_layer.c similarity index 100% rename from det-yolov4-training/src/dropout_layer.c rename to det-yolov4-tmi/src/dropout_layer.c diff --git a/det-yolov4-training/src/dropout_layer.h b/det-yolov4-tmi/src/dropout_layer.h similarity index 100% rename from det-yolov4-training/src/dropout_layer.h rename to det-yolov4-tmi/src/dropout_layer.h diff --git a/det-yolov4-training/src/dropout_layer_kernels.cu b/det-yolov4-tmi/src/dropout_layer_kernels.cu similarity index 100% rename from det-yolov4-training/src/dropout_layer_kernels.cu rename to det-yolov4-tmi/src/dropout_layer_kernels.cu diff --git a/det-yolov4-training/src/gaussian_yolo_layer.c b/det-yolov4-tmi/src/gaussian_yolo_layer.c similarity index 100% rename from det-yolov4-training/src/gaussian_yolo_layer.c rename to det-yolov4-tmi/src/gaussian_yolo_layer.c diff --git a/det-yolov4-training/src/gaussian_yolo_layer.h b/det-yolov4-tmi/src/gaussian_yolo_layer.h similarity index 100% rename from det-yolov4-training/src/gaussian_yolo_layer.h rename to det-yolov4-tmi/src/gaussian_yolo_layer.h diff --git a/det-yolov4-training/src/gemm.c b/det-yolov4-tmi/src/gemm.c similarity index 100% rename from det-yolov4-training/src/gemm.c rename to det-yolov4-tmi/src/gemm.c diff --git a/det-yolov4-training/src/gemm.h b/det-yolov4-tmi/src/gemm.h similarity index 100% rename from det-yolov4-training/src/gemm.h rename to det-yolov4-tmi/src/gemm.h diff --git a/det-yolov4-training/src/getopt.c b/det-yolov4-tmi/src/getopt.c similarity index 100% rename from det-yolov4-training/src/getopt.c rename to det-yolov4-tmi/src/getopt.c diff --git a/det-yolov4-training/src/getopt.h b/det-yolov4-tmi/src/getopt.h similarity index 100% rename from det-yolov4-training/src/getopt.h rename to det-yolov4-tmi/src/getopt.h diff --git a/det-yolov4-training/src/gettimeofday.c b/det-yolov4-tmi/src/gettimeofday.c similarity index 100% rename from det-yolov4-training/src/gettimeofday.c rename to det-yolov4-tmi/src/gettimeofday.c diff --git a/det-yolov4-training/src/gettimeofday.h b/det-yolov4-tmi/src/gettimeofday.h similarity index 100% rename from det-yolov4-training/src/gettimeofday.h rename to det-yolov4-tmi/src/gettimeofday.h diff --git a/det-yolov4-training/src/go.c b/det-yolov4-tmi/src/go.c similarity index 100% rename from det-yolov4-training/src/go.c rename to det-yolov4-tmi/src/go.c diff --git a/det-yolov4-training/src/gru_layer.c b/det-yolov4-tmi/src/gru_layer.c similarity index 100% rename from det-yolov4-training/src/gru_layer.c rename to det-yolov4-tmi/src/gru_layer.c diff --git a/det-yolov4-training/src/gru_layer.h b/det-yolov4-tmi/src/gru_layer.h similarity index 100% rename from det-yolov4-training/src/gru_layer.h rename to det-yolov4-tmi/src/gru_layer.h diff --git a/det-yolov4-training/src/http_stream.cpp b/det-yolov4-tmi/src/http_stream.cpp similarity index 100% rename from det-yolov4-training/src/http_stream.cpp rename to det-yolov4-tmi/src/http_stream.cpp diff --git a/det-yolov4-training/src/http_stream.h b/det-yolov4-tmi/src/http_stream.h similarity index 100% rename from det-yolov4-training/src/http_stream.h rename to det-yolov4-tmi/src/http_stream.h diff --git a/det-yolov4-training/src/httplib.h b/det-yolov4-tmi/src/httplib.h similarity index 100% rename from det-yolov4-training/src/httplib.h rename to det-yolov4-tmi/src/httplib.h diff --git a/det-yolov4-training/src/im2col.c b/det-yolov4-tmi/src/im2col.c similarity index 100% rename from det-yolov4-training/src/im2col.c rename to det-yolov4-tmi/src/im2col.c diff --git a/det-yolov4-training/src/im2col.h b/det-yolov4-tmi/src/im2col.h similarity index 100% rename from det-yolov4-training/src/im2col.h rename to det-yolov4-tmi/src/im2col.h diff --git a/det-yolov4-training/src/im2col_kernels.cu b/det-yolov4-tmi/src/im2col_kernels.cu similarity index 100% rename from det-yolov4-training/src/im2col_kernels.cu rename to det-yolov4-tmi/src/im2col_kernels.cu diff --git a/det-yolov4-training/src/image.c b/det-yolov4-tmi/src/image.c similarity index 100% rename from det-yolov4-training/src/image.c rename to det-yolov4-tmi/src/image.c diff --git a/det-yolov4-training/src/image.h b/det-yolov4-tmi/src/image.h similarity index 100% rename from det-yolov4-training/src/image.h rename to det-yolov4-tmi/src/image.h diff --git a/det-yolov4-training/src/image_opencv.cpp b/det-yolov4-tmi/src/image_opencv.cpp similarity index 100% rename from det-yolov4-training/src/image_opencv.cpp rename to det-yolov4-tmi/src/image_opencv.cpp diff --git a/det-yolov4-training/src/image_opencv.h b/det-yolov4-tmi/src/image_opencv.h similarity index 100% rename from det-yolov4-training/src/image_opencv.h rename to det-yolov4-tmi/src/image_opencv.h diff --git a/det-yolov4-training/src/layer.c b/det-yolov4-tmi/src/layer.c similarity index 100% rename from det-yolov4-training/src/layer.c rename to det-yolov4-tmi/src/layer.c diff --git a/det-yolov4-training/src/layer.h b/det-yolov4-tmi/src/layer.h similarity index 100% rename from det-yolov4-training/src/layer.h rename to det-yolov4-tmi/src/layer.h diff --git a/det-yolov4-training/src/list.c b/det-yolov4-tmi/src/list.c similarity index 100% rename from det-yolov4-training/src/list.c rename to det-yolov4-tmi/src/list.c diff --git a/det-yolov4-training/src/list.h b/det-yolov4-tmi/src/list.h similarity index 100% rename from det-yolov4-training/src/list.h rename to det-yolov4-tmi/src/list.h diff --git a/det-yolov4-training/src/local_layer.c b/det-yolov4-tmi/src/local_layer.c similarity index 100% rename from det-yolov4-training/src/local_layer.c rename to det-yolov4-tmi/src/local_layer.c diff --git a/det-yolov4-training/src/local_layer.h b/det-yolov4-tmi/src/local_layer.h similarity index 100% rename from det-yolov4-training/src/local_layer.h rename to det-yolov4-tmi/src/local_layer.h diff --git a/det-yolov4-training/src/lstm_layer.c b/det-yolov4-tmi/src/lstm_layer.c similarity index 100% rename from det-yolov4-training/src/lstm_layer.c rename to det-yolov4-tmi/src/lstm_layer.c diff --git a/det-yolov4-training/src/lstm_layer.h b/det-yolov4-tmi/src/lstm_layer.h similarity index 100% rename from det-yolov4-training/src/lstm_layer.h rename to det-yolov4-tmi/src/lstm_layer.h diff --git a/det-yolov4-training/src/matrix.c b/det-yolov4-tmi/src/matrix.c similarity index 100% rename from det-yolov4-training/src/matrix.c rename to det-yolov4-tmi/src/matrix.c diff --git a/det-yolov4-training/src/matrix.h b/det-yolov4-tmi/src/matrix.h similarity index 100% rename from det-yolov4-training/src/matrix.h rename to det-yolov4-tmi/src/matrix.h diff --git a/det-yolov4-training/src/maxpool_layer.c b/det-yolov4-tmi/src/maxpool_layer.c similarity index 100% rename from det-yolov4-training/src/maxpool_layer.c rename to det-yolov4-tmi/src/maxpool_layer.c diff --git a/det-yolov4-training/src/maxpool_layer.h b/det-yolov4-tmi/src/maxpool_layer.h similarity index 100% rename from det-yolov4-training/src/maxpool_layer.h rename to det-yolov4-tmi/src/maxpool_layer.h diff --git a/det-yolov4-training/src/maxpool_layer_kernels.cu b/det-yolov4-tmi/src/maxpool_layer_kernels.cu similarity index 100% rename from det-yolov4-training/src/maxpool_layer_kernels.cu rename to det-yolov4-tmi/src/maxpool_layer_kernels.cu diff --git a/det-yolov4-training/src/network.c b/det-yolov4-tmi/src/network.c similarity index 100% rename from det-yolov4-training/src/network.c rename to det-yolov4-tmi/src/network.c diff --git a/det-yolov4-training/src/network.h b/det-yolov4-tmi/src/network.h similarity index 100% rename from det-yolov4-training/src/network.h rename to det-yolov4-tmi/src/network.h diff --git a/det-yolov4-training/src/network_kernels.cu b/det-yolov4-tmi/src/network_kernels.cu similarity index 100% rename from det-yolov4-training/src/network_kernels.cu rename to det-yolov4-tmi/src/network_kernels.cu diff --git a/det-yolov4-training/src/nightmare.c b/det-yolov4-tmi/src/nightmare.c similarity index 100% rename from det-yolov4-training/src/nightmare.c rename to det-yolov4-tmi/src/nightmare.c diff --git a/det-yolov4-training/src/normalization_layer.c b/det-yolov4-tmi/src/normalization_layer.c similarity index 100% rename from det-yolov4-training/src/normalization_layer.c rename to det-yolov4-tmi/src/normalization_layer.c diff --git a/det-yolov4-training/src/normalization_layer.h b/det-yolov4-tmi/src/normalization_layer.h similarity index 100% rename from det-yolov4-training/src/normalization_layer.h rename to det-yolov4-tmi/src/normalization_layer.h diff --git a/det-yolov4-training/src/option_list.c b/det-yolov4-tmi/src/option_list.c similarity index 100% rename from det-yolov4-training/src/option_list.c rename to det-yolov4-tmi/src/option_list.c diff --git a/det-yolov4-training/src/option_list.h b/det-yolov4-tmi/src/option_list.h similarity index 100% rename from det-yolov4-training/src/option_list.h rename to det-yolov4-tmi/src/option_list.h diff --git a/det-yolov4-training/src/parser.c b/det-yolov4-tmi/src/parser.c similarity index 100% rename from det-yolov4-training/src/parser.c rename to det-yolov4-tmi/src/parser.c diff --git a/det-yolov4-training/src/parser.h b/det-yolov4-tmi/src/parser.h similarity index 100% rename from det-yolov4-training/src/parser.h rename to det-yolov4-tmi/src/parser.h diff --git a/det-yolov4-training/src/region_layer.c b/det-yolov4-tmi/src/region_layer.c similarity index 100% rename from det-yolov4-training/src/region_layer.c rename to det-yolov4-tmi/src/region_layer.c diff --git a/det-yolov4-training/src/region_layer.h b/det-yolov4-tmi/src/region_layer.h similarity index 100% rename from det-yolov4-training/src/region_layer.h rename to det-yolov4-tmi/src/region_layer.h diff --git a/det-yolov4-training/src/reorg_layer.c b/det-yolov4-tmi/src/reorg_layer.c similarity index 100% rename from det-yolov4-training/src/reorg_layer.c rename to det-yolov4-tmi/src/reorg_layer.c diff --git a/det-yolov4-training/src/reorg_layer.h b/det-yolov4-tmi/src/reorg_layer.h similarity index 100% rename from det-yolov4-training/src/reorg_layer.h rename to det-yolov4-tmi/src/reorg_layer.h diff --git a/det-yolov4-training/src/reorg_old_layer.c b/det-yolov4-tmi/src/reorg_old_layer.c similarity index 100% rename from det-yolov4-training/src/reorg_old_layer.c rename to det-yolov4-tmi/src/reorg_old_layer.c diff --git a/det-yolov4-training/src/reorg_old_layer.h b/det-yolov4-tmi/src/reorg_old_layer.h similarity index 100% rename from det-yolov4-training/src/reorg_old_layer.h rename to det-yolov4-tmi/src/reorg_old_layer.h diff --git a/det-yolov4-training/src/representation_layer.c b/det-yolov4-tmi/src/representation_layer.c similarity index 100% rename from det-yolov4-training/src/representation_layer.c rename to det-yolov4-tmi/src/representation_layer.c diff --git a/det-yolov4-training/src/representation_layer.h b/det-yolov4-tmi/src/representation_layer.h similarity index 100% rename from det-yolov4-training/src/representation_layer.h rename to det-yolov4-tmi/src/representation_layer.h diff --git a/det-yolov4-training/src/rnn.c b/det-yolov4-tmi/src/rnn.c similarity index 100% rename from det-yolov4-training/src/rnn.c rename to det-yolov4-tmi/src/rnn.c diff --git a/det-yolov4-training/src/rnn_layer.c b/det-yolov4-tmi/src/rnn_layer.c similarity index 100% rename from det-yolov4-training/src/rnn_layer.c rename to det-yolov4-tmi/src/rnn_layer.c diff --git a/det-yolov4-training/src/rnn_layer.h b/det-yolov4-tmi/src/rnn_layer.h similarity index 100% rename from det-yolov4-training/src/rnn_layer.h rename to det-yolov4-tmi/src/rnn_layer.h diff --git a/det-yolov4-training/src/rnn_vid.c b/det-yolov4-tmi/src/rnn_vid.c similarity index 100% rename from det-yolov4-training/src/rnn_vid.c rename to det-yolov4-tmi/src/rnn_vid.c diff --git a/det-yolov4-training/src/route_layer.c b/det-yolov4-tmi/src/route_layer.c similarity index 100% rename from det-yolov4-training/src/route_layer.c rename to det-yolov4-tmi/src/route_layer.c diff --git a/det-yolov4-training/src/route_layer.h b/det-yolov4-tmi/src/route_layer.h similarity index 100% rename from det-yolov4-training/src/route_layer.h rename to det-yolov4-tmi/src/route_layer.h diff --git a/det-yolov4-training/src/sam_layer.c b/det-yolov4-tmi/src/sam_layer.c similarity index 100% rename from det-yolov4-training/src/sam_layer.c rename to det-yolov4-tmi/src/sam_layer.c diff --git a/det-yolov4-training/src/sam_layer.h b/det-yolov4-tmi/src/sam_layer.h similarity index 100% rename from det-yolov4-training/src/sam_layer.h rename to det-yolov4-tmi/src/sam_layer.h diff --git a/det-yolov4-training/src/scale_channels_layer.c b/det-yolov4-tmi/src/scale_channels_layer.c similarity index 100% rename from det-yolov4-training/src/scale_channels_layer.c rename to det-yolov4-tmi/src/scale_channels_layer.c diff --git a/det-yolov4-training/src/scale_channels_layer.h b/det-yolov4-tmi/src/scale_channels_layer.h similarity index 100% rename from det-yolov4-training/src/scale_channels_layer.h rename to det-yolov4-tmi/src/scale_channels_layer.h diff --git a/det-yolov4-training/src/shortcut_layer.c b/det-yolov4-tmi/src/shortcut_layer.c similarity index 100% rename from det-yolov4-training/src/shortcut_layer.c rename to det-yolov4-tmi/src/shortcut_layer.c diff --git a/det-yolov4-training/src/shortcut_layer.h b/det-yolov4-tmi/src/shortcut_layer.h similarity index 100% rename from det-yolov4-training/src/shortcut_layer.h rename to det-yolov4-tmi/src/shortcut_layer.h diff --git a/det-yolov4-training/src/softmax_layer.c b/det-yolov4-tmi/src/softmax_layer.c similarity index 100% rename from det-yolov4-training/src/softmax_layer.c rename to det-yolov4-tmi/src/softmax_layer.c diff --git a/det-yolov4-training/src/softmax_layer.h b/det-yolov4-tmi/src/softmax_layer.h similarity index 100% rename from det-yolov4-training/src/softmax_layer.h rename to det-yolov4-tmi/src/softmax_layer.h diff --git a/det-yolov4-training/src/super.c b/det-yolov4-tmi/src/super.c similarity index 100% rename from det-yolov4-training/src/super.c rename to det-yolov4-tmi/src/super.c diff --git a/det-yolov4-training/src/swag.c b/det-yolov4-tmi/src/swag.c similarity index 100% rename from det-yolov4-training/src/swag.c rename to det-yolov4-tmi/src/swag.c diff --git a/det-yolov4-training/src/tag.c b/det-yolov4-tmi/src/tag.c similarity index 100% rename from det-yolov4-training/src/tag.c rename to det-yolov4-tmi/src/tag.c diff --git a/det-yolov4-training/src/tree.c b/det-yolov4-tmi/src/tree.c similarity index 100% rename from det-yolov4-training/src/tree.c rename to det-yolov4-tmi/src/tree.c diff --git a/det-yolov4-training/src/tree.h b/det-yolov4-tmi/src/tree.h similarity index 100% rename from det-yolov4-training/src/tree.h rename to det-yolov4-tmi/src/tree.h diff --git a/det-yolov4-training/src/upsample_layer.c b/det-yolov4-tmi/src/upsample_layer.c similarity index 100% rename from det-yolov4-training/src/upsample_layer.c rename to det-yolov4-tmi/src/upsample_layer.c diff --git a/det-yolov4-training/src/upsample_layer.h b/det-yolov4-tmi/src/upsample_layer.h similarity index 100% rename from det-yolov4-training/src/upsample_layer.h rename to det-yolov4-tmi/src/upsample_layer.h diff --git a/det-yolov4-training/src/utils.c b/det-yolov4-tmi/src/utils.c similarity index 100% rename from det-yolov4-training/src/utils.c rename to det-yolov4-tmi/src/utils.c diff --git a/det-yolov4-training/src/utils.h b/det-yolov4-tmi/src/utils.h similarity index 100% rename from det-yolov4-training/src/utils.h rename to det-yolov4-tmi/src/utils.h diff --git a/det-yolov4-training/src/version.h b/det-yolov4-tmi/src/version.h similarity index 100% rename from det-yolov4-training/src/version.h rename to det-yolov4-tmi/src/version.h diff --git a/det-yolov4-training/src/version.h.in b/det-yolov4-tmi/src/version.h.in similarity index 100% rename from det-yolov4-training/src/version.h.in rename to det-yolov4-tmi/src/version.h.in diff --git a/det-yolov4-training/src/voxel.c b/det-yolov4-tmi/src/voxel.c similarity index 100% rename from det-yolov4-training/src/voxel.c rename to det-yolov4-tmi/src/voxel.c diff --git a/det-yolov4-training/src/writing.c b/det-yolov4-tmi/src/writing.c similarity index 100% rename from det-yolov4-training/src/writing.c rename to det-yolov4-tmi/src/writing.c diff --git a/det-yolov4-training/src/yolo.c b/det-yolov4-tmi/src/yolo.c similarity index 100% rename from det-yolov4-training/src/yolo.c rename to det-yolov4-tmi/src/yolo.c diff --git a/det-yolov4-training/src/yolo_console_dll.cpp b/det-yolov4-tmi/src/yolo_console_dll.cpp similarity index 100% rename from det-yolov4-training/src/yolo_console_dll.cpp rename to det-yolov4-tmi/src/yolo_console_dll.cpp diff --git a/det-yolov4-training/src/yolo_layer.c b/det-yolov4-tmi/src/yolo_layer.c similarity index 100% rename from det-yolov4-training/src/yolo_layer.c rename to det-yolov4-tmi/src/yolo_layer.c diff --git a/det-yolov4-training/src/yolo_layer.h b/det-yolov4-tmi/src/yolo_layer.h similarity index 100% rename from det-yolov4-training/src/yolo_layer.h rename to det-yolov4-tmi/src/yolo_layer.h diff --git a/det-yolov4-training/src/yolo_v2_class.cpp b/det-yolov4-tmi/src/yolo_v2_class.cpp similarity index 100% rename from det-yolov4-training/src/yolo_v2_class.cpp rename to det-yolov4-tmi/src/yolo_v2_class.cpp diff --git a/det-yolov4-tmi/start.py b/det-yolov4-tmi/start.py new file mode 100644 index 0000000..67da850 --- /dev/null +++ b/det-yolov4-tmi/start.py @@ -0,0 +1,24 @@ +import logging +import subprocess +import sys + +import yaml + + +def start() -> int: + with open("/in/env.yaml", "r", encoding='utf8') as f: + config = yaml.safe_load(f) + + logging.info(f"config is {config}") + if config['run_training']: + cmd = 'bash /darknet/make_train_test_darknet.sh' + cwd = '/darknet' + else: + cmd = 'python3 docker_main.py' + cwd = '/darknet/mining' + subprocess.run(cmd, check=True, shell=True, cwd=cwd) + + return 0 + +if __name__ == '__main__': + sys.exit(start()) diff --git a/det-yolov4-training/train.sh b/det-yolov4-tmi/train.sh similarity index 100% rename from det-yolov4-training/train.sh rename to det-yolov4-tmi/train.sh diff --git a/det-yolov4-training/train_watcher.py b/det-yolov4-tmi/train_watcher.py similarity index 100% rename from det-yolov4-training/train_watcher.py rename to det-yolov4-tmi/train_watcher.py diff --git a/det-yolov4-training/train_yolov3.sh b/det-yolov4-tmi/train_yolov3.sh similarity index 100% rename from det-yolov4-training/train_yolov3.sh rename to det-yolov4-tmi/train_yolov3.sh diff --git a/det-yolov4-training/training-template.yaml b/det-yolov4-tmi/training-template.yaml similarity index 82% rename from det-yolov4-training/training-template.yaml rename to det-yolov4-tmi/training-template.yaml index 17c32f7..bb276dc 100644 --- a/det-yolov4-training/training-template.yaml +++ b/det-yolov4-tmi/training-template.yaml @@ -5,8 +5,9 @@ learning_rate: 0.0013 max_batches: 20000 warmup_iterations: 1000 batch: 64 -subdivisions: 32 -shm_size: '16G' +subdivisions: 64 +shm_size: '128G' +export_format: 'ark:raw' # class_names: # - cat # gpu_id: '0,1,2,3' diff --git a/det-yolov4-training/video_yolov3.sh b/det-yolov4-tmi/video_yolov3.sh similarity index 100% rename from det-yolov4-training/video_yolov3.sh rename to det-yolov4-tmi/video_yolov3.sh diff --git a/det-yolov4-training/video_yolov4.sh b/det-yolov4-tmi/video_yolov4.sh similarity index 100% rename from det-yolov4-training/video_yolov4.sh rename to det-yolov4-tmi/video_yolov4.sh diff --git a/det-yolov4-training/warm_up_training.py b/det-yolov4-tmi/warm_up_training.py similarity index 100% rename from det-yolov4-training/warm_up_training.py rename to det-yolov4-tmi/warm_up_training.py diff --git a/det-yolov5-tmi/.dockerignore b/det-yolov5-tmi/.dockerignore index af51ccc..39f415a 100644 --- a/det-yolov5-tmi/.dockerignore +++ b/det-yolov5-tmi/.dockerignore @@ -11,9 +11,13 @@ data/samples/* **/results*.csv *.jpg +ymir/tensorrt/build +ymir/tensorrt/pt_result +ymir/tensorrt/trt_result # Neural Network weights ----------------------------------------------------------------------------------------------- -**/*.pt +#**/*.pt **/*.pth +**/*.pkl **/*.onnx **/*.engine **/*.mlmodel diff --git a/det-yolov5-tmi/Dockerfile b/det-yolov5-tmi/Dockerfile deleted file mode 100644 index 489dd04..0000000 --- a/det-yolov5-tmi/Dockerfile +++ /dev/null @@ -1,64 +0,0 @@ -# YOLOv5 🚀 by Ultralytics, GPL-3.0 license - -# Start FROM Nvidia PyTorch image https://ngc.nvidia.com/catalog/containers/nvidia:pytorch -FROM nvcr.io/nvidia/pytorch:21.10-py3 - -# Install linux packages -RUN apt update && apt install -y zip htop screen libgl1-mesa-glx - -# Install python dependencies -COPY requirements.txt . -RUN python -m pip install --upgrade pip -RUN pip uninstall -y torch torchvision torchtext -RUN pip install --no-cache -r requirements.txt albumentations wandb gsutil notebook \ - torch==1.10.2+cu113 torchvision==0.11.3+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html -# RUN pip install --no-cache -U torch torchvision - -# Create working directory -RUN mkdir -p /usr/src/app -WORKDIR /usr/src/app - -# Copy contents -COPY . /usr/src/app - -# Downloads to user config dir -ADD https://ultralytics.com/assets/Arial.ttf /root/.config/Ultralytics/ - -# Set environment variables -# ENV HOME=/usr/src/app - - -# Usage Examples ------------------------------------------------------------------------------------------------------- - -# Build and Push -# t=ultralytics/yolov5:latest && sudo docker build -t $t . && sudo docker push $t - -# Pull and Run -# t=ultralytics/yolov5:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all $t - -# Pull and Run with local directory access -# t=ultralytics/yolov5:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all -v "$(pwd)"/datasets:/usr/src/datasets $t - -# Kill all -# sudo docker kill $(sudo docker ps -q) - -# Kill all image-based -# sudo docker kill $(sudo docker ps -qa --filter ancestor=ultralytics/yolov5:latest) - -# Bash into running container -# sudo docker exec -it 5a9b5863d93d bash - -# Bash into stopped container -# id=$(sudo docker ps -qa) && sudo docker start $id && sudo docker exec -it $id bash - -# Clean up -# docker system prune -a --volumes - -# Update Ubuntu drivers -# https://www.maketecheasier.com/install-nvidia-drivers-ubuntu/ - -# DDP test -# python -m torch.distributed.run --nproc_per_node 2 --master_port 1 train.py --epochs 3 - -# GCP VM from Image -# docker.io/ultralytics/yolov5:latest diff --git a/det-yolov5-tmi/mining/mining_cald.py b/det-yolov5-tmi/mining/mining_cald.py deleted file mode 100644 index d93fb43..0000000 --- a/det-yolov5-tmi/mining/mining_cald.py +++ /dev/null @@ -1,145 +0,0 @@ -""" -Consistency-based Active Learning for Object Detection CVPR 2022 workshop -official code: https://github.com/we1pingyu/CALD/blob/master/cald_train.py -""" -import sys -from typing import Dict, List, Tuple - -import cv2 -import numpy as np -from nptyping import NDArray -from scipy.stats import entropy -from tqdm import tqdm -from ymir_exc import dataset_reader as dr -from ymir_exc import env, monitor -from ymir_exc import result_writer as rw - -from mining.data_augment import cutout, horizontal_flip, intersect, resize, rotate -from utils.ymir_yolov5 import BBOX, CV_IMAGE, YmirYolov5, YmirStage, get_ymir_process, get_merged_config - - -def split_result(result: NDArray) -> Tuple[BBOX, NDArray, NDArray]: - if len(result) > 0: - bboxes = result[:, :4].astype(np.int32) - conf = result[:, 4] - class_id = result[:, 5] - else: - bboxes = np.zeros(shape=(0, 4), dtype=np.int32) - conf = np.zeros(shape=(0, 1), dtype=np.float32) - class_id = np.zeros(shape=(0, 1), dtype=np.int32) - - return bboxes, conf, class_id - - -class MiningCald(YmirYolov5): - def mining(self) -> List: - N = dr.items_count(env.DatasetType.CANDIDATE) - monitor_gap = max(1, N // 100) - idx = -1 - beta = 1.3 - mining_result = [] - for asset_path, _ in tqdm(dr.item_paths(dataset_type=env.DatasetType.CANDIDATE)): - img = cv2.imread(asset_path) - # xyxy,conf,cls - result = self.predict(img) - bboxes, conf, _ = split_result(result) - if len(result) == 0: - # no result for the image without augmentation - mining_result.append((asset_path, -beta)) - continue - - consistency = 0.0 - aug_bboxes_dict, aug_results_dict = self.aug_predict(img, bboxes) - for key in aug_results_dict: - # no result for the image with augmentation f'{key}' - if len(aug_results_dict[key]) == 0: - consistency += beta - continue - - bboxes_key, conf_key, _ = split_result(aug_results_dict[key]) - cls_scores_aug = 1 - conf_key - cls_scores = 1 - conf - - consistency_per_aug = 2.0 - ious = get_ious(bboxes_key, aug_bboxes_dict[key]) - aug_idxs = np.argmax(ious, axis=0) - for origin_idx, aug_idx in enumerate(aug_idxs): - max_iou = ious[aug_idx, origin_idx] - if max_iou == 0: - consistency_per_aug = min(consistency_per_aug, beta) - p = cls_scores_aug[aug_idx] - q = cls_scores[origin_idx] - m = (p + q) / 2. - js = 0.5 * entropy(p, m) + 0.5 * entropy(q, m) - if js < 0: - js = 0 - consistency_box = max_iou - consistency_cls = 0.5 * (conf[origin_idx] + conf_key[aug_idx]) * (1 - js) - consistency_per_inst = abs(consistency_box + consistency_cls - beta) - consistency_per_aug = min(consistency_per_aug, consistency_per_inst.item()) - - consistency += consistency_per_aug - - consistency /= len(aug_results_dict) - - mining_result.append((asset_path, consistency)) - idx += 1 - - if idx % monitor_gap == 0: - percent = get_ymir_process(stage=YmirStage.TASK, p=idx / N) - monitor.write_monitor_logger(percent=percent) - - return mining_result - - def aug_predict(self, image: CV_IMAGE, bboxes: BBOX) -> Tuple[Dict[str, BBOX], Dict[str, NDArray]]: - """ - for different augmentation methods: flip, cutout, rotate and resize - augment the image and bbox and use model to predict them. - - return the predict result and augment bbox. - """ - aug_dict = dict(flip=horizontal_flip, - cutout=cutout, - rotate=rotate, - resize=resize) - - aug_bboxes = dict() - aug_results = dict() - for key in aug_dict: - aug_img, aug_bbox = aug_dict[key](image, bboxes) - - aug_result = self.predict(aug_img) - aug_bboxes[key] = aug_bbox - aug_results[key] = aug_result - - return aug_bboxes, aug_results - - -def get_ious(boxes1: BBOX, boxes2: BBOX) -> NDArray: - """ - args: - boxes1: np.array, (N, 4), xyxy - boxes2: np.array, (M, 4), xyxy - return: - iou: np.array, (N, M) - """ - area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1]) - area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1]) - iner_area = intersect(boxes1, boxes2) - area1 = area1.reshape(-1, 1).repeat(area2.shape[0], axis=1) - area2 = area2.reshape(1, -1).repeat(area1.shape[0], axis=0) - iou = iner_area / (area1 + area2 - iner_area + 1e-14) - return iou - - -def main(): - cfg = get_merged_config() - miner = MiningCald(cfg) - mining_result = miner.mining() - rw.write_mining_result(mining_result=mining_result) - - return 0 - - -if __name__ == "__main__": - sys.exit(main()) diff --git a/det-yolov5-tmi/models/common.py b/det-yolov5-tmi/models/common.py index d116aa5..289eb78 100644 --- a/det-yolov5-tmi/models/common.py +++ b/det-yolov5-tmi/models/common.py @@ -5,6 +5,7 @@ import json import math +import os import platform import warnings from collections import OrderedDict, namedtuple @@ -41,7 +42,18 @@ def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, k super().__init__() self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False) self.bn = nn.BatchNorm2d(c2) - self.act = nn.Hardswish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) + + activation = os.environ.get('ACTIVATION', None) + if activation is None: + self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) + else: + act_dict = dict(relu=nn.ReLU, relu6=nn.ReLU6, leakyrelu=nn.LeakyReLU, hardswish=nn.Hardswish, silu=nn.SiLU) + if activation.lower() in act_dict: + custom_act = act_dict[activation.lower()]() + else: + # view https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity + custom_act = getattr(nn, activation)() + self.act = custom_act if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) def forward(self, x): return self.act(self.bn(self.conv(x))) @@ -115,7 +127,15 @@ def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, nu self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False) self.cv4 = Conv(2 * c_, c2, 1, 1) self.bn = nn.BatchNorm2d(2 * c_) # applied to cat(cv2, cv3) - self.act = nn.SiLU() + activation = os.environ.get('ACTIVATION', None) + if activation is None: + self.act = nn.SiLU() + else: + if activation.lower() == 'relu': + self.act = nn.ReLU() + else: + warnings.warn(f'unknown activation {activation}, use SiLU instead') + self.act = nn.SiLU() self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n))) def forward(self, x): @@ -227,11 +247,12 @@ class GhostBottleneck(nn.Module): def __init__(self, c1, c2, k=3, s=1): # ch_in, ch_out, kernel, stride super().__init__() c_ = c2 // 2 - self.conv = nn.Sequential(GhostConv(c1, c_, 1, 1), # pw - DWConv(c_, c_, k, s, act=False) if s == 2 else nn.Identity(), # dw - GhostConv(c_, c2, 1, 1, act=False)) # pw-linear - self.shortcut = nn.Sequential(DWConv(c1, c1, k, s, act=False), - Conv(c1, c2, 1, 1, act=False)) if s == 2 else nn.Identity() + self.conv = nn.Sequential( + GhostConv(c1, c_, 1, 1), # pw + DWConv(c_, c_, k, s, act=False) if s == 2 else nn.Identity(), # dw + GhostConv(c_, c2, 1, 1, act=False)) # pw-linear + self.shortcut = nn.Sequential(DWConv(c1, c1, k, s, act=False), Conv(c1, c2, 1, 1, + act=False)) if s == 2 else nn.Identity() def forward(self, x): return self.conv(x) + self.shortcut(x) @@ -260,9 +281,9 @@ def __init__(self, gain=2): def forward(self, x): b, c, h, w = x.size() # assert C / s ** 2 == 0, 'Indivisible gain' s = self.gain - x = x.view(b, s, s, c // s ** 2, h, w) # x(1,2,2,16,80,80) + x = x.view(b, s, s, c // s**2, h, w) # x(1,2,2,16,80,80) x = x.permute(0, 3, 4, 1, 5, 2).contiguous() # x(1,16,80,2,80,2) - return x.view(b, c // s ** 2, h * s, w * s) # x(1,16,160,160) + return x.view(b, c // s**2, h * s, w * s) # x(1,16,160,160) class Concat(nn.Module): @@ -315,7 +336,7 @@ def __init__(self, weights='yolov5s.pt', device=None, dnn=False, data=None): stride, names = int(d['stride']), d['names'] elif dnn: # ONNX OpenCV DNN LOGGER.info(f'Loading {w} for ONNX OpenCV DNN inference...') - check_requirements(('opencv-python>=4.5.4',)) + check_requirements(('opencv-python>=4.5.4', )) net = cv2.dnn.readNetFromONNX(w) elif onnx: # ONNX Runtime LOGGER.info(f'Loading {w} for ONNX Runtime inference...') @@ -326,7 +347,7 @@ def __init__(self, weights='yolov5s.pt', device=None, dnn=False, data=None): session = onnxruntime.InferenceSession(w, providers=providers) elif xml: # OpenVINO LOGGER.info(f'Loading {w} for OpenVINO inference...') - check_requirements(('openvino-dev',)) # requires openvino-dev: https://pypi.org/project/openvino-dev/ + check_requirements(('openvino-dev', )) # requires openvino-dev: https://pypi.org/project/openvino-dev/ import openvino.inference_engine as ie core = ie.IECore() if not Path(w).is_file(): # if not *.xml @@ -381,9 +402,11 @@ def wrap_frozen_graph(gd, inputs, outputs): Interpreter, load_delegate = tf.lite.Interpreter, tf.lite.experimental.load_delegate, if edgetpu: # Edge TPU https://coral.ai/software/#edgetpu-runtime LOGGER.info(f'Loading {w} for TensorFlow Lite Edge TPU inference...') - delegate = {'Linux': 'libedgetpu.so.1', - 'Darwin': 'libedgetpu.1.dylib', - 'Windows': 'edgetpu.dll'}[platform.system()] + delegate = { + 'Linux': 'libedgetpu.so.1', + 'Darwin': 'libedgetpu.1.dylib', + 'Windows': 'edgetpu.dll' + }[platform.system()] interpreter = Interpreter(model_path=w, experimental_delegates=[load_delegate(delegate)]) else: # Lite LOGGER.info(f'Loading {w} for TensorFlow Lite inference...') @@ -554,8 +577,13 @@ def forward(self, imgs, size=640, augment=False, profile=False): t.append(time_sync()) # Post-process - y = non_max_suppression(y if self.dmb else y[0], self.conf, iou_thres=self.iou, classes=self.classes, - agnostic=self.agnostic, multi_label=self.multi_label, max_det=self.max_det) # NMS + y = non_max_suppression(y if self.dmb else y[0], + self.conf, + iou_thres=self.iou, + classes=self.classes, + agnostic=self.agnostic, + multi_label=self.multi_label, + max_det=self.max_det) # NMS for i in range(n): scale_coords(shape1, y[i][:, :4], shape0[i]) @@ -596,8 +624,13 @@ def display(self, pprint=False, show=False, save=False, crop=False, render=False label = f'{self.names[int(cls)]} {conf:.2f}' if crop: file = save_dir / 'crops' / self.names[int(cls)] / self.files[i] if save else None - crops.append({'box': box, 'conf': conf, 'cls': cls, 'label': label, - 'im': save_one_box(box, im, file=file, save=save)}) + crops.append({ + 'box': box, + 'conf': conf, + 'cls': cls, + 'label': label, + 'im': save_one_box(box, im, file=file, save=save) + }) else: # all others annotator.box_label(box, label, color=colors(cls)) im = annotator.im diff --git a/det-yolov5-tmi/models/experimental.py b/det-yolov5-tmi/models/experimental.py index 463e551..dbfecbf 100644 --- a/det-yolov5-tmi/models/experimental.py +++ b/det-yolov5-tmi/models/experimental.py @@ -2,6 +2,7 @@ """ Experimental modules """ +import os import math import numpy as np @@ -10,6 +11,7 @@ from models.common import Conv from utils.downloads import attempt_download +import warnings class CrossConv(nn.Module): @@ -59,14 +61,22 @@ def __init__(self, c1, c2, k=(1, 3), s=1, equal_ch=True): # ch_in, ch_out, kern b = [c2] + [0] * n a = np.eye(n + 1, n, k=-1) a -= np.roll(a, 1, axis=1) - a *= np.array(k) ** 2 + a *= np.array(k)**2 a[0] = 1 c_ = np.linalg.lstsq(a, b, rcond=None)[0].round() # solve for equal weight indices, ax = b self.m = nn.ModuleList( [nn.Conv2d(c1, int(c_), k, s, k // 2, groups=math.gcd(c1, int(c_)), bias=False) for k, c_ in zip(k, c_)]) self.bn = nn.BatchNorm2d(c2) - self.act = nn.SiLU() + activation = os.environ.get('ACTIVATION', None) + if activation is None: + self.act = nn.SiLU() + else: + if activation.lower() == 'relu': + self.act = nn.ReLU() + else: + warnings.warn(f'unknown activation {activation}, use SiLU instead') + self.act = nn.SiLU() def forward(self, x): return self.act(self.bn(torch.cat([m(x) for m in self.m], 1))) diff --git a/det-yolov5-tmi/mypy.ini b/det-yolov5-tmi/mypy.ini index 85e751a..6a356a3 100644 --- a/det-yolov5-tmi/mypy.ini +++ b/det-yolov5-tmi/mypy.ini @@ -1,8 +1,7 @@ [mypy] ignore_missing_imports = True disallow_untyped_defs = False -files = [mining/*.py, utils/ymir_yolov5.py, start.py, train.py] -exclude = [utils/general.py] +exclude = [utils/general.py, models/*.py, utils/*.py] [mypy-torch.*] ignore_errors = True diff --git a/det-yolov5-tmi/start.py b/det-yolov5-tmi/start.py deleted file mode 100644 index fba6632..0000000 --- a/det-yolov5-tmi/start.py +++ /dev/null @@ -1,133 +0,0 @@ -import logging -import os -import os.path as osp -import shutil -import subprocess -import sys - -import cv2 -from easydict import EasyDict as edict -from ymir_exc import dataset_reader as dr -from ymir_exc import env, monitor -from ymir_exc import result_writer as rw - -from utils.ymir_yolov5 import (YmirStage, YmirYolov5, convert_ymir_to_yolov5, download_weight_file, get_merged_config, - get_weight_file, get_ymir_process) - - -def start() -> int: - cfg = get_merged_config() - - logging.info(f'merged config: {cfg}') - - if cfg.ymir.run_training: - _run_training(cfg) - else: - if cfg.ymir.run_mining: - _run_mining(cfg) - if cfg.ymir.run_infer: - _run_infer(cfg) - - return 0 - - -def _run_training(cfg: edict) -> None: - """ - function for training task - 1. convert dataset - 2. training model - 3. save model weight/hyperparameter/... to design directory - """ - # 1. convert dataset - out_dir = cfg.ymir.output.root_dir - convert_ymir_to_yolov5(cfg) - logging.info(f'generate {out_dir}/data.yaml') - monitor.write_monitor_logger(percent=get_ymir_process(stage=YmirStage.PREPROCESS, p=1.0)) - - # 2. training model - epochs = cfg.param.epochs - batch_size = cfg.param.batch_size - model = cfg.param.model - img_size = cfg.param.img_size - save_period = cfg.param.save_period - args_options = cfg.param.args_options - weights = get_weight_file(cfg) - if not weights: - # download pretrained weight - weights = download_weight_file(model) - - models_dir = cfg.ymir.output.models_dir - command = f'python3 train.py --epochs {epochs} ' + \ - f'--batch-size {batch_size} --data {out_dir}/data.yaml --project /out ' + \ - f'--cfg models/{model}.yaml --name models --weights {weights} ' + \ - f'--img-size {img_size} ' + \ - f'--save-period {save_period}' - if args_options: - command += f" {args_options}" - - logging.info(f'start training: {command}') - - subprocess.run(command.split(), check=True) - monitor.write_monitor_logger(percent=get_ymir_process(stage=YmirStage.TASK, p=1.0)) - - # 3. convert to onnx and save model weight to design directory - opset = cfg.param.opset - command = f'python3 export.py --weights {models_dir}/best.pt --opset {opset} --include onnx' - logging.info(f'export onnx weight: {command}') - subprocess.run(command.split(), check=True) - - # save hyperparameter - shutil.copy(f'models/{model}.yaml', f'{models_dir}/{model}.yaml') - - # if task done, write 100% percent log - monitor.write_monitor_logger(percent=1.0) - - -def _run_mining(cfg: edict()) -> None: - # generate data.yaml for mining - out_dir = cfg.ymir.output.root_dir - convert_ymir_to_yolov5(cfg) - logging.info(f'generate {out_dir}/data.yaml') - monitor.write_monitor_logger(percent=get_ymir_process(stage=YmirStage.PREPROCESS, p=1.0)) - - command = 'python3 mining/mining_cald.py' - logging.info(f'mining: {command}') - subprocess.run(command.split(), check=True) - monitor.write_monitor_logger(percent=1.0) - - -def _run_infer(cfg: edict) -> None: - # generate data.yaml for infer - out_dir = cfg.ymir.output.root_dir - convert_ymir_to_yolov5(cfg) - logging.info(f'generate {out_dir}/data.yaml') - monitor.write_monitor_logger(percent=get_ymir_process(stage=YmirStage.PREPROCESS, p=1.0)) - - N = dr.items_count(env.DatasetType.CANDIDATE) - infer_result = dict() - model = YmirYolov5(cfg) - idx = -1 - - monitor_gap = max(1, N // 100) - for asset_path, _ in dr.item_paths(dataset_type=env.DatasetType.CANDIDATE): - img = cv2.imread(asset_path) - result = model.infer(img) - infer_result[asset_path] = result - idx += 1 - - if idx % monitor_gap == 0: - percent = get_ymir_process(stage=YmirStage.TASK, p=idx / N) - monitor.write_monitor_logger(percent=percent) - - rw.write_infer_result(infer_result=infer_result) - monitor.write_monitor_logger(percent=1.0) - - -if __name__ == '__main__': - logging.basicConfig(stream=sys.stdout, - format='%(levelname)-8s: [%(asctime)s] %(message)s', - datefmt='%Y%m%d-%H:%M:%S', - level=logging.INFO) - - os.environ.setdefault('PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION', 'python') - sys.exit(start()) diff --git a/det-yolov5-tmi/train.py b/det-yolov5-tmi/train.py index 7fcbbce..2001443 100644 --- a/det-yolov5-tmi/train.py +++ b/det-yolov5-tmi/train.py @@ -16,6 +16,7 @@ import math import os import random +import subprocess import sys import time from copy import deepcopy @@ -47,33 +48,32 @@ from utils.datasets import create_dataloader from utils.downloads import attempt_download from utils.general import (LOGGER, check_dataset, check_file, check_git_status, check_img_size, check_requirements, - check_suffix, check_version, check_yaml, colorstr, get_latest_run, increment_path, init_seeds, - intersect_dicts, labels_to_class_weights, labels_to_image_weights, methods, one_cycle, - print_args, print_mutation, strip_optimizer) + check_suffix, check_version, check_yaml, colorstr, get_latest_run, increment_path, + init_seeds, intersect_dicts, labels_to_class_weights, labels_to_image_weights, methods, + one_cycle, print_args, print_mutation, strip_optimizer) from utils.loggers import Loggers from utils.loggers.wandb.wandb_utils import check_wandb_resume from utils.loss import ComputeLoss from utils.metrics import fitness from utils.plots import plot_evolve, plot_labels from utils.torch_utils import EarlyStopping, ModelEMA, de_parallel, select_device, torch_distributed_zero_first -from utils.ymir_yolov5 import write_ymir_training_result, YmirStage, get_ymir_process, get_merged_config -from ymir_exc import monitor +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process, write_ymir_training_result LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html RANK = int(os.getenv('RANK', -1)) WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) -def train(hyp, # path/to/hyp.yaml or hyp dictionary - opt, - device, - callbacks - ): +def train( + hyp, # path/to/hyp.yaml or hyp dictionary + opt, + device, + callbacks): save_dir, epochs, batch_size, weights, single_cls, evolve, data, cfg, resume, noval, nosave, workers, freeze = \ Path(opt.save_dir), opt.epochs, opt.batch_size, opt.weights, opt.single_cls, opt.evolve, opt.data, opt.cfg, \ opt.resume, opt.noval, opt.nosave, opt.workers, opt.freeze ymir_cfg = opt.ymir_cfg - opt.ymir_cfg = '' # yaml cannot dump edict, remove it here + opt.ymir_cfg = '' # yaml cannot dump edict, remove it here log_dir = Path(ymir_cfg.ymir.output.tensorboard_dir) # Directories @@ -184,7 +184,10 @@ def train(hyp, # path/to/hyp.yaml or hyp dictionary if opt.cos_lr: lf = one_cycle(1, hyp['lrf'], epochs) # cosine 1->hyp['lrf'] else: - lf = lambda x: (1 - x / epochs) * (1.0 - hyp['lrf']) + hyp['lrf'] # linear + + def lf(x): + return (1 - x / epochs) * (1.0 - hyp['lrf']) + hyp['lrf'] # linear + scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf) # plot_lr_scheduler(optimizer, scheduler, epochs) # EMA @@ -206,7 +209,7 @@ def train(hyp, # path/to/hyp.yaml or hyp dictionary # Epochs start_epoch = ckpt['epoch'] + 1 if resume: - assert start_epoch > 0, f'{weights} training to {epochs} epochs is finished, nothing to resume.' + assert start_epoch > 0, f'{weights} training from {start_epoch} to {epochs} epochs is finished, nothing to resume.' if epochs < start_epoch: LOGGER.info(f"{weights} has been trained for {ckpt['epoch']} epochs. Fine-tuning for {epochs} more epochs.") epochs += ckpt['epoch'] # finetune additional epochs @@ -225,20 +228,38 @@ def train(hyp, # path/to/hyp.yaml or hyp dictionary LOGGER.info('Using SyncBatchNorm()') # Trainloader - train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls, - hyp=hyp, augment=True, cache=None if opt.cache == 'val' else opt.cache, - rect=opt.rect, rank=LOCAL_RANK, workers=workers, - image_weights=opt.image_weights, quad=opt.quad, - prefix=colorstr('train: '), shuffle=True) + train_loader, dataset = create_dataloader(train_path, + imgsz, + batch_size // WORLD_SIZE, + gs, + single_cls, + hyp=hyp, + augment=True, + cache=None if opt.cache == 'val' else opt.cache, + rect=opt.rect, + rank=LOCAL_RANK, + workers=workers, + image_weights=opt.image_weights, + quad=opt.quad, + prefix=colorstr('train: '), + shuffle=True) mlc = int(np.concatenate(dataset.labels, 0)[:, 0].max()) # max label class nb = len(train_loader) # number of batches assert mlc < nc, f'Label class {mlc} exceeds nc={nc} in {data}. Possible class labels are 0-{nc - 1}' # Process 0 if RANK in [-1, 0]: - val_loader = create_dataloader(val_path, imgsz, batch_size // WORLD_SIZE * 2, gs, single_cls, - hyp=hyp, cache=None if noval else opt.cache, - rect=True, rank=-1, workers=workers * 2, pad=0.5, + val_loader = create_dataloader(val_path, + imgsz, + batch_size // WORLD_SIZE * 2, + gs, + single_cls, + hyp=hyp, + cache=None if noval else opt.cache, + rect=True, + rank=-1, + workers=workers * 2, + pad=0.5, prefix=colorstr('val: '))[0] if not resume: @@ -267,7 +288,7 @@ def train(hyp, # path/to/hyp.yaml or hyp dictionary nl = de_parallel(model).model[-1].nl # number of detection layers (to scale hyps) hyp['box'] *= 3 / nl # scale to layers hyp['cls'] *= nc / 80 * 3 / nl # scale to classes and layers - hyp['obj'] *= (imgsz / 640) ** 2 * 3 / nl # scale to image size and layers + hyp['obj'] *= (imgsz / 640)**2 * 3 / nl # scale to image size and layers hyp['label_smoothing'] = opt.label_smoothing model.nc = nc # attach number of classes to model model.hyp = hyp # attach hyperparameters to model @@ -295,13 +316,12 @@ def train(hyp, # path/to/hyp.yaml or hyp dictionary model.train() # ymir monitor - if epoch % monitor_gap == 0: - percent = get_ymir_process(stage=YmirStage.TASK, p=epoch/(epochs-start_epoch+1)) - monitor.write_monitor_logger(percent=percent) + if epoch % monitor_gap == 0 and RANK in [0, -1]: + write_ymir_monitor_process(ymir_cfg, task='training', naive_stage_percent=(epoch - start_epoch + 1) / (epochs - start_epoch + 1), stage=YmirStage.TASK) # Update image weights (optional, single-GPU only) if opt.image_weights: - cw = model.class_weights.cpu().numpy() * (1 - maps) ** 2 / nc # class weights + cw = model.class_weights.cpu().numpy() * (1 - maps)**2 / nc # class weights iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw) # image weights dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n) # rand weighted idx @@ -365,8 +385,8 @@ def train(hyp, # path/to/hyp.yaml or hyp dictionary if RANK in [-1, 0]: mloss = (mloss * i + loss_items) / (i + 1) # update mean losses mem = f'{torch.cuda.memory_reserved() / 1E9 if torch.cuda.is_available() else 0:.3g}G' # (GB) - pbar.set_description(('%10s' * 2 + '%10.4g' * 5) % ( - f'{epoch}/{epochs - 1}', mem, *mloss, targets.shape[0], imgs.shape[-1])) + pbar.set_description(('%10s' * 2 + '%10.4g' * 5) % + (f'{epoch}/{epochs - 1}', mem, *mloss, targets.shape[0], imgs.shape[-1])) callbacks.run('on_train_batch_end', ni, model, imgs, targets, paths, plots, opt.sync_bn) if callbacks.stop_training: return @@ -401,24 +421,27 @@ def train(hyp, # path/to/hyp.yaml or hyp dictionary callbacks.run('on_fit_epoch_end', log_vals, epoch, best_fitness, fi) # Save model - if (not nosave) or (final_epoch and not evolve): # if save - ckpt = {'epoch': epoch, - 'best_fitness': best_fitness, - 'model': deepcopy(de_parallel(model)).half(), - 'ema': deepcopy(ema.ema).half(), - 'updates': ema.updates, - 'optimizer': optimizer.state_dict(), - 'wandb_id': loggers.wandb.wandb_run.id if loggers.wandb else None, - 'date': datetime.now().isoformat()} + if (not nosave) or (best_fitness == fi) or (final_epoch and not evolve): # if save + ckpt = { + 'epoch': epoch, + 'best_fitness': best_fitness, + 'model': deepcopy(de_parallel(model)).half(), + 'ema': deepcopy(ema.ema).half(), + 'updates': ema.updates, + 'optimizer': optimizer.state_dict(), + 'wandb_id': loggers.wandb.wandb_run.id if loggers.wandb else None, + 'date': datetime.now().isoformat() + } # Save last, best and delete torch.save(ckpt, last) if best_fitness == fi: torch.save(ckpt, best) - if (epoch > 0) and (opt.save_period > 0) and (epoch % opt.save_period == 0): + write_ymir_training_result(ymir_cfg, map50=best_fitness, id='yolov5_best', files=[str(best)]) + if (not nosave) and (epoch > 0) and (opt.save_period > 0) and (epoch % opt.save_period == 0): torch.save(ckpt, w / f'epoch{epoch}.pt') weight_file = str(w / f'epoch{epoch}.pt') - write_ymir_training_result(ymir_cfg, map50=results[2], epoch=epoch, weight_file=weight_file) + write_ymir_training_result(ymir_cfg, map50=results[2], id=f'epoch_{epoch}', files=[weight_file]) del ckpt callbacks.run('on_model_save', last, epoch, final_epoch, best_fitness, fi) @@ -426,7 +449,7 @@ def train(hyp, # path/to/hyp.yaml or hyp dictionary if RANK == -1 and stopper(epoch=epoch, fitness=fi): break - # Stop DDP TODO: known issues shttps://github.com/ultralytics/yolov5/pull/4576 + # Stop DDP TODO: known issues https://github.com/ultralytics/yolov5/pull/4576 # stop = stopper(epoch=epoch, fitness=fi) # if RANK == 0: # dist.broadcast_object_list([stop], 0) # broadcast 'stop' to all ranks @@ -445,28 +468,43 @@ def train(hyp, # path/to/hyp.yaml or hyp dictionary strip_optimizer(f) # strip optimizers if f is best: LOGGER.info(f'\nValidating {f}...') - results, _, _ = val.run(data_dict, - batch_size=batch_size // WORLD_SIZE * 2, - imgsz=imgsz, - model=attempt_load(f, device).half(), - iou_thres=0.65 if is_coco else 0.60, # best pycocotools results at 0.65 - single_cls=single_cls, - dataloader=val_loader, - save_dir=save_dir, - save_json=is_coco, - verbose=True, - plots=True, - callbacks=callbacks, - compute_loss=compute_loss) # val best model with plots + results, _, _ = val.run( + data_dict, + batch_size=batch_size // WORLD_SIZE * 2, + imgsz=imgsz, + model=attempt_load(f, device).half(), + iou_thres=0.65 if is_coco else 0.60, # best pycocotools results at 0.65 + single_cls=single_cls, + dataloader=val_loader, + save_dir=save_dir, + save_json=is_coco, + verbose=True, + plots=True, + callbacks=callbacks, + compute_loss=compute_loss) # val best model with plots if is_coco: callbacks.run('on_fit_epoch_end', list(mloss) + list(results) + lr, epoch, best_fitness, fi) callbacks.run('on_train_end', last, best, plots, epoch, results) LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}") + opset = ymir_cfg.param.opset + onnx_file: Path = best.with_suffix('.onnx') + command = f'python3 export.py --weights {best} --opset {opset} --include onnx' + LOGGER.info(f'export onnx weight: {command}') + subprocess.run(command.split(), check=True) + + if nosave: + # save best.pt and best.onnx + write_ymir_training_result(ymir_cfg, + map50=best_fitness, + id='yolov5_best', + files=[str(best), str(onnx_file)]) + else: + # set files = [] to save all files in /out/models + write_ymir_training_result(ymir_cfg, map50=best_fitness, id='yolov5_best', files=[]) + torch.cuda.empty_cache() - # save the best and last weight file with other files in models_dir - write_ymir_training_result(ymir_cfg, map50=best_fitness, epoch=epochs, weight_file='') return results @@ -522,12 +560,18 @@ def main(opt, callbacks=Callbacks()): check_git_status() check_requirements(exclude=['thop']) + ymir_cfg = get_merged_config() # Resume if opt.resume and not check_wandb_resume(opt) and not opt.evolve: # resume an interrupted run - ckpt = opt.resume if isinstance(opt.resume, str) else get_latest_run() # specified or most recent path + ckpt = opt.resume if isinstance(opt.resume, str) else get_latest_run( + ymir_cfg.ymir.input.root_dir) # specified or most recent path assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist' - with open(Path(ckpt).parent.parent / 'opt.yaml', errors='ignore') as f: - opt = argparse.Namespace(**yaml.safe_load(f)) # replace + + opt_file = Path(ckpt).parent / 'opt.yaml' + if opt_file.exists(): + with open(opt_file, errors='ignore') as f: + opt = argparse.Namespace(**yaml.safe_load(f)) # replace + os.makedirs(opt.save_dir, exist_ok=True) opt.cfg, opt.weights, opt.resume = '', ckpt, True # reinstate LOGGER.info(f'Resuming training from {ckpt}') else: @@ -538,9 +582,8 @@ def main(opt, callbacks=Callbacks()): if opt.project == str(ROOT / 'runs/train'): # if default project name, rename to runs/evolve opt.project = str(ROOT / 'runs/evolve') opt.save_dir = str(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)) - ymir_cfg = get_merged_config() - opt.ymir_cfg = ymir_cfg + opt.ymir_cfg = ymir_cfg # DDP mode device = select_device(opt.device, batch_size=opt.batch_size) @@ -558,42 +601,41 @@ def main(opt, callbacks=Callbacks()): # Train if not opt.evolve: train(opt.hyp, opt, device, callbacks) - if WORLD_SIZE > 1 and RANK == 0: - LOGGER.info('Destroying process group... ') - dist.destroy_process_group() # Evolve hyperparameters (optional) else: # Hyperparameter evolution metadata (mutation scale 0-1, lower_limit, upper_limit) - meta = {'lr0': (1, 1e-5, 1e-1), # initial learning rate (SGD=1E-2, Adam=1E-3) - 'lrf': (1, 0.01, 1.0), # final OneCycleLR learning rate (lr0 * lrf) - 'momentum': (0.3, 0.6, 0.98), # SGD momentum/Adam beta1 - 'weight_decay': (1, 0.0, 0.001), # optimizer weight decay - 'warmup_epochs': (1, 0.0, 5.0), # warmup epochs (fractions ok) - 'warmup_momentum': (1, 0.0, 0.95), # warmup initial momentum - 'warmup_bias_lr': (1, 0.0, 0.2), # warmup initial bias lr - 'box': (1, 0.02, 0.2), # box loss gain - 'cls': (1, 0.2, 4.0), # cls loss gain - 'cls_pw': (1, 0.5, 2.0), # cls BCELoss positive_weight - 'obj': (1, 0.2, 4.0), # obj loss gain (scale with pixels) - 'obj_pw': (1, 0.5, 2.0), # obj BCELoss positive_weight - 'iou_t': (0, 0.1, 0.7), # IoU training threshold - 'anchor_t': (1, 2.0, 8.0), # anchor-multiple threshold - 'anchors': (2, 2.0, 10.0), # anchors per output grid (0 to ignore) - 'fl_gamma': (0, 0.0, 2.0), # focal loss gamma (efficientDet default gamma=1.5) - 'hsv_h': (1, 0.0, 0.1), # image HSV-Hue augmentation (fraction) - 'hsv_s': (1, 0.0, 0.9), # image HSV-Saturation augmentation (fraction) - 'hsv_v': (1, 0.0, 0.9), # image HSV-Value augmentation (fraction) - 'degrees': (1, 0.0, 45.0), # image rotation (+/- deg) - 'translate': (1, 0.0, 0.9), # image translation (+/- fraction) - 'scale': (1, 0.0, 0.9), # image scale (+/- gain) - 'shear': (1, 0.0, 10.0), # image shear (+/- deg) - 'perspective': (0, 0.0, 0.001), # image perspective (+/- fraction), range 0-0.001 - 'flipud': (1, 0.0, 1.0), # image flip up-down (probability) - 'fliplr': (0, 0.0, 1.0), # image flip left-right (probability) - 'mosaic': (1, 0.0, 1.0), # image mixup (probability) - 'mixup': (1, 0.0, 1.0), # image mixup (probability) - 'copy_paste': (1, 0.0, 1.0)} # segment copy-paste (probability) + meta = { + 'lr0': (1, 1e-5, 1e-1), # initial learning rate (SGD=1E-2, Adam=1E-3) + 'lrf': (1, 0.01, 1.0), # final OneCycleLR learning rate (lr0 * lrf) + 'momentum': (0.3, 0.6, 0.98), # SGD momentum/Adam beta1 + 'weight_decay': (1, 0.0, 0.001), # optimizer weight decay + 'warmup_epochs': (1, 0.0, 5.0), # warmup epochs (fractions ok) + 'warmup_momentum': (1, 0.0, 0.95), # warmup initial momentum + 'warmup_bias_lr': (1, 0.0, 0.2), # warmup initial bias lr + 'box': (1, 0.02, 0.2), # box loss gain + 'cls': (1, 0.2, 4.0), # cls loss gain + 'cls_pw': (1, 0.5, 2.0), # cls BCELoss positive_weight + 'obj': (1, 0.2, 4.0), # obj loss gain (scale with pixels) + 'obj_pw': (1, 0.5, 2.0), # obj BCELoss positive_weight + 'iou_t': (0, 0.1, 0.7), # IoU training threshold + 'anchor_t': (1, 2.0, 8.0), # anchor-multiple threshold + 'anchors': (2, 2.0, 10.0), # anchors per output grid (0 to ignore) + 'fl_gamma': (0, 0.0, 2.0), # focal loss gamma (efficientDet default gamma=1.5) + 'hsv_h': (1, 0.0, 0.1), # image HSV-Hue augmentation (fraction) + 'hsv_s': (1, 0.0, 0.9), # image HSV-Saturation augmentation (fraction) + 'hsv_v': (1, 0.0, 0.9), # image HSV-Value augmentation (fraction) + 'degrees': (1, 0.0, 45.0), # image rotation (+/- deg) + 'translate': (1, 0.0, 0.9), # image translation (+/- fraction) + 'scale': (1, 0.0, 0.9), # image scale (+/- gain) + 'shear': (1, 0.0, 10.0), # image shear (+/- deg) + 'perspective': (0, 0.0, 0.001), # image perspective (+/- fraction), range 0-0.001 + 'flipud': (1, 0.0, 1.0), # image flip up-down (probability) + 'fliplr': (0, 0.0, 1.0), # image flip left-right (probability) + 'mosaic': (1, 0.0, 1.0), # image mixup (probability) + 'mixup': (1, 0.0, 1.0), # image mixup (probability) + 'copy_paste': (1, 0.0, 1.0) + } # segment copy-paste (probability) with open(opt.hyp, errors='ignore') as f: hyp = yaml.safe_load(f) # load hyps dict diff --git a/det-yolov5-tmi/training-template.yaml b/det-yolov5-tmi/training-template.yaml deleted file mode 100644 index c6d0ee4..0000000 --- a/det-yolov5-tmi/training-template.yaml +++ /dev/null @@ -1,16 +0,0 @@ -# training template for your executor app -# after build image, it should at /img-man/training-template.yaml -# key: gpu_id, task_id, pretrained_model_params, class_names should be preserved - -# gpu_id: '0' -# task_id: 'default-training-task' -# pretrained_model_params: [] -# class_names: [] - -model: 'yolov5s' -batch_size: 16 -epochs: 300 -img_size: 640 -opset: 11 -args_options: '--exist-ok' -save_period: 10 diff --git a/det-yolov5-tmi/utils/downloads.py b/det-yolov5-tmi/utils/downloads.py index d7b87cb..c71fad2 100644 --- a/det-yolov5-tmi/utils/downloads.py +++ b/det-yolov5-tmi/utils/downloads.py @@ -58,17 +58,11 @@ def attempt_download(file, repo='ultralytics/yolov5'): # from utils.downloads i # GitHub assets file.parent.mkdir(parents=True, exist_ok=True) # make parent dir (if required) - try: - response = requests.get(f'https://api.github.com/repos/{repo}/releases/latest').json() # github api - assets = [x['name'] for x in response['assets']] # release assets, i.e. ['yolov5s.pt', 'yolov5m.pt', ...] - tag = response['tag_name'] # i.e. 'v1.0' - except Exception: # fallback plan - assets = ['yolov5n.pt', 'yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt', - 'yolov5n6.pt', 'yolov5s6.pt', 'yolov5m6.pt', 'yolov5l6.pt', 'yolov5x6.pt'] - try: - tag = subprocess.check_output('git tag', shell=True, stderr=subprocess.STDOUT).decode().split()[-1] - except Exception: - tag = 'v6.0' # current release + assets = [ + 'yolov5n.pt', 'yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt', 'yolov5n6.pt', 'yolov5s6.pt', + 'yolov5m6.pt', 'yolov5l6.pt', 'yolov5x6.pt' + ] + tag = 'v6.1' if name in assets: safe_download(file, diff --git a/det-yolov5-tmi/utils/ymir_yolov5.py b/det-yolov5-tmi/utils/ymir_yolov5.py deleted file mode 100644 index 492822f..0000000 --- a/det-yolov5-tmi/utils/ymir_yolov5.py +++ /dev/null @@ -1,232 +0,0 @@ -""" -utils function for ymir and yolov5 -""" -import glob -import os.path as osp -import shutil -from enum import IntEnum -from typing import Any, List, Tuple - -import numpy as np -import torch -import yaml -from easydict import EasyDict as edict -from nptyping import NDArray, Shape, UInt8 -from ymir_exc import env -from ymir_exc import result_writer as rw - -from models.common import DetectMultiBackend -from models.experimental import attempt_download -from utils.augmentations import letterbox -from utils.general import check_img_size, non_max_suppression, scale_coords -from utils.torch_utils import select_device - - -class YmirStage(IntEnum): - PREPROCESS = 1 # convert dataset - TASK = 2 # training/mining/infer - POSTPROCESS = 3 # export model - - -BBOX = NDArray[Shape['*,4'], Any] -CV_IMAGE = NDArray[Shape['*,*,3'], UInt8] - - -def get_ymir_process(stage: YmirStage, p: float) -> float: - # const value for ymir process - PREPROCESS_PERCENT = 0.1 - TASK_PERCENT = 0.8 - POSTPROCESS_PERCENT = 0.1 - - if p < 0 or p > 1.0: - raise Exception(f'p not in [0,1], p={p}') - - if stage == YmirStage.PREPROCESS: - return PREPROCESS_PERCENT * p - elif stage == YmirStage.TASK: - return PREPROCESS_PERCENT + TASK_PERCENT * p - elif stage == YmirStage.POSTPROCESS: - return PREPROCESS_PERCENT + TASK_PERCENT + POSTPROCESS_PERCENT * p - else: - raise NotImplementedError(f'unknown stage {stage}') - - -def get_merged_config() -> edict: - """ - merge ymir_config and executor_config - """ - merged_cfg = edict() - # the hyperparameter information - merged_cfg.param = env.get_executor_config() - - # the ymir path information - merged_cfg.ymir = env.get_current_env() - return merged_cfg - - -def get_weight_file(cfg: edict) -> str: - """ - return the weight file path by priority - find weight file in cfg.param.model_params_path or cfg.param.model_params_path - """ - if cfg.ymir.run_training: - model_params_path = cfg.param.get('pretrained_model_params',[]) - else: - model_params_path = cfg.param.model_params_path - - model_dir = osp.join(cfg.ymir.input.root_dir, - cfg.ymir.input.models_dir) - model_params_path = [p for p in model_params_path if osp.exists(osp.join(model_dir, p))] - - # choose weight file by priority, best.pt > xxx.pt - if 'best.pt' in model_params_path: - return osp.join(model_dir, 'best.pt') - else: - for f in model_params_path: - if f.endswith('.pt'): - return osp.join(model_dir, f) - - return "" - - -def download_weight_file(model_name): - weights = attempt_download(f'{model_name}.pt') - return weights - - -class YmirYolov5(): - """ - used for mining and inference to init detector and predict. - """ - - def __init__(self, cfg: edict): - self.cfg = cfg - device = select_device(cfg.param.get('gpu_id', 'cpu')) - - self.model = self.init_detector(device) - self.device = device - self.class_names = cfg.param.class_names - self.stride = self.model.stride - self.conf_thres = float(cfg.param.conf_thres) - self.iou_thres = float(cfg.param.iou_thres) - - img_size = int(cfg.param.img_size) - imgsz = (img_size, img_size) - imgsz = check_img_size(imgsz, s=self.stride) - - self.model.warmup(imgsz=(1, 3, *imgsz), half=False) # warmup - self.img_size = imgsz - - def init_detector(self, device: torch.device) -> DetectMultiBackend: - weights = get_weight_file(self.cfg) - - data_yaml = osp.join(self.cfg.ymir.output.root_dir, 'data.yaml') - model = DetectMultiBackend(weights=weights, - device=device, - dnn=False, # not use opencv dnn for onnx inference - data=data_yaml) # dataset.yaml path - - return model - - def predict(self, img: CV_IMAGE) -> NDArray: - """ - predict single image and return bbox information - img: opencv BGR, uint8 format - """ - # preprocess: padded resize - img1 = letterbox(img, self.img_size, stride=self.stride, auto=True)[0] - - # preprocess: convert data format - img1 = img1.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB - img1 = np.ascontiguousarray(img1) - img1 = torch.from_numpy(img1).to(self.device) - - img1 = img1 / 255 # 0 - 255 to 0.0 - 1.0 - img1.unsqueeze_(dim=0) # expand for batch dim - pred = self.model(img1) - - # postprocess - conf_thres = self.conf_thres - iou_thres = self.iou_thres - classes = None # not filter class_idx in results - agnostic_nms = False - max_det = 1000 - - pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det) - - result = [] - for det in pred: - if len(det): - # Rescale boxes from img_size to img size - det[:, :4] = scale_coords(img1.shape[2:], det[:, :4], img.shape).round() - result.append(det) - - # xyxy, conf, cls - if len(result) > 0: - tensor_result = torch.cat(result, dim=0) - numpy_result = tensor_result.data.cpu().numpy() - else: - numpy_result = np.zeros(shape=(0, 6), dtype=np.float32) - - return numpy_result - - def infer(self, img: CV_IMAGE) -> List[rw.Annotation]: - anns = [] - result = self.predict(img) - - for i in range(result.shape[0]): - xmin, ymin, xmax, ymax, conf, cls = result[i, :6].tolist() - ann = rw.Annotation(class_name=self.class_names[int(cls)], score=conf, box=rw.Box( - x=int(xmin), y=int(ymin), w=int(xmax - xmin), h=int(ymax - ymin))) - - anns.append(ann) - - return anns - - -def convert_ymir_to_yolov5(cfg: edict) -> None: - """ - convert ymir format dataset to yolov5 format - generate data.yaml for training/mining/infer - """ - - data = dict(path=cfg.ymir.output.root_dir, - nc=len(cfg.param.class_names), - names=cfg.param.class_names) - for split, prefix in zip(['train', 'val', 'test'], ['training', 'val', 'candidate']): - src_file = getattr(cfg.ymir.input, f'{prefix}_index_file') - if osp.exists(src_file): - shutil.copy(src_file, f'{cfg.ymir.output.root_dir}/{split}.tsv') - - data[split] = f'{split}.tsv' - - with open(osp.join(cfg.ymir.output.root_dir, 'data.yaml'), 'w') as fw: - fw.write(yaml.safe_dump(data)) - - -def write_ymir_training_result(cfg: edict, - map50: float, - epoch: int, - weight_file: str) -> int: - """ - cfg: ymir config - results: (mp, mr, map50, map, loss) - maps: map@0.5:0.95 for all classes - epoch: stage - weight_file: saved weight files, empty weight_file will save all files - """ - model = cfg.param.model - # use `rw.write_training_result` to save training result - if weight_file: - rw.write_model_stage(stage_name=f"{model}_{epoch}", - files=[osp.basename(weight_file)], - mAP=float(map50)) - else: - # save other files with - files = [osp.basename(f) for f in glob.glob(osp.join(cfg.ymir.output.models_dir, '*')) - if not f.endswith('.pt')] + ['last.pt', 'best.pt'] - - rw.write_model_stage(stage_name=f"{model}_last_and_best", - files=files, - mAP=float(map50)) - return 0 diff --git a/det-yolov5-tmi/ymir/README.md b/det-yolov5-tmi/ymir/README.md new file mode 100644 index 0000000..7ab2d25 --- /dev/null +++ b/det-yolov5-tmi/ymir/README.md @@ -0,0 +1,154 @@ +# yolov5-ymir readme +update 2022/11/23 + +## build your ymir-executor + +``` +cd det-yolov5-tmi + +docker build -t your/ymir-executor:ymir2.0.0-cuda102-yolov5-tmi -f ymir/docker/cuda102.dockerfile . + +docker build -t your/ymir-executor:ymir2.0.0-cuda111-yolov5-tmi -f ymir/docker/cuda111.dockerfile . +``` + +## 训练: training + +### 性能表现 + +|Model |size
    (pixels) |mAPval
    0.5:0.95 |mAPval
    0.5 |Speed
    CPU b1
    (ms) |Speed
    V100 b1
    (ms) |Speed
    V100 b32
    (ms) |params
    (M) |FLOPs
    @640 (B) +|--- |--- |--- |--- |--- |--- |--- |--- |--- +|[YOLOv5n] |640 |28.0 |45.7 |**45** |**6.3**|**0.6**|**1.9**|**4.5** +|[YOLOv5s] |640 |37.4 |56.8 |98 |6.4 |0.9 |7.2 |16.5 +|[YOLOv5m] |640 |45.4 |64.1 |224 |8.2 |1.7 |21.2 |49.0 +|[YOLOv5l] |640 |49.0 |67.3 |430 |10.1 |2.7 |46.5 |109.1 +|[YOLOv5x] |640 |50.7 |68.9 |766 |12.1 |4.8 |86.7 |205.7 +| | | | | | | | | +|[YOLOv5n6] |1280 |36.0 |54.4 |153 |8.1 |2.1 |3.2 |4.6 +|[YOLOv5s6] |1280 |44.8 |63.7 |385 |8.2 |3.6 |16.8 |12.6 +|[YOLOv5m6] |1280 |51.3 |69.3 |887 |11.1 |6.8 |35.7 |50.0 +|[YOLOv5l6] |1280 |53.7 |71.3 |1784 |15.8 |10.5 |76.8 |111.4 + +### 训练参数说明 + +- 一些参数由ymir后台生成,如 `gpu_id`, `class_names` 等参数 + - `gpu_id`: 使用的GPU硬件编号,如`0,1,2`,类型为 `str`。实际上对应的主机GPU随机,可能为`3,1,7`,镜像中只能感知并使用`0,1,2`作为设备ID。 + - `task_id`: ymir任务id, 类型为 `str` + - `pretrained_model_params`: 预训练模型文件的路径,类型为 `List[str]` + - `class_names`: 类别名,类型为 `List[str]` + +- 一些参数由ymir后台进行处理,如 `shm_size`, `export_format`, 其中 `shm_size` 影响到docker镜像所能使用的共享内存,若过小会导致 `out of memory` 等错误。 `export_format` 会决定docker镜像中所看到数据的格式 + + + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| export_format | ark:raw | 字符串| 受ymir后台处理,ymir数据集导出格式 | - | +| model | yolov5s | 字符串 | yolov5模型,可选yolov5n, yolov5s, yolov5m, yolov5l等 | 建议:速度快选yolov5n, 精度高选yolov5l, yolov5x, 平衡选yolov5s或yolov5m | +| batch_size_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| num_workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| epochs | 100 | 整数 | 整个数据集的训练遍历次数 | 建议:必要时分析tensorboard确定是否有必要改变,一般采用默认值即可 | +| img_size | 640 | 整数 | 输入模型的图像分辨率 | - | +| opset | 11 | 整数 | onnx 导出参数 opset | 建议:一般不需要用到onnx,不必改 | +| args_options | '--exist-ok' | 字符串 | yolov5命令行参数 | 建议:专业用户可用yolov5所有命令行参数 | +| save_best_only | True | 布尔型 | 是否只保存最优模型 | 建议:为节省空间设为True即可 | +| save_period | 10 | 整数 | 保存模型的间隔 | 建议:当save_best_only为False时,可保存 `epoch/save_period` 个中间结果 +| sync_bn | False | 布尔型 | 是否同步各gpu上的归一化层 | 建议:开启以提高训练稳定性及精度 | +| activate | '' | 字符串 | 激活函数,默认为nn.Hardswish(), 参考 [pytorch激活函数](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) | 可选值: ELU, Hardswish, LeakyReLU, PReLU, ReLU, ReLU6, SiLU, ... | + +### 训练结果文件示例 +``` +. +├── data.yaml # ymir数据集转换后生成的data.yaml +├── models # 模型保存目录 +├── monitor.txt # ymir进度接口文件 +├── tensorboard # tensorboard日志文件 +│ ├── events.out.tfevents.1669112949.2cf0844ff367.337.0 +│ ├── results.csv +│ ├── results.png +│ ├── train_batch0.jpg +│ ├── train_batch1.jpg +│ └── train_batch2.jpg +├── test.tsv # ymir数据集转换后生成的测试索引文件,为空 +├── train.cache # 训练集缓存文件 +├── train.tsv # ymir数据集转换后生成的训练集索引文件 +├── val.cache # 验证集缓存文件 +└── val.tsv # ymir数据集转换后生成的测试集索引文件 +``` + + +--- + +## 推理: infer + +推理任务中,ymir后台会生成参数 `gpu_id`, `class_names`, `task_id` 与 `model_param_path`, 其中`model_param_path`与训练任务中的`pretrained_model_params`类似。 + +### 推理参数说明 +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| img_size | 640 | 整数 | 模型的输入图像大小 | 采用32的整数倍,224 = 32*7 以上大小 | +| conf_thres | 0.25 | 浮点数 | 置信度阈值 | 采用默认值 | +| iou_thres | 0.45 | 浮点数 | nms时的iou阈值 | 采用默认值 | +| batch_size_per_gpu | 16 | 整数| 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加1倍加快训练速度 | +| num_workers_per_gpu | 4 | 整数| 每张GPU对应的数据读取进程数 | - | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| pin_memory | False | 布尔型 | 是否为数据集单独固定内存? | 内存充足时改为True可加快数据集加载 | + + +--- + +## 挖掘: mining + +挖掘任务中,ymir后台会生成参数 `gpu_id`, `class_names`, `task_id` 与 `model_param_path`, 其中`model_param_path`与训练任务中的`pretrained_model_params`类似。推理与挖掘任务ymir后台生成的参数一样。 + +### 挖掘参数说明 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| img_size | 640 | 整数 | 模型的输入图像大小 | 采用32的整数倍,224 = 32*7 以上大小 | +| mining_algorithm | aldd | 字符串 | 挖掘算法名称,可选 random, aldd, cald, entropy | 建议单类检测采用aldd,多类检测采用entropy | +| class_distribution_scores | '' | List[float]的字符表示 | aldd算法的类别平衡参数 | 不用更改, 专业用户可根据各类比较调整,如对于4类检测,用 `1.0,1.0,0.1,0.2` 降低后两类的挖掘比重 | +| conf_thres | 0.25 | 浮点数 | 置信度阈值 | 采用默认值 | +| iou_thres | 0.45 | 浮点数 | nms时的iou阈值 | 采用默认值 | +| batch_size_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加1倍加快训练速度 | +| num_workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| shm_size | 128G | 字符串 | 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| pin_memory | False | 布尔型 | 是否为数据集单独固定内存? | 内存充足时改为True可加快数据集加载 | + +## 主要改动:main change log + +- add `start.py` and `ymir/ymir_yolov5.py` for train/infer/mining + +- add `ymir/ymir_yolov5.py` for useful functions + + - `get_merged_config()` add ymir path config `cfg.yaml` and hyper-parameter `cfg.param` + + - `convert_ymir_to_yolov5()` generate yolov5 dataset config file `data.yaml` + + - `write_ymir_training_result()` save model weight, map and other files. + + - `get_weight_file()` get pretrained weight or init weight file from ymir system + +- modify `utils/datasets.py` for ymir dataset format + +- modify `train.py` for training process monitor + +- add `mining/data_augment.py` and `mining/mining_cald.py` for mining + +- add `training/infer/mining-template.yaml` for `/img-man/training/infer/mining-template.yaml` + +- add `cuda102/111.dockerfile`, remove origin `Dockerfile` + +- modify `requirements.txt` + +- other modify support onnx export, not important. + +## 更新功能:new features + +- 2022/09/08: add aldd active learning algorithm for mining task. [Active Learning for Deep Detection Neural Networks (ICCV 2019)](https://gitlab.com/haghdam/deep_active_learning) +- 2022/09/14: support change hyper-parameter `num_workers_per_gpu` +- 2022/09/16: support change activation, view [rknn](https://github.com/airockchip/rknn_model_zoo/tree/main/models/vision/object_detection/yolov5-pytorch) +- 2022/10/09: fix dist.destroy_process_group() hang diff --git a/det-yolov5-tmi/cuda102.dockerfile b/det-yolov5-tmi/ymir/docker/cuda102.dockerfile similarity index 55% rename from det-yolov5-tmi/cuda102.dockerfile rename to det-yolov5-tmi/ymir/docker/cuda102.dockerfile index 49a29d3..94b5eaf 100644 --- a/det-yolov5-tmi/cuda102.dockerfile +++ b/det-yolov5-tmi/ymir/docker/cuda102.dockerfile @@ -3,39 +3,38 @@ ARG CUDA="10.2" ARG CUDNN="7" FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-runtime -ARG SERVER_MODE=prod +# support YMIR=1.0.0, 1.1.0 or 1.2.0 +ARG YMIR="1.1.0" ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX" ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all" ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" ENV LANG=C.UTF-8 +ENV YMIR_VERSION=${YMIR} +ENV YOLOV5_CONFIG_DIR='/app/data' # Install linux package RUN apt-get update && apt-get install -y gnupg2 git libglib2.0-0 \ - libgl1-mesa-glx curl wget zip \ + libgl1-mesa-glx libsm6 libxext6 libxrender-dev curl wget zip vim \ + build-essential ninja-build \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* # install ymir-exc sdk -RUN if [ "${SERVER_MODE}" = "dev" ]; then \ - pip install --force-reinstall -U "git+https://github.com/IndustryEssentials/ymir.git/@dev#egg=ymir-exc&subdirectory=docker_executor/sample_executor/ymir_exc"; \ - else \ - pip install ymir-exc; \ - fi +RUN pip install "git+https://github.com/modelai/ymir-executor-sdk.git@ymir1.3.0" # Copy file from host to docker and install requirements -ADD ./det-yolov5-tmi /app -RUN mkdir /img-man && mv /app/*-template.yaml /img-man/ \ +COPY . /app +RUN mkdir /img-man && mv /app/ymir/img-man/*-template.yaml /img-man/ \ && pip install -r /app/requirements.txt # Download pretrained weight and font file RUN cd /app && bash data/scripts/download_weights.sh \ - && mkdir -p /root/.config/Ultralytics \ - && wget https://ultralytics.com/assets/Arial.ttf -O /root/.config/Ultralytics/Arial.ttf + && wget https://ultralytics.com/assets/Arial.ttf -O ${YOLOV5_CONFIG_DIR}/Arial.ttf # Make PYTHONPATH find local package ENV PYTHONPATH=. WORKDIR /app -RUN echo "python3 /app/start.py" > /usr/bin/start.sh +RUN echo "python3 /app/ymir/start.py" > /usr/bin/start.sh CMD bash /usr/bin/start.sh diff --git a/det-yolov5-tmi/cuda111.dockerfile b/det-yolov5-tmi/ymir/docker/cuda111.dockerfile similarity index 52% rename from det-yolov5-tmi/cuda111.dockerfile rename to det-yolov5-tmi/ymir/docker/cuda111.dockerfile index 0c6e5dd..be05e87 100644 --- a/det-yolov5-tmi/cuda111.dockerfile +++ b/det-yolov5-tmi/ymir/docker/cuda111.dockerfile @@ -4,39 +4,40 @@ ARG CUDNN="8" # cuda11.1 + pytorch 1.9.0 + cudnn8 not work!!! FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-runtime -ARG SERVER_MODE=prod +# support YMIR=1.0.0, 1.1.0 or 1.2.0 +ARG YMIR="1.1.0" + ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX" ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all" ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" ENV LANG=C.UTF-8 +ENV YMIR_VERSION=$YMIR +ENV YOLOV5_CONFIG_DIR='/app/data' # Install linux package RUN apt-get update && apt-get install -y gnupg2 git libglib2.0-0 \ - libgl1-mesa-glx curl wget zip \ + libgl1-mesa-glx libsm6 libxext6 libxrender-dev curl wget zip vim \ + build-essential ninja-build \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* -# install ymir-exc sdk -RUN if [ "${SERVER_MODE}" = "dev" ]; then \ - pip install --force-reinstall -U "git+https://github.com/IndustryEssentials/ymir.git/@dev#egg=ymir-exc&subdirectory=docker_executor/sample_executor/ymir_exc"; \ - else \ - pip install ymir-exc; \ - fi +COPY ./requirements.txt /workspace/ +# install ymir-exc sdk and requirements +RUN pip install "git+https://github.com/modelai/ymir-executor-sdk.git@ymir1.3.0" \ + && pip install -r /workspace/requirements.txt # Copy file from host to docker and install requirements -ADD ./det-yolov5-tmi /app -RUN mkdir /img-man && mv /app/*-template.yaml /img-man/ \ - && pip install -r /app/requirements.txt +COPY . /app +RUN mkdir /img-man && mv /app/ymir/img-man/*-template.yaml /img-man/ # Download pretrained weight and font file RUN cd /app && bash data/scripts/download_weights.sh \ - && mkdir -p /root/.config/Ultralytics \ - && wget https://ultralytics.com/assets/Arial.ttf -O /root/.config/Ultralytics/Arial.ttf + && wget https://ultralytics.com/assets/Arial.ttf -O ${YOLOV5_CONFIG_DIR}/Arial.ttf # Make PYTHONPATH find local package ENV PYTHONPATH=. WORKDIR /app -RUN echo "python3 /app/start.py" > /usr/bin/start.sh +RUN echo "python3 /app/ymir/start.py" > /usr/bin/start.sh CMD bash /usr/bin/start.sh diff --git a/det-yolov5-tmi/infer-template.yaml b/det-yolov5-tmi/ymir/img-man/infer-template.yaml similarity index 79% rename from det-yolov5-tmi/infer-template.yaml rename to det-yolov5-tmi/ymir/img-man/infer-template.yaml index 89dcc96..c19dc74 100644 --- a/det-yolov5-tmi/infer-template.yaml +++ b/det-yolov5-tmi/ymir/img-man/infer-template.yaml @@ -10,3 +10,7 @@ img_size: 640 conf_thres: 0.25 iou_thres: 0.45 +batch_size_per_gpu: 16 +num_workers_per_gpu: 4 +pin_memory: False +shm_size: 128G diff --git a/det-yolov5-tmi/mining-template.yaml b/det-yolov5-tmi/ymir/img-man/mining-template.yaml similarity index 68% rename from det-yolov5-tmi/mining-template.yaml rename to det-yolov5-tmi/ymir/img-man/mining-template.yaml index 20106dc..485c8bb 100644 --- a/det-yolov5-tmi/mining-template.yaml +++ b/det-yolov5-tmi/ymir/img-man/mining-template.yaml @@ -8,5 +8,11 @@ # class_names: [] img_size: 640 +mining_algorithm: aldd +class_distribution_scores: '' # 1.0,1.0,0.1,0.2 conf_thres: 0.25 iou_thres: 0.45 +batch_size_per_gpu: 16 +num_workers_per_gpu: 4 +pin_memory: False +shm_size: 128G diff --git a/det-yolov5-tmi/ymir/img-man/training-template.yaml b/det-yolov5-tmi/ymir/img-man/training-template.yaml new file mode 100644 index 0000000..dc7bb02 --- /dev/null +++ b/det-yolov5-tmi/ymir/img-man/training-template.yaml @@ -0,0 +1,23 @@ +# training template for your executor app +# after build image, it should at /img-man/training-template.yaml +# key: gpu_id, task_id, pretrained_model_params, class_names should be preserved + +# gpu_id: '0' +# task_id: 'default-training-task' +# pretrained_model_params: [] +# class_names: [] + +shm_size: '128G' +export_format: 'ark:raw' +model: 'yolov5s' +batch_size_per_gpu: 16 +num_workers_per_gpu: 4 +epochs: 100 +img_size: 640 +opset: 11 +args_options: '--exist-ok' +save_best_only: True # save the best weight file only +save_period: 10 +sync_bn: False # work for multi-gpu only +activation: 'SiLU' # view https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity +ymir_saved_file_patterns: '' # custom saved files, support python regular expression, use , to split multiple pattern diff --git a/det-yolov5-tmi/mining/data_augment.py b/det-yolov5-tmi/ymir/mining/data_augment.py similarity index 91% rename from det-yolov5-tmi/mining/data_augment.py rename to det-yolov5-tmi/ymir/mining/data_augment.py index 47b1d50..595bfac 100644 --- a/det-yolov5-tmi/mining/data_augment.py +++ b/det-yolov5-tmi/ymir/mining/data_augment.py @@ -9,7 +9,7 @@ import numpy as np from nptyping import NDArray -from utils.ymir_yolov5 import BBOX, CV_IMAGE +from ymir.ymir_yolov5 import BBOX, CV_IMAGE def intersect(boxes1: BBOX, boxes2: BBOX) -> NDArray: @@ -23,11 +23,13 @@ def intersect(boxes1: BBOX, boxes2: BBOX) -> NDArray: ''' n1 = boxes1.shape[0] n2 = boxes2.shape[0] - max_xy = np.minimum(np.expand_dims(boxes1[:, 2:], axis=1).repeat(n2, axis=1), - np.expand_dims(boxes2[:, 2:], axis=0).repeat(n1, axis=0)) + max_xy = np.minimum( + np.expand_dims(boxes1[:, 2:], axis=1).repeat(n2, axis=1), + np.expand_dims(boxes2[:, 2:], axis=0).repeat(n1, axis=0)) - min_xy = np.maximum(np.expand_dims(boxes1[:, :2], axis=1).repeat(n2, axis=1), - np.expand_dims(boxes2[:, :2], axis=0).repeat(n1, axis=0)) + min_xy = np.maximum( + np.expand_dims(boxes1[:, :2], axis=1).repeat(n2, axis=1), + np.expand_dims(boxes2[:, :2], axis=0).repeat(n1, axis=0)) inter = np.clip(max_xy - min_xy, a_min=0, a_max=None) # (n1, n2, 2) return inter[:, :, 0] * inter[:, :, 1] # (n1, n2) @@ -50,8 +52,12 @@ def horizontal_flip(image: CV_IMAGE, bbox: BBOX) \ return image, bbox -def cutout(image: CV_IMAGE, bbox: BBOX, cut_num: int = 2, fill_val: int = 0, - bbox_remove_thres: float = 0.4, bbox_min_thres: float = 0.1) -> Tuple[CV_IMAGE, BBOX]: +def cutout(image: CV_IMAGE, + bbox: BBOX, + cut_num: int = 2, + fill_val: int = 0, + bbox_remove_thres: float = 0.4, + bbox_min_thres: float = 0.1) -> Tuple[CV_IMAGE, BBOX]: ''' Cutout augmentation image: A PIL image diff --git a/det-yolov5-tmi/ymir/mining/util.py b/det-yolov5-tmi/ymir/mining/util.py new file mode 100644 index 0000000..0e9e3f5 --- /dev/null +++ b/det-yolov5-tmi/ymir/mining/util.py @@ -0,0 +1,149 @@ +"""run.py: +img --(model)--> pred --(augmentation)--> (aug1_pred, aug2_pred, ..., augN_pred) +img --(augmentation)--> aug1_img --(model)--> pred1 +img --(augmentation)--> aug2_img --(model)--> pred2 +... +img --(augmentation)--> augN_img --(model)--> predN + +dataload(img) --(model)--> pred +dataload(img, pred) --(augmentation1)--> (aug1_img, aug1_pred) --(model)--> pred1 + +1. split dataset with DDP sampler +2. use DDP model to infer sampled dataloader +3. gather infer result + +""" +import os +from typing import Any, List + +import cv2 +import numpy as np +import torch.utils.data as td +from nptyping import NDArray +from scipy.stats import entropy +from torch.utils.data._utils.collate import default_collate +from utils.augmentations import letterbox +from ymir.mining.data_augment import cutout, horizontal_flip, intersect, resize, rotate +from ymir.ymir_yolov5 import BBOX + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +def get_ious(boxes1: BBOX, boxes2: BBOX) -> NDArray: + """ + args: + boxes1: np.array, (N, 4), xyxy + boxes2: np.array, (M, 4), xyxy + return: + iou: np.array, (N, M) + """ + area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1]) + area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1]) + iner_area = intersect(boxes1, boxes2) + area1 = area1.reshape(-1, 1).repeat(area2.shape[0], axis=1) + area2 = area2.reshape(1, -1).repeat(area1.shape[0], axis=0) + iou = iner_area / (area1 + area2 - iner_area + 1e-14) + return iou + + +def preprocess(img, img_size, stride): + img1 = letterbox(img, img_size, stride=stride, auto=False)[0] + + # preprocess: convert data format + img1 = img1.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB + img1 = np.ascontiguousarray(img1) + # img1 = torch.from_numpy(img1).to(self.device) + + img1 = img1 / 255 # 0 - 255 to 0.0 - 1.0 + return img1 + + +def load_image_file(img_file: str, img_size, stride): + img = cv2.imread(img_file) + img1 = letterbox(img, img_size, stride=stride, auto=False)[0] + + # preprocess: convert data format + img1 = img1.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB + img1 = np.ascontiguousarray(img1) + # img1 = torch.from_numpy(img1).to(self.device) + + img1 = img1 / 255 # 0 - 255 to 0.0 - 1.0 + # img1.unsqueeze_(dim=0) # expand for batch dim + return dict(image=img1, origin_shape=img.shape[0:2], image_file=img_file) + # return img1 + + +def load_image_file_with_ann(image_info: dict, img_size, stride): + img_file = image_info['image_file'] + # xyxy(int) conf(float) class_index(int) + bboxes = image_info['results'][:, :4].astype(np.int32) + img = cv2.imread(img_file) + aug_dict = dict(flip=horizontal_flip, cutout=cutout, rotate=rotate, resize=resize) + + data = dict(image_file=img_file, origin_shape=img.shape[0:2]) + for key in aug_dict: + aug_img, aug_bbox = aug_dict[key](img, bboxes) + preprocess_aug_img = preprocess(aug_img, img_size, stride) + data[f'image_{key}'] = preprocess_aug_img + data[f'bboxes_{key}'] = aug_bbox + data[f'origin_shape_{key}'] = aug_img.shape[0:2] + + data.update(image_info) + return data + + +def collate_fn_with_fake_ann(batch): + new_batch = dict() + for key in ['flip', 'cutout', 'rotate', 'resize']: + new_batch[f'bboxes_{key}_list'] = [data[f'bboxes_{key}'] for data in batch] + + new_batch[f'image_{key}'] = default_collate([data[f'image_{key}'] for data in batch]) + + new_batch[f'origin_shape_{key}'] = default_collate([data[f'origin_shape_{key}'] for data in batch]) + + new_batch['results_list'] = [data['results'] for data in batch] + new_batch['image_file'] = [data['image_file'] for data in batch] + + return new_batch + + +def update_consistency(consistency, consistency_per_aug, beta, pred_bboxes_key, pred_conf_key, aug_bboxes_key, + aug_conf): + cls_scores_aug = 1 - pred_conf_key + cls_scores = 1 - aug_conf + + consistency_per_aug = 2.0 + ious = get_ious(pred_bboxes_key, aug_bboxes_key) + aug_idxs = np.argmax(ious, axis=0) + for origin_idx, aug_idx in enumerate(aug_idxs): + max_iou = ious[aug_idx, origin_idx] + if max_iou == 0: + consistency_per_aug = min(consistency_per_aug, beta) + p = cls_scores_aug[aug_idx] + q = cls_scores[origin_idx] + m = (p + q) / 2. + js = 0.5 * entropy([p, 1 - p], [m, 1 - m]) + 0.5 * entropy([q, 1 - q], [m, 1 - m]) + if js < 0: + js = 0 + consistency_box = max_iou + consistency_cls = 0.5 * (aug_conf[origin_idx] + pred_conf_key[aug_idx]) * (1 - js) + consistency_per_inst = abs(consistency_box + consistency_cls - beta) + consistency_per_aug = min(consistency_per_aug, consistency_per_inst.item()) + + consistency += consistency_per_aug + return consistency + + +class YmirDataset(td.Dataset): + def __init__(self, images: List[Any], load_fn=None): + super().__init__() + self.images = images + self.load_fn = load_fn + + def __getitem__(self, index): + return self.load_fn(self.images[index]) + + def __len__(self): + return len(self.images) diff --git a/det-yolov5-tmi/ymir/mining/ymir_infer.py b/det-yolov5-tmi/ymir/mining/ymir_infer.py new file mode 100644 index 0000000..5f3d56a --- /dev/null +++ b/det-yolov5-tmi/ymir/mining/ymir_infer.py @@ -0,0 +1,138 @@ +"""use fake DDP to infer +1. split data with `images_rank = images[RANK::WORLD_SIZE]` +2. save splited result with `torch.save(results, f'results_{RANK}.pt')` +3. merge result +""" +import os +import sys +import warnings +from functools import partial + +import torch +import torch.distributed as dist +import torch.utils.data as td +from easydict import EasyDict as edict +from tqdm import tqdm +from ymir_exc import result_writer as rw +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process + +from utils.general import scale_coords +from ymir.mining.util import YmirDataset, load_image_file +from ymir.ymir_yolov5 import YmirYolov5 + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +def run(ymir_cfg: edict, ymir_yolov5: YmirYolov5): + # eg: gpu_id = 1,3,5,7 for LOCAL_RANK = 2, will use gpu 5. + gpu = max(0, LOCAL_RANK) + device = torch.device('cuda', gpu) + ymir_yolov5.to(device) + + load_fn = partial(load_image_file, img_size=ymir_yolov5.img_size, stride=ymir_yolov5.stride) + batch_size_per_gpu = ymir_yolov5.batch_size_per_gpu + gpu_count = ymir_yolov5.gpu_count + cpu_count: int = os.cpu_count() or 1 + num_workers_per_gpu = min([ + cpu_count // max(gpu_count, 1), batch_size_per_gpu if batch_size_per_gpu > 1 else 0, + ymir_yolov5.num_workers_per_gpu + ]) + + with open(ymir_cfg.ymir.input.candidate_index_file, 'r') as f: + images = [line.strip() for line in f.readlines()] + + max_barrier_times = len(images) // max(1, WORLD_SIZE) // batch_size_per_gpu + # origin dataset + if RANK != -1: + images_rank = images[RANK::WORLD_SIZE] + else: + images_rank = images + origin_dataset = YmirDataset(images_rank, load_fn=load_fn) + origin_dataset_loader = td.DataLoader(origin_dataset, + batch_size=batch_size_per_gpu, + shuffle=False, + sampler=None, + num_workers=num_workers_per_gpu, + pin_memory=ymir_yolov5.pin_memory, + drop_last=False) + + results = [] + dataset_size = len(images_rank) + monitor_gap = max(1, dataset_size // 1000 // batch_size_per_gpu) + pbar = tqdm(origin_dataset_loader) if RANK in [0, -1] else origin_dataset_loader + for idx, batch in enumerate(pbar): + # batch-level sync, avoid 30min time-out error + if WORLD_SIZE > 1 and idx < max_barrier_times: + dist.barrier() + + with torch.no_grad(): + pred = ymir_yolov5.forward(batch['image'].float().to(device), nms=True) + + if idx % monitor_gap == 0 and RANK in [0, -1]: + write_ymir_monitor_process(ymir_cfg, + task='infer', + naive_stage_percent=idx * batch_size_per_gpu / dataset_size, + stage=YmirStage.TASK) + + preprocess_image_shape = batch['image'].shape[2:] + for idx, det in enumerate(pred): # per image + result_per_image = [] + image_file = batch['image_file'][idx] + if len(det): + origin_image_shape = (batch['origin_shape'][0][idx], batch['origin_shape'][1][idx]) + # Rescale boxes from img_size to img size + det[:, :4] = scale_coords(preprocess_image_shape, det[:, :4], origin_image_shape).round() + result_per_image.append(det) + results.append(dict(image_file=image_file, result=result_per_image)) + + torch.save(results, f'/out/infer_results_{max(0,RANK)}.pt') + + +def main() -> int: + ymir_cfg = get_merged_config() + ymir_yolov5 = YmirYolov5(ymir_cfg) + + if LOCAL_RANK != -1: + assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command' + torch.cuda.set_device(LOCAL_RANK) + dist.init_process_group(backend="nccl" if dist.is_nccl_available() else "gloo") + + run(ymir_cfg, ymir_yolov5) + + # wait all process to save the infer result + if WORLD_SIZE > 1: + dist.barrier() + + if RANK in [0, -1]: + results = [] + for rank in range(WORLD_SIZE): + results.append(torch.load(f'/out/infer_results_{rank}.pt')) + + ymir_infer_result = dict() + for result in results: + for img_data in result: + img_file = img_data['image_file'] + anns = [] + for each_det in img_data['result']: + each_det_np = each_det.data.cpu().numpy() + for i in range(each_det_np.shape[0]): + xmin, ymin, xmax, ymax, conf, cls = each_det_np[i, :6].tolist() + if conf < ymir_yolov5.conf_thres: + continue + if int(cls) >= len(ymir_yolov5.class_names): + warnings.warn(f'class index {int(cls)} out of range for {ymir_yolov5.class_names}') + continue + ann = rw.Annotation(class_name=ymir_yolov5.class_names[int(cls)], + score=conf, + box=rw.Box(x=int(xmin), y=int(ymin), w=int(xmax - xmin), + h=int(ymax - ymin))) + anns.append(ann) + ymir_infer_result[img_file] = anns + rw.write_infer_result(infer_result=ymir_infer_result) + return 0 + + +if __name__ == '__main__': + sys.exit(main()) diff --git a/det-yolov5-tmi/ymir/mining/ymir_mining_aldd.py b/det-yolov5-tmi/ymir/mining/ymir_mining_aldd.py new file mode 100644 index 0000000..fd0087b --- /dev/null +++ b/det-yolov5-tmi/ymir/mining/ymir_mining_aldd.py @@ -0,0 +1,213 @@ +"""use fake DDP to infer +1. split data with `images_rank = images[RANK::WORLD_SIZE]` +2. infer on the origin dataset +3. infer on the augmentation dataset +4. save splited mining result with `torch.save(results, f'/out/mining_results_{RANK}.pt')` +5. merge mining result +""" +import os +import sys +import warnings +from functools import partial +from typing import Any, List + +import numpy as np +import torch +import torch.distributed as dist +import torch.nn.functional as F +import torch.utils.data as td +from easydict import EasyDict as edict +from tqdm import tqdm +from ymir.mining.util import YmirDataset, load_image_file +from ymir.ymir_yolov5 import YmirYolov5 +from ymir_exc import result_writer as rw +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +class ALDD(object): + + def __init__(self, ymir_cfg: edict): + self.avg_pool_size = 9 + self.max_pool_size = 32 + self.avg_pool_pad = (self.avg_pool_size - 1) // 2 + + self.num_classes = len(ymir_cfg.param.class_names) + if ymir_cfg.param.get('class_distribution_scores', ''): + scores = [float(x.strip()) for x in ymir_cfg.param.class_distribution_scores.split(',')] + if len(scores) < self.num_classes: + warnings.warn('extend 1.0 to class_distribution_scores') + scores.extend([1.0] * (self.num_classes - len(scores))) + self.class_distribution_scores = np.array(scores[0:self.num_classes], dtype=np.float32) + else: + self.class_distribution_scores = np.array([1.0] * self.num_classes, dtype=np.float32) + + def calc_unc_val(self, heatmap: torch.Tensor) -> torch.Tensor: + # mean of entropy + ent = F.binary_cross_entropy(heatmap, heatmap, reduction='none') + avg_ent = F.avg_pool2d(ent, + kernel_size=self.avg_pool_size, + stride=1, + padding=self.avg_pool_pad, + count_include_pad=False) # N, 1, H, W + mean_of_entropy = torch.sum(avg_ent, dim=1, keepdim=True) # N, 1, H, W + + # entropy of mean + avg_heatmap = F.avg_pool2d(heatmap, + kernel_size=self.avg_pool_size, + stride=1, + padding=self.avg_pool_pad, + count_include_pad=False) # N, C, H, W + ent_avg = F.binary_cross_entropy(avg_heatmap, avg_heatmap, reduction='none') + entropy_of_mean = torch.sum(ent_avg, dim=1, keepdim=True) # N, 1, H, W + + uncertainty = entropy_of_mean - mean_of_entropy + unc = F.max_pool2d(uncertainty, + kernel_size=self.max_pool_size, + stride=self.max_pool_size, + padding=0, + ceil_mode=False) + + # aggregating + scores = torch.mean(unc, dim=(1, 2, 3)) # (N,) + return scores + + def compute_aldd_score(self, net_output: List[torch.Tensor], net_input_shape: Any): + """ + args: + imgs: list[np.array(H, W, C)] + returns: + scores: list of float + """ + if not isinstance(net_input_shape, (list, tuple)): + net_input_shape = (net_input_shape, net_input_shape) + + # CLASS_DISTRIBUTION_SCORE = np.array([1.0] * num_of_class) + scores_list = [] + + for feature_map in net_output: + feature_map.sigmoid_() + + for each_class_index in range(self.num_classes): + feature_map_list: List[torch.Tensor] = [] + + # each_output_feature_map: [bs, 3, h, w, 5 + num_classes] + for each_output_feature_map in net_output: + net_output_conf = each_output_feature_map[:, :, :, :, 4] + net_output_cls_mult_conf = net_output_conf * each_output_feature_map[:, :, :, :, 5 + each_class_index] + # feature_map_reshape: [bs, 3, h, w] + feature_map_reshape = F.interpolate(net_output_cls_mult_conf, + net_input_shape, + mode='bilinear', + align_corners=False) + feature_map_list.append(feature_map_reshape) + + # len(net_output) = 3 + # feature_map_concate: [bs, 9, h, w] + feature_map_concate = torch.cat(feature_map_list, 1) + # scores: [bs, 1] for each class + scores = self.calc_unc_val(feature_map_concate) + scores = scores.cpu().detach().numpy() + scores_list.append(scores) + + # total_scores: [bs, num_classes] + total_scores = np.stack(scores_list, axis=1) + total_scores = total_scores * self.class_distribution_scores + total_scores = np.sum(total_scores, axis=1) + + return total_scores + + +def run(ymir_cfg: edict, ymir_yolov5: YmirYolov5): + # eg: gpu_id = 1,3,5,7 for LOCAL_RANK = 2, will use gpu 5. + gpu = LOCAL_RANK if LOCAL_RANK >= 0 else 0 + device = torch.device('cuda', gpu) + ymir_yolov5.to(device) + + load_fn = partial(load_image_file, img_size=ymir_yolov5.img_size, stride=ymir_yolov5.stride) + batch_size_per_gpu: int = ymir_yolov5.batch_size_per_gpu + gpu_count: int = ymir_yolov5.gpu_count + cpu_count: int = os.cpu_count() or 1 + num_workers_per_gpu = min([ + cpu_count // max(gpu_count, 1), batch_size_per_gpu if batch_size_per_gpu > 1 else 0, + ymir_yolov5.num_workers_per_gpu + ]) + + with open(ymir_cfg.ymir.input.candidate_index_file, 'r') as f: + images = [line.strip() for line in f.readlines()] + + max_barrier_times = (len(images) // max(1, WORLD_SIZE)) // batch_size_per_gpu + + # origin dataset + if RANK != -1: + images_rank = images[RANK::WORLD_SIZE] + else: + images_rank = images + origin_dataset = YmirDataset(images_rank, load_fn=load_fn) + origin_dataset_loader = td.DataLoader(origin_dataset, + batch_size=batch_size_per_gpu, + shuffle=False, + sampler=None, + num_workers=num_workers_per_gpu, + pin_memory=ymir_yolov5.pin_memory, + drop_last=False) + + mining_results = dict() + dataset_size = len(images_rank) + pbar = tqdm(origin_dataset_loader) if RANK in [-1, 0] else origin_dataset_loader + miner = ALDD(ymir_cfg) + for idx, batch in enumerate(pbar): + # batch-level sync, avoid 30min time-out error + if WORLD_SIZE > 1 and idx < max_barrier_times: + dist.barrier() + + with torch.no_grad(): + featuremap_output = ymir_yolov5.model.model(batch['image'].float().to(device))[1] + unc_scores = miner.compute_aldd_score(featuremap_output, ymir_yolov5.img_size) + + for each_imgname, each_score in zip(batch["image_file"], unc_scores): + mining_results[each_imgname] = each_score + + if RANK in [-1, 0]: + write_ymir_monitor_process(ymir_cfg, + task='mining', + naive_stage_percent=idx * batch_size_per_gpu / dataset_size, + stage=YmirStage.TASK) + + torch.save(mining_results, f'/out/mining_results_{max(0,RANK)}.pt') + + +def main() -> int: + ymir_cfg = get_merged_config() + # note select_device(gpu_id) will set os.environ['CUDA_VISIBLE_DEVICES'] to gpu_id + ymir_yolov5 = YmirYolov5(ymir_cfg) + + if LOCAL_RANK != -1: + assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command' + torch.cuda.set_device(LOCAL_RANK) + dist.init_process_group(backend="nccl" if dist.is_nccl_available() else "gloo") + + run(ymir_cfg, ymir_yolov5) + + # wait all process to save the mining result + if WORLD_SIZE > 1: + dist.barrier() + + if RANK in [0, -1]: + results = [] + for rank in range(WORLD_SIZE): + results.append(torch.load(f'/out/mining_results_{rank}.pt')) + + ymir_mining_result = [] + for result in results: + for img_file, score in result.items(): + ymir_mining_result.append((img_file, score)) + rw.write_mining_result(mining_result=ymir_mining_result) + return 0 + + +if __name__ == '__main__': + sys.exit(main()) diff --git a/det-yolov5-tmi/ymir/mining/ymir_mining_cald.py b/det-yolov5-tmi/ymir/mining/ymir_mining_cald.py new file mode 100644 index 0000000..b36eb1b --- /dev/null +++ b/det-yolov5-tmi/ymir/mining/ymir_mining_cald.py @@ -0,0 +1,199 @@ +"""use fake DDP to infer +1. split data with `images_rank = images[RANK::WORLD_SIZE]` +2. infer on the origin dataset +3. infer on the augmentation dataset +4. save splited mining result with `torch.save(results, f'/out/mining_results_{RANK}.pt')` +5. merge mining result +""" +import os +import sys +from functools import partial + +import numpy as np +import torch +import torch.distributed as dist +import torch.utils.data as td +from easydict import EasyDict as edict +from tqdm import tqdm +from ymir_exc import result_writer as rw +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process + +from utils.general import scale_coords +from ymir.mining.util import (YmirDataset, collate_fn_with_fake_ann, load_image_file, load_image_file_with_ann, + update_consistency) +from ymir.ymir_yolov5 import YmirYolov5 + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +def run(ymir_cfg: edict, ymir_yolov5: YmirYolov5): + # eg: gpu_id = 1,3,5,7 for LOCAL_RANK = 2, will use gpu 5. + gpu = LOCAL_RANK if LOCAL_RANK >= 0 else 0 + device = torch.device('cuda', gpu) + ymir_yolov5.to(device) + + load_fn = partial(load_image_file, img_size=ymir_yolov5.img_size, stride=ymir_yolov5.stride) + batch_size_per_gpu: int = ymir_yolov5.batch_size_per_gpu + gpu_count: int = ymir_yolov5.gpu_count + cpu_count: int = os.cpu_count() or 1 + num_workers_per_gpu = min([ + cpu_count // max(gpu_count, 1), batch_size_per_gpu if batch_size_per_gpu > 1 else 0, + ymir_yolov5.num_workers_per_gpu + ]) + + with open(ymir_cfg.ymir.input.candidate_index_file, 'r') as f: + images = [line.strip() for line in f.readlines()] + + max_barrier_times = (len(images) // max(1, WORLD_SIZE)) // batch_size_per_gpu + # origin dataset + if RANK != -1: + images_rank = images[RANK::WORLD_SIZE] + else: + images_rank = images + origin_dataset = YmirDataset(images_rank, load_fn=load_fn) + origin_dataset_loader = td.DataLoader(origin_dataset, + batch_size=batch_size_per_gpu, + shuffle=False, + sampler=None, + num_workers=num_workers_per_gpu, + pin_memory=ymir_yolov5.pin_memory, + drop_last=False) + + results = [] + mining_results = dict() + beta = 1.3 + dataset_size = len(images_rank) + pbar = tqdm(origin_dataset_loader) if RANK in [-1, 0] else origin_dataset_loader + for idx, batch in enumerate(pbar): + # batch-level sync, avoid 30min time-out error + if WORLD_SIZE > 1 and idx < max_barrier_times: + dist.barrier() + + with torch.no_grad(): + pred = ymir_yolov5.forward(batch['image'].float().to(device), nms=True) + + if RANK in [-1, 0]: + write_ymir_monitor_process(ymir_cfg, + task='mining', + naive_stage_percent=0.3 * idx * batch_size_per_gpu / dataset_size, + stage=YmirStage.TASK) + preprocess_image_shape = batch['image'].shape[2:] + for inner_idx, det in enumerate(pred): # per image + result_per_image = [] + image_file = batch['image_file'][inner_idx] + if len(det): + origin_image_shape = (batch['origin_shape'][0][inner_idx], batch['origin_shape'][1][inner_idx]) + # Rescale boxes from img_size to img size + det[:, :4] = scale_coords(preprocess_image_shape, det[:, :4], origin_image_shape).round() + result_per_image.append(det) + else: + mining_results[image_file] = -beta + continue + + results_per_image = torch.cat(result_per_image, dim=0).data.cpu().numpy() + results.append(dict(image_file=image_file, origin_shape=origin_image_shape, results=results_per_image)) + + aug_load_fn = partial(load_image_file_with_ann, img_size=ymir_yolov5.img_size, stride=ymir_yolov5.stride) + aug_dataset = YmirDataset(results, load_fn=aug_load_fn) + aug_dataset_loader = td.DataLoader(aug_dataset, + batch_size=batch_size_per_gpu, + shuffle=False, + sampler=None, + collate_fn=collate_fn_with_fake_ann, + num_workers=num_workers_per_gpu, + pin_memory=ymir_yolov5.pin_memory, + drop_last=False) + + dataset_size = len(results) + monitor_gap = max(1, dataset_size // 1000 // batch_size_per_gpu) + pbar = tqdm(aug_dataset_loader) if RANK in [0, -1] else aug_dataset_loader + for idx, batch in enumerate(pbar): + if idx % monitor_gap == 0 and RANK in [-1, 0]: + write_ymir_monitor_process(ymir_cfg, + task='mining', + naive_stage_percent=0.3 + 0.7 * idx * batch_size_per_gpu / dataset_size, + stage=YmirStage.TASK) + + batch_consistency = [0.0 for _ in range(len(batch['image_file']))] + aug_keys = ['flip', 'cutout', 'rotate', 'resize'] + + pred_result = dict() + for key in aug_keys: + with torch.no_grad(): + pred_result[key] = ymir_yolov5.forward(batch[f'image_{key}'].float().to(device), nms=True) + + for inner_idx in range(len(batch['image_file'])): + for key in aug_keys: + preprocess_image_shape = batch[f'image_{key}'].shape[2:] + result_per_image = [] + det = pred_result[key][inner_idx] + if len(det) == 0: + # no result for the image with augmentation f'{key}' + batch_consistency[inner_idx] += beta + continue + + # prediction result from origin image + fake_ann = batch['results_list'][inner_idx] + # bboxes = fake_ann[:, :4].data.cpu().numpy().astype(np.int32) + conf = fake_ann[:, 4] + + # augmentated bbox from bboxes, aug_conf = conf + aug_bboxes_key = batch[f'bboxes_{key}_list'][inner_idx].astype(np.int32) + + origin_image_shape = (batch[f'origin_shape_{key}'][0][inner_idx], + batch[f'origin_shape_{key}'][1][inner_idx]) + + # Rescale boxes from img_size to img size + det[:, :4] = scale_coords(preprocess_image_shape, det[:, :4], origin_image_shape).round() + result_per_image.append(det) + + pred_bboxes_key = det[:, :4].data.cpu().numpy().astype(np.int32) + pred_conf_key = det[:, 4].data.cpu().numpy() + batch_consistency[inner_idx] = update_consistency(consistency=batch_consistency[inner_idx], + consistency_per_aug=2.0, + beta=beta, + pred_bboxes_key=pred_bboxes_key, + pred_conf_key=pred_conf_key, + aug_bboxes_key=aug_bboxes_key, + aug_conf=conf) + + for inner_idx in range(len(batch['image_file'])): + batch_consistency[inner_idx] /= len(aug_keys) + image_file = batch['image_file'][inner_idx] + mining_results[image_file] = batch_consistency[inner_idx] + + torch.save(mining_results, f'/out/mining_results_{max(0,RANK)}.pt') + + +def main() -> int: + ymir_cfg = get_merged_config() + ymir_yolov5 = YmirYolov5(ymir_cfg) + + if LOCAL_RANK != -1: + assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command' + torch.cuda.set_device(LOCAL_RANK) + dist.init_process_group(backend="nccl" if dist.is_nccl_available() else "gloo") + + run(ymir_cfg, ymir_yolov5) + + # wait all process to save the mining result + if WORLD_SIZE > 1: + dist.barrier() + + if RANK in [0, -1]: + results = [] + for rank in range(WORLD_SIZE): + results.append(torch.load(f'/out/mining_results_{rank}.pt')) + + ymir_mining_result = [] + for result in results: + for img_file, score in result.items(): + ymir_mining_result.append((img_file, score)) + rw.write_mining_result(mining_result=ymir_mining_result) + return 0 + + +if __name__ == '__main__': + sys.exit(main()) diff --git a/det-yolov5-tmi/ymir/mining/ymir_mining_entropy.py b/det-yolov5-tmi/ymir/mining/ymir_mining_entropy.py new file mode 100644 index 0000000..eff29fe --- /dev/null +++ b/det-yolov5-tmi/ymir/mining/ymir_mining_entropy.py @@ -0,0 +1,115 @@ +"""use fake DDP to infer +1. split data with `images_rank = images[RANK::WORLD_SIZE]` +2. infer on the origin dataset +3. infer on the augmentation dataset +4. save splited mining result with `torch.save(results, f'/out/mining_results_{RANK}.pt')` +5. merge mining result +""" +import os +import sys +from functools import partial + +import numpy as np +import torch +import torch.distributed as dist +import torch.utils.data as td +from easydict import EasyDict as edict +from tqdm import tqdm +from ymir.mining.util import YmirDataset, load_image_file +from ymir.ymir_yolov5 import YmirYolov5 +from ymir_exc import result_writer as rw +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +def run(ymir_cfg: edict, ymir_yolov5: YmirYolov5): + # eg: gpu_id = 1,3,5,7 for LOCAL_RANK = 2, will use gpu 5. + gpu = LOCAL_RANK if LOCAL_RANK >= 0 else 0 + device = torch.device('cuda', gpu) + ymir_yolov5.to(device) + + load_fn = partial(load_image_file, img_size=ymir_yolov5.img_size, stride=ymir_yolov5.stride) + batch_size_per_gpu: int = ymir_yolov5.batch_size_per_gpu + gpu_count: int = ymir_yolov5.gpu_count + cpu_count: int = os.cpu_count() or 1 + num_workers_per_gpu = min([ + cpu_count // max(gpu_count, 1), batch_size_per_gpu if batch_size_per_gpu > 1 else 0, + ymir_yolov5.num_workers_per_gpu + ]) + + with open(ymir_cfg.ymir.input.candidate_index_file, 'r') as f: + images = [line.strip() for line in f.readlines()] + + max_barrier_times = (len(images) // max(1, WORLD_SIZE)) // batch_size_per_gpu + # origin dataset + if RANK != -1: + images_rank = images[RANK::WORLD_SIZE] + else: + images_rank = images + origin_dataset = YmirDataset(images_rank, load_fn=load_fn) + origin_dataset_loader = td.DataLoader(origin_dataset, + batch_size=batch_size_per_gpu, + shuffle=False, + sampler=None, + num_workers=num_workers_per_gpu, + pin_memory=ymir_yolov5.pin_memory, + drop_last=False) + + mining_results = dict() + dataset_size = len(images_rank) + pbar = tqdm(origin_dataset_loader) if RANK in [0, -1] else origin_dataset_loader + for idx, batch in enumerate(pbar): + # batch-level sync, avoid 30min time-out error + if WORLD_SIZE > 1 and idx < max_barrier_times: + dist.barrier() + + with torch.no_grad(): + pred = ymir_yolov5.forward(batch['image'].float().to(device), nms=False) + + if RANK in [-1, 0]: + write_ymir_monitor_process(ymir_cfg, task='mining', naive_stage_percent=idx * batch_size_per_gpu / dataset_size, stage=YmirStage.TASK) + for inner_idx, det in enumerate(pred): # per image + image_file = batch['image_file'][inner_idx] + if len(det): + conf = det[:, 4].data.cpu().numpy() + mining_results[image_file] = -np.sum(conf * np.log2(conf)) + else: + mining_results[image_file] = -10 + continue + + torch.save(mining_results, f'/out/mining_results_{max(0,RANK)}.pt') + + +def main() -> int: + ymir_cfg = get_merged_config() + ymir_yolov5 = YmirYolov5(ymir_cfg) + + if LOCAL_RANK != -1: + assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command' + torch.cuda.set_device(LOCAL_RANK) + dist.init_process_group(backend="nccl" if dist.is_nccl_available() else "gloo") + + run(ymir_cfg, ymir_yolov5) + + # wait all process to save the mining result + if WORLD_SIZE > 1: + dist.barrier() + + if RANK in [0, -1]: + results = [] + for rank in range(WORLD_SIZE): + results.append(torch.load(f'/out/mining_results_{rank}.pt')) + + ymir_mining_result = [] + for result in results: + for img_file, score in result.items(): + ymir_mining_result.append((img_file, score)) + rw.write_mining_result(mining_result=ymir_mining_result) + return 0 + + +if __name__ == '__main__': + sys.exit(main()) diff --git a/det-yolov5-tmi/ymir/mining/ymir_mining_random.py b/det-yolov5-tmi/ymir/mining/ymir_mining_random.py new file mode 100644 index 0000000..815783d --- /dev/null +++ b/det-yolov5-tmi/ymir/mining/ymir_mining_random.py @@ -0,0 +1,78 @@ +"""use fake DDP to infer +1. split data with `images_rank = images[RANK::WORLD_SIZE]` +2. infer on the origin dataset +3. infer on the augmentation dataset +4. save splited mining result with `torch.save(results, f'/out/mining_results_{RANK}.pt')` +5. merge mining result +""" +import os +import random +import sys + +import torch +import torch.distributed as dist +from easydict import EasyDict as edict +from tqdm import tqdm +from ymir.ymir_yolov5 import YmirYolov5 +from ymir_exc import result_writer as rw +from ymir_exc.util import YmirStage, get_merged_config, write_ymir_monitor_process + +LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html +RANK = int(os.getenv('RANK', -1)) +WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) + + +def run(ymir_cfg: edict, ymir_yolov5: YmirYolov5): + # eg: gpu_id = 1,3,5,7 for LOCAL_RANK = 2, will use gpu 5. + gpu = LOCAL_RANK if LOCAL_RANK >= 0 else 0 + device = torch.device('cuda', gpu) + ymir_yolov5.to(device) + + with open(ymir_cfg.ymir.input.candidate_index_file, 'r') as f: + images = [line.strip() for line in f.readlines()] + + if RANK != -1: + images_rank = images[RANK::WORLD_SIZE] + else: + images_rank = images + mining_results = dict() + dataset_size = len(images_rank) + pbar = tqdm(images_rank) if RANK in [-1, 0] else images_rank + for idx, image in enumerate(pbar): + if RANK in [-1, 0]: + write_ymir_monitor_process(ymir_cfg, task='mining', naive_stage_percent=idx / dataset_size, stage=YmirStage.TASK) + mining_results[image] = random.random() + + torch.save(mining_results, f'/out/mining_results_{max(0,RANK)}.pt') + + +def main() -> int: + ymir_cfg = get_merged_config() + ymir_yolov5 = YmirYolov5(ymir_cfg) + + if LOCAL_RANK != -1: + assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command' + torch.cuda.set_device(LOCAL_RANK) + dist.init_process_group(backend="nccl" if dist.is_nccl_available() else "gloo") + + run(ymir_cfg, ymir_yolov5) + + # wait all process to save the mining result + if WORLD_SIZE > 1: + dist.barrier() + + if RANK in [0, -1]: + results = [] + for rank in range(WORLD_SIZE): + results.append(torch.load(f'/out/mining_results_{rank}.pt')) + + ymir_mining_result = [] + for result in results: + for img_file, score in result.items(): + ymir_mining_result.append((img_file, score)) + rw.write_mining_result(mining_result=ymir_mining_result) + return 0 + + +if __name__ == '__main__': + sys.exit(main()) diff --git a/det-yolov5-tmi/ymir/start.py b/det-yolov5-tmi/ymir/start.py new file mode 100644 index 0000000..b368ac1 --- /dev/null +++ b/det-yolov5-tmi/ymir/start.py @@ -0,0 +1,159 @@ +import logging +import os +import subprocess +import sys + +from easydict import EasyDict as edict +from ymir_exc import monitor +from ymir_exc.util import YmirStage, find_free_port, get_bool, get_merged_config, write_ymir_monitor_process + +from models.experimental import attempt_download +from ymir.ymir_yolov5 import convert_ymir_to_yolov5, get_weight_file + + +def start(cfg: edict) -> int: + logging.info(f'merged config: {cfg}') + + if cfg.ymir.run_training: + _run_training(cfg) + else: + if cfg.ymir.run_mining: + _run_mining(cfg) + if cfg.ymir.run_infer: + _run_infer(cfg) + + return 0 + + +def _run_training(cfg: edict) -> None: + """ + function for training task + 1. convert dataset + 2. training model + 3. save model weight/hyperparameter/... to design directory + """ + # 1. convert dataset + out_dir = cfg.ymir.output.root_dir + convert_ymir_to_yolov5(cfg) + logging.info(f'generate {out_dir}/data.yaml') + write_ymir_monitor_process(cfg, task='training', naive_stage_percent=1.0, stage=YmirStage.PREPROCESS) + + # 2. training model + epochs: int = int(cfg.param.epochs) + batch_size_per_gpu: int = int(cfg.param.batch_size_per_gpu) + num_workers_per_gpu: int = int(cfg.param.get('num_workers_per_gpu', 4)) + model: str = cfg.param.model + img_size: int = int(cfg.param.img_size) + save_period: int = int(cfg.param.save_period) + save_best_only: bool = get_bool(cfg, key='save_best_only', default_value=True) + args_options: str = cfg.param.args_options + gpu_id: str = str(cfg.param.get('gpu_id', '0')) + gpu_count: int = len(gpu_id.split(',')) if gpu_id else 0 + batch_size: int = batch_size_per_gpu * max(1, gpu_count) + port: int = find_free_port() + sync_bn: bool = get_bool(cfg, key='sync_bn', default_value=False) + + weights = get_weight_file(cfg) + if not weights: + # download pretrained weight + weights = attempt_download(f'{model}.pt') + + models_dir = cfg.ymir.output.models_dir + project = os.path.dirname(models_dir) + name = os.path.basename(models_dir) + assert os.path.join(project, name) == models_dir + + commands = ['python3'] + device = gpu_id or 'cpu' + if gpu_count > 1: + commands.extend(f'-m torch.distributed.launch --nproc_per_node {gpu_count} --master_port {port}'.split()) + + commands.extend([ + 'train.py', '--epochs', + str(epochs), '--batch-size', + str(batch_size), '--data', f'{out_dir}/data.yaml', '--project', project, '--cfg', f'models/{model}.yaml', + '--name', name, '--weights', weights, '--img-size', + str(img_size), '--save-period', + str(save_period), '--device', device, '--workers', + str(num_workers_per_gpu) + ]) + + if save_best_only: + commands.append("--nosave") + + if gpu_count > 1 and sync_bn: + commands.append("--sync-bn") + + if args_options: + commands.extend(args_options.split()) + + logging.info(f'start training: {commands}') + + subprocess.run(commands, check=True) + write_ymir_monitor_process(cfg, task='training', naive_stage_percent=1.0, stage=YmirStage.TASK) + + # if task done, write 100% percent log + monitor.write_monitor_logger(percent=1.0) + + +def _run_mining(cfg: edict) -> None: + # generate data.yaml for mining + out_dir = cfg.ymir.output.root_dir + convert_ymir_to_yolov5(cfg) + logging.info(f'generate {out_dir}/data.yaml') + write_ymir_monitor_process(cfg, task='mining', naive_stage_percent=1.0, stage=YmirStage.PREPROCESS) + gpu_id: str = str(cfg.param.get('gpu_id', '0')) + gpu_count: int = len(gpu_id.split(',')) if gpu_id else 0 + + mining_algorithm = cfg.param.get('mining_algorithm', 'aldd') + support_mining_algorithms = ['aldd', 'cald', 'random', 'entropy'] + if mining_algorithm not in support_mining_algorithms: + raise Exception(f'unknown mining algorithm {mining_algorithm}, not in {support_mining_algorithms}') + + if gpu_count <= 1: + command = f'python3 ymir/mining/ymir_mining_{mining_algorithm}.py' + else: + port = find_free_port() + command = f'python3 -m torch.distributed.launch --nproc_per_node {gpu_count} --master_port {port} ymir/mining/ymir_mining_{mining_algorithm}.py' # noqa + + logging.info(f'mining: {command}') + subprocess.run(command.split(), check=True) + write_ymir_monitor_process(cfg, task='mining', naive_stage_percent=1.0, stage=YmirStage.POSTPROCESS) + + +def _run_infer(cfg: edict) -> None: + # generate data.yaml for infer + out_dir = cfg.ymir.output.root_dir + convert_ymir_to_yolov5(cfg) + logging.info(f'generate {out_dir}/data.yaml') + write_ymir_monitor_process(cfg, task='infer', naive_stage_percent=1.0, stage=YmirStage.PREPROCESS) + + gpu_id: str = str(cfg.param.get('gpu_id', '0')) + gpu_count: int = len(gpu_id.split(',')) if gpu_id else 0 + + if gpu_count <= 1: + command = 'python3 ymir/mining/ymir_infer.py' + else: + port = find_free_port() + command = f'python3 -m torch.distributed.launch --nproc_per_node {gpu_count} --master_port {port} ymir/mining/ymir_infer.py' # noqa + + logging.info(f'infer: {command}') + subprocess.run(command.split(), check=True) + + write_ymir_monitor_process(cfg, task='infer', naive_stage_percent=1.0, stage=YmirStage.POSTPROCESS) + + +if __name__ == '__main__': + logging.basicConfig(stream=sys.stdout, + format='%(levelname)-8s: [%(asctime)s] %(message)s', + datefmt='%Y%m%d-%H:%M:%S', + level=logging.INFO) + + cfg = get_merged_config() + os.environ.setdefault('PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION', 'python') + + # activation: relu + activation: str = cfg.param.get('activation', '') + if activation: + os.environ.setdefault('ACTIVATION', activation) + sys.exit(start(cfg)) diff --git a/det-yolov5-tmi/ymir/ymir_yolov5.py b/det-yolov5-tmi/ymir/ymir_yolov5.py new file mode 100644 index 0000000..6b924cf --- /dev/null +++ b/det-yolov5-tmi/ymir/ymir_yolov5.py @@ -0,0 +1,172 @@ +""" +utils function for ymir and yolov5 +""" +import os.path as osp +import shutil +from typing import Any, List + +import numpy as np +import torch +import yaml +from easydict import EasyDict as edict +from nptyping import NDArray, Shape, UInt8 +from ymir_exc import result_writer as rw +from ymir_exc.util import get_bool, get_weight_files + +from models.common import DetectMultiBackend +from utils.augmentations import letterbox +from utils.general import check_img_size, non_max_suppression, scale_coords +from utils.torch_utils import select_device + +BBOX = NDArray[Shape['*,4'], Any] +CV_IMAGE = NDArray[Shape['*,*,3'], UInt8] + + +def get_weight_file(cfg: edict) -> str: + """ + return the weight file path by priority + find weight file in cfg.param.model_params_path or cfg.param.model_params_path + """ + weight_files = get_weight_files(cfg, suffix=('.pt')) + # choose weight file by priority, best.pt > xxx.pt + for p in weight_files: + if p.endswith('best.pt'): + return p + + if len(weight_files) > 0: + return max(weight_files, key=osp.getctime) + + return "" + + +class YmirYolov5(torch.nn.Module): + """ + used for mining and inference to init detector and predict. + """ + + def __init__(self, cfg: edict): + super().__init__() + self.cfg = cfg + + self.gpu_id: str = str(cfg.param.get('gpu_id', '0')) + device = select_device(self.gpu_id) # will set CUDA_VISIBLE_DEVICES=self.gpu_id + self.gpu_count: int = len(self.gpu_id.split(',')) if self.gpu_id else 0 + self.batch_size_per_gpu: int = int(cfg.param.get('batch_size_per_gpu', 4)) + self.num_workers_per_gpu: int = int(cfg.param.get('num_workers_per_gpu', 4)) + self.pin_memory: bool = get_bool(cfg, 'pin_memory', False) + self.batch_size: int = self.batch_size_per_gpu * self.gpu_count + self.model = self.init_detector(device) + self.model.eval() + self.device = device + self.class_names: List[str] = cfg.param.class_names + self.stride = self.model.stride + self.conf_thres: float = float(cfg.param.conf_thres) + self.iou_thres: float = float(cfg.param.iou_thres) + + img_size = int(cfg.param.img_size) + imgsz = [img_size, img_size] + imgsz = check_img_size(imgsz, s=self.stride) + + self.model.warmup(imgsz=(1, 3, *imgsz), half=False) # warmup + self.img_size: List[int] = imgsz + + def extract_feats(self, x): + """ + return the feature maps before sigmoid for mining + """ + return self.model.model(x)[1] + + def forward(self, x, nms=False): + pred = self.model(x) + if not nms: + return pred + + pred = non_max_suppression( + pred, + conf_thres=self.conf_thres, + iou_thres=self.iou_thres, + classes=None, # not filter class_idx + agnostic=False, + max_det=100) + return pred + + def init_detector(self, device: torch.device) -> DetectMultiBackend: + weights = get_weight_file(self.cfg) + + if not weights: + raise Exception("no weights file specified!") + + data_yaml = osp.join(self.cfg.ymir.output.root_dir, 'data.yaml') + model = DetectMultiBackend( + weights=weights, + device=device, + dnn=False, # not use opencv dnn for onnx inference + data=data_yaml) # dataset.yaml path + + return model + + def predict(self, img: CV_IMAGE) -> NDArray: + """ + predict single image and return bbox information + img: opencv BGR, uint8 format + """ + # preprocess: padded resize + img1 = letterbox(img, self.img_size, stride=self.stride, auto=True)[0] + + # preprocess: convert data format + img1 = img1.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB + img1 = np.ascontiguousarray(img1) + img1 = torch.from_numpy(img1).to(self.device) + + img1 = img1 / 255 # 0 - 255 to 0.0 - 1.0 + img1.unsqueeze_(dim=0) # expand for batch dim + pred = self.forward(img1, nms=True) + + result = [] + for det in pred: + if len(det): + # Rescale boxes from img_size to img size + det[:, :4] = scale_coords(img1.shape[2:], det[:, :4], img.shape).round() + result.append(det) + + # xyxy, conf, cls + if len(result) > 0: + tensor_result = torch.cat(result, dim=0) + numpy_result = tensor_result.data.cpu().numpy() + else: + numpy_result = np.zeros(shape=(0, 6), dtype=np.float32) + + return numpy_result + + def infer(self, img: CV_IMAGE) -> List[rw.Annotation]: + anns = [] + result = self.predict(img) + + for i in range(result.shape[0]): + xmin, ymin, xmax, ymax, conf, cls = result[i, :6].tolist() + ann = rw.Annotation(class_name=self.class_names[int(cls)], + score=conf, + box=rw.Box(x=int(xmin), y=int(ymin), w=int(xmax - xmin), h=int(ymax - ymin))) + + anns.append(ann) + + return anns + + +def convert_ymir_to_yolov5(cfg: edict, out_dir: str = None): + """ + convert ymir format dataset to yolov5 format + generate data.yaml for training/mining/infer + """ + + out_dir = out_dir or cfg.ymir.output.root_dir + data = dict(path=out_dir, nc=len(cfg.param.class_names), names=cfg.param.class_names) + for split, prefix in zip(['train', 'val', 'test'], ['training', 'val', 'candidate']): + src_file = getattr(cfg.ymir.input, f'{prefix}_index_file') + if osp.exists(src_file): + shutil.copy(src_file, f'{out_dir}/{split}.tsv') + + data[split] = f'{split}.tsv' + + with open(osp.join(out_dir, 'data.yaml'), 'w') as fw: + fw.write(yaml.safe_dump(data)) diff --git a/docs/FAQ.md b/docs/FAQ.md new file mode 100644 index 0000000..0d06a4f --- /dev/null +++ b/docs/FAQ.md @@ -0,0 +1,49 @@ +# FAQ + +## 关于cuda版本 + +- 推荐主机安装高版本驱动,支持11.2以上的cuda版本, 使用11.1及以上的镜像 + +- GTX3080/GTX3090不支持11.1以下的cuda,只能使用cuda11.1及以上的镜像 + +## apt 或 pip 安装慢或出错 + +- 采用国内源,如在docker file 中添加如下命令 + + ``` + RUN sed -i 's/archive.ubuntu.com/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list + + RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple + ``` + +## docker build 的时候出错,找不到相应docker file或`COPY/ADD`时出错 + +- 回到项目根目录或docker file对应根目录,确保docker file 中`COPY/ADD`的文件与文件夹能够访问,以yolov5为例. + + ``` + cd ymir-executor-fork/det-yolov5-tmi + + docker build -t ymir-executor/yolov5:cuda111 . -f ymir/docker/cuda111.dockerfile + ``` + +## 自制镜像出错 + +- [检测镜像调试](./object_detection/test_det.md) + +- [分割镜像调试](./image_segmentation/test_semantic_seg.md) + +- [常见镜像错误](./common_image_error.md) + +## 模型精度/速度如何权衡与提升 + +- 模型精度与数据集大小、数据集质量、学习率、batch size、 迭代次数、模型结构、数据增强方式、损失函数等相关,在此不做展开,详情参考: + + - [Object Detection in 20 Years: A Survey](https://arxiv.org/abs/1905.05055) + + - [Paper with Code: Object Detection](https://paperswithcode.com/task/object-detection) + + - [awesome object detection](https://github.com/amusi/awesome-object-detection) + + - [voc2012 object detection leadboard](http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4) + + - [coco object detection leadboard](https://cocodataset.org/#detection-leaderboard) diff --git a/docs/README.MD b/docs/README.MD new file mode 100644 index 0000000..5975153 --- /dev/null +++ b/docs/README.MD @@ -0,0 +1,35 @@ +# ymir镜像文档 + +## 基于开源仓库进行定制 + +- [yolov5示例](https://github.com/modelai/ymir-yolov5/pull/2/files) + +## 💫 生态环境 + +- [ymir镜像开发SDK](https://github.com/modelai/ymir-executor-sdk) + + - [读取配置与数据](https://github.com/modelai/ymir-executor-sdk/blob/master/docs/read.md) + + - [写进度与结果文件](https://github.com/modelai/ymir-executor-sdk/blob/master/docs/write.md) + + - [数据集格式转换](https://github.com/modelai/ymir-executor-sdk/blob/master/docs/dataset_convert.md) + +- [ymir镜像调试校验工具](https://github.com/modelai/ymir-executor-verifier) + + - 样例数据下载 + + - 交互式调试 + + - 批量校验镜像 + +## FAQ + +- [FAQ](./FAQ.md) + +## 其它 + +- [ymir版本与接口兼容](./ymir-executor-version.md) + +- [加速apt/pip/docker](./speedup_apt_pip_docker.md) + +- [导入外部模型权值](./import_outer_weight.md) diff --git a/docs/algorithms/mmdet.md b/docs/algorithms/mmdet.md new file mode 100644 index 0000000..1135545 --- /dev/null +++ b/docs/algorithms/mmdet.md @@ -0,0 +1,116 @@ +# ymir-mmdetection + +此文档采用 `mmdetection v3.x` 架构,阅读此文档前,建议先了解[mmengine](https://mmengine.readthedocs.io/zh_CN/latest/get_started/introduction.html). + +- [mmdetection v3.x](https://github.com/open-mmlab/mmdetection/tree/3.x) + +- [ymir-mmdetection](https://github.com/modelai/ymir-mmdetection) + +## mmdetection --> ymir-mmdetection + +- mmdetection支持 `coco` 与 `pascal voc` 等多种数据格式。ymir-mmdetection镜像会将ymir平台的检测数据格式 `det-ark:raw` 转换为 `coco`。 + +- mmdetection通过配置文件如 [configs/_base_/datasets/coco_detection.py](https://github.com/open-mmlab/mmdetection/blob/3.x/configs/_base_/datasets/coco_detection.py#L36-L42) 指明数据集的路径。 + +``` +# dataset settings +dataset_type = 'CocoDataset' +data_root = 'data/coco/' + +train_dataloader = dict( + dataset=dict( + type=dataset_type, + data_root=data_root, + ann_file='annotations/instances_train2017.json', + data_prefix=dict(img='train2017/'), + filter_cfg=dict(filter_empty_gt=True, min_size=32), + pipeline=train_pipeline)) + ) +``` + +- 为加载ymir平台数据集,一种方案是参考[自定义数据集](https://mmdetection.readthedocs.io/en/3.x/user_guides/train.html#train-with-customized-datasets),提供配置文件。但这种方案会固定数据集的类别,不适合ymir平台。 + +- ymir-mmdetection采用另一种方案,在已有配置文件的基础上,直接在内存中进行修改。参考[ymir-mmyolo/ymir/tools/train.py](https://github.com/modelai/ymir-mmyolo/blob/ymir/tools/train.py#L65-L67) + +``` + # 加载已有配置文件如 `configs/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco.py` + cfg = Config.fromfile(args.config) + # 获得ymir平台超参数 + ymir_cfg = get_merged_config() + # 直接在内存中修改配置 + modify_mmengine_config(cfg, ymir_cfg) +``` + +## 配置镜像环境 + +## 提供超参数模板文件与镜像配置文件 + +- [img-man/*-template.yaml](https://github.com/modelai/ymir-mmdetection/tree/ymir/ymir/img-man) + +## 提供默认启动脚本 + +- [ymir/start.py](https://github.com/modelai/ymir-mmyolo/tree/ymir/ymir/start.py) + +- Dockerfile +``` +RUN echo "python /app/ymir/start.py" > /usr/bin/start.sh # 生成启动脚本 /usr/bin/start.sh +CMD bash /usr/bin/start.sh # 将镜像的默认启动脚本设置为 /usr/bin/start.sh +``` + +## 实现基本功能 + +### 训练 + +### 推理 + +### 挖掘 + +## 制作镜像 det/mmdet:tmi + +- [ymir/Dockerfile](https://github.com/modelai/ymir-mmdetection/tree/ymir/ymir/Dockerfile) + +``` +docker build -t det/mmdet:tmi -f ymir/Dockerfile . +``` + +## 💫复杂用法 + +!!! 注意 + 这部分内容初学者可以跳过 + +### cfg_options + +当用户使用脚本 “tools/train.py” 或 “tools/test.py” 提交任务,或者其他工具时,可以通过指定 --cfg-options 参数来直接修改配置文件中内容。 + +- 更新字典链中的配置的键 + + 配置项可以通过遵循原始配置中键的层次顺序指定。例如,--cfg-options model.backbone.norm_eval=False 改变模型 backbones 中的所有 BN 模块为 train 模式。 + +- 更新列表中配置的键 + + 你的配置中的一些配置字典是由列表组成。例如,训练 pipeline data.train.pipeline 通常是一个列表。 例如 [dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), ...]。 如果你想要在 pipeline 中将 'flip_prob=0.5' 修改为 'flip_prob=0.0' , 您可以指定 --cfg-options data.train.pipeline.1.flip_prob=0.0. + +- 更新 list/tuples 中的值 + + 如果想要更新的值是一个列表或者元组。 例如, 一些配置文件中包含 param_scheduler = "[dict(type='CosineAnnealingLR',T_max=200,by_epoch=True,begin=0,end=200)]"。 如果你想要改变这个键,你可以指定 --cfg-options param_scheduler = "[dict(type='LinearLR',start_factor=1e-4, by_epoch=True,begin=0,end=40,convert_to_iter_based=True)]"。 注意, ” 是必要的, 并且在指定值的时候,在引号中不能存在空白字符。 + +### 自定义Dataset + +mmengine提供了基本类BaseDataset,可直接定义YmirDataset, 实现以下三个函数。 + +``` +from mmengine.dataset import BaseDataset + +from mmengine.registry import DATASETS + +@DATASETS.registre_module() +class YmirDataset(BaseDataset): + def __init__(self, xxx, xxx, **kwargs): + ... + + def load_data_list(self) -> List[dict]: + ... + + def get_cat_ids(self, idx: int) -> List[int]: + ... +``` diff --git a/docs/algorithms/mmseg.md b/docs/algorithms/mmseg.md new file mode 100644 index 0000000..ca0b642 --- /dev/null +++ b/docs/algorithms/mmseg.md @@ -0,0 +1,39 @@ +# ymir-mmsegmentation + +## mmsegmentation简介 + +mmsegmentation 是 OpenMMLab 开源的语义分割工具库,包含众多的算法。可以阅读其[官方文档](https://mmsegmentation.readthedocs.io/zh_CN/latest/index.html)了解其详细用法,此处仅介绍其训练,推理相关的内容。 + +### 训练 + +- 单GPU训练的命令如下, 其中的 **CONFIG_FILE** 可以从 [configs](https://github.com/open-mmlab/mmsegmentation/tree/master/configs) 目录下找到 + +``` +python tools/train.py ${CONFIG_FILE} [可选参数] + +# 在cityscapes数据集上训练deeplabv3plus模型 +python tools/train.py configs/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes.py +``` + +- 多GPU训练的命令如下 +``` +sh tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [可选参数] + +# 采用4块GPU进行训练 +python tools/train.py configs/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes.py 4 +``` + +### 推理 + +- 可以参考 [demo/image_demo.py](https://github.com/open-mmlab/mmsegmentation/tree/master/demo/image_demo.py) + +- 先下载对应config的权重文件, 可在[configs/deeplabv3plus/README.md](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/deeplabv3plus/README.md)找到对应 **CONFIG_FILE** 和权重文件 + +``` +wget https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes/deeplabv3plus_r50-d8_512x1024_40k_cityscapes_20200605_094610-d222ffcd.pth +``` + +- 进行推理 +``` +python demo/image_demo.py demo/demo.png configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py deeplabv3plus_r50-d8_512x1024_40k_cityscapes_20200605_094610-d222ffcd.pth +``` diff --git a/docs/algorithms/mmyolo.md b/docs/algorithms/mmyolo.md new file mode 100644 index 0000000..c0fce14 --- /dev/null +++ b/docs/algorithms/mmyolo.md @@ -0,0 +1,96 @@ +# ymir-mmyolo + +阅读此文档前,建议先阅读[mmdet](./mmdet.md),了解mmyolo代码仓库数据加载,超参数加载与模型训练流程。 + +- [mmyolo](https://github.com/open-mmlab/mmyolo) + +- [mmyolo算法解析](https://mmyolo.readthedocs.io/zh_CN/latest/algorithm_descriptions/index.html) + +- [ymir-mmyolo](https://github.com/modelai/ymir-mmyolo) + +## 配置镜像环境 + +参考 [mmyolo#installation](https://github.com/modelai/ymir-mmyolo#%EF%B8%8F-installation-) + +- [ymir/Dockerfile](https://github.com/modelai/ymir-mmyolo/tree/ymir/ymir/Dockerfile) + +## 提供超参数模板文件 + +- [img-man/*-template.yaml](https://github.com/modelai/ymir-mmyolo/tree/ymir/ymir/img-man) + +## 提供镜像说明文件 + +- [img-man/manifest.yaml](https://github.com/modelai/ymir-mmyolo/tree/ymir/ymir/img-man/manifest.yaml) + +## 提供默认启动脚本 + +- [ymir/start.py](https://github.com/modelai/ymir-mmyolo/tree/ymir/ymir/start.py) + +- Dockerfile +``` +RUN echo "python /app/ymir/start.py" > /usr/bin/start.sh # 生成启动脚本 /usr/bin/start.sh +CMD bash /usr/bin/start.sh # 将镜像的默认启动脚本设置为 /usr/bin/start.sh +``` + +## 实现基本功能 + +完整代码变动参考[ymir-mmyolo/pull/1](https://github.com/modelai/ymir-mmyolo/pull/1/files) + +### 训练 + +1. 启动镜像时调用 `bash /usr/bin/start.sh` + +2. `start.sh` 调用 `python3 ymir/start.py` + +3. `start.py` 调用 `python3 ymir/ymir_training.py` + +4. `ymir_training.py` 调用 `bash tools/dist_train.sh ...` + + - `ymir_training.py` 调用 `convert_ymir_to_coco()` 实现数据集格式转换 + + - `ymir_training.py` 获取配置文件(config_file)、GPU数量(num_gpus)、工作目录(work_dir), 并拼接到调用命令中 + + ``` + cmd = f"bash ./tools/dist_train.sh {config_file} {num_gpus} --work-dir {work_dir}" + ``` + - 在训练结束后, 保存 `max_keep_checkpoints` 份权重文件 + +5. `dist_train.sh` 调用 `python3 tools/train.py ...` + + - `train.py` 中调用 `modify_mmengine_config()` 加载ymir平台超参数、自动配置预训练模型、添加tensorboard功能、添加ymir进度监控hook等。 + +### 推理 + +1. 启动镜像时调用 `bash /usr/bin/start.sh` + +2. `start.sh` 调用 `python3 ymir/start.py` + +3. `start.py` 调用 `python3 ymir/ymir_infer.py` + + - 调用 `init_detector()` 与 `inference_detector()` 获取推理结果 + + - 调用 `mmdet_result_to_ymir()` 将mmdet推理结果转换为ymir格式 + + - 调用 `rw.write_infer_result()` 保存推理结果 + +### 挖掘 + +1. 启动镜像时调用 `bash /usr/bin/start.sh` + +2. `start.sh` 调用 `python3 ymir/start.py` + +3. `start.py` 调用 `python3 ymir/ymir_mining.py` + + - 调用 `init_detector()` 与 `inference_detector()` 获取推理结果 + + - 调用 `compute_score()` 计算挖掘分数 + + - 调用 `rw.write_mining_result()` 保存挖掘结果 + +## 制作镜像 det/mmyolo:tmi + +- [ymir/Dockerfile](https://github.com/modelai/ymir-mmyolo/tree/ymir/ymir/Dockerfile) + +``` +docker build -t det/mmyolo:tmi -f ymir/Dockerfile . +``` diff --git a/docs/algorithms/yolov5.md b/docs/algorithms/yolov5.md new file mode 100644 index 0000000..19bcea0 --- /dev/null +++ b/docs/algorithms/yolov5.md @@ -0,0 +1,113 @@ +# yolov5 代码库简介 + +## 安装 + +``` +git clone https://github.com/ultralytics/yolov5 # clone +cd yolov5 +pip install -r requirements.txt # install +``` + +## 训练 + +``` +python train.py --data coco.yaml --epochs 300 --weights '' --cfg yolov5n.yaml --batch-size 128 + yolov5s 64 + yolov5m 40 + yolov5l 24 + yolov5x 16 +``` + +## 推理 + +``` +python detect.py --weights yolov5s.pt --source 0 # webcam + img.jpg # image + vid.mp4 # video + screen # screenshot + path/ # directory + list.txt # list of images + list.streams # list of streams + 'path/*.jpg' # glob + 'https://youtu.be/Zgi9g1ksQHc' # YouTube + 'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream +``` + +## 数据集 + +参考[yolov5自定义数据集](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data#11-create-datasetyaml) + +- 数据集配置文件 `dataset.yaml` + +yolov5通过读取yaml配置文件,获得数据集的以下信息: + + - path: 数据集的根目录 + + - train: 训练集划分,可以是一个目录,也可以是一个索引文件,或者是一个列表 + + - val: 验证集划分 + + - test: 测试集划分 + + - names: 数据集的类别信息 + +``` +# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] +path: ../datasets/coco128 # dataset root dir +train: images/train2017 # train images (relative to 'path') 128 images +val: images/train2017 # val images (relative to 'path') 128 images +test: # test images (optional) + +# Classes (80 COCO classes) +names: + 0: person + 1: bicycle + 2: car + ... + 77: teddy bear + 78: hair drier + 79: toothbrush + +``` + +- 数据集划分索引文件 + +每行均为图像文件的路径,示例如下: +``` +coco128/images/im0.jpg +coco128/images/im1.jpg +coco128/images/im2.jpg +``` + +- 标注文件 + + - 标注文件的路径通过图像文件的路径进行替换得到,会将其中的 `/images/` 替换为 `/labels/`, 文件后辍替换为 `.txt`, 具体代码如下: + + ``` + def img2label_paths(img_paths): + # Define label paths as a function of image paths + sa, sb = f'{os.sep}images{os.sep}', f'{os.sep}labels{os.sep}' # /images/, /labels/ substrings + return [sb.join(x.rsplit(sa, 1)).rsplit('.', 1)[0] + '.txt' for x in img_paths] + ``` + + - 标注文件采用txt格式, 每行为一个标注框,采用 `class_id x_center y_center width height` 的格式, 以空格进行分割。 + + ![](../imgs/yolov5_ann_format.jpg) + + - `class_id`: 表示标注框所属类别的整数,从0开始计数 + + - `x_center`: 归一化后标注框的中心 x 坐标,浮点数,取值范围为[0, 1] + + - `y_center`: 归一化后标注框的中心 y 坐标,浮点数,取值范围为[0, 1] + + - `width`: 归一化后的标注框宽度,浮点数,取值范围为[0, 1] + + - `height`: 归一化后的标注框亮度,浮点数,取值范围为[0, 1] + + - 标注文件内容示例如下: + + ``` + 0 0.48 0.63 0.69 0.71 + 0 0.74 0.52 0.31 0.93 + 4 0.36 0.79 0.07 0.40 + ``` diff --git a/docs/algorithms/yolov8.md b/docs/algorithms/yolov8.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/common_image_error.md b/docs/common_image_error.md new file mode 100644 index 0000000..628f111 --- /dev/null +++ b/docs/common_image_error.md @@ -0,0 +1,11 @@ +# 常见自制镜像错误 + +## 训练镜像 + +- 写训练结果精度时数据格式为tensor或numpy等,而不是基本的float +``` +result = torch.tensor(0.5) +evaluation_result = dict(mAP=result) # 应改为 evaluation_result = dict(mAP=result.item()) + +yaml.representer.RepresenterError: ('cannot represent an object', 0.39) +``` diff --git a/docs/debug.png b/docs/debug.png new file mode 100644 index 0000000..e439ca6 Binary files /dev/null and b/docs/debug.png differ diff --git a/docs/design_doc/ymir_call_image.md b/docs/design_doc/ymir_call_image.md new file mode 100644 index 0000000..5d95d3b --- /dev/null +++ b/docs/design_doc/ymir_call_image.md @@ -0,0 +1,115 @@ +# 设计文档 + +本文档介绍ymir平台调用镜像的过程 + +- 镜像是ymir平台的核心组件,ymir平台通过调用镜像实现模型训练,推理及挖掘功能。 + +- 用户可以通过使用不同的镜像,满足不同场景的速度、精度及部署要求。 + +![ ](../imgs/ymir-design.png) + +## 新增镜像 + +- 即 `镜像管理/我的镜像/新增镜像` + +- 新增镜像时,ymir平台将根据镜像地址解析镜像中/img-man下的四个yaml文件 + +``` +training-template.yaml +mining-template.yaml +infer-template.yaml +manifest.yaml +``` + +- 如果镜像地址在本地不存在, 将会调用 `docker pull` 进行下载, 由于docker hub在国外,有时可能因网络原因而失败。可以尝试[镜像代理加速](https://dockerproxy.com/) + +- training-template.yaml 中包含训练任务的默认超参数 + +- mining-template.yaml 中包含挖掘任务的默认超参数 + +- infer-template.yaml 中包含推理任务的默认超参数 + +- manifest.yaml 指明镜像的目标类型,是目标检测、语义分割还是实例分割镜像 + +## 模型训练 + +- 在ymir平台选择好训练镜像, 训练集, 训练目标, 验证集, 预训练模型, GPU数量 及 超参数。 ymir平台将建立 in 与 out 目录, 并挂载到镜像中的 /in 与 /out 目录 + +### in 目录内容 + +- 训练集与验证集图片:训练集与验证集的图片均会通过软链接的方式将根目录链接到 in/assets, 对应图片的路径输出到 in/train-index.tsv 与 in/val-index.tsv + +- 训练集与验证集标注:ymir平台将根据超参数中的`export_format`决定标注的格式, 参考[数据集格式](../overview/dataset-format.md) + +- 训练目标:ymir平台将训练目标附加到超参数中的 `class_names` 字段 + +- GPU数量:ymir平台自动选择空闲的GPU(如显存占用率<30%),进行gpu_id映射后, 附加到超参数中的 `gpu_ids` 字段 + +- 预训练模型:ymir平台会将预训练模型解压到 in/models, 并将其中的文件路径附加到超参数中的 `pretrained_model_params` 字段 + +- 超参数:ymir平台将在网页上显示 training-template.yaml 中的默认超参数, 用户修改后,将与上面附加的字段一起保存到 in/config.yaml + +### out 目录内容 + +- 镜像运行需要产生 /out/monitor.txt, /out/tensorboard/xxx 及 /out/models/result.yaml + +- models目录中存在模型权重及训练结果文件/out/models/result.yaml, ymir平台将依据result.yaml打包权重, 显示模型精度。 + +- ymir平台将链接tensorboard目录到 out/tensorboard, 镜像的训练日志需要保存在此 + +- 镜像训练的进度(0到1之间)需要实时写到/out/monitor.txt, ymir平台依此在页面上显示进度,估计剩余时间。 + +### in out 目录内容示例 + +``` +. +├── in +│ ├── annotations [257 entries exceeds filelimit, not opening dir] +│ ├── assets -> /home/ymir/ymir/ymir-workplace/sandbox/0001/training_asset_cache +│ ├── config.yaml +│ ├── env.yaml +│ ├── models +│ ├── train-index.tsv +│ └── val-index.tsv +├── out +│ ├── models [29 entries exceeds filelimit, not opening dir] +│ ├── monitor.txt +│ ├── tensorboard -> /home/ymir/ymir/ymir-workplace/ymir-tensorboard-logs/0001/t00000010000028774b61663839849 +│ └── ymir-executor-out.log +└── task_config.yaml +``` + +### result.yaml 示例 +``` +best_stage_name: epoch2 +mAP: 0.5509647407646582 +model_stages: + epoch1: + files: + - epoch1.pt + mAP: 0.2869113044394813 + stage_name: epoch1 + timestamp: 1663839980 + epoch2: + files: + - epoch2.pt + mAP: 0.5509647407646582 + stage_name: epoch2 + timestamp: 1663840020 +``` + +## 模型推理与挖掘 + +- 大致上与模型训练类似, 下面是不同点 + +### in 目录内容 + +- 推理集/挖掘集图片:图片均会通过软链接的方式将根目录链接到 in/assets, 对应径均输出到 in/candidate-index.tsv + +- 预训练模型:ymir平台会将预训练模型解压到 in/models, 并将其中的文件路径附加到超参数中的 `model_params_path` 字段 + +### out 目录内容 + +- 模型推理需要产生 /out/monitor.txt 与 /out/infer-result.json + +- 模型挖掘需要产生 /out/monitor.txt 与 /out/result.tsv diff --git a/docs/failed_tensorboard_task_id.png b/docs/failed_tensorboard_task_id.png new file mode 100644 index 0000000..e34d899 Binary files /dev/null and b/docs/failed_tensorboard_task_id.png differ diff --git a/docs/failed_training_task.png b/docs/failed_training_task.png new file mode 100644 index 0000000..861aef7 Binary files /dev/null and b/docs/failed_training_task.png differ diff --git a/docs/fast_custom/custom_hyper_parameter.md b/docs/fast_custom/custom_hyper_parameter.md new file mode 100644 index 0000000..eaab406 --- /dev/null +++ b/docs/fast_custom/custom_hyper_parameter.md @@ -0,0 +1,101 @@ +# 修改镜像超参数 + +## ymir后台如何获取镜像超参数 + +- 通过解析镜像中 `/img-man/training-template.yaml` 获得训练的超参数, 若文件不存在则标记镜像不支持训练。 + +- 通过解析镜像中 `/img-man/infer-template.yaml` 获得推理的超参数,若文件不存在则标记镜像不支持推理。 + +- 通过解析镜像中 `/img-man/mining-template.yaml` 获得挖掘的超参数,若文件不存在则标记镜像不支持挖掘。 + +以 `youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi` 为例,可执行以下命令查看镜像对应超参数 + +``` +docker run --rm youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi cat /img-man/training-template.yaml + +# 输出结果 +# training template for your executor app +# after build image, it should at /img-man/training-template.yaml +# key: gpu_id, task_id, pretrained_model_params, class_names should be preserved + +# gpu_id: '0' +# task_id: 'default-training-task' +# pretrained_model_params: [] +# class_names: [] + +shm_size: '128G' +export_format: 'ark:raw' +model: 'yolov5s' +batch_size_per_gpu: 16 +num_workers_per_gpu: 4 +epochs: 100 +img_size: 640 +opset: 11 +args_options: '--exist-ok' +save_best_only: True # save the best weight file only +save_period: 10 +sync_bn: False # work for multi-gpu only +ymir_saved_file_patterns: '' # custom saved files, support python regular expression, use , to split multiple pattern +``` + +注:同名镜像在后台更新超参数配置文件如 `/img-man/training-template.yaml` 后,需要在 ymir 网页端重新添加,使超参数配置生效。 + +## 如何更新镜像默认的超参数 + +准备以下文件与对应内容: + +- training-template.yaml + + ``` + model: 'yolov5n' # change from yolov5s --> yolov5n + batch_size_per_gpu: 2 # change from 16 --> 2 + num_workers_per_gpu: 2 # change from 4 --> 2 + epochs: 100 + img_size: 640 + opset: 12 # change from 11 --> 12 + args_options: '--exist-ok' + save_best_only: True # save the best weight file only + save_period: 10 + sync_bn: False # work for multi-gpu only + ``` + +- zzz.dockerfile + +``` +FROM youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi + +COPY ./training-template.yaml /img-man/training-template.yaml + +CMD bash /usr/bin/start.sh +``` + +- 执行构建命令即可获得新镜像 `youdaoyzbx/ymir-executor:ymir2.0.1-yolov5-cu111-tmi` + +``` +docker build -t youdaoyzbx/ymir-executor:ymir2.0.1-yolov5-cu111-tmi . -f zzz.dockerfile +``` + +## 如何增加或删除镜像的超参数 + +准备以下文件与对应代码, 以修改镜像中 `/app/start.py` 为例 + +- training-template.yaml + +- start.py: 修改该文件内容,处理增加或删除的超参数 + +- zzz.dockerfile + +``` +FROM youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi + +COPY ./training-template.yaml /img-man/training-template.yaml +COPY ./start.py /app/start.py + +CMD bash /usr/bin/start.sh +``` + +- 执行构建命令即可获得新镜像 `youdaoyzbx/ymir-executor:ymir2.0.2-yolov5-cu111-tmi` + +``` +docker build -t youdaoyzbx/ymir-executor:ymir2.0.2-yolov5-cu111-tmi . -f zzz.dockerfile +``` diff --git a/docs/image_community/det-detectron2-tmi.md b/docs/image_community/det-detectron2-tmi.md new file mode 100644 index 0000000..e4d185d --- /dev/null +++ b/docs/image_community/det-detectron2-tmi.md @@ -0,0 +1,101 @@ +# ymir-detectron2 镜像说明文档 + +## 代码仓库 + +> 参考[facebook/detectron2](https://github.com/facebookresearch/detectron2) +- [modelai/ymir-detectron2](https://github.com/modelai/ymir-detectron2) + +## 镜像地址 +``` +youdaoyzbx/ymir-exectutor:ymir2.0.0-detectron2-cu111-tmi +``` + +## 性能表现 + +> 数据参考[detectron2/Model Zoo](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Namelr
    sched
    train
    time
    (s/iter)
    inference
    time
    (s/im)
    train
    mem
    (GB)
    box
    AP
    model iddownload
    R501x0.2050.0414.137.4190397773model | metrics
    R503x0.2050.0414.138.7190397829model | metrics
    R1013x0.2910.0545.240.4190397697model | metrics
    + + +## 训练参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| batch_size | 2 | 整数 | batch size 大小 | - | +| config_file | configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml | 文件路径 | 配置文件路径 | 参考 [configs/COCO-Detection](https://github.com/modelai/ymir-detectron2/tree/ymir/configs/COCO-Detection) | +| max_iter | 90000 | 整数 | 最大训练次数 | - | +| learning_rate | 0.001 | 浮点数 | 学习率 | - | +| args_options | '' | 字符串 | 命令行参数 | 参考 [default_argument_parser](https://github.com/modelai/ymir-detectron2/blob/ymir/detectron2/engine/defaults.py) | + +## 推理参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| conf_threshold | 0.2 | 浮点数 | 置信度阈值 | 采用默认值 | + +## 挖掘参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| conf_threshold | 0.2 | 浮点数 | 置信度阈值 | 采用默认值 | + + +## 引用 +``` +@misc{wu2019detectron2, + author = {Yuxin Wu and Alexander Kirillov and Francisco Massa and + Wan-Yen Lo and Ross Girshick}, + title = {Detectron2}, + howpublished = {\url{https://github.com/facebookresearch/detectron2}}, + year = {2019} +} +``` diff --git a/docs/image_community/det-mmdet-tmi.md b/docs/image_community/det-mmdet-tmi.md new file mode 100644 index 0000000..d6a6ca9 --- /dev/null +++ b/docs/image_community/det-mmdet-tmi.md @@ -0,0 +1,90 @@ +# ymir-mmdetection 镜像说明文档 + +## 仓库地址 + +> 参考[mmdetection](https://github.com/open-mmlab/mmdetection) + +- [det-mmdetection-tmi](https://github.com/modelai/ymir-executor-fork/det-mmdetection-tmi) + +## 镜像地址 +``` +youdaoyzbx/ymir-executor:ymir2.0.0-mmdet-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.2-mmdet-cu111-tmi +``` + +## 性能表现 + +> 参考[mmdetection官方数据](https://github.com/open-mmlab/mmdetection/blob/master/configs/yolox/README.md) + +| Backbone | size | Mem (GB) | box AP | Config | Download | +| :--------: | :--: | :------: | :----: | :-------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| YOLOX-tiny | 416 | 3.5 | 32.0 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/yolox/yolox_tiny_8x8_300e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_tiny_8x8_300e_coco/yolox_tiny_8x8_300e_coco_20211124_171234-b4047906.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_tiny_8x8_300e_coco/yolox_tiny_8x8_300e_coco_20211124_171234.log.json) | +| YOLOX-s | 640 | 7.6 | 40.5 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/yolox/yolox_s_8x8_300e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711.log.json) | +| YOLOX-l | 640 | 19.9 | 49.4 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/yolox/yolox_l_8x8_300e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco_20211126_140236.log.json) | +| YOLOX-x | 640 | 28.1 | 50.9 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/yolox/yolox_x_8x8_300e_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254.log.json) | + +**说明**: + +1. The test score threshold is 0.001, and the box AP indicates the best AP. +2. Due to the need for pre-training weights, we cannot reproduce the performance of the `yolox-nano` model. Please refer to https://github.com/Megvii-BaseDetection/YOLOX/issues/674 for more information. +3. We also trained the model by the official release of YOLOX based on [Megvii-BaseDetection/YOLOX#735](https://github.com/Megvii-BaseDetection/YOLOX/issues/735) with commit ID [38c633](https://github.com/Megvii-BaseDetection/YOLOX/tree/38c633bf176462ee42b110c70e4ffe17b5753208). We found that the best AP of `YOLOX-tiny`, `YOLOX-s`, `YOLOX-l`, and `YOLOX-x` is 31.8, 40.3, 49.2, and 50.9, respectively. The performance is consistent with that of our re-implementation (see Table above) but still has a gap (0.3~0.8 AP) in comparison with the reported performance in their [README](https://github.com/Megvii-BaseDetection/YOLOX/blob/38c633bf176462ee42b110c70e4ffe17b5753208/README.md#benchmark). + + +## 训练参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| config_file | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| export_format | ark:raw | 字符串| 受ymir后台处理,ymir数据集导出格式 | - | +| config_file | configs/yolox/yolox_tiny_8x8_300e_coco.py | 文件路径 | mmdetection配置文件 | 建议采用yolox系列, 参考[det-mmdetection-tmi/configs](https://github.com/modelai/ymir-executor-fork/tree/master/det-mmdetection-tmi/configs) | +| samples_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| max_epochs | 100 | 整数 | 整个数据集的训练遍历次数 | 建议:必要时分析tensorboard确定是否有必要改变,一般采用默认值即可 | +| args_options | '' | 字符串 | 训练命令行参数 | 参考 [det-mmdetection-tmi/tools/train.py](https://github.com/modelai/ymir-executor-fork/blob/master/det-mmdetection-tmi/tools/train.py) +| cfg_options | '' | 字符串 | 训练命令行参数 | 参考 [det-mmdetection-tmi/tools/train.py](https://github.com/modelai/ymir-executor-fork/blob/master/det-mmdetection-tmi/tools/train.py) +| metric | bbox | 字符串 | 模型评测方式 | 采用默认值即可 | +| val_interval | 1 | 整数 | 模型在验证集上评测的周期 | 设置为1,每个epoch可评测一次 | +| max_keep_checkpoints | 1 | 整数 | 最多保存的权重文件数量 | 设置为k, 可保存k个最优权重和k个最新的权重文件,设置为-1可保存所有权重文件。 + +**说明** +1. config_file 可查看[det-mmdetection-tmi/configs](https://github.com/modelai/ymir-executor-fork/tree/master/det-mmdetection-tmi/configs)进行选择 + + +## 推理参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| conf_threshold | 0.2 | 浮点数 | 推理结果置信度过滤阈值 | 设置为0可保存所有结果,设置为0.6可过滤大量结果 | +| cfg_options | '' | 字符串 | 训练命令行参数 | 参考 [det-mmdetection-tmi/tools/train.py](https://github.com/modelai/ymir-executor-fork/blob/master/det-mmdetection-tmi/tools/train.py) + +**说明** +1. 由于没有采用批量推理技术,因此没有samples_per_gpu和workers_per_gpu选项 + + +## 挖掘参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| mining_algorithm | aldd | 字符串 | 挖掘算法可选 aldd, cald, entropy 和 random | 单类建议采用aldd, 多类检测建议采用entropy | +| cfg_options | '' | 字符串 | 训练命令行参数 | 参考 [det-mmdetection-tmi/tools/train.py](https://github.com/modelai/ymir-executor-fork/blob/master/det-mmdetection-tmi/tools/train.py) + +**说明** +1. class_distribution_scores 一些复杂的参数在此不做说明 +2. 由于没有采用批量推理技术,因此没有samples_per_gpu和workers_per_gpu选项 + +## 论文引用 + +```latex +@article{yolox2021, + title={{YOLOX}: Exceeding YOLO Series in 2021}, + author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian}, + journal={arXiv preprint arXiv:2107.08430}, + year={2021} +} +``` diff --git a/docs/image_community/det-nanodet-tmi.md b/docs/image_community/det-nanodet-tmi.md new file mode 100644 index 0000000..2b68296 --- /dev/null +++ b/docs/image_community/det-nanodet-tmi.md @@ -0,0 +1,88 @@ +# ymir-nanodet 镜像说明文档 + +> Super fast and high accuracy lightweight anchor-free object detection model. Real-time on mobile devices. + +## 代码仓库 + +> 参考[RangiLyu/nanodet](https://github.com/RangiLyu/nanodet) +- [modelai/ymir-nanodet](https://github.com/modelai/ymir-nanodet) + +## 镜像地址 +``` +youdaoyzbx/ymir-executor:ymir2.0.0-nanodet-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.2-nanodet-cu111-tmi +``` + +## 性能说明 + +> 参考[RangiLyu/nanodet](https://github.com/RangiLyu/nanodet) + +Model |Resolution| mAPval
    0.5:0.95 |CPU Latency
    (i7-8700) |ARM Latency
    (4xA76) | FLOPS | Params | Model Size +:-------------:|:--------:|:-------:|:--------------------:|:--------------------:|:----------:|:---------:|:-------: +NanoDet-m | 320*320 | 20.6 | **4.98ms** | **10.23ms** | **0.72G** | **0.95M** | **1.8MB(FP16)** | **980KB(INT8)** +**NanoDet-Plus-m** | 320*320 | **27.0** | **5.25ms** | **11.97ms** | **0.9G** | **1.17M** | **2.3MB(FP16)** | **1.2MB(INT8)** +**NanoDet-Plus-m** | 416*416 | **30.4** | **8.32ms** | **19.77ms** | **1.52G** | **1.17M** | **2.3MB(FP16)** | **1.2MB(INT8)** +**NanoDet-Plus-m-1.5x** | 320*320 | **29.9** | **7.21ms** | **15.90ms** | **1.75G** | **2.44M** | **4.7MB(FP16)** | **2.3MB(INT8)** +**NanoDet-Plus-m-1.5x** | 416*416 | **34.1** | **11.50ms** | **25.49ms** | **2.97G** | **2.44M** | **4.7MB(FP16)** | **2.3MB(INT8)** +YOLOv3-Tiny | 416*416 | 16.6 | - | 37.6ms | 5.62G | 8.86M | 33.7MB +YOLOv4-Tiny | 416*416 | 21.7 | - | 32.81ms | 6.96G | 6.06M | 23.0MB +YOLOX-Nano | 416*416 | 25.8 | - | 23.08ms | 1.08G | 0.91M | 1.8MB(FP16) +YOLOv5-n | 640*640 | 28.4 | - | 44.39ms | 4.5G | 1.9M | 3.8MB(FP16) +FBNetV5 | 320*640 | 30.4 | - | - | 1.8G | - | - +MobileDet | 320*320 | 25.6 | - | - | 0.9G | - | - + +***Download pre-trained models and find more models in [Model Zoo](#model-zoo) or in [Release Files](https://github.com/RangiLyu/nanodet/releases)*** + + +## 训练参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| export_format | ark:raw | 字符串| 受ymir后台处理,ymir数据集导出格式 | - | +| batch_size_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| config_file | config/nanodet-plus-m_416.yml | 文件路径 | 配置文件路径 | 参考[config](https://github.com/modelai/ymir-nanodet/tree/ymir-dev/config) | +| epochs | 100 | 整数 | 整个数据集的训练遍历次数 | 建议:必要时分析tensorboard确定是否有必要改变,一般采用默认值即可 | +| input_size | -1 | 整数 | 输入模型的图像分辨率 | -1表示采用config_file中定义的图像大小 | +| learning_rate | -1 | 浮点数 | 学习率 | -1表示采用config_file中定义的学习率 +| resume | False | 布尔型 | 是否继续训练 | 设置为True可实现提前中断与继续训练功能 | +| load_from | '' | 文件路径 | 加载权重位置 | 设置后可加载指定位置的权重文件 | + + +## 推理参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| batch_size_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| conf_thres | 0.35 | 浮点数 | 置信度阈值 | - | +| pin_memory | False | 布尔型 | 是否为数据集单独固定内存? | 内存充足时改为True可加快数据集加载 | + + +## 挖掘参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| batch_size_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| conf_thres | 0.35 | 浮点数 | 置信度阈值 | - | +| pin_memory | False | 布尔型 | 是否为数据集单独固定内存? | 内存充足时改为True可加快数据集加载 | + +**说明** +1. nanodet仅支持aldd挖掘算法 + + +## 引用 + +``` +@misc{=nanodet, + title={NanoDet-Plus: Super fast and high accuracy lightweight anchor-free object detection model.}, + author={RangiLyu}, + howpublished = {\url{https://github.com/RangiLyu/nanodet}}, + year={2021} +} +``` diff --git a/docs/image_community/det-vidt-tmi.md b/docs/image_community/det-vidt-tmi.md new file mode 100644 index 0000000..8b8f09b --- /dev/null +++ b/docs/image_community/det-vidt-tmi.md @@ -0,0 +1,65 @@ +# ymir-vidt 镜像说明文档 + +ICLR 2022的 transformer 架构检测器 + +## 代码仓库 + +> 参考[naver-ai/vidt](https://github.com/naver-ai/vidt) +- [modelai/ymir-vidt](https://github.com/modelai/ymir-vidt) + +## 镜像地址 +``` +youdaoyzbx/ymir-executor:ymir2.0.0-vidt-cu111-tmi +``` + +## 性能表现 + +> 数据参考[naver-ai/vidt](https://github.com/naver-ai/vidt) + +| Backbone | Epochs | AP | AP50 | AP75 | AP_S | AP_M | AP_L | Params | FPS | Checkpoint / Log | +| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | +| `Swin-nano` | 50 (150) | 40.4 (42.6) | 59.9 (62.2) | 43.0 (45.7) | 23.1 (24.9) | 42.8 (45.4) | 55.9 (59.1) | 16M | 20.0 | [Github](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_nano_50.pth) / [Log](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_nano_50.txt)
    ([Github](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_nano_150.pth) / [Log](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_nano_150.txt))| +| `Swin-tiny` | 50 (150)| 44.9 (47.2) | 64.7 (66.7) | 48.3 (51.4) | 27.5 (28.4) | 47.9 (50.2) | 61.9 (64.7) | 38M | 17.2 | [Github](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_tiny_50.pth) / [Log](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_tiny_50.txt)
    ([Github](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_tiny_150.pth) / [Log](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_tiny_150.txt))| +| `Swin-small` | 50 (150) | 47.4 (48.8) | 67.7 (68.8) | 51.2 (53.0) | 30.4 (30.7) | 50.7 (52.0) | 64.6 (65.9) | 60M | 12.1 | [Github](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_small_50.pth) / [Log](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_small_50.txt)
    ([Github](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_small_150.pth) / [Log](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_small_150.txt))| +| `Swin-base` | 50 (150) | 49.4 (50.4) | 69.6 (70.4) | 53.4 (54.8) | 31.6 (34.1) | 52.4 (54.2) | 66.8 (67.4) | 0.1B | 9.0 | [Github](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_base_50.pth) / [Log](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_base_50.txt)
    ([Github](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_base_150.pth) / [Log](https://github.com/naver-ai/vidt/releases/download/v0.1-vidt/vidt_base_150.txt)) | + + +## 训练参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| export_format | ark:raw | 字符串| 受ymir后台处理,ymir数据集导出格式 | - | +| backbone_name | swin_nano | 字符串 | 骨架网络,可选swin_nano, swin_tiny, swin_small, swin_base | - | +| batch_size_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| num_workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| epochs | 50 | 整数 | 整个数据集的训练遍历次数 | 建议:必要时分析tensorboard确定是否有必要改变,一般采用默认值即可 | +| learning_rate | 0.0001 | 浮点数 | 学习率 | - | +| eval_size | 640 | 整数 | 输入网络的图片大小 | - | +| weight_save_interval | 100 | 整数 | 权重文件保存间隔 | - | +| args_options | '' | 字符串 | 命令行参数 | 参考 [get_args_parser](https://github.com/modelai/ymir-vidt/blob/ymir-dev/arguments.py) | + +## 推理参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| conf_threshold | 0.2 | 浮点数 | 置信度阈值 | 采用默认值 | + +## 挖掘参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| conf_threshold | 0.2 | 浮点数 | 置信度阈值 | 采用默认值 | + +## 引用 +``` +@inproceedings{song2022vidt, + title={ViDT: An Efficient and Effective Fully Transformer-based Object Detector}, + author={Song, Hwanjun and Sun, Deqing and Chun, Sanghyuk and Jampani, Varun and Han, Dongyoon and Heo, Byeongho and Kim, Wonjae and Yang, Ming-Hsuan}, + booktitle={International Conference on Learning Representation}, + year={2022} +} +``` diff --git a/docs/image_community/det-yolov4-tmi.md b/docs/image_community/det-yolov4-tmi.md new file mode 100644 index 0000000..f07698f --- /dev/null +++ b/docs/image_community/det-yolov4-tmi.md @@ -0,0 +1,67 @@ +# ymir-yolov4 镜像说明文档 + +## 仓库地址 + +> 参考仓库 [AlexeyAB/darknet](https://github.com/AlexeyAB/darknet) +- [det-yolov4-tmi](https://github.com/modelai/ymir-executor-fork/tree/master/det-yolov4-tmi) + +## 镜像地址 +``` +youdaoyzbx/ymir-executor:ymir2.0.0-yolov4-cu112-tmi +``` + +## 性能表现 + +> 参考文档 [yolov4 model zoo](https://github.com/AlexeyAB/darknet/wiki/YOLOv4-model-zoo) + +| model | size | mAP@0.5:0.95 | mAP@0.5 | +| - | - | - | - | +| yolov4 | 608 | 43.5 | 65.7 | +| yolov4-Leaky | 608 | 42.9 | 65.3 | +| yolov4-Mish | 608 | 43.8 | 65.6 | + +## 训练参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| export_format | ark:raw | 字符串| 受ymir后台处理,ymir数据集导出格式 | - | +| image_height | 608 | 整数 | 输入网络的图像高度 | 采用 32的整数倍,如416, 512, 608 | +| image_width | 608 | 整数 | 输入网络的图像宽度 | 采用 32的整数倍,如416, 512, 608 | +| learning_rate | 0.0013 | 浮点数 | 学习率 | 采用默认值即可 | +| max_batches | 20000 | 整数 | 训练次数 | 如要减少训练时间,可减少max_batches | +| warmup_iterations | 1000 | 整数 | 预热训练次数 | 采用默认值即可 | +| batch | 64 | 整数 | 累计梯度的批处理大小,即batch size | 采用默认值即可 | +| subdivisions | 64 | 整数 | 累计梯度的次数 | 需要是batch参数的因数,如32。其中64表示一次加载一张图片,累计梯度64次;32表示一次加载两张图片,共累计32次。实际的batch size均为64。| + +**说明** +1. 过于复杂的参数anchors不做说明,保持默认即可 + + +## 推理参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| image_height | 608 | 整数 | 输入网络的图像高度 | 采用 32的整数倍,如416, 512, 608 | +| image_width | 608 | 整数 | 输入网络的图像宽度 | 采用 32的整数倍,如416, 512, 608 | +| confidence_thresh | 0.1 | 浮点数 | 置信度阈值 | - | +| nms_thresh | 0.45 | 浮点数 | nms时的iou阈值 | - | +| max_boxes | 50 | 整数 | 每张图像最多检测的目标数量 | - | + +## 挖掘参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| data_workers | 28 | 整数 | 读取数据时使用的进程数量 | - | +| strategy | aldd_yolo | 字符串 | 挖掘算法 | - | +| image_height | 608 | 整数 | 输入网络的图像高度 | 采用 32的整数倍,如416, 512, 608 | +| image_width | 608 | 整数 | 输入网络的图像宽度 | 采用 32的整数倍,如416, 512, 608 | +| batch_size | 4 | 整数 | 批处理大小 | - | +| confidence_thresh | 0.1 | 浮点数 | 置信度阈值 | - | +| nms_thresh | 0.45 | 浮点数 | nms时的iou阈值 | - | +| max_boxes | 50 | 整数 | 每张图像最多检测的目标数量 | - | diff --git a/docs/image_community/det-yolov5-automl-tmi.md b/docs/image_community/det-yolov5-automl-tmi.md new file mode 100644 index 0000000..f07fbb7 --- /dev/null +++ b/docs/image_community/det-yolov5-automl-tmi.md @@ -0,0 +1,37 @@ +# ymir-yolov5 automl 镜像说明文档 + +## 仓库地址 + +> 参考[ultralytics/yolov5](https://github.com/ultralytics/yolov5) +- [modelai/ymir-yolov5](https://github.com/modelai/ymir-yolov5/tree/ymir-automl) + +## 镜像地址 + +``` +youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu102-tmi +``` + +## 性能表现 + +|Model |size
    (pixels) |mAPval
    0.5:0.95 |mAPval
    0.5 |Speed
    CPU b1
    (ms) |Speed
    V100 b1
    (ms) |Speed
    V100 b32
    (ms) |params
    (M) |FLOPs
    @640 (B) +|--- |--- |--- |--- |--- |--- |--- |--- |--- +|[YOLOv5n] |640 |28.0 |45.7 |**45** |**6.3**|**0.6**|**1.9**|**4.5** +|[YOLOv5s] |640 |37.4 |56.8 |98 |6.4 |0.9 |7.2 |16.5 +|[YOLOv5m] |640 |45.4 |64.1 |224 |8.2 |1.7 |21.2 |49.0 +|[YOLOv5l] |640 |49.0 |67.3 |430 |10.1 |2.7 |46.5 |109.1 +|[YOLOv5x] |640 |50.7 |68.9 |766 |12.1 |4.8 |86.7 |205.7 +| | | | | | | | | +|[YOLOv5n6] |1280 |36.0 |54.4 |153 |8.1 |2.1 |3.2 |4.6 +|[YOLOv5s6] |1280 |44.8 |63.7 |385 |8.2 |3.6 |16.8 |12.6 +|[YOLOv5m6] |1280 |51.3 |69.3 |887 |11.1 |6.8 |35.7 |50.0 +|[YOLOv5l6] |1280 |53.7 |71.3 |1784 |15.8 |10.5 |76.8 |111.4 + +## 训练/推理/挖掘参数 + + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| fast | true | 布尔型 | True表示要求速度快 | True, true, False, false 大写小均支持 | +| accurate | true | 布尔型 | True表示要求精度高 | True, true, False, false 大写小均支持 | diff --git a/docs/image_community/det-yolov5-tmi.md b/docs/image_community/det-yolov5-tmi.md new file mode 100644 index 0000000..f4f3475 --- /dev/null +++ b/docs/image_community/det-yolov5-tmi.md @@ -0,0 +1,80 @@ +# ymir-yolov5 镜像说明文档 + + +## 仓库地址 + +> 参考[ultralytics/yolov5](https://github.com/ultralytics/yolov5) +- [modelai/ymir-executor-fork](https://github.com/modelai/ymir-executor-fork/tree/master/det-yolov5-tmi) + +## 镜像地址 + +``` +youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu102-tmi +``` + +## 性能表现 + +|Model |size
    (pixels) |mAPval
    0.5:0.95 |mAPval
    0.5 |Speed
    CPU b1
    (ms) |Speed
    V100 b1
    (ms) |Speed
    V100 b32
    (ms) |params
    (M) |FLOPs
    @640 (B) +|--- |--- |--- |--- |--- |--- |--- |--- |--- +|[YOLOv5n] |640 |28.0 |45.7 |**45** |**6.3**|**0.6**|**1.9**|**4.5** +|[YOLOv5s] |640 |37.4 |56.8 |98 |6.4 |0.9 |7.2 |16.5 +|[YOLOv5m] |640 |45.4 |64.1 |224 |8.2 |1.7 |21.2 |49.0 +|[YOLOv5l] |640 |49.0 |67.3 |430 |10.1 |2.7 |46.5 |109.1 +|[YOLOv5x] |640 |50.7 |68.9 |766 |12.1 |4.8 |86.7 |205.7 +| | | | | | | | | +|[YOLOv5n6] |1280 |36.0 |54.4 |153 |8.1 |2.1 |3.2 |4.6 +|[YOLOv5s6] |1280 |44.8 |63.7 |385 |8.2 |3.6 |16.8 |12.6 +|[YOLOv5m6] |1280 |51.3 |69.3 |887 |11.1 |6.8 |35.7 |50.0 +|[YOLOv5l6] |1280 |53.7 |71.3 |1784 |15.8 |10.5 |76.8 |111.4 + +## 训练参数 + + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| export_format | ark:raw | 字符串| 受ymir后台处理,ymir数据集导出格式 | - | +| model | yolov5s | 字符串 | yolov5模型,可选yolov5n, yolov5s, yolov5m, yolov5l等 | 建议:速度快选yolov5n, 精度高选yolov5l, yolov5x, 平衡选yolov5s或yolov5m | +| batch_size_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| num_workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| epochs | 100 | 整数 | 整个数据集的训练遍历次数 | 建议:必要时分析tensorboard确定是否有必要改变,一般采用默认值即可 | +| img_size | 640 | 整数 | 输入模型的图像分辨率 | - | +| opset | 11 | 整数 | onnx 导出参数 opset | 建议:一般不需要用到onnx,不必改 | +| args_options | '--exist-ok' | 字符串 | yolov5命令行参数 | 建议:专业用户可用yolov5所有命令行参数 | +| save_best_only | True | 布尔型 | 是否只保存最优模型 | 建议:为节省空间设为True即可 | +| save_period | 10 | 整数 | 保存模型的间隔 | 建议:当save_best_only为False时,可保存 `epoch/save_period` 个中间结果 +| sync_bn | False | 布尔型 | 是否同步各gpu上的归一化层 | 建议:开启以提高训练稳定性及精度 | +| activate | '' | 字符串 | 激活函数,默认为nn.Hardswish(), 参考 [pytorch激活函数](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) | 可选值: ELU, Hardswish, LeakyReLU, PReLU, ReLU, ReLU6, SiLU, ... | + +## 推理参数 + + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| img_size | 640 | 整数 | 模型的输入图像大小 | 采用32的整数倍,224 = 32*7 以上大小 | +| conf_thres | 0.25 | 浮点数 | 置信度阈值 | 采用默认值 | +| iou_thres | 0.45 | 浮点数 | nms时的iou阈值 | 采用默认值 | +| batch_size_per_gpu | 16 | 整数| 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加1倍加快训练速度 | +| num_workers_per_gpu | 4 | 整数| 每张GPU对应的数据读取进程数 | - | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| pin_memory | False | 布尔型 | 是否为数据集单独固定内存? | 内存充足时改为True可加快数据集加载 | + + +## 挖掘参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| img_size | 640 | 整数 | 模型的输入图像大小 | 采用32的整数倍,224 = 32*7 以上大小 | +| mining_algorithm | aldd | 字符串 | 挖掘算法名称,可选 random, aldd, cald, entropy | 建议单类检测采用aldd,多类检测采用entropy | +| class_distribution_scores | '' | List[float]的字符表示 | aldd算法的类别平衡参数 | 不用更改, 专业用户可根据类别占比进行调整,如对于4类检测,用 `1.0,1.0,0.1,0.2` 降低后两类的挖掘比重 | +| conf_thres | 0.25 | 浮点数 | 置信度阈值 | 采用默认值 | +| iou_thres | 0.45 | 浮点数 | nms时的iou阈值 | 采用默认值 | +| batch_size_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加1倍加快训练速度 | +| num_workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| shm_size | 128G | 字符串 | 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| pin_memory | False | 布尔型 | 是否为数据集单独固定内存? | 内存充足时改为True可加快数据集加载 | + diff --git a/docs/image_community/det-yolov7-tmi.md b/docs/image_community/det-yolov7-tmi.md new file mode 100644 index 0000000..3acf80d --- /dev/null +++ b/docs/image_community/det-yolov7-tmi.md @@ -0,0 +1,76 @@ +# ymir-yolov7 镜像说明文档 + +## 代码仓库 + +> 参考[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7) +- [modelai/ymir-yolov7](https://github.com/modelai/ymir-yolov7) + +## 镜像地址 + +``` +youdaoyzbx/ymir-executor:ymir2.1.0-yolov7-cu111-tmi +``` + +## 性能表现 + +> 数据参考[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7) + +| Model | Test Size | APtest | AP50test | AP75test | batch 1 fps | batch 32 average time | +| :-- | :-: | :-: | :-: | :-: | :-: | :-: | +| [**YOLOv7**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt) | 640 | **51.4%** | **69.7%** | **55.9%** | 161 *fps* | 2.8 *ms* | +| [**YOLOv7-X**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x.pt) | 640 | **53.1%** | **71.2%** | **57.8%** | 114 *fps* | 4.3 *ms* | +| | | | | | | | +| [**YOLOv7-W6**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6.pt) | 1280 | **54.9%** | **72.6%** | **60.1%** | 84 *fps* | 7.6 *ms* | +| [**YOLOv7-E6**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6.pt) | 1280 | **56.0%** | **73.5%** | **61.2%** | 56 *fps* | 12.3 *ms* | +| [**YOLOv7-D6**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-d6.pt) | 1280 | **56.6%** | **74.0%** | **61.8%** | 44 *fps* | 15.0 *ms* | +| [**YOLOv7-E6E**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt) | 1280 | **56.8%** | **74.4%** | **62.1%** | 36 *fps* | 18.7 *ms* | + + +## 训练参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| export_format | ark:raw | 字符串| 受ymir后台处理,ymir数据集导出格式 | - | +| model | yolov5s | 字符串 | yolov5模型,可选yolov5n, yolov5s, yolov5m, yolov5l等 | 建议:速度快选yolov5n, 精度高选yolov5l, yolov5x, 平衡选yolov5s或yolov5m | +| batch_size_per_gpu | 16 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| workers_per_gpu | 4 | 整数 | 每张GPU对应的数据读取进程数 | - | +| epochs | 100 | 整数 | 整个数据集的训练遍历次数 | 建议:必要时分析tensorboard确定是否有必要改变,一般采用默认值即可 | +| img_size | 640 | 整数 | 输入模型的图像分辨率 | - | +| args_options | '--exist-ok' | 字符串 | yolov5命令行参数 | 建议:专业用户可用yolov5所有命令行参数 | +| save_weight_file_num | 1 | 整数 | 保存最新模型的数量 | - | +| sync_bn | False | 布尔型 | 是否同步各gpu上的归一化层 | 建议:开启以提高训练稳定性及精度 | +| cfg_file | cfg/training/yolov7-tiny.yaml | 文件路径 | 模型文件路径, 对应 `--cfg` | 参考[cfg/training](https://github.com/modelai/ymir-yolov7/tree/ymir/cfg/training) | +| hyp_file | data/hyp.scratch.tiny.yaml | 文件路径 | 超参数文件路径,对应 `--hyp` | 参考[data](https://github.com/modelai/ymir-yolov7/tree/ymir/data) | +| cache_images | True | 布尔 | 是否缓存图像 | 设置为True可加快训练速度 | + + +## 推理参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| img_size | 640 | 整数 | 模型的输入图像大小 | 采用32的整数倍,224 = 32*7 以上大小 | +| conf_thres | 0.25 | 浮点数 | 置信度阈值 | 采用默认值 | +| iou_thres | 0.45 | 浮点数 | nms时的iou阈值 | 采用默认值 | + +## 挖掘参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| img_size | 640 | 整数 | 模型的输入图像大小 | 采用32的整数倍,224 = 32*7 以上大小 | +| conf_thres | 0.25 | 浮点数 | 置信度阈值 | 采用默认值 | +| iou_thres | 0.45 | 浮点数 | nms时的iou阈值 | 采用默认值 | + +## 引用 +``` +@article{wang2022yolov7, + title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors}, + author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark}, + journal={arXiv preprint arXiv:2207.02696}, + year={2022} +} +``` diff --git a/docs/image_community/image_community.md b/docs/image_community/image_community.md new file mode 100644 index 0000000..9ffe7a9 --- /dev/null +++ b/docs/image_community/image_community.md @@ -0,0 +1,33 @@ +# 镜像社区 + +- [镜像社区](http://pubimg.vesionbook.com:8110/img)的目的是共享用户之间制作的镜像,增加用户的可用镜像。 + +![](../imgs/ymir_image_community.png) + +- 用户通过ymir平台可使用与发布镜像 + +![](../imgs/ymir_publish_image.png) + +### 将镜像上传到docker hub +可以参考[runoob/docker](https://www.runoob.com/docker/docker-repository.html),其发布流程与`git`类似。 + +- 在[docker hub](https://hub.docker.com/) 上注册帐号,假设用户名 ` = youdaoyzbx` + +- 将本地镜像 `xxx/xxx:xxx` 添加别名,改为 `/xxx:xxx` 的格式 +``` +docker pull ubuntu18.04 +docker tag ubuntu:18.04 youdaoyzbx/ubuntu:18.04 +``` +- login 到docker hub并上传 +``` +docker login +docker push youdaozbx/ubuntu:18.04 +``` + +### 在ymir平台进行发布 + +- 镜像地址:填写 `/xxx:xxx`, 需要上传到docker hub + +- 填写其它信息与[参数说明](./det-yolov5-tmi.md) + +- 点击确定并等待Ymir团队审核 diff --git a/docs/image_community/seg-mmseg-tmi.md b/docs/image_community/seg-mmseg-tmi.md new file mode 100644 index 0000000..93b1fd7 --- /dev/null +++ b/docs/image_community/seg-mmseg-tmi.md @@ -0,0 +1,109 @@ +# ymir-mmsegmentation镜像说明文档 + +- 支持任务类型: 训练, 推理, 挖掘 + +- 支持算法: deeplabv3plus, fastscnn,hrnet, ocrnet 语义分割 + +- 版本信息 + +``` +python: 3.8.8 +pytorch: 1.8.0 +torchvision: 0.9.0 +cuda: 11.1 +cudnn: 8 +mmcv: 1.6.1 +mmsegmentation: 0.27.0+ +``` + +## 镜像信息 + +> 参考仓库[open-mmlab/mmsegmentation](https://github.com/open-mmlab/mmsegmentation) + +- 代码仓库[modelai/ymir-mmsegmentation](https://github.com/modelai/ymir-mmsegmentation) + +- 镜像地址 + +``` +docker pull youdaoyzbx/ymir-executor:ymir2.1.0-mmseg-cu111-tmi +``` + +## 性能表现 + +>参考 [fastscnn](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/fastscnn) + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download | +| -------- | -------- | --------- | ------: | -------- | -------------- | ----: | ------------- | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| FastSCNN | FastSCNN | 512x1024 | 160000 | 3.3 | 56.45 | 70.96 | 72.65 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/fastscnn/fast_scnn_lr0.12_8x4_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fast_scnn/fast_scnn_lr0.12_8x4_160k_cityscapes/fast_scnn_lr0.12_8x4_160k_cityscapes_20210630_164853-0cec9937.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/fast_scnn/fast_scnn_lr0.12_8x4_160k_cityscapes/fast_scnn_lr0.12_8x4_160k_cityscapes_20210630_164853.log.json) | +| HRNet | HRNetV2p-W18-Small | 512x1024 | 40000 | 1.7 | 23.74 | 73.86 | 75.91 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/hrnet/fcn_hr18s_512x1024_40k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_40k_cityscapes/fcn_hr18s_512x1024_40k_cityscapes_20200601_014216-93db27d0.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_40k_cityscapes/fcn_hr18s_512x1024_40k_cityscapes_20200601_014216.log.json) | +| HRNet | HRNetV2p-W18 | 512x1024 | 40000 | 2.9 | 12.97 | 77.19 | 78.92 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/hrnet/fcn_hr18_512x1024_40k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_40k_cityscapes/fcn_hr18_512x1024_40k_cityscapes_20200601_014216-f196fb4e.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_40k_cityscapes/fcn_hr18_512x1024_40k_cityscapes_20200601_014216.log.json) | +| HRNet | HRNetV2p-W18-Small | 512x1024 | 80000 | - | - | 75.31 | 77.48 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/hrnet/fcn_hr18s_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_80k_cityscapes/fcn_hr18s_512x1024_80k_cityscapes_20200601_202700-1462b75d.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_80k_cityscapes/fcn_hr18s_512x1024_80k_cityscapes_20200601_202700.log.json) | +| HRNet | HRNetV2p-W18 | 512x1024 | 80000 | - | - | 78.65 | 80.35 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/hrnet/fcn_hr18_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_80k_cityscapes/fcn_hr18_512x1024_80k_cityscapes_20200601_223255-4e7b345e.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_80k_cityscapes/fcn_hr18_512x1024_80k_cityscapes_20200601_223255.log.json) | +| HRNet | HRNetV2p-W18-Small | 512x1024 | 160000 | - | - | 76.31 | 78.31 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/hrnet/fcn_hr18s_512x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_160k_cityscapes/fcn_hr18s_512x1024_160k_cityscapes_20200602_190901-4a0797ea.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_160k_cityscapes/fcn_hr18s_512x1024_160k_cityscapes_20200602_190901.log.json) | +| HRNet | HRNetV2p-W18 | 512x1024 | 160000 | - | - | 78.80 | 80.74 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/hrnet/fcn_hr18_512x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_160k_cityscapes/fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_160k_cityscapes/fcn_hr18_512x1024_160k_cityscapes_20200602_190822.log.json) | +| DeepLabV3+ | R-50-D8 | 512x1024 | 40000 | 7.5 | 3.94 | 79.61 | 81.01 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes/deeplabv3plus_r50-d8_512x1024_40k_cityscapes_20200605_094610-d222ffcd.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes/deeplabv3plus_r50-d8_512x1024_40k_cityscapes_20200605_094610.log.json) | +| DeepLabV3+ | R-50-D8 | 769x769 | 40000 | 8.5 | 1.72 | 78.97 | 80.46 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/deeplabv3plus/deeplabv3plus_r50-d8_769x769_40k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_769x769_40k_cityscapes/deeplabv3plus_r50-d8_769x769_40k_cityscapes_20200606_114143-1dcb0e3c.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_769x769_40k_cityscapes/deeplabv3plus_r50-d8_769x769_40k_cityscapes_20200606_114143.log.json) | +| DeepLabV3+ | R-18-D8 | 512x1024 | 80000 | 2.2 | 14.27 | 76.89 | 78.76 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/deeplabv3plus/deeplabv3plus_r18-d8_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18-d8_512x1024_80k_cityscapes/deeplabv3plus_r18-d8_512x1024_80k_cityscapes_20201226_080942-cff257fe.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18-d8_512x1024_80k_cityscapes/deeplabv3plus_r18-d8_512x1024_80k_cityscapes-20201226_080942.log.json) | +| DeepLabV3+ | R-18-D8 | 769x769 | 80000 | 2.5 | 5.74 | 76.26 | 77.91 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/deeplabv3plus/deeplabv3plus_r18-d8_769x769_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18-d8_769x769_80k_cityscapes/deeplabv3plus_r18-d8_769x769_80k_cityscapes_20201226_083346-f326e06a.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18-d8_769x769_80k_cityscapes/deeplabv3plus_r18-d8_769x769_80k_cityscapes-20201226_083346.log.json) | +| DeepLabV3+ | R-18b-D8 | 512x1024 | 80000 | 2.1 | 14.95 | 75.87 | 77.52 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes_20201226_090828-e451abd9.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes-20201226_090828.log.json) | +| DeepLabV3+ | R-18b-D8 | 769x769 | 80000 | 2.4 | 5.96 | 76.36 | 78.24 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/deeplabv3plus/deeplabv3plus_r18b-d8_769x769_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18b-d8_769x769_80k_cityscapes/deeplabv3plus_r18b-d8_769x769_80k_cityscapes_20201226_151312-2c868aff.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18b-d8_769x769_80k_cityscapes/deeplabv3plus_r18b-d8_769x769_80k_cityscapes-20201226_151312.log.json) | +| OCRNet | HRNetV2p-W18-Small | 512x1024 | 40000 | 3.5 | 10.45 | 74.30 | 75.95 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/ocrnet/ocrnet_hr18s_512x1024_40k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_40k_cityscapes/ocrnet_hr18s_512x1024_40k_cityscapes_20200601_033304-fa2436c2.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_40k_cityscapes/ocrnet_hr18s_512x1024_40k_cityscapes_20200601_033304.log.json) | +| OCRNet | HRNetV2p-W18 | 512x1024 | 40000 | 4.7 | 7.50 | 77.72 | 79.49 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/ocrnet/ocrnet_hr18_512x1024_40k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_40k_cityscapes/ocrnet_hr18_512x1024_40k_cityscapes_20200601_033320-401c5bdd.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_40k_cityscapes/ocrnet_hr18_512x1024_40k_cityscapes_20200601_033320.log.json) | +| OCRNet | HRNetV2p-W18-Small | 512x1024 | 80000 | - | - | 77.16 | 78.66 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/ocrnet/ocrnet_hr18s_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_80k_cityscapes/ocrnet_hr18s_512x1024_80k_cityscapes_20200601_222735-55979e63.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_80k_cityscapes/ocrnet_hr18s_512x1024_80k_cityscapes_20200601_222735.log.json) | +| OCRNet | HRNetV2p-W18 | 512x1024 | 80000 | - | - | 78.57 | 80.46 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/ocrnet/ocrnet_hr18_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_80k_cityscapes/ocrnet_hr18_512x1024_80k_cityscapes_20200614_230521-c2e1dd4a.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_80k_cityscapes/ocrnet_hr18_512x1024_80k_cityscapes_20200614_230521.log.json) | +| OCRNet | HRNetV2p-W18-Small | 512x1024 | 160000 | - | - | 78.45 | 79.97 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/ocrnet/ocrnet_hr18s_512x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_160k_cityscapes/ocrnet_hr18s_512x1024_160k_cityscapes_20200602_191005-f4a7af28.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_160k_cityscapes/ocrnet_hr18s_512x1024_160k_cityscapes_20200602_191005.log.json) | +| OCRNet | HRNetV2p-W18 | 512x1024 | 160000 | - | - | 79.47 | 80.91 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/ocrnet/ocrnet_hr18_512x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_160k_cityscapes/ocrnet_hr18_512x1024_160k_cityscapes_20200602_191001-b9172d0c.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_160k_cityscapes/ocrnet_hr18_512x1024_160k_cityscapes_20200602_191001.log.json) | + +## 训练参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| export_format | seg-coco:raw | 字符串| 受ymir后台处理,ymir分割数据集导出格式 | 禁止改变 | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| config_file | configs/fastscnn/fast_scnn_lr0.12_8x4_160k_cityscapes.py | 文件路径 | mmlab配置文件 | 建议采用fastscnn系列, 参考[configs](https://github.com/modelai/ymir-mmsegmentation/tree/master/configs) | +| samples_per_gpu | 2 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| workers_per_gpu | 2 | 整数 | 每张GPU对应的数据读取进程数 | 采用默认值即可,若内存及CPU配置高,可适当增大 | +| max_iters | 20000 | 整数 | 数据集的训练批次 | 建议:必要时分析tensorboard确定是否有必要改变,一般采用默认值即可 | +| interval | 2000 | 整数 | 模型在验证集上评测的周期 | 采用默认值即可 | +| args_options | '' | 字符串 | 训练命令行参数 | 参考[tools/train.py]() +| cfg_options | '' | 字符串 | 训练命令行参数 | 参考 [tools/train.py]() +| save_least_file | True | 布尔型 | 是否只保存最优和最新的权重文件 | 设置为True | +| max_keep_ckpts | -1 | 整数 | 当save_least_file为False时,最多保存的权重文件数量 | 设置为k, 可保存k个最优权重和k个最新的权重文件,设置为-1可保存所有权重文件。| +| ignore_black_area | False | 布尔型 | 是否忽略未标注的区域 | 采用默认即可将空白区域当成背景进行训练 | + +## 推理参数 + +| 超参数 | 默认值 | 类型 | 说明 | 建议 | +| - | - | - | - | - | +| hyper-parameter | default value | type | note | advice | +| shm_size | 128G | 字符串| 受ymir后台处理,docker image 可用共享内存 | 建议大小:镜像占用GPU数 * 32G | +| samples_per_gpu | 2 | 整数 | 每张GPU一次处理的图片数量 | 建议大小:显存占用<50% 可增加2倍加快训练速度 | +| workers_per_gpu | 2 | 整数 | 每张GPU对应的数据读取进程数 | 采用默认值即可,若内存及CPU配置高,可适当增大 | + + +## 挖掘参数 + +| 超参数 | 默认值 | 类型 | 可选值 | 说明 | +| hyper-parameter | default value | type | choices | note | +| - | - | - | - | - | +| mining_algorithm | RSAL | str | RSAL, RIPU | 挖掘算法名称 | +| superpixel_algorithm | slico | str | slico, slic, mslic, seeds | 超像素算法名称 | +| uncertainty_method | BvSB | str | BvSB | 不确定性计算方法名称 | +| shm_size | 128G | str | 128G | 容器可使用的共享内存大小 | +| max_superpixel_per_image | 1024 | int | 1024, ... | 一张图像中超像素的数量上限 | +| max_kept_mining_image | 5000 | int | 500, 1000, 2000, 5000, ... | 挖掘图像数量的上限 | +| topk_superpixel_score | 3 | int | 3, 5, 10, ... | 一张图像中采用的超像素数量 | +| class_balance | True | bool | True, False | 是否考虑各类标注的平衡性 | +| fp16 | True | bool | True, False | 是否采用fp16技术加速 | +| samples_per_gpu | 2 | int | 2, 4, ... | batch size per gpu | +| workers_per_gpu | 2 | int | 2 | num_workers per gpu | +| ripu_region_radius | 1 | int | 1, 2, 3 | ripu挖掘算法专用参数 | + +## 镜像制作 + +- [YMIR语义分割镜像制作](https://ymir-executor-fork.readthedocs.io/zh/latest/image_segmentation/simple_semantic_seg_training/) + +- [mmsegmentation简介](https://ymir-executor-fork.readthedocs.io/zh/latest/algorithms/mmseg/) diff --git a/docs/image_segmentation/simple_instance_seg_tmi.md b/docs/image_segmentation/simple_instance_seg_tmi.md new file mode 100644 index 0000000..5ebb88b --- /dev/null +++ b/docs/image_segmentation/simple_instance_seg_tmi.md @@ -0,0 +1,38 @@ +# 制作简单的实例分割镜像 + +参考语义分割镜像的制作: + +- [语义分割-训练](./simple_semantic_seg_training.md) + +- [语义分割-推理](./simple_semantic_seg_infer.md) + +- [语义分割-挖掘](./simple_semantic_seg_mining.md) + +## 镜像说明文件 + +**object_type** 为 4 表示镜像支持实例分割 + +- [img-man/manifest.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-instance-demo-tmi/img-man/manifest.yaml) +``` +# 4 for instance segmentation +"object_type": 4 +``` + +## 训练结果返回 + +``` +rw.write_model_stage(stage_name='epoch20', + files=['epoch20.pt', 'config.py'], + evaluation_result=dict(maskAP=expected_maskap)) +``` + +## 推理结果返回 + +采用coco数据集格式,相比语义分割,实例分割的annotation中需要增加 `bbox` 的置信度。 +``` +# for instance segmentation +annotation_info['confidence'] = min(1.0, 0.1 + random.random()) + +coco_results = convert(cfg, results, True) +rw.write_infer_result(infer_result=coco_results, algorithm='segmentation') +``` diff --git a/docs/image_segmentation/simple_semantic_seg_infer.md b/docs/image_segmentation/simple_semantic_seg_infer.md new file mode 100644 index 0000000..588e024 --- /dev/null +++ b/docs/image_segmentation/simple_semantic_seg_infer.md @@ -0,0 +1,104 @@ +# 制作一个简单的语义分割推理镜像 + +参考[ymir镜像制作简介](../overview/ymir-executor.md) + +## 镜像输入输出示例 +``` +. +├── in +│ ├── annotations +│ ├── assets +│ ├── candidate-index.tsv +│ ├── config.yaml +│ ├── env.yaml +│ └── models +└── out + ├── monitor.txt + └── infer-result.json +``` + +## 工作目录 + +``` +cd seg-semantic-demo-tmi +``` + +## 提供超参数模型文件 + +镜像中包含**/img-man/infer-template.yaml** 表示镜像支持推理 + +- [img-man/infer-template.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/img-man/infer-template.yaml) + +```yaml +{!seg-semantic-demo-tmi/img-man/infer-template.yaml!} +``` + +- [Dockerfile](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/Dockerfile) + +``` +RUN mkdir -p /img-man # 在镜像中生成/img-man目录 +COPY img-man/*.yaml /img-man/ # 将主机中img-man目录下的所有yaml文件复制到镜像/img-man目录 +``` + +## 提供镜像说明文件 + +**object_type** 为 3 表示镜像支持语义分割 + +- [img-man/manifest.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/img-man/manifest.yaml) +``` +# 3 for semantic segmentation +"object_type": 3 +``` + +- Dockerfile +`COPY img-man/*.yaml /img-man/` 在复制infer-template.yaml的同时,会将manifest.yaml复制到镜像中的**/img-man**目录 + +## 提供默认启动脚本 + +- Dockerfile +``` +RUN echo "python /app/start.py" > /usr/bin/start.sh # 生成启动脚本 /usr/bin/start.sh +CMD bash /usr/bin/start.sh # 将镜像的默认启动脚本设置为 /usr/bin/start.sh +``` + +## 实现基本功能 + +- [app/start.py](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/app/start.py) + +::: seg-semantic-demo-tmi.app.start._run_infer + handler: python + options: + show_root_heading: false + show_source: true + +## 写进度 + +``` +# use `monitor.write_monitor_logger` to write log to console and write task process percent to monitor.txt +logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") +monitor.write_monitor_logger(percent=0.2) + +# real-time monitor +monitor.write_monitor_logger(percent=0.2 + 0.8 * iter / valid_image_count) + +# if task done, write 100% percent log +logging.info('infer done') +monitor.write_monitor_logger(percent=1.0) +``` + +## 写结果文件 + +``` +coco_results = convert(cfg, results, True) +rw.write_infer_result(infer_result=coco_results, algorithm='segmentation') +``` + +## 制作镜像 demo/semantic_seg:infer + +```dockerfile +{!seg-semantic-demo-tmi/Dockerfile!} +``` + +``` +docker build -t demo/semantic_seg:infer -f Dockerfile . +``` diff --git a/docs/image_segmentation/simple_semantic_seg_mining.md b/docs/image_segmentation/simple_semantic_seg_mining.md new file mode 100644 index 0000000..a4c50b1 --- /dev/null +++ b/docs/image_segmentation/simple_semantic_seg_mining.md @@ -0,0 +1,103 @@ +# 制作一个简单的语义分割挖掘镜像 + +参考[ymir镜像制作简介](../overview/ymir-executor.md) + +## 镜像输入输出示例 +``` +. +├── in +│ ├── annotations +│ ├── assets +│ ├── candidate-index.tsv +│ ├── config.yaml +│ ├── env.yaml +│ └── models +└── out + ├── monitor.txt + └── result.tsv +``` + +## 工作目录 + +``` +cd seg-semantic-demo-tmi +``` + +## 提供超参数模型文件 + +镜像中包含**/img-man/mining-template.yaml** 表示镜像支持挖掘 + +- [img-man/mining-template.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/img-man/mining-template.yaml) + +```yaml +{!seg-semantic-demo-tmi/img-man/mining-template.yaml!} +``` + +- [Dockerfile](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/Dockerfile) + +``` +RUN mkdir -p /img-man # 在镜像中生成/img-man目录 +COPY img-man/*.yaml /img-man/ # 将主机中img-man目录下的所有yaml文件复制到镜像/img-man目录 +``` + +## 提供镜像说明文件 + +**object_type** 为 3 表示镜像支持语义分割 + +- [img-man/manifest.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/img-man/manifest.yaml) +``` +# 3 for semantic segmentation +"object_type": 3 +``` + +- Dockerfile +`COPY img-man/*.yaml /img-man/` 在复制mining-template.yaml的同时,会将manifest.yaml复制到镜像中的**/img-man**目录 + +## 提供默认启动脚本 + +- Dockerfile +``` +RUN echo "python /app/start.py" > /usr/bin/start.sh # 生成启动脚本 /usr/bin/start.sh +CMD bash /usr/bin/start.sh # 将镜像的默认启动脚本设置为 /usr/bin/start.sh +``` + +## 实现基本功能 + +- [app/start.py](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/app/start.py) + +::: seg-semantic-demo-tmi.app.start._run_mining + handler: python + options: + show_root_heading: false + show_source: true + +## 写进度 + +``` +# use `monitor.write_monitor_logger` to write log to console and write task process percent to monitor.txt +logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") +monitor.write_monitor_logger(percent=0.2) + +time.sleep(0.1) +monitor.write_monitor_logger(percent=0.2 + 0.8 * index / valid_image_count) + +# if task done, write 100% percent log +logging.info('mining done') +monitor.write_monitor_logger(percent=1.0) +``` + +## 写结果文件 + +``` +rw.write_mining_result(mining_result=mining_result) +``` + +## 制作镜像 demo/semantic_seg:mining + +```dockerfile +{!seg-semantic-demo-tmi/Dockerfile!} +``` + +``` +docker build -t demo/semantic_seg:mining -f Dockerfile . +``` diff --git a/docs/image_segmentation/simple_semantic_seg_training.md b/docs/image_segmentation/simple_semantic_seg_training.md new file mode 100644 index 0000000..3cd4933 --- /dev/null +++ b/docs/image_segmentation/simple_semantic_seg_training.md @@ -0,0 +1,126 @@ +# 制作一个简单的语义分割训练镜像 + +参考[ymir镜像制作简介](../overview/ymir-executor.md), 通过加载 /in 目录下的数据集,超参数,任务信息,预训练权重, 在 /out 目录下产生模型权重,进度文件,训练日志。 + +## 镜像输入输出示例 +``` +. +├── in +│ ├── annotations +│ │ └── coco-annotations.json +│ ├── assets -> /home/ymir/ymir/ymir-workplace/sandbox/0001/asset_cache +│ ├── config.yaml +│ ├── env.yaml +│ ├── models +│ │ ├── best_mIoU_iter_180.pth +│ │ └── fast_scnn_lr0.12_8x4_160k_cityscapes.py +│ ├── train-index.tsv +│ └── val-index.tsv +├── out +│ ├── models +│ │ ├── 20221103_082913.log +│ │ ├── 20221103_082913.log.json +│ │ ├── fast_scnn_lr0.12_8x4_160k_cityscapes.py +│ │ ├── iter_10000.pth +│ │ ├── iter_12000.pth +│ │ ├── iter_14000.pth +│ │ ├── iter_16000.pth +│ │ ├── iter_18000.pth +│ │ ├── iter_20000.pth +│ │ ├── latest.pth -> iter_20000.pth +│ │ └── result.yaml +│ ├── monitor.txt +│ ├── tensorboard -> /home/ymir/ymir/ymir-workplace/ymir-tensorboard-logs/0001/t00000010000043b47591667304420 +│ └── ymir-executor-out.log +└── task_config.yaml +``` + +## 工作目录 +``` +cd seg-semantic-demo-tmi +``` + +## 提供超参数模型文件 + +镜像中包含**/img-man/training-template.yaml** 表示镜像支持训练 + +- [img-man/training-template.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/img-man/training-template.yaml) + +指明数据格式 **export_format** 为 **seg-coco:raw**, 即语义/实例分割标注格式,详情参考[Ymir镜像数据集格式](../overview/dataset-format.md) + +```yaml +{!seg-semantic-demo-tmi/img-man/training-template.yaml!} +``` + +- [Dockerfile](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/Dockerfile) + +``` +RUN mkdir -p /img-man # 在镜像中生成/img-man目录 +COPY img-man/*.yaml /img-man/ # 将主机中img-man目录下的所有yaml文件复制到镜像/img-man目录 +``` + +## 提供镜像说明文件 + +**object_type** 为 3 表示镜像支持语义分割 + +- [img-man/manifest.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/img-man/manifest.yaml) +``` +# 3 for semantic segmentation +"object_type": 3 +``` + +- Dockerfile +`COPY img-man/*.yaml /img-man/` 在复制training-template.yaml的同时,会将manifest.yaml复制到镜像中的**/img-man**目录 + +## 提供默认启动脚本 + +- Dockerfile +``` +RUN echo "python /app/start.py" > /usr/bin/start.sh # 生成启动脚本 /usr/bin/start.sh +CMD bash /usr/bin/start.sh # 将镜像的默认启动脚本设置为 /usr/bin/start.sh +``` + +## 实现基本功能 + +- [app/start.py](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/seg-semantic-demo-tmi/app/start.py) + +::: seg-semantic-demo-tmi.app.start._run_training + handler: python + options: + show_root_heading: false + show_source: true + +## 写进度 + +``` +if idx % monitor_gap == 0: + monitor.write_monitor_logger(percent=0.2 * idx / N) + +monitor.write_monitor_logger(percent=0.2) + +monitor.write_monitor_logger(percent=1.0) +``` + +## 写结果文件 + +``` +rw.write_model_stage(stage_name='epoch20', + files=['epoch20.pt', 'config.py'], + evaluation_result=dict(mIoU=expected_miou)) +``` + +## 写tensorboard日志 + +``` +write_tensorboard_log(cfg.ymir.output.tensorboard_dir) +``` + +## 制作镜像 demo/semantic_seg:training + +```dockerfile +{!seg-semantic-demo-tmi/Dockerfile!} +``` + +``` +docker build -t demo/semantic_seg:training -f Dockerfile . +``` diff --git a/docs/image_segmentation/test_semantic_seg.md b/docs/image_segmentation/test_semantic_seg.md new file mode 100644 index 0000000..c5e8354 --- /dev/null +++ b/docs/image_segmentation/test_semantic_seg.md @@ -0,0 +1,101 @@ +# 测试Ymir语义分割镜像 + +## 通过YMIR平台进行测试 + +用户可以直接通过Ymir平台发起语义分割的训练,推理及挖掘任务,对镜像进行测试。 + +!!! 注意 + YMIR平台发起的任务在顺利结束时,会清理相应的目录,因此在测试时,请确保相应目录存在。 + +### 导入待测镜像 + +- 假设用户已经制作好镜像 **demo/semantic_seg:tmi**, 它支持训练、推理及挖掘 + +- 假设用户具有管理员权限,按照[新增镜像](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E6%96%B0%E5%A2%9E%E9%95%9C%E5%83%8F) 将**demo/semantic_seg:tmi** 添加到 **我的镜像** 中。 + +### 导入待测数据集 + +- 下载示例语义分割数据集 [train-semantic-seg.zip](https://github.com/modelai/ymir-executor-fork/releases/download/dataset-ymir2.0.0/eg100_fgonly_train.zip) [val-semantic-seg.zip](https://github.com/modelai/ymir-executor-fork/releases/download/dataset-ymir2.0.0/eg100_fgonly_val.zip) + +- 建立包含对应标签的项目, `训练类别` 中添加对应标签 `foreground` + +- 按照[添加数据集](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E6%B7%BB%E5%8A%A0%E6%95%B0%E6%8D%AE%E9%9B%86)导入示例语义分割数据集 + +### 发起待测任务 + +发起待测的训练、推理或挖掘任务后,等待其结束或出错。 + +### 获取任务id + +登录服务器后台,进入YMIR部署的目录 `ymir-workplace` + +- 对于训练任务:`cd sandbox/work_dir/TaskTypeTraining` + +- 对于挖掘或推理任务: `cd sandbox/work_dir/TaskTypeMining` + +- 对于单张图片测试任务: `cd sandbox/work_dir/TaskTypeInfer` + +- 列举当前所有的任务,按任务时间找到对应任务id, 此处假设为最新的 **t00000020000023a473e1673591617** + +!!! 注意 + 对于训练任务, 可通过tensorboard链接获得对应任务id。 + +``` +> ls -lt . + +drwxr-xr-x 4 root root 45 Jan 13 14:33 t00000020000023a473e1673591617 +drwxr-xr-x 4 root root 45 Jan 13 14:19 t00000020000025d55ff1673590756 +drwxr-xr-x 4 root root 45 Jan 13 14:13 t00000020000028b0cce1673590425 +drwxr-xr-x 4 root root 45 Jan 10 14:09 t00000020000018429301673330944 +drwxr-xr-x 4 root root 45 Jan 9 18:21 t000000200000210e0811673259669 +drwxr-xr-x 4 root root 45 Jan 9 18:07 t00000020000029e02f61673258829 +``` + +### 通过 docker 进行交互式调试 + +- 进行任务id对应的工作目录 `cd t00000020000023a473e1673591617/sub_task/t00000020000023a473e1673591617` + +- 列举当前目录可以看到 `in` 和 `out` 目录 + +- 进行交互式调试 + + - 假设 `ymir-workplace` 存放在 **/data/ymir/ymir-workplace**, 需要将 `ymir-workplace` 目录也挂载到镜像中相同位置,以确保所有软链接均有效。 + + - 假设启动程序为 **/usr/bin/start.sh** + +``` +# --ipc host 表示容器共享主机的所有内存 +docker run -it --rm --gpus all --ipc host -v $PWD/in:/in -v $PWD/out:/out -v /data:/data demo/semantic_seg:tmi bash + +# --shm-size 128g 表示容器最多共享主机128G内存 +# docker run -it --rm --gpus all --shm-size 128g -v $PWD/in:/in -v $PWD/out:/out -v /data:/data demo/semantic_seg:tmi bash + +bash /usr/bin/start.sh +``` + +- 假设用户开发镜像的代码存放在 **/home/modelai/code**, 为方便测试, 可以将 **/home/modelai/code** 也挂载到镜像中进行测试。 + + - 假设实际启动程序为 **start.py** + +``` +docker run -it --rm --gpus all --ipc host -v $PWD/in:/in -v $PWD/out:/out -v /data:/data -v /home/modelai/code:/home/modelai/code demo/semantic_seg:tmi bash + +cd /home/modelai/code +python start.py +``` + +### 测试通过后 + +- 通过 `docker build` 重新构建镜像, 如果修改了超参数,需要在Ymir平台删除旧镜像并重新添加,使更新的超参数生效。如果仅仅修改了代码,不需要重新添加即可使用本地的最新镜像。 + +## 💫 YMIR后台错误查看 + +- 如镜像正确运行,但输出格式不符合YMIR后台要求,或其他错误,可在 `ymir-workplace/ymir-data/logs` 下查看 + +``` +tail -f -n 200 ymir_controller.log +``` + +## 💫 通过 ymir-executor-verifier 进行测试 + +[ymir-executor-verifier](https://github.com/modelai/ymir-executor-verifier) 面向企业用户,目的是对大量镜像进行自动化测试,以保障镜像的质量。 diff --git a/docs/imgs/2007_000783.png b/docs/imgs/2007_000783.png new file mode 100644 index 0000000..2151d32 Binary files /dev/null and b/docs/imgs/2007_000783.png differ diff --git a/docs/imgs/training-hyper-parameter-web.png b/docs/imgs/training-hyper-parameter-web.png new file mode 100644 index 0000000..f599453 Binary files /dev/null and b/docs/imgs/training-hyper-parameter-web.png differ diff --git a/docs/imgs/ymir-design.png b/docs/imgs/ymir-design.png new file mode 100644 index 0000000..54e1238 Binary files /dev/null and b/docs/imgs/ymir-design.png differ diff --git a/docs/imgs/ymir_image_community.png b/docs/imgs/ymir_image_community.png new file mode 100644 index 0000000..b65314f Binary files /dev/null and b/docs/imgs/ymir_image_community.png differ diff --git a/docs/imgs/ymir_publish_image.png b/docs/imgs/ymir_publish_image.png new file mode 100644 index 0000000..0795b8f Binary files /dev/null and b/docs/imgs/ymir_publish_image.png differ diff --git a/docs/import_outer_weight.md b/docs/import_outer_weight.md new file mode 100644 index 0000000..6dfa16e --- /dev/null +++ b/docs/import_outer_weight.md @@ -0,0 +1,30 @@ +# 导入外部模型权值 + +## import extra model for yolov5 (ymir2.0.0) + +- create a tar file with weight file `best.pt` and config file `ymir-info.yaml` + +``` +$ tar -cf yolov5_best.tar best.pt ymir-info.yaml +$ cat ymir-info.yaml +best_stage_name: best +executor_config: + class_names: + - dog +package_version: 2.0.0 +stages: + best: + files: + - best.pt + mAP: 0.8349897782446034 + stage_name: best + timestamp: 1669186346 +task_context: + executor: youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi + mAP: 0.8349897782446034 + producer: ymir + task_parameters: '{"keywords": ["dog"]}' + type: 1 +``` + +![图片](https://user-images.githubusercontent.com/5005182/184783723-1ce48603-1254-4ed9-90ba-c1dd8510dc79.png) diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..7ca84a2 --- /dev/null +++ b/docs/index.md @@ -0,0 +1 @@ +{!docs/overview/introduction.md!} diff --git a/docs/mining-images-overview.md b/docs/mining-images-overview.md new file mode 100644 index 0000000..f427e55 --- /dev/null +++ b/docs/mining-images-overview.md @@ -0,0 +1,51 @@ +# ymir mining images overview + +| docker images | random | cald | aldd | entropy | +| - | - | - | - | - | +| yolov5 | ✔️ | ✔️ | ✔️ | ✔️ | +| mmdetection | ✔️ | ✔️ | ✔️ | ✔️ | +| yolov4 | ❌ | ✔️ | ✔️ | ❌ | +| yolov7 | ❌ | ❌ | ✔️ | ❌ | +| nanodet | ❌ | ❌ | ✔️ | ❌ | +| vidt |❌ | ✔️ | ❌ | ❌ | +| detectron2 | ❌ | ✔️ | ❌ | ❌ | + +![](./mining_score.png) + +# 带负样本的单类挖掘实验 + +- view [ALBench: Active Learning Benchmark](https://github.com/modelai/ALBench) for detail + +## 实验设置 + +COCO数据集中选择三个类做实验,分别是Train,Fork,Dog,从选定类别的train中选择1000张图片加入训练集,从不包含该类别的图片中选择3000张作为负样本加入训练集。选定类别的所有val加入验证集,从不包含该图片的val中选择3倍数据作为负样本加入验证集。剩余图片全部加入挖掘集,每次迭代从中选择500图片加入训练集。 + +| class | train | val | mining | +| - | - | - | - | +| train(火车) | 4000 | 628 | 114287 | +| fork(叉子) | 4000 | 620 | 114287 | +| dog(狗) | 4000 | 708 | 114287 | + +## 挖掘实验结果 + +| class | mining algorithm | iter 0 | iter 1 | iter 2 | iter 3 | iter 4 | +| - | - | - | - | - | - | - | +| train(火车) | random | 0.647 | 0.639 | 0.652 | 0.620 | 0.622 | +| train(火车) | entropy | 0.678 | 0.703 | 0.721 | 0.738 | 0.757 | +| train(火车) | aldd | 0.665 | 0.706 | 0.738 | 0.754 | 0.778 | +| fork(叉子) | random | 0.244 | 0.221 | 0.224 | 0.227 | 0.225 | +| fork(叉子) | entropy | 0.239 | 0.255 | 0.313 | 0.367 | 0.372 | +| fork(叉子) | aldd | 0.220 | 0.290 | 0.329 | 0.368 | 0.379 | +| dog(狗) | random | 0.391 | 0.418 | 0.401 | 0.389 | 0.416 | +| dog(狗) | entropy | 0.391 | 0.418 | 0.449 | 0.535 | 0.505 | +| dog(狗) | aldd | 0.399 | 0.487 | 0.518 | 0.533 | 0.564 | + +## reference + +- [awesome-active-learning](https://github.com/baifanxxx/awesome-active-learning) + +- entropy: `Multi-class active learning for image classification. CVPR 2009` + +- [CALD](https://github.com/we1pingyu/CALD/): `Consistency-based Active Learning for Object Detection. CVPR 2022 workshop` + +- [ALDD](https://gitlab.com/haghdam/deep_active_learning): `Active Learning for Deep Detection Neural Networks. ICCV 2019` diff --git a/docs/mining_score.png b/docs/mining_score.png new file mode 100644 index 0000000..7582616 Binary files /dev/null and b/docs/mining_score.png differ diff --git a/docs/object_detection/simple_det_infer.md b/docs/object_detection/simple_det_infer.md new file mode 100644 index 0000000..3849656 --- /dev/null +++ b/docs/object_detection/simple_det_infer.md @@ -0,0 +1,100 @@ +# 制作一个简单的目标检测推理镜像 + +参考[ymir镜像制作简介](../overview/ymir-executor.md) + +## 镜像输入输出示例 +``` +. +├── in +│ ├── annotations +│ ├── assets +│ ├── candidate-index.tsv +│ ├── config.yaml +│ ├── env.yaml +│ └── models +└── out + ├── monitor.txt + └── infer-result.json +``` + +## 工作目录 +``` +cd det-demo-tmi +``` + +## 提供超参数模型文件 + +镜像中包含**/img-man/infer-template.yaml** 表示镜像支持推理 + +- [img-man/infer-template.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/img-man/infer-template.yaml) + +```yaml +{!det-demo-tmi/img-man/infer-template.yaml!} +``` + +- [Dockerfile](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/Dockerfile) + +``` +RUN mkdir -p /img-man # 在镜像中生成/img-man目录 +COPY img-man/*.yaml /img-man/ # 将主机中img-man目录下的所有yaml文件复制到镜像/img-man目录 +``` + +## 提供镜像说明文件 + +**object_type** 为 2 表示镜像支持目标检测 + +- [img-man/manifest.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/img-man/manifest.yaml) +``` +# 2 for object detection +"object_type": 2 +``` + +- Dockerfile +`COPY img-man/*.yaml /img-man/` 在复制mining-template.yaml的同时,会将manifest.yaml复制到镜像中的**/img-man**目录 + +## 提供默认启动脚本 + +- Dockerfile +``` +RUN echo "python /app/start.py" > /usr/bin/start.sh # 生成启动脚本 /usr/bin/start.sh +CMD bash /usr/bin/start.sh # 将镜像的默认启动脚本设置为 /usr/bin/start.sh +``` + +## 实现基本功能 + +- [app/start.py](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/app/start.py) + +::: det-demo-tmi.app.start._run_mining + handler: python + options: + show_root_heading: false + show_source: true + + +## 写进度 + +``` +# use `monitor.write_monitor_logger` to write log to console and write task process percent to monitor.txt +logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") +monitor.write_monitor_logger(percent=0.2) + +# if task done, write 100% percent log +logging.info('infer done') +monitor.write_monitor_logger(percent=1.0) +``` + +## 写结果文件 + +``` +rw.write_infer_result(infer_result=infer_result, algorithm='detection') +``` + +## 制作镜像 demo/det:infer + +```dockerfile +{!det-demo-tmi/Dockerfile!} +``` + +``` +docker build -t demo/det:infer -f Dockerfile . +``` diff --git a/docs/object_detection/simple_det_mining.md b/docs/object_detection/simple_det_mining.md new file mode 100644 index 0000000..a0b9d1a --- /dev/null +++ b/docs/object_detection/simple_det_mining.md @@ -0,0 +1,100 @@ +# 制作一个简单的目标检测挖掘镜像 + +参考[ymir镜像制作简介](../overview/ymir-executor.md) + +## 镜像输入输出示例 +``` +. +├── in +│ ├── annotations +│ ├── assets +│ ├── candidate-index.tsv +│ ├── config.yaml +│ ├── env.yaml +│ └── models +└── out + ├── monitor.txt + └── result.tsv +``` + +## 工作目录 + +``` +cd det-demo-tmi +``` + +## 提供超参数模型文件 + +镜像中包含**/img-man/mining-template.yaml** 表示镜像支持挖掘 + +- [img-man/mining-template.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/img-man/mining-template.yaml) + +```yaml +{!det-demo-tmi/img-man/mining-template.yaml!} +``` + +- [Dockerfile](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/Dockerfile) + +``` +RUN mkdir -p /img-man # 在镜像中生成/img-man目录 +COPY img-man/*.yaml /img-man/ # 将主机中img-man目录下的所有yaml文件复制到镜像/img-man目录 +``` + +## 提供镜像说明文件 + +**object_type** 为 2 表示镜像支持目标检测 + +- [img-man/manifest.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/img-man/manifest.yaml) +``` +# 2 for object detection +"object_type": 2 +``` + +- Dockerfile +`COPY img-man/*.yaml /img-man/` 在复制mining-template.yaml的同时,会将manifest.yaml复制到镜像中的**/img-man**目录 + +## 提供默认启动脚本 + +- Dockerfile +``` +RUN echo "python /app/start.py" > /usr/bin/start.sh # 生成启动脚本 /usr/bin/start.sh +CMD bash /usr/bin/start.sh # 将镜像的默认启动脚本设置为 /usr/bin/start.sh +``` + +## 实现基本功能 + +- [app/start.py](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/app/start.py) + +::: det-demo-tmi.app.start._run_mining + handler: python + options: + show_root_heading: false + show_source: true + +## 写进度 + +``` +# use `monitor.write_monitor_logger` to write log to console and write task process percent to monitor.txt +logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") +monitor.write_monitor_logger(percent=0.2) + +# if task done, write 100% percent log +logging.info('mining done') +monitor.write_monitor_logger(percent=1.0) +``` + +## 写结果文件 + +``` +rw.write_mining_result(mining_result=mining_result) +``` + +## 制作镜像 demo/det:mining + +```dockerfile +{!det-demo-tmi/Dockerfile!} +``` + +``` +docker build -t demo/det:mining -f Dockerfile . +``` diff --git a/docs/object_detection/simple_det_training.md b/docs/object_detection/simple_det_training.md new file mode 100644 index 0000000..04bccb6 --- /dev/null +++ b/docs/object_detection/simple_det_training.md @@ -0,0 +1,117 @@ +# 制作一个简单的目标检测训练镜像 + +参考[ymir镜像制作简介](../overview/ymir-executor.md), 通过加载 /in 目录下的数据集,超参数,任务信息,预训练权重, 在 /out 目录下产生模型权重,进度文件,训练日志。 + +## 镜像输入输出示例 +``` +. +├── in +│ ├── annotations [257 entries exceeds filelimit, not opening dir] +│ ├── assets -> /home/ymir/ymir/ymir-workplace/sandbox/0001/training_asset_cache +│ ├── config.yaml +│ ├── env.yaml +│ ├── models +│ ├── train-index.tsv +│ └── val-index.tsv +├── out +│ ├── models [29 entries exceeds filelimit, not opening dir] +│ ├── monitor.txt +│ ├── tensorboard -> /home/ymir/ymir/ymir-workplace/ymir-tensorboard-logs/0001/t00000010000028774b61663839849 +│ └── ymir-executor-out.log +└── task_config.yaml +``` + +## 工作目录 +``` +cd det-demo-tmi +``` + +## 提供超参数模型文件 + +镜像中包含**/img-man/training-template.yaml** 表示镜像支持训练 + +- [img-man/training-template.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/img-man/training-template.yaml) + +指明数据格式 **export_format** 为 **det-ark:raw**, 即目标检测标注格式,详情参考[Ymir镜像数据集格式](../overview/dataset-format.md) + +```yaml +{!det-demo-tmi/img-man/training-template.yaml!} +``` + +- [Dockerfile](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/Dockerfile) + +``` +RUN mkdir -p /img-man # 在镜像中生成/img-man目录 +COPY img-man/*.yaml /img-man/ # 将主机中img-man目录下的所有yaml文件复制到镜像/img-man目录 +``` + +## 提供镜像说明文件 + +**object_type** 为 2 表示镜像支持目标检测 + +- [img-man/manifest.yaml](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/img-man/manifest.yaml) +``` +# 2 for object detection +"object_type": 2 +``` + +- Dockerfile +`COPY img-man/*.yaml /img-man/` 在复制training-template.yaml的同时,会将manifest.yaml复制到镜像中的**/img-man**目录 + +## 提供默认启动脚本 + +- Dockerfile +``` +RUN echo "python /app/start.py" > /usr/bin/start.sh # 生成启动脚本 /usr/bin/start.sh +CMD bash /usr/bin/start.sh # 将镜像的默认启动脚本设置为 /usr/bin/start.sh +``` + +## 实现基本功能 + +- [app/start.py](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-demo-tmi/app/start.py) + +::: det-demo-tmi.app.start._run_training + handler: python + options: + show_root_heading: false + show_source: true + +## 写进度 + +``` +if idx % monitor_gap == 0: + monitor.write_monitor_logger(percent=0.2 * idx / N) + +monitor.write_monitor_logger(percent=0.2) + +monitor.write_monitor_logger(percent=1.0) +``` + +## 写结果文件 + +``` +# use `rw.write_model_stage` to save training result +rw.write_model_stage(stage_name='epoch10', + files=['epoch10.pt', 'config.py'], + evaluation_result=dict(mAP=random.random() / 2)) + +rw.write_model_stage(stage_name='epoch20', + files=['epoch20.pt', 'config.py'], + evaluation_result=dict(mAP=expected_mAP)) +``` + +## 写tensorboard日志 + +``` +write_tensorboard_log(cfg.ymir.output.tensorboard_dir) +``` + +## 制作镜像 demo/det:training + +```dockerfile +{!det-demo-tmi/Dockerfile!} +``` + +``` +docker build -t demo/det:training -f Dockerfile . +``` diff --git a/docs/object_detection/test_det.md b/docs/object_detection/test_det.md new file mode 100644 index 0000000..8a702c5 --- /dev/null +++ b/docs/object_detection/test_det.md @@ -0,0 +1,102 @@ +# 测试Ymir目标检测镜像 + +## 通过YMIR平台进行测试 + +用户可以直接通过Ymir平台发起目标检测的训练,推理及挖掘任务,对镜像进行测试。、 + +!!! 注意 + YMIR平台发起的任务在顺利结束时,会清理相应的目录,因此在测试时,请确保相应目录存在。 + +### 导入待测镜像 + +- 假设用户已经制作好镜像 **demo/det:tmi**, 它支持训练、推理及挖掘 + +- 假设用户具有管理员权限,按照[新增镜像](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E6%96%B0%E5%A2%9E%E9%95%9C%E5%83%8F) 将**demo/det:tmi** 添加到 **我的镜像** 中。 + +### 导入待测数据集 + +- 下载示例目标检测数据集 [ymir2.0.0_dog_train.zip](https://github.com/modelai/ymir-executor-fork/releases/download/dataset-ymir2.0.0/ymir2.0.0_dog_train.zip) [ymir2.0.0_dog_val.zip](https://github.com/modelai/ymir-executor-fork/releases/download/dataset-ymir2.0.0/ymir2.0.0_dog_val.zip) + +- 建立包含对应标签的项目, `训练类别` 中添加对应标签 `dog` + +- 按照[添加数据集](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E6%B7%BB%E5%8A%A0%E6%95%B0%E6%8D%AE%E9%9B%86)导入示例目标检测数据集 + +### 发起待测任务 + +发起待测的训练、推理或挖掘任务后,等待其结束或出错。 + +### 获取任务id + +登录服务器后台,进入YMIR部署的目录 `ymir-workplace` + +- 对于训练任务:`cd sandbox/work_dir/TaskTypeTraining` + +- 对于挖掘或推理任务: `cd sandbox/work_dir/TaskTypeMining` + +- 对于单张图片测试任务: `cd sandbox/work_dir/TaskTypeInfer` + +- 列举当前所有的任务,按任务时间找到对应任务id, 此处假设为最新的 **t00000020000023a473e1673591617** + +!!! 注意 + 对于训练任务, 可通过tensorboard链接获得对应任务id。 + +``` +> ls -lt . + +drwxr-xr-x 4 root root 45 Jan 13 14:33 t00000020000023a473e1673591617 +drwxr-xr-x 4 root root 45 Jan 13 14:19 t00000020000025d55ff1673590756 +drwxr-xr-x 4 root root 45 Jan 13 14:13 t00000020000028b0cce1673590425 +drwxr-xr-x 4 root root 45 Jan 10 14:09 t00000020000018429301673330944 +drwxr-xr-x 4 root root 45 Jan 9 18:21 t000000200000210e0811673259669 +drwxr-xr-x 4 root root 45 Jan 9 18:07 t00000020000029e02f61673258829 +``` + +### 通过 docker 进行交互式调试 + +- 进行任务id对应的工作目录 `cd t00000020000023a473e1673591617/sub_task/t00000020000023a473e1673591617` + +- 列举当前目录可以看到 `in` 和 `out` 目录 + +- 进行交互式调试 + + - 假设 `ymir-workplace` 存放在 **/data/ymir/ymir-workplace**, 需要将 `ymir-workplace` 目录也挂载到镜像中相同位置,以确保所有软链接均有效。 + + - 假设启动程序为 **/usr/bin/start.sh** + +``` +# --ipc host 表示容器共享主机的所有内存 +docker run -it --rm --gpus all --ipc host -v $PWD/in:/in -v $PWD/out:/out -v /data:/data demo/det:tmi bash + +# --shm-size 128g 表示容器最多共享主机128G内存 +# docker run -it --rm --gpus all --shm-size 128g -v $PWD/in:/in -v $PWD/out:/out -v /data:/data demo/det:tmi bash + +bash /usr/bin/start.sh +``` + +- 假设用户开发镜像的代码存放在 **/home/modelai/code**, 为方便测试, 可以将 **/home/modelai/code** 也挂载到镜像中进行测试。 + + - 假设实际启动程序为 **start.py** + +``` +docker run -it --rm --gpus all --ipc host -v $PWD/in:/in -v $PWD/out:/out -v /data:/data -v /home/modelai/code:/home/modelai/code demo/det:tmi bash + +cd /home/modelai/code +python start.py +``` + +### 测试通过后 + +- 通过 `docker build` 重新构建镜像, 如果修改了超参数,需要在Ymir平台删除旧镜像并重新添加,使更新的超参数生效。如果仅仅修改了代码,不需要重新添加即可使用本地的最新镜像。 + + +## 💫 YMIR后台错误查看 + +- 如镜像正确运行,但输出格式不符合YMIR后台要求,或其他错误,可在 `ymir-workplace/ymir-data/logs` 下查看 + +``` +tail -f -n 200 ymir_controller.log +``` + +## 💫 通过 ymir-executor-verifier 进行测试 + +[ymir-executor-verifier](https://github.com/modelai/ymir-executor-verifier) 面向企业用户,目的是对大量镜像进行自动化测试,以保障镜像的质量。 diff --git a/docs/official-docker-image.md b/docs/official-docker-image.md new file mode 100644 index 0000000..a737618 --- /dev/null +++ b/docs/official-docker-image.md @@ -0,0 +1,175 @@ +# official docker image + +update: 2022/11/01 + +## the hyper-parameters for ymir-executor + +| docker images | epochs/iters | model structure | image size | batch_size | +| - | - | - | - | - | +| yolov5 | epochs | model | img_size | batch_size_per_gpu | +| mmdetection | max_epochs | config_file | - | samples_per_gpu | +| yolov4 | max_batches | - | image_height, image_width | batch | +| yolov7 | epochs | cfg_file | img_size | batch_size_per_gpu | +| nanodet | epochs | config_file | input_size | batch_size_per_gpu | +| vidt | epochs | backbone_name | eval_size | batch_size_per_gpu | +| detectron2 | max_iter | config_file | - | batch_size | + +- epochs: such as `epochs` or `max_epochs`, control the time for training. +- iters: such as `max_batches` or `max_iter`, control the time for training. +- ymir_saved_file_patterns: save the file match one of the pattern. for example `best.pt, *.yaml` will save `best.pt` and all the `*.yaml` file in `/out/models` directory. +- export_format: the dataset format for ymir-executor in `/in`, support `ark:raw` and `voc:raw` +- args_options/cfg_options: for yolov5, use it for other options, such as `--multi-scale --single-cls --optimizer SGD` and so on, view `train.py, parse_opt()` for detail. for mmdetection and detectron2, it provides methods to change other hyper-pameters not defined in `/img-man/training-template.yaml` + +## docker image format + +youdaoyzbx/ymir-executor:[ymir-version]-[repository]-[cuda version]-[ymir-executor function] + +- ymir-version + - ymir1.1.0 + - ymir1.2.0 + - ymir1.3.0 + - ymir2.0.0 + +- repository + - yolov4 + - yolov5 + - yolov7 + - mmdet + - detectron2 + - vidt + - nanodet + +- cuda version + - cu101: cuda 10.1 + - cu102: cuda 10.2 + - cu111: cuda 11.1 + - cu112: cuda 11.2 + +- ymir-executor function + - t: training + - m: mining + - i: infer + - d: deploy + + + +## ymir2.0.0 + +2022/10/26: support ymir1.1.0/1.2.0/1.3.0/2.0.0 + +``` +youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.0-yolov7-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.0-mmdet-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.0-detectron2-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.0-vidt-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.0-nanodet-cu111-tmi +youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmid # support deploy +youdaoyzbx/ymir-executor:ymir2.0.0-yolov4-cu111-tmi # deprecated +``` + +## ymir1.3.0 + +2022/10/10: support ymir1.1.0/1.2.0/1.3.0/2.0.0 + +``` +youdaoyzbx/ymir-executor:ymir1.3.0-yolov5-cu111-tmi +youdaoyzbx/ymir-executor:ymir1.3.0-yolov5-v6.2-cu111-tmi +youdaoyzbx/ymir-executor:ymir1.3.0-yolov5-cu111-modelstore +youdaoyzbx/ymir-executor:ymir1.3.0-mmdet-cu111-tmi +``` + +## ymir1.1.0 + +- [yolov4](https://github.com/modelai/ymir-executor-fork#det-yolov4-training) + + ``` + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-yolov4-cu112-tmi + + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-yolov4-cu101-tmi + ``` + +- [yolov5](https://github.com/modelai/ymir-executor-fork#det-yolov5-tmi) + + - [change log](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-yolov5-tmi/ymir/README.md) + + ``` + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-yolov5-cu111-tmi + + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-yolov5-cu102-tmi + ``` + +- [mmdetection](https://github.com/modelai/ymir-executor-fork#det-mmdetection-tmi) + + - [change log](https://github.com/modelai/ymir-executor-fork/tree/ymir-dev/det-mmdetection-tmi/README.md) + + ``` + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-mmdet-cu111-tmi + + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-mmdet-cu102-tmi + ``` + +- [detectron2](https://github.com/modelai/ymir-detectron2) + + - [change log](https://github.com/modelai/ymir-detectron2/blob/master/README.md) + + ``` + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-detectron2-cu111-tmi + ``` + +- [yolov7](https://github.com/modelai/ymir-yolov7) + + - [change log](https://github.com/modelai/ymir-yolov7/blob/main/ymir/README.md) + + ``` + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-yolov7-cu111-tmi + ``` + +- [vidt](https://github.com/modelai/ymir-vidt) + + - [change log](https://github.com/modelai/ymir-vidt/tree/main/ymir) + + ``` + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-vidt-cu111-tmi + ``` + +- [nanodet](https://github.com/modelai/ymir-nanodet/tree/ymir-dev) + + - [change log](https://github.com/modelai/ymir-nanodet/tree/ymir-dev/ymir) + + ``` + docker pull youdaoyzbx/ymir-executor:ymir1.1.0-nanodet-cu111-tmi + ``` + +# build ymir executor + +## det-yolov4-tmi + +- yolov4 training, mining and infer docker image, use `mxnet` and `darknet` framework + + ``` + cd det-yolov4-tmi + docker build -t ymir-executor/yolov4:cuda101-tmi -f cuda101.dockerfile . + + docker build -t ymir-executor/yolov4:cuda112-tmi -f cuda112.dockerfile . + ``` + +## det-yolov5-tmi + +- yolov5 training, mining and infer docker image, use `pytorch` framework + +``` +cd det-yolov5-tmi +docker build -t ymir-executor/yolov5:cuda102-tmi -f cuda102.dockerfile . + +docker build -t ymir-executor/yolov5:cuda111-tmi -f cuda111.dockerfile . +``` + +## det-mmdetection-tmi + +``` +cd det-mmdetection-tmi +docker build -t ymir-executor/mmdet:cu102-tmi -f docker/Dockerfile.cuda102 . + +docker build -t ymir-executor/mmdet:cu111-tmi -f docker/Dockerfile.cuda111 . +``` diff --git a/docs/overview/dataset-format.md b/docs/overview/dataset-format.md new file mode 100644 index 0000000..4c302fa --- /dev/null +++ b/docs/overview/dataset-format.md @@ -0,0 +1,230 @@ +# Ymir镜像数据集格式 + +## 数据集整体格式 + +### /in/env.yaml + +ymir平台提供的数据集信息存储在镜像文件 /in/env.yaml 中。 + +{!docs/sample_files/in_env.md!} + +### 训练任务 + +ymir平台导出的数据集格式,其中图片格式固定为 'raw', 而标注格式可为 ["ark", "voc", "det-ark", "det-voc", "seg-coco"] 中的某一个, 用户可以通过超参数 `export_format` 修改ymir平台为训练任务提供的数据集格式。 + + +- 训练与验证数据集信息分别存储在索引文件`/in/train-index.tsv`与`/in/val-index.tsv`中, 其中每行的格式为`<图像文件绝对路径>\t<标注文件绝对路径>` + +``` +<图像文件1绝对路径> <标注文件1绝对路径> +<图像文件2绝对路径> <标注文件2绝对路径> +<图像文件3绝对路径> <标注文件3绝对路径> +``` + +### 推理任务与挖掘任务 + +- 推理任务与挖掘任务的数据集格式相同 + +- 推理或挖掘数据集信息存储在索引文件`/in/candidate-index.tsv`中,其中每行的格式为`<图像文件绝对路径>` + +``` +<图像文件1绝对路径> +<图像文件2绝对路径> +<图像文件3绝对路径> +``` + +## det-ark:raw + +也可写为 ark:raw, 为目标检测格式 + +- export_format = det-ark:raw 时的训练/验证集索引文件 + +``` +/in/assets/02/1c5c432085dc136f6920f901792d357d4266df02.jpg /in/annotations/02/1c5c432085dc136f6920f901792d357d4266df02.txt +/in/assets/95/e47ac9932cdf6fb08681f6b0007cbdeefdf49c95.jpg /in/annotations/95/e47ac9932cdf6fb08681f6b0007cbdeefdf49c95.txt +/in/assets/56/56f3af57d381154d377ad92a99b53e4d12de6456.jpg /in/annotations/56/56f3af57d381154d377ad92a99b53e4d12de6456.txt +``` + +这个索引文件采用文本文件格式,每行包含一个图像的`绝对路径`及对应标注的`绝对路径`,以制表符 `\t`进行分隔。 + +- 标注txt文件每行的格式为 `class_id, xmin, ymin, xmax, ymax, ann_quality, bbox_angle`, 以英文逗号 `,` 进行分隔。 + + - `class_id`: 表示标注框所属类别的整数,从0开始计数 + + - `xmin, ymin, xmax, ymax`: 表示标注框左上角和右下角的整数坐标值,以像素为单位。 + + - `ann_quality`:表示标注质量的浮点数,默认为-1.0 + + - `bbox_angle`: 表示标注框旋转角度的浮点数,以[弧度RAD](https://baike.baidu.com/item/RAD/2262445)为单位,默认为0.0 + +``` +0, 242, 61, 424, 249, -1.0, 0.0 +1, 211, 147, 325, 255, -1.0, 0.0 +1, 122, 7, 372, 375, -1.0, 0.0 +``` + +## det-voc:raw + +也可写为 voc:raw, 为目标检测格式 + +- export_format = det-ark:raw 时的训练/验证集索引文件 + +``` +/in/assets/02/1c5c432085dc136f6920f901792d357d4266df02.jpg /in/annotations/02/1c5c432085dc136f6920f901792d357d4266df02.xml +/in/assets/95/e47ac9932cdf6fb08681f6b0007cbdeefdf49c95.jpg /in/annotations/95/e47ac9932cdf6fb08681f6b0007cbdeefdf49c95.xml +/in/assets/56/56f3af57d381154d377ad92a99b53e4d12de6456.jpg /in/annotations/56/56f3af57d381154d377ad92a99b53e4d12de6456.xml +``` + +- 示例xml文件 +``` + + VOC2012 + 2008_000026.jpg + + The VOC2008 Database + PASCAL VOC2008 + flickr + + + 500 + 375 + 3 + + 0 + + person + Frontal + 1 + 1 + + 122 + 7 + 372 + 375 + + 0 + + + dog + Unspecified + 0 + 1 + + 211 + 147 + 325 + 255 + + 0 + + +``` + +## seg-coco:raw + +语义与实例分割的标注格式, 参考coco数据集给出的格式 + +- `export_format = seg-coco:raw` 时的训练/验证集索引文件 + +!!! 注意 + 训练集与验证集共享一个标注文件,需要根据索引文件进行数据集划分 + +!!! 注意 + 语义与实例分割标注中不包含背景类,即只提供项目标签的标注mask。 + 如下图所示,annotations中可能只编码人和马的区域。 + 用户可以通过超参数控制训练镜像是否忽略背景区域。 + +![](../imgs/2007_000783.png) + +``` +/in/assets/02/1c5c432085dc136f6920f901792d357d4266df02.jpg /in/annotations/coco-annotations.json +/in/assets/95/e47ac9932cdf6fb08681f6b0007cbdeefdf49c95.jpg /in/annotations/coco-annotations.json +/in/assets/56/56f3af57d381154d377ad92a99b53e4d12de6456.jpg /in/annotations/coco-annotations.json +``` + +- 示例json文件 + +标注mask采用 `rle` 编码。 + +```json +{ + "images": [ + { + "file_name": "fake1.jpg", + "height": 800, + "width": 800, + "id": 0 + }, + { + "file_name": "fake2.jpg", + "height": 800, + "width": 800, + "id": 1 + }, + { + "file_name": "fake3.jpg", + "height": 800, + "width": 800, + "id": 2 + } + ], + "annotations": [ + { + "bbox": [ + 0, + 0, + 20, + 20 + ], + "segmentation": {"counts": ''}, + "area": 400.00, + "score": 1.0, + "category_id": 1, + "id": 1, + "image_id": 0 + }, + { + "bbox": [ + 0, + 0, + 20, + 20 + ], + "segmentation": {"counts": ''}, + "area": 400.00, + "score": 1.0, + "category_id": 2, + "id": 2, + "image_id": 0 + }, + { + "bbox": [ + 0, + 0, + 20, + 20 + ], + "segmentation": {"counts": ''}, + "area": 400.00, + "score": 1.0, + "category_id": 1, + "id": 3, + "image_id": 1 + } + ], + "categories": [ + { + "id": 1, + "name": "bus", + "supercategory": "none" + }, + { + "id": 2, + "name": "car", + "supercategory": "none" + } + ], + "licenses": [], + "info": null +} +``` diff --git a/docs/overview/framework.md b/docs/overview/framework.md new file mode 100644 index 0000000..b6915b9 --- /dev/null +++ b/docs/overview/framework.md @@ -0,0 +1,67 @@ +# ymir镜像简介 + +- 从数据的角度看,ymir平台实现了数据的导入、划分、合并与标注等功能;镜像则提供代码与环境依赖,利用数据训练模型,对数据进行推理或挖掘出最有标注价值的数据。 + +- 从镜像的角度看,ymir平台提供数据集、任务与超参数信息,镜像处理后产生结果文件,ymir对结果文件进行解析,并显示在ymir平台上。 + +- 从接口的角度看,约定好ymir平台提供的数据与超参数格式,镜像产生的结果文件格式。则可以提供多种镜像,实现不同的算法功能并对接到ymir平台。 + +!!! 注意 + 与其它docker镜像不同,ymir镜像中包含镜像配置文件、代码与运行环境。 + +## ymir镜像使用 + +- [模型训练](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E6%A8%A1%E5%9E%8B%E8%AE%AD%E7%BB%83) + +- [模型推理](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86) + +- [数据挖掘](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E%98) + +## ymir镜像 + +> 将ymir镜像视为一个对象或黑盒,它有以下属性 + +- 镜像类型:按镜像提供的功能,可以将镜像分类为训练镜像,推理镜像及挖掘镜像。一个镜像可以同时为训练,推理及挖掘镜像,也可以仅支持一种或两种功能。 + + - ymir平台基于镜像或数据集,可以发起训练,推理及挖掘任务,任务信息提供到选择的镜像,启动对应的代码实现对应功能。如发起训练任务,将启动镜像中对应的训练代码;发起推理任务,将启动镜像中对应的推理代码。目前ymir平台支持发起单一任务,也支持发起推理及挖掘的联合任务。 + +- 镜像地址:来自[docker](https://www.runoob.com/docker/docker-tutorial.html)的概念,即镜像的仓库源加标签,一般采用<仓库源>:<标签>的格式,如 `ubuntu:22.04`, `youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi`。 + - 对于公开的镜像,仓库源对应docker hub上的镜像仓库,如 [youdaoyzbx/ymir-executor](https://hub.docker.com/r/youdaoyzbx/ymir-executor/tags), [pytorch/pytorch](https://hub.docker.com/r/pytorch/pytorch/tags) + +- 镜像名称:用户自定义的镜像名称,注意名称长度,最多50个字符 + +- 镜像功能参数:为提高镜像的灵活性,用户可以在ymir平台上修改镜像的默认功能参数。如 `epochs`, `batch_size_per_gpu`,控制训练镜像的训练时长及显存占用。注意ymir平台为所有镜像提供额外的[通用参数](./hyper-parameter.md) + + - 训练镜像功能参数:对应训练超参数,常见的有`epochs`, `batch_size_per_gpu`, `num_workers_per_gpu`。默认训练参数配置文件存放在镜像的`/img-man/training-template.yaml` + + - 推理镜像功能参数:常见的有`confidence_threshold`,设置推理置信度。默认推理参数配置文件存放在镜像的`/img-man/infer-template.yaml` + + - 挖掘镜像功能参数:常见的有`confidence_threshold`设置推理置信度, `mining_algorithm`设置挖掘算法。默认挖掘参数配置文件存放在镜像的`/img-man/mining-template.yaml` + +- 镜像目标:根据镜像中算法的类型,将镜像分为目标检测镜像、语义分割镜像及实例分割镜像等。 + + - 镜像目标定义在镜像的 `/img-man/manifest.yaml` 文件中,如此文件不存在,ymir则默认镜像为目标检测镜像。 + +- 关联镜像:对于单一功能的镜像,训练镜像产生的模型,其它镜像不一定能使用。如采用基于[yolov4](https://github.com/AlexeyAB/darknet)训练的模型权重,基于[yolov7](https://github.com/WongKinYiu/yolov7) 推理镜像不支持加载相应模型权重。 因此需要对此类镜像进行关联,推荐使用多功能镜像。 + +!!! 添加镜像 + 添加镜像时需要管理员权限,ymir平台首先会通过 `docker pull` 下载镜像,再解析镜像的`/img-man`目录,确定镜像中算法的类型及镜像支持的功能。 + + +## ymir平台与镜像之间的接口 + +> 从镜像的角度看,ymir平台将任务信息,数据集信息,超参数信息放在镜像的`/in`目录,而镜像输出的进度信息,结果文件放在镜像的`/out`目录。 + +- 任务信息:任务信息包含是否要执行的训练,推理或挖掘任务,任务id。参考镜像文件[/in/env.yaml](../sample_files/in_env.md) + +- [数据集信息](./dataset-format.md):ymir平台中所有的数据集存放在相同的目录下,其中图片以其hash码命名,以避免图片的重复。ymir平台为镜像提供索引文件,索引文件的每一行包含图像绝对路径及对应标注绝对路径。 + + - 对于训练任务,标注的格式由超参数 [export-format](./hyper-parameter.md) 决定。 + + - 对于推理及挖掘任务,索引文件仅包含图像绝对路径。 + + - 参考镜像文件 [/in/env.yaml](../sample_files/in_config.md) + +- [超参数信息](./hyper-parameter.md) + +- [接口文档](../design_doc/ymir_call_image.md) diff --git a/docs/overview/hyper-parameter.md b/docs/overview/hyper-parameter.md new file mode 100644 index 0000000..1dc4db0 --- /dev/null +++ b/docs/overview/hyper-parameter.md @@ -0,0 +1,203 @@ +# Ymir镜像超参数 + +- ymir平台为每个镜像提供通用的参数,同时每个镜像按任务拥有相应的训练、推理及挖掘功能参数。 + +- 部分通用参数由ymir平台自动生成,剩余通用参数可以手动修改。 + +- 默认功能参数由镜像提供,用户可以手动修改功能参数。 + +- 从镜像的角度,通用参数与功能参数均以 [yaml格式](https://www.runoob.com/w3cnote/yaml-intro.html) 存储在镜像中的 `/in/config.yaml` + +{!docs/sample_files/in_config.md!} + +## ymir平台的通用参数 + +- gpu_count: 用户可在启动任务中进行修改 + +- gpu_id: ymir平台根据gpu_count自动生成 + +- task_id: ymir平台自动生成 + +- class_names: ymir平台根据用户选择自动生成 + +- shm_size: 用户可在`超参数配置`页面中手动修改 + +- export_format: 用户可在`超参数配置`页面中手动修改 + +- pretrained_model_params: ymir平台根据用户选择自动生成 + +- model_params_path: ymir平台根据用户选择自动生成 + +### gpu_count + +ymir平台为镜像提供的显卡数量 + +``` +gpu_count: 0 # 表示不使用显卡,即仅使用cpu +gpu_count: 2 # 表示使用 2 块显卡 +``` + +### gpu_id + +ymir 平台为镜像提供的gpu编号,编号从0开始,但实际上使用的显卡为当前空闲显存超过80%的随机显卡。 + +``` +gpu_id: '0' # 采用一块显卡,实际上可能使用编号为5, 6的显卡。 +gpu_id: '0, 1' # 采用两块显卡,实际上可能使用编号为1和8的显卡,或者使用编号为3和5的显卡。 +``` + +!!! 注意 + 对于镜像而言,直接使用 `gpu_id` 对应的显卡即可,不需要考虑 `CUDA_VISIBLE_DEVICES`等变量。 ymir平台在启动镜像时通过`--gpus '"device=5,7"'`指定使用编号为5, 7的显卡,但实际镜像中只能使用编号 `0, 1`,其效果如下。 + +``` +> docker run --gpus '"device=5,7"' nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04 nvidia-smi +Thu Jan 12 06:19:03 2023 ++-----------------------------------------------------------------------------+ +| NVIDIA-SMI 465.31 Driver Version: 465.31 CUDA Version: 11.3 | +|-------------------------------+----------------------+----------------------+ +| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | +| | | MIG M. | +|===============================+======================+======================| +| 0 NVIDIA GeForce ... On | 00000000:86:00.0 Off | N/A | +| 22% 31C P8 1W / 250W | 0MiB / 11019MiB | 0% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ +| 1 NVIDIA GeForce ... On | 00000000:8A:00.0 Off | N/A | +| 22% 29C P8 5W / 250W | 0MiB / 11019MiB | 0% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ +``` + +### task_id + +任务 id, 可以唯一确定某项任务, 如 `t000000100000208ac7a1664337925` + +### class_names + +数据集的类别名称 + +``` +class_names: ['cat', 'dog'] # 任务目标中包含cat和dog +class_names: ['cat'] # 任务目标中仅包含cat +``` + +### shm_size + +ymir平台为镜像提供的共享内存大小,对于ymir2.0.0后的版本,默认的共享内存为 `16G` 乘以 `gpu_count`。而ymir2.0.0之前的版本,默认的共享内存固定为 `16G`。 + +``` +shm_size: 128G # 为镜像提供128G共享内存 +shm_size: 256G # 为镜像提供256G共享内存 +``` + +!!! 注意 + 共享内存过小时,会报 `Out of Memory`的错误,即内存不足错误。可以考虑减少`gpu_count`,`batch size`, `num_workers` 或 增加 `shm_size`。服务器的共享内存可以通过 `df -h` 查看,下面服务器的共享内存为 `63G` + +``` +> df -h | grep shm +Filesystem Size Used Avail Use% Mounted on +tmpfs 63G 0 63G 0% /dev/shm +``` + +### export_format + +ymir平台为训练任务导出的图像及标注格式, 详情参考 [数据集格式](./dataset-format.md) + +- 图像格式:`['raw']`, `raw` 代表常用的图片存储格式,如 `jpg`。 + +- 标注格式:`["ark", "voc", "det-ark", "det-voc", "seg-coco"]`, 其中 `ark` 与 `det-ark` 为同一种目标检测格式,标注文件为txt文件;`voc`与`det-voc`为同一种目标检测格式,标注文件为xml文件;`seg-coco`为语义分割与实例分割的格式,标注文件为[coco格式](https://cocodataset.org/#format-data)的json文件。 + +``` +export_format: ark:raw # 类似 yolov5 的目标检测格式, 标注文件为txt格式 +export_format: det-voc:raw # 类似 voc 目标检测格式,标注文件为xml格式 +export_format: seg-coco:raw # 类似 coco 的语义分割或实例分割的格式,标注文件为json格式 +``` + +!!! 注意 + 仅对训练任务起作用,对于推理或挖掘任务,此参数不起作用 + +### pretrained_model_params + +- ymir平台为训练任务对应镜像提供的参数,其中包含预训练文件的绝对路径。对应[训练配置](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E8%AE%AD%E7%BB%83%E9%85%8D%E7%BD%AE)中的预训练模型。 +- 预训练模型文件对应训练任务的输出文件,可以包含任意文件,如配置文件等,不局限于权重文件。 + +``` +pretrained_model_params: ['/in/models/a.pth', '/in/models/b.pth', '/in/models/a.py'] +``` + +!!! 注意 + 对于推理或挖掘任务,此参数不提供 + + +### model_params_path + +- ymir平台为推理或挖掘任务对应镜像提供的参数,其中包含权重文件的绝对路径,对应[模型推理](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86) 或 [数据挖掘](https://github.com/IndustryEssentials/ymir/wiki/%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E#%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E%98) 中选择的模型。 + +``` +model_params_path: ['/in/models/a.pth', '/in/models/b.pth', '/in/models/a.py'] +``` + +!!! 注意 + 对于训练任务,此参数不提供 + +## 任务超参数 + +### 训练任务超参数 + +镜像可以通过 `/img-man/training-template.yaml` 向ymir平台暴露训练任务的超参数, 以`youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi`镜像为例,它的训练任务超参数配置文件如下: + +- 训练任务超参数配置文件: 镜像中的 `/img-man/training-template.yaml` + +``` +shm_size: '128G' +export_format: 'ark:raw' +model: 'yolov5s' +batch_size_per_gpu: 16 +num_workers_per_gpu: 4 +epochs: 100 +img_size: 640 +opset: 11 +args_options: '--exist-ok' +save_best_only: True # save the best weight file only +save_period: 10 +sync_bn: False # work for multi-gpu only +activation: 'SiLU' # view https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity +``` + +- ymir平台对应的超参数编辑页面, 编辑页面与配置文件一一对应。 + +![](../imgs/training-hyper-parameter-web.png) + +### 推理任务超参数 + +- 推理任务超参数配置文件: 镜像中的 `/img-man/infer-template.yaml` + + +### 挖掘任务超参数 + +- 挖掘任务超参数配置文件: 镜像中的 `/img-man/mining-template.yaml` + +### 常用任务超参数 + +- epochs: 整数,如 100, 表示在训练任务中,整个数据集循环的次数。epochs 越大,数据集越大,训练时间越长。 + + - 类似的参数有 `max_epochs`, `num_epochs`,表达的意思相同。 + +- steps: 整数,如20000,表示训练任务中,训练步骤循环的次数。steps越大,训练时间越长。 + + - 类似的参数有 `max_steps`, `num_steps`, `iters`, `max_iters`, `num_iters` + + - `steps = epochs * dataset_size / batch_size` + +- batch_size: 整数,批量大小,如 8。由于数据集往往上万张,计算机无法一次性全部加载到内存或显存中,因此在处理时,可以一次处理 8 张。 + + - 类似的参数有 `batch`, `num_batch`。 + + - 对于支持分布式处理的镜像, `batch_size_per_gpu` 与 `num_images_per_gpu` 乘以 使用的GPU数(gpu_count),则为实际的 batch_size。 + +- num_workers: 整数,数据加载时使用的进程数,设置为0则是采用单进程进行加载,一般设置为4 或 8。 + + - 类似的参数有: `workers` + + - 对于支持分布式处理的镜像, `num_workers_per_gpu` 乘以使用的GPU数(gpu_count), 则为实际的 num_workers。 diff --git a/docs/overview/introduction.md b/docs/overview/introduction.md new file mode 100644 index 0000000..83acf37 --- /dev/null +++ b/docs/overview/introduction.md @@ -0,0 +1,79 @@ +# YMIR简介 + +YMIR是一款专为规模化生产而设计的AI平台,旨在为算法开发和标注人员提供端到端的算法研发工具。YMIR平台可以将企业的数据和模型进行平台化管理,降低算法开发和维护成本,提高数据标注和模型训练效率。除了数据标注、模型训练和模型部署功能外,YMIR还提供以下特色功能: + +1. 数据挖掘功能:YMIR利用主动学习算法,可以挖掘高质量数据,并仅使用10%的标注量即可获得接近100%标注的精度。 + +2. 数据和模型版本管理:YMIR系统可以对数据和模型进行版本管理,支持历史追溯和复现。 + +3. 项目划分:每个项目都具有固定的标签集,用户可以在同一项目中进行数据操作和模型训练,产生多个数据和模型版本,并对不同版本的数据和模型进行对比分析,提高工作效率。 + +4. 可视化:YMIR支持对数据、模型训练、模型推理和模型评估进行可视化,方便用户理解和把控AI算法生产的所有环节。 + +![ ](../imgs/ymir-design.png) + +# 安装简介 + +详情参考[官方安装说明](https://github.com/IndustryEssentials/ymir/blob/master/README_zh-CN.md#2-%E5%AE%89%E8%A3%85) + +## 服务器系统 + +- 推荐使用 ubuntu 18.04, 使用ubuntu 22.04+ 可能会出现glibc缺失的问题。 + +## nvidia驱动 + +- 推荐使用 Nvidia dirver >= 510.47.03, 以支持cuda11.6及以下镜像 + +``` +# 测试命令 +nvidia-smi +``` + +## docker & docker compose + +- 推荐使用 docker >= 20.10, 安装参考[docker install](https://docs.docker.com/engine/install/ubuntu/) + +- 推荐使用 docker compose >= 1.29.2 + +``` +# 安装docker engine, 此方式经过第三方,可能有风险 +curl -sSL https://get.daocloud.io/docker | sh + +# 安装docker compose +pip3 install docker-compose + +# 普通用户添加docker权限,重启生效 +sudo groupadd docker +sudo usermod -aG docker $USER + +# 测试普通用户使用docker +docker run hello-world + +# 查看docker-compose版本 +docker-compose version +``` + +## nvidia-docker + +- [安装参考](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installation-guide) + +``` +# 测试命令 +docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi +``` + +## ymir安装命令 + +- 安装并开启服务 +``` +git clone git clone https://github.com/IndustryEssentials/ymir.git +bash ymir.sh start +``` + +- 安装完成后,直接访问 http://localhost:12001 即可显示登录界面, 默认用户名为admin@example.com, 密码为12345678 + +- 停止服务 +``` +bash ymir.sh stop +``` + diff --git a/docs/overview/ymir-executor.md b/docs/overview/ymir-executor.md new file mode 100644 index 0000000..a390210 --- /dev/null +++ b/docs/overview/ymir-executor.md @@ -0,0 +1,174 @@ +# ymir镜像制作简介 + +## 背景知识 + +- [python3](https://www.runoob.com/python3/python3-tutorial.html) ymir平台,深度学习框架,开源算法库主要以python3进行开发 + +- [docker](https://www.runoob.com/docker/docker-tutorial.html) 制作ymir镜像,需要了解docker 及 [dockerfile](https://www.runoob.com/docker/docker-dockerfile.html) + +- [linux](https://www.runoob.com/linux/linux-shell.html) ymir镜像主要基于linux系统,需要了解linux 及 [linux-shell](https://www.runoob.com/linux/linux-shell.html) + +- [深度视觉算法] ymir镜像的核心算法是深度视觉算法,需要了解[深度学习](https://leonardoaraujosantos.gitbook.io/artificial-inteligence/machine_learning/deep_learning), 计算机视觉。 + +- [深度学习框架] 应用深度学习算法离不开深度学习框架如 [pytorch](https://pytorch.org/), [tensorflow](https://tensorflow.google.cn/?hl=en) 与 [keras](https://keras.io/) 等的支持。熟悉其中的一种即可,推荐pytorch. + +- [深度学习算法库] 基于已有的算法库应用前沿算法或开发新算法是常规操作,推荐了解 [mmdetection](https://github.com/open-mmlab/mmdetection) 与 [yolov5](https://github.com/ultralytics/yolov5) + + +## 环境依赖 + +假设拥有一台带nvidia显卡的linux服务器, 以ubuntu18.04 为例 + +!!! 注意 + 如果apt update 或 apt install 速度缓慢,可以考虑更换软件源 + [清华软件源](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/) + [中科大软件源](http://mirrors.ustc.edu.cn/help/ubuntu.html) + +- [docker](https://www.runoob.com/docker/ubuntu-docker-install.html) +``` +# 安装 +curl -sSL https://get.daocloud.io/docker | sh + +# 测试 +sudo docker run hello-world + +# 添加普通用户执行权限 +sudo usermod -aG docker $USER + +# 重新login后测试 +docker run hello-world +``` + +- [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installation-guide) + +!!! 注意 + 先按照上述链接中的前提条件安装好 **NVIDIA Driver >=510.47.03 **, 以支持 `cuda11.6+` + +!!! gpu驱动与cuda版本 + 引用自openmmlab: + + 对于基于 Ampere 的 NVIDIA GPU,例如 GeForce 30 系列和 NVIDIA A100,CUDA 版本需要 >= 11。 + 对于较旧的 NVIDIA GPU,CUDA 11 向后兼容,但 CUDA 10.2 提供更好的兼容性并且更轻量级。 + 请确保 GPU 驱动程序满足最低版本要求。有关详细信息,请参阅[此表](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions) + +``` +# 添加软件源 +distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ + && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ + && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ + sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ + sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list + +# 更新索引 +sudo apt-get update + +# 安装 +sudo apt-get install -y nvidia-docker2 + +# 重启docker +sudo systemctl restart docker + +# 测试 +docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi +``` + +## 制作一个hello world 镜像 + +### 编辑Dockerfile + +``` +# vim Dockerfile +# cat Dockerfile + +FROM ubuntu:18.04 # 基于ubuntu18.04镜像制作新镜像 + +CMD echo "hello ymir executor" # 新镜像在运行时默认执行的命令 +``` + +### 制作 hello-ymir:latest 镜像 + +``` +# docker build -t hello-ymir:latest -f Dockerfile . + +Sending build context to Docker daemon 52.74kB +Step 1/2 : FROM ubuntu:18.04 +18.04: Pulling from library/ubuntu +a055bf07b5b0: Pull complete +Digest: sha256:c1d0baf2425ecef88a2f0c3543ec43690dc16cc80d3c4e593bb95e4f45390e45 +Status: Downloaded newer image for ubuntu:18.04 + ---> e28a50f651f9 +Step 2/2 : CMD echo "hello ymir executor" + ---> Running in 6dd391c7688d +Removing intermediate container 6dd391c7688d + ---> 4c8672e6ce02 +Successfully built 4c8672e6ce02 +Successfully tagged hello-ymir:latest +``` + +### 测试 + +``` +# docker run -it --rm hello-ymir + +hello ymir executor +``` + +## ymir 镜像制作 + +### 基础镜像 + +需要选择一个合适的基础镜像来避免从0开始制作ymir镜像,上面的例子中我们采用ubuntu18.04作用基础镜像构建新镜像,基于实践,我们推荐制作ymir镜像的基础镜像包含以下配置: + +- python 版本 >= 3.8 + +- ymir镜像的cuda版本<=主机支持的cuda版本 + +- 推荐基于[nvidia/cuda](https://hub.docker.com/r/nvidia/cuda/tags) 与 [pytorch/pytorch](https://hub.docker.com/r/pytorch/pytorch/tags) 进行ymir镜像制作 + +### 所有ymir镜像均需要实现的功能 + +- 提供超参数模板文件: 必选,ymir平台需要解析镜像的 **/img-man** 目录生成超参数配置页面 + +- 提供默认启动脚本:必选,推荐采用 **bash /usr/bin/start.sh** 作用镜像的默认启动脚本 + +- 写进度: 必选, 将程序当前完成的百分比反馈到ymir平台,从而估计程序的剩余运行时间 + +- 写结果文件:必选,将程序运行的结果反馈到ymir平台 + +- 提供镜像说明文件:可选,ymir平台通过解析 **/img-man/manifest.yaml** 得到镜像的目标类型,即镜像支持目标检测,语义分割还是实例分割。默认目标类型为目标检测。 + +### 训练镜像需要实现的额外功能 + +- 基本功能:加载数据集与超参数进行训练,将模型权重,模型精度等结果保存到 **/out** 目录的指定文件。 + +``` +# pip install "git+https://github.com/modelai/ymir-executor-sdk.git@ymir2.1.0" +from ymir_exc import env + +env_config = env.get_current_env() +with open(env_config.output.training_result_file, "w") as f: + yaml.safe_dump(data=training_result, stream=f) +``` + +- 写tensorboard日志:可选, ymir平台支持查看训练任务的tensorboard训练日志 + +### 推理镜像需要实现的额外功能 + +- 基本功能:加载数据集与模型权重进行推理,将推理结果保存到 **/out** 目录的指定文件。 + +``` +env_config = env.get_current_env() +with open(env_config.output.infer_result_file, "w") as f: + f.write(json.dumps(result)) +``` + +### 挖掘镜像需要实现的额外功能 + +- 基本功能:加载数据集与模型权重进行挖掘,基于主动学习算法获得每张图片的重要程度分数,将分数保存到 **/out** 目录的指定文件。 + +``` +env_config = env.get_current_env() +with open(env_config.output.mining_result_file, "w") as f: + for asset_id, score in sorted_mining_result: + f.write(f"{asset_id}\t{score}\n") +``` diff --git a/docs/requirements.in b/docs/requirements.in new file mode 100644 index 0000000..bec300c --- /dev/null +++ b/docs/requirements.in @@ -0,0 +1,3 @@ +mkdocs +mkdocstrings[python] +markdown-include diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 0000000..323332c --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,68 @@ +# +# This file is autogenerated by pip-compile with python 3.10 +# To update, run: +# +# pip-compile docs/requirements.in +# +click==8.1.3 + # via mkdocs +ghp-import==2.1.0 + # via mkdocs +griffe==0.22.0 + # via mkdocstrings-python +importlib-metadata==4.12.0 + # via mkdocs +jinja2==3.1.2 + # via + # mkdocs + # mkdocstrings +markdown==3.3.7 + # via + # markdown-include + # mkdocs + # mkdocs-autorefs + # mkdocstrings + # pymdown-extensions +markdown-include==0.6.0 + # via -r docs/requirements.in +markupsafe==2.1.1 + # via + # jinja2 + # mkdocstrings +mergedeep==1.3.4 + # via mkdocs +mkdocs==1.3.0 + # via + # -r docs/requirements.in + # mkdocs-autorefs + # mkdocstrings +mkdocs-autorefs==0.4.1 + # via mkdocstrings +mkdocstrings[python]==0.19.0 + # via + # -r docs/requirements.in + # mkdocstrings-python +mkdocstrings-python==0.7.1 + # via mkdocstrings +packaging==21.3 + # via mkdocs +pymdown-extensions==9.5 + # via mkdocstrings +pyparsing==3.0.9 + # via packaging +python-dateutil==2.8.2 + # via ghp-import +pyyaml==6.0 + # via + # mkdocs + # pyyaml-env-tag +pyyaml-env-tag==0.1 + # via mkdocs +six==1.16.0 + # via python-dateutil +watchdog==2.1.9 + # via mkdocs +zipp==3.8.0 + # via importlib-metadata +mkdocs-include-dir-to-nav==1.2.0 + # via mkdocs diff --git a/docs/sample_files/in_config.md b/docs/sample_files/in_config.md new file mode 100644 index 0000000..8d767f5 --- /dev/null +++ b/docs/sample_files/in_config.md @@ -0,0 +1,20 @@ +``` +args_options: --exist-ok +batch_size_per_gpu: 16 +class_names: +- dog +- cat +- person +epochs: 10 +export_format: ark:raw +gpu_count: 4 +gpu_id: '0,1,2,3' +img_size: 640 +model: yolov5s +num_workers_per_gpu: 8 +opset: 11 +save_period: 10 +shm_size: 32G +sync_bn: false +task_id: t000000100000208ac7a1664337925 +``` diff --git a/docs/sample_files/in_env.md b/docs/sample_files/in_env.md new file mode 100644 index 0000000..07e0a14 --- /dev/null +++ b/docs/sample_files/in_env.md @@ -0,0 +1,27 @@ +``` +input: + annotations_dir: /in/annotations # 标注文件存储目录 + assets_dir: /in/assets # 图像文件存储目录 + candidate_index_file: /in/candidate-index.tsv # 推理或挖掘任务中的数据集索引文件 + config_file: /in/config.yaml # 超参数文件 + models_dir: /in/models # 预训练模型文件存储目录 + root_dir: /in # 输入信息根目录 + training_index_file: /in/train-index.tsv # 训练任务中的训练数据集索引文件 + val_index_file: /in/val-index.tsv # 训练任务中的验证数据集索引文件 +output: + infer_result_file: /out/infer-result.json # 推理任务结果文件 + mining_result_file: /out/result.tsv # 挖掘任务结果文件 + models_dir: /out/models # 训练任务权重文件输出目录 + monitor_file: /out/monitor.txt # 进度记录文件 + root_dir: /out # 输出信息根目录 + tensorboard_dir: /out/tensorboard # 训练任务中tensorboard日志目录 + training_result_file: /out/models/result.yaml # 训练任务的结果文件 +run_infer: false # 是否执行推理任务 +run_mining: true # 是否执行挖掘任务 +run_training: false # 是否执行训练任务 +protocol_version: 1.0.0 # ymir平台镜像接口版本 +task_id: t00000020000029d077c1662111056 # 任务id +``` + +!!! 注意 + /in/env.yaml 中的所有路径均为绝对路径 diff --git a/docs/speedup_apt_pip_docker.md b/docs/speedup_apt_pip_docker.md new file mode 100644 index 0000000..4eac2ae --- /dev/null +++ b/docs/speedup_apt_pip_docker.md @@ -0,0 +1,36 @@ +# docker 加速 apt + +在 `dockerfile` 中添加如下命令再进行 `apt` 安装 + +``` +# Install linux package +RUN sed -i 's#http://archive.ubuntu.com#https://mirrors.ustc.edu.cn#g' /etc/apt/sources.list \ + && sed -i 's#http://security.ubuntu.com#https://mirrors.ustc.edu.cn#g' /etc/apt/sources.list \ + && apt-get update +``` + +- [ubuntu/debian 加速apt](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/) +- [centos 加速yum](https://mirrors.tuna.tsinghua.edu.cn/help/centos/) + +# docker 加速 pip + +在 `dockerfile` 中添加如下命令再进行 `pip` 安装 + +``` +# install ymir-exc sdk +RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple +``` + +- [pip 加速](https://mirrors.tuna.tsinghua.edu.cn/help/pypi/) +- [conda/anaconda 加速](https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/) + + +# docker pull/push 加速 + +以下链接均没测试,欢迎反馈 + +- [南京大学 mirror](https://nju-mirror-help.njuer.org/dockerhub.html) + +- [百度网易阿里 mirror](https://yeasy.gitbook.io/docker_practice/install/mirror) + +- [华为 mirror](https://bbs.huaweicloud.com/blogs/381362) diff --git a/docs/ymir-docker-develop.drawio.png b/docs/ymir-docker-develop.drawio.png new file mode 100644 index 0000000..706a95e Binary files /dev/null and b/docs/ymir-docker-develop.drawio.png differ diff --git a/docs/ymir-executor-version.md b/docs/ymir-executor-version.md new file mode 100644 index 0000000..1135b33 --- /dev/null +++ b/docs/ymir-executor-version.md @@ -0,0 +1,21 @@ +# ymir2.0.0 (2022-09-30) + +- 支持分开输出模型权重,用户可以采用epoch10.pth进行推理,也可以选择epoch20.pth进行推理 + +- 训练镜像需要指定数据集标注格式, ymir1.1.0默认标注格式为`ark:raw`, ymir2.0.0默认标注格式为`ark:voc` + +- 训练镜像可以获得系统的ymir接口版本,方便镜像兼容 + +- 预训练模型文件在ymir1.1.0时放在/in/models目录下,ymir2.0.0时放在 /in/models/目录下 + +## 辅助库 + +- [ymir-executor-sdk](https://github.com/modelai/ymir-executor-sdk) 采用ymir2.0.0分支 + +- [ymir-executor-verifier](https://github.com/modelai/ymir-executor-verifier) 镜像检查工具 + +# ymir1.1.0 + +- [custom ymir-executor](https://github.com/IndustryEssentials/ymir/blob/dev/dev_docs/ymir-dataset-zh-CN.md) + +- [ymir-executor-sdk](https://github.com/modelai/ymir-executor-sdk) 采用ymir1.0.0分支 diff --git a/live-code-executor/img-man/training-template.yaml b/live-code-executor/img-man/training-template.yaml index 865b40b..df87016 100644 --- a/live-code-executor/img-man/training-template.yaml +++ b/live-code-executor/img-man/training-template.yaml @@ -6,3 +6,5 @@ gpu_id: '0' task_id: 'default-training-task' pretrained_model_params: [] class_names: [] +export_format: 'ark:raw' +shm_size: '128G' diff --git a/live-code-executor/mxnet.dockerfile b/live-code-executor/mxnet.dockerfile index e04bd4b..ed08fff 100644 --- a/live-code-executor/mxnet.dockerfile +++ b/live-code-executor/mxnet.dockerfile @@ -15,7 +15,8 @@ ENV PATH /opt/conda/bin:$PATH # install linux package, needs to fix GPG error first. RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC && \ apt-get update && \ - apt-get install -y git gcc wget curl zip libglib2.0-0 libgl1-mesa-glx && \ + apt-get install -y git gcc wget curl zip libglib2.0-0 libgl1-mesa-glx \ + libsm6 libxext6 libxrender-dev build-essential && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* && \ wget "${MINICONDA_URL}" -O miniconda.sh -q && \ diff --git a/live-code-executor/torch.dockerfile b/live-code-executor/torch.dockerfile index a71476f..4fd9a90 100644 --- a/live-code-executor/torch.dockerfile +++ b/live-code-executor/torch.dockerfile @@ -15,7 +15,8 @@ ENV LANG=C.UTF-8 # install linux package RUN apt-get update && apt-get install -y git curl wget zip gcc \ - libglib2.0-0 libgl1-mesa-glx \ + libglib2.0-0 libgl1-mesa-glx libsm6 libxext6 libxrender-dev \ + build-essential \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* diff --git a/live-code-executor/ymir_start.py b/live-code-executor/ymir_start.py index d2c5415..ee81336 100644 --- a/live-code-executor/ymir_start.py +++ b/live-code-executor/ymir_start.py @@ -50,7 +50,13 @@ def main(): logger.info('no python package needs to install') # step 3. run /app/start.py - cmd = 'python3 start.py' + if osp.exists('/app/start.py'): + cmd = 'python3 start.py' + elif osp.exists('/app/ymir/start.py'): + cmd = 'python3 ymir/start.py' + else: + raise Exception('cannot found start.py') + logger.info(f'run task: {cmd}') subprocess.run(cmd.split(), check=True, cwd='/app') diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..1592a16 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,51 @@ +site_name: Ymir-Executor Documence +theme: + name: readthedocs + highlightjs: true +plugins: + - search + - mkdocstrings: + handlers: + # See: https://mkdocstrings.github.io/python/usage/ + python: + options: + docstring_style: numpy + # watch: + # - seg-semantic-demo-tmi.app.start + - include_dir_to_nav +markdown_extensions: + - markdown_include.include: + base_path: . + - admonition + - toc: + permalink: "#" +# - sane_lists +nav: + - Home: index.md + - 基本概念: + - overview/introduction.md + - overview/framework.md + - overview/dataset-format.md + - overview/hyper-parameter.md + - overview/ymir-executor.md + - 目标检测: object_detection + - 图像分割: + - image_segmentation/simple_semantic_seg_training.md + - image_segmentation/simple_semantic_seg_infer.md + - image_segmentation/simple_semantic_seg_mining.md + - image_segmentation/test_semantic_seg.md + - image_segmentation/simple_instance_seg_tmi.md + - 快速定制: fast_custom + - 镜像社区: + - image_community/image_community.md + - image_community/seg-mmseg-tmi.md + - image_community/det-yolov5-tmi.md + - image_community/det-mmdet-tmi.md + - image_community/det-nanodet-tmi.md + - image_community/det-detectron2-tmi.md + - image_community/det-yolov7-tmi.md + - image_community/det-vidt-tmi.md + - image_community/det-yolov5-automl-tmi.md + - image_community/det-yolov4-tmi.md + - 算法仓库: algorithms + - 设计文档: design_doc diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..14a2dda --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,8 @@ +[build-system] +requires = ["flit_core >=3.2,<4"] +build-backend = "flit_core.buildapi" + +[project] +name = "lumache" +authors = [{name = "Graziella", email = "graziella@lumache"}] +dynamic = ["version", "description"] diff --git a/seg-instance-demo-tmi/Dockerfile b/seg-instance-demo-tmi/Dockerfile new file mode 100644 index 0000000..7481013 --- /dev/null +++ b/seg-instance-demo-tmi/Dockerfile @@ -0,0 +1,42 @@ +# a docker file for an sample training / mining / infer executor + +# FROM ubuntu:20.04 +FROM python:3.8.16 + +ENV LANG=C.UTF-8 + +# Change mirror +RUN sed -i 's#http://archive.ubuntu.com#http://mirrors.ustc.edu.cn#g' /etc/apt/sources.list \ + && sed -i 's#http://security.ubuntu.com#http://mirrors.ustc.edu.cn#g' /etc/apt/sources.list + +# Set timezone +RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \ + && echo 'Asia/Shanghai' >/etc/timezone + +# Install linux package +RUN apt-get update && apt-get install -y gnupg2 git libglib2.0-0 \ + libgl1-mesa-glx libsm6 libxext6 libxrender-dev \ + build-essential ninja-build \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +COPY requirements.txt ./ +RUN pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple + +WORKDIR /app +# copy user code to WORKDIR +COPY ./app/*.py /app/ + +# copy user config template and manifest.yaml to /img-man +RUN mkdir -p /img-man +COPY img-man/*.yaml /img-man/ + +# view https://github.com/protocolbuffers/protobuf/issues/10051 for detail +ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +# entry point for your app +# the whole docker image will be started with `nvidia-docker run ` +# and this command will run automatically + +RUN echo "python /app/start.py" > /usr/bin/start.sh +CMD bash /usr/bin/start.sh diff --git a/seg-instance-demo-tmi/README.MD b/seg-instance-demo-tmi/README.MD new file mode 100644 index 0000000..cd30a72 --- /dev/null +++ b/seg-instance-demo-tmi/README.MD @@ -0,0 +1,3 @@ +# ymir 自定义实例分割镜像 + + diff --git a/seg-instance-demo-tmi/app/pycococreatortools.py b/seg-instance-demo-tmi/app/pycococreatortools.py new file mode 100644 index 0000000..edf777b --- /dev/null +++ b/seg-instance-demo-tmi/app/pycococreatortools.py @@ -0,0 +1,143 @@ +#!/usr/bin/env python3 +""" +from https://github.com/waspinator/pycococreator/blob/0.2.1/pycococreatortools/pycococreatortools.py +""" +import datetime +from itertools import groupby + +import numpy as np +from PIL import Image +from pycocotools import mask + + +def resize_binary_mask(array, new_size): + image = Image.fromarray(array.astype(np.uint8) * 255) + image = image.resize(new_size) + return np.asarray(image).astype(np.bool_) + + +def close_contour(contour): + if not np.array_equal(contour[0], contour[-1]): + contour = np.vstack((contour, contour[0])) + return contour + + +def binary_mask_to_rle(binary_mask, compress=True): + """ + if compress: + return {'counts': b'', 'size': list(binary_mask.shape)} + else: + return {'counts': [0, 56541, 7, 338, ...], 'size': list(binary_mask.shape)} + """ + if compress: + rle = mask.encode(np.asfortranarray(binary_mask.astype(np.uint8))) + rle['counts'] = rle['counts'].decode('utf-8') + return rle + + rle = {'counts': [], 'size': list(binary_mask.shape)} + counts = rle.get('counts') + for i, (value, elements) in enumerate(groupby(binary_mask.ravel(order='F'))): + if i == 0 and value == 1: + counts.append(0) + counts.append(len(list(elements))) + + return rle + + +def binary_mask_to_polygon(binary_mask, tolerance=0): + """Converts a binary mask to COCO polygon representation + + Args: + binary_mask: a 2D binary numpy array where '1's represent the object + tolerance: Maximum distance from original points of polygon to approximated + polygonal chain. If tolerance is 0, the original coordinate array is returned. + + """ + from skimage import measure + + polygons = [] + # pad mask to close contours of shapes which start and end at an edge + padded_binary_mask = np.pad(binary_mask, pad_width=1, mode='constant', constant_values=0) + contours = measure.find_contours(padded_binary_mask, 0.5) + contours = np.subtract(contours, 1) + for contour in contours: + contour = close_contour(contour) + contour = measure.approximate_polygon(contour, tolerance) + if len(contour) < 3: + continue + contour = np.flip(contour, axis=1) + segmentation = contour.ravel().tolist() + # after padding and subtracting 1 we may get -0.5 points in our segmentation + segmentation = [0 if i < 0 else i for i in segmentation] + polygons.append(segmentation) + + return polygons + + +def create_image_info(image_id, + file_name, + image_size, + date_captured=datetime.datetime.utcnow().isoformat(' '), + license_id=1, + coco_url="", + flickr_url=""): + + image_info = { + "id": image_id, + "file_name": file_name, + "width": image_size[0], + "height": image_size[1], + "date_captured": date_captured, + "license": license_id, + "coco_url": coco_url, + "flickr_url": flickr_url + } + + return image_info + + +def create_annotation_info(annotation_id, + image_id, + category_info, + binary_mask, + image_size=None, + tolerance=2, + bounding_box=None): + + if image_size is not None: + binary_mask = resize_binary_mask(binary_mask, image_size) + + binary_mask_encoded = mask.encode(np.asfortranarray(binary_mask.astype(np.uint8))) + + area = mask.area(binary_mask_encoded) + if area < 1: + return None + + if bounding_box is None: + bounding_box = mask.toBbox(binary_mask_encoded) + + if category_info["is_crowd"]: + is_crowd = 1 + # segmentation = binary_mask_to_rle(binary_mask) + segmentation = binary_mask_encoded + # avoid TypeError: Object of type bytes is not JSON serializable + segmentation['counts'] = segmentation['counts'].decode('utf-8') + else: + is_crowd = 0 + segmentation = binary_mask_to_polygon(binary_mask, tolerance) + if not segmentation: + return None + + annotation_info = { + "id": annotation_id, + "image_id": image_id, + "category_id": category_info["id"], + "iscrowd": is_crowd, + "area": area.tolist(), + "bbox": bounding_box.tolist(), + "segmentation": segmentation, + "width": binary_mask.shape[1], + "height": binary_mask.shape[0], + } + + return annotation_info diff --git a/seg-instance-demo-tmi/app/result_to_coco.py b/seg-instance-demo-tmi/app/result_to_coco.py new file mode 100644 index 0000000..69a48e9 --- /dev/null +++ b/seg-instance-demo-tmi/app/result_to_coco.py @@ -0,0 +1,108 @@ +#!/usr/bin/env python3 + +import datetime +import os.path as osp +import random +from typing import Dict, List + +import imagesize +import numpy as np +from easydict import EasyDict as edict +from tqdm import tqdm + +import pycococreatortools + +INFO = { + "description": "Example Dataset", + "url": "https://github.com/waspinator/pycococreator", + "version": "0.1.0", + "year": 2022, + "contributor": "ymir", + "date_created": datetime.datetime.utcnow().isoformat(' ') +} + +LICENSES = [{ + "id": 1, + "name": "Attribution-NonCommercial-ShareAlike License", + "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/" +}] + +CATEGORIES = [ + { + 'id': 1, + 'name': 'square', + 'supercategory': 'shape', + }, + { + 'id': 2, + 'name': 'circle', + 'supercategory': 'shape', + }, + { + 'id': 3, + 'name': 'triangle', + 'supercategory': 'shape', + }, +] + + +def convert(ymir_cfg: edict, results: List[Dict], with_blank_area: bool): + """ + convert ymir infer result to coco instance segmentation format + the mask is encode in compressed rle + the is_crowd is True + """ + class_names = ymir_cfg.param.class_names + + categories = [] + # categories should start from 0 + for idx, name in enumerate(class_names): + categories.append(dict(id=idx, name=name, supercategory='none')) + + coco_output = {"info": INFO, "licenses": LICENSES, "categories": categories, "images": [], "annotations": []} + + image_id = 1 + annotation_id = 1 + + for idx, d in enumerate(tqdm(results, desc='convert result to coco')): + image_f = d['image'] + result = d['result'] + + width, height = imagesize.get(image_f) + + image_info = pycococreatortools.create_image_info(image_id=image_id, + file_name=osp.basename(image_f), + image_size=(width, height)) + + coco_output["images"].append(image_info) # type: ignore + + # category_id === class_id start from 0 + unique_ids = np.unique(result) + for np_class_id in unique_ids: + if with_blank_area: + class_id = int(np_class_id) - 1 + else: + class_id = int(np_class_id) + + # remove background class in infer-result + if with_blank_area and class_id < 0: + continue + + assert class_id < len(class_names), f'class_id {class_id} must < class_num {len(class_names)}' + category_info = {'id': class_id, 'is_crowd': True} + binary_mask = result == np_class_id + annotation_info = pycococreatortools.create_annotation_info(annotation_id, + image_id, + category_info, + binary_mask, + tolerance=2) + + # for instance segmentation + annotation_info['confidence'] = min(1.0, 0.1 + random.random()) + if annotation_info is not None: + coco_output["annotations"].append(annotation_info) # type: ignore + annotation_id = annotation_id + 1 + + image_id += 1 + + return coco_output diff --git a/seg-instance-demo-tmi/app/start.py b/seg-instance-demo-tmi/app/start.py new file mode 100644 index 0000000..e528ff4 --- /dev/null +++ b/seg-instance-demo-tmi/app/start.py @@ -0,0 +1,239 @@ +import logging +import os +import random +import sys +import time +from typing import List + +import cv2 +import numpy as np +from easydict import EasyDict as edict +from tensorboardX import SummaryWriter +from ymir_exc import monitor +from ymir_exc import result_writer as rw +from ymir_exc.util import get_merged_config + +from result_to_coco import convert + + +def start() -> int: + cfg = get_merged_config() + + if cfg.ymir.run_training: + _run_training(cfg) + if cfg.ymir.run_mining: + _run_mining(cfg) + if cfg.ymir.run_infer: + _run_infer(cfg) + + return 0 + + +def _run_training(cfg: edict) -> None: + """sample function of training + + which shows: + - how to get config file + - how to read training and validation datasets + - how to write logs + - how to write training result + """ + # use `env.get_executor_config` to get config file for training + gpu_id: str = cfg.param.get('gpu_id') + class_names: List[str] = cfg.param.get('class_names') + expected_maskap: float = cfg.param.get('expected_maskap', 0.6) + idle_seconds: float = cfg.param.get('idle_seconds', 60) + trigger_crash: bool = cfg.param.get('trigger_crash', False) + # use `logging` or `print` to write log to console + # notice that logging.basicConfig is invoked at executor.env + logging.info(f'gpu device: {gpu_id}') + logging.info(f'dataset class names: {class_names}') + logging.info(f"training config: {cfg.param}") + + # count for image and annotation file + with open(cfg.ymir.input.training_index_file, 'r') as fp: + lines = fp.readlines() + + valid_image_count = 0 + valid_ann_count = 0 + + N = len(lines) + monitor_gap = max(1, N // 100) + for idx, line in enumerate(lines): + asset_path, annotation_path = line.strip().split() + if os.path.isfile(asset_path): + valid_image_count += 1 + + if os.path.isfile(annotation_path): + valid_ann_count += 1 + + # use `monitor.write_monitor_logger` to write write task process percent to monitor.txt + if idx % monitor_gap == 0: + monitor.write_monitor_logger(percent=0.2 * idx / N) + + logging.info(f'total image-ann pair: {N}') + logging.info(f'valid images: {valid_image_count}') + logging.info(f'valid annotations: {valid_ann_count}') + + # use `monitor.write_monitor_logger` to write write task process percent to monitor.txt + monitor.write_monitor_logger(percent=0.2) + + # suppose we have a long time training, and have saved the final model + # model output dir: os.path.join(cfg.ymir.output.models_dir, your_stage_name) + stage_dir = os.path.join(cfg.ymir.output.models_dir, 'epoch10') + os.makedirs(stage_dir, exist_ok=True) + with open(os.path.join(stage_dir, 'epoch10.pt'), 'w') as f: + f.write('fake model weight') + with open(os.path.join(stage_dir, 'config.py'), 'w') as f: + f.write('fake model config file') + # use `rw.write_model_stage` to save training result + rw.write_model_stage(stage_name='epoch10', + files=['epoch10.pt', 'config.py'], + evaluation_result=dict(maskAP=random.random() / 2)) + + _dummy_work(idle_seconds=idle_seconds, trigger_crash=trigger_crash) + + write_tensorboard_log(cfg.ymir.output.tensorboard_dir) + + stage_dir = os.path.join(cfg.ymir.output.models_dir, 'epoch20') + os.makedirs(stage_dir, exist_ok=True) + with open(os.path.join(stage_dir, 'epoch20.pt'), 'w') as f: + f.write('fake model weight') + with open(os.path.join(stage_dir, 'config.py'), 'w') as f: + f.write('fake model config file') + rw.write_model_stage(stage_name='epoch20', + files=['epoch20.pt', 'config.py'], + evaluation_result=dict(maskAP=expected_maskap)) + + # if task done, write 100% percent log + logging.info('training done') + monitor.write_monitor_logger(percent=1.0) + + +def _run_mining(cfg: edict) -> None: + # use `cfg.param` to get config file for training + # pretrained models in `cfg.ymir.input.models_dir` + gpu_id: str = cfg.param.get('gpu_id') + class_names: List[str] = cfg.param.get('class_names') + idle_seconds: float = cfg.param.get('idle_seconds', 60) + trigger_crash: bool = cfg.param.get('trigger_crash', False) + # use `logging` or `print` to write log to console + logging.info(f"mining config: {cfg.param}") + logging.info(f'gpu device: {gpu_id}') + logging.info(f'dataset class names: {class_names}') + + # use `cfg.input.candidate_index_file` to read candidate dataset items + # note that annotations path will be empty str if there's no annotations in that dataset + # count for image files + with open(cfg.ymir.input.candidate_index_file, 'r') as fp: + lines = fp.readlines() + + valid_images = [] + valid_image_count = 0 + for line in lines: + if os.path.isfile(line.strip()): + valid_image_count += 1 + valid_images.append(line.strip()) + + # use `monitor.write_monitor_logger` to write task process to monitor.txt + logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") + monitor.write_monitor_logger(percent=0.2) + + _dummy_work(idle_seconds=idle_seconds, trigger_crash=trigger_crash) + + # write mining result + # here we give a fake score to each assets + total_length = len(valid_images) + mining_result = [] + for index, asset_path in enumerate(valid_images): + mining_result.append((asset_path, index / total_length)) + time.sleep(0.1) + monitor.write_monitor_logger(percent=0.2 + 0.8 * index / valid_image_count) + + rw.write_mining_result(mining_result=mining_result) + + # if task done, write 100% percent log + logging.info('mining done') + monitor.write_monitor_logger(percent=1.0) + + +def _run_infer(cfg: edict) -> None: + # use `cfg.param` to get config file for training + # models are transfered in `cfg.ymir.input.models_dir` model_params_path + class_names = cfg.param.get('class_names') + idle_seconds: float = cfg.param.get('idle_seconds', 60) + trigger_crash: bool = cfg.param.get('trigger_crash', False) + seed: int = cfg.param.get('seed', 15) + # use `logging` or `print` to write log to console + logging.info(f"infer config: {cfg.param}") + + # use `cfg.ymir.input.candidate_index_file` to read candidate dataset items + # note that annotations path will be empty str if there's no annotations in that dataset + with open(cfg.ymir.input.candidate_index_file, 'r') as fp: + lines = fp.readlines() + + valid_images = [] + invalid_images = [] + valid_image_count = 0 + for line in lines: + if os.path.isfile(line.strip()): + valid_image_count += 1 + valid_images.append(line.strip()) + else: + invalid_images.append(line.strip()) + + # use `monitor.write_monitor_logger` to write log to console and write task process percent to monitor.txt + logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") + monitor.write_monitor_logger(percent=0.2) + + _dummy_work(idle_seconds=idle_seconds, trigger_crash=trigger_crash) + + # write infer result + random.seed(seed) + results = [] + + fake_mask_num = min(len(class_names), 10) + for iter, img_file in enumerate(valid_images): + img = cv2.imread(img_file, cv2.IMREAD_GRAYSCALE) + mask = np.zeros(shape=img.shape[0:2], dtype=np.uint8) + for idx in range(fake_mask_num): + percent = 100 * idx / fake_mask_num + value = np.percentile(img, percent) + mask[img > value] = idx + 1 + + results.append(dict(image=img_file, result=mask)) + + # real-time monitor + monitor.write_monitor_logger(percent=0.2 + 0.8 * iter / valid_image_count) + + coco_results = convert(cfg, results, True) + rw.write_infer_result(infer_result=coco_results, algorithm='segmentation') + + # if task done, write 100% percent log + logging.info('infer done') + monitor.write_monitor_logger(percent=1.0) + + +def _dummy_work(idle_seconds: float, trigger_crash: bool = False) -> None: + if idle_seconds > 0: + time.sleep(idle_seconds) + if trigger_crash: + raise RuntimeError('app crashed') + + +def write_tensorboard_log(tensorboard_dir: str) -> None: + tb_log = SummaryWriter(tensorboard_dir) + + total_epoch = 30 + for e in range(total_epoch): + tb_log.add_scalar("fake_loss", 10 / (1 + e), e) + time.sleep(1) + monitor.write_monitor_logger(percent=e / total_epoch) + + +if __name__ == '__main__': + logging.basicConfig(stream=sys.stdout, + format='%(levelname)-8s: [%(asctime)s] %(message)s', + datefmt='%Y%m%d-%H:%M:%S', + level=logging.INFO) + sys.exit(start()) diff --git a/seg-instance-demo-tmi/fast.Dockerfile b/seg-instance-demo-tmi/fast.Dockerfile new file mode 100644 index 0000000..6033dda --- /dev/null +++ b/seg-instance-demo-tmi/fast.Dockerfile @@ -0,0 +1,21 @@ +FROM youdaoyzbx/ymir-executor:ymir2.0.2-seg-semantic-demo-base + +WORKDIR /app +# copy user code to WORKDIR +COPY ./app/*.py /app/ + +# copy user config template and manifest.yaml to /img-man +RUN mkdir -p /img-man +COPY img-man/*.yaml /img-man/ + +COPY ./requirements.txt /app/ +RUN pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple +# view https://github.com/protocolbuffers/protobuf/issues/10051 for detail +ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +# entry point for your app +# the whole docker image will be started with `nvidia-docker run ` +# and this command will run automatically + +RUN echo "python /app/start.py" > /usr/bin/start.sh +CMD bash /usr/bin/start.sh diff --git a/seg-instance-demo-tmi/img-man/infer-template.yaml b/seg-instance-demo-tmi/img-man/infer-template.yaml new file mode 100644 index 0000000..67295db --- /dev/null +++ b/seg-instance-demo-tmi/img-man/infer-template.yaml @@ -0,0 +1,12 @@ +# infer template for your executor app +# after build image, it should at /img-man/infer-template.yaml +# key: gpu_id, task_id, model_params_path, class_names, gpu_count should be preserved + +# gpu_id: '0' +# gpu_count: 1 +# task_id: 'default-infer-task' +# model_params_path: [] +# class_names: [] + +# just for test, remove this key in your own docker image +idle_seconds: 3 # idle seconds for each task diff --git a/seg-instance-demo-tmi/img-man/manifest.yaml b/seg-instance-demo-tmi/img-man/manifest.yaml new file mode 100644 index 0000000..38c8521 --- /dev/null +++ b/seg-instance-demo-tmi/img-man/manifest.yaml @@ -0,0 +1,3 @@ +# object_type: 2 for object detection, 3 for semantic segmentation +# 4 for instance segmentation +"object_type": 4 diff --git a/seg-instance-demo-tmi/img-man/mining-template.yaml b/seg-instance-demo-tmi/img-man/mining-template.yaml new file mode 100644 index 0000000..3eae941 --- /dev/null +++ b/seg-instance-demo-tmi/img-man/mining-template.yaml @@ -0,0 +1,12 @@ +# mining template for your executor app +# after build image, it should at /img-man/mining-template.yaml +# key: gpu_id, task_id, model_params_path, class_names, gpu_count should be preserved + +# gpu_id: '0' +# gpu_count: 1 +# task_id: 'default-mining-task' +# model_params_path: [] +# class_names: [] + +# just for test, remove this key in your own docker image +idle_seconds: 3 # idle seconds for each task diff --git a/seg-instance-demo-tmi/img-man/training-template.yaml b/seg-instance-demo-tmi/img-man/training-template.yaml new file mode 100644 index 0000000..c6d423a --- /dev/null +++ b/seg-instance-demo-tmi/img-man/training-template.yaml @@ -0,0 +1,18 @@ +# training template for your executor app +# after build image, it should at /img-man/training-template.yaml +# key: gpu_id, task_id, pretrained_model_paths, class_names, gpu_count should be preserved + +# gpu_id: '0' +# gpu_count: 1 +# task_id: 'default-training-task' +# pretrained_model_params: [] +# class_names: [] + +# format of annotations and images that ymir should provide to this docker container +# annotation format: must be seg-coco +# image format: must be raw +export_format: 'seg-coco:raw' + +# just for test, remove this key in your own docker image +expected_maskap: 0.983 # expected mIoU for training task +idle_seconds: 3 # idle seconds for each task diff --git a/seg-instance-demo-tmi/requirements.txt b/seg-instance-demo-tmi/requirements.txt new file mode 100644 index 0000000..708647b --- /dev/null +++ b/seg-instance-demo-tmi/requirements.txt @@ -0,0 +1,11 @@ +pycocotools +pydantic>=1.8.2 +pyyaml>=5.4.1 +tensorboardX>=2.4 +numpy +opencv-python>=4.0 +pillow +imagesize +tqdm +easydict +ymir_exc@git+https://github.com/modelai/ymir-executor-sdk.git@ymir2.1.0 diff --git a/seg-semantic-demo-tmi/Dockerfile b/seg-semantic-demo-tmi/Dockerfile new file mode 100644 index 0000000..b1ee2f0 --- /dev/null +++ b/seg-semantic-demo-tmi/Dockerfile @@ -0,0 +1,42 @@ +# a docker file for an sample training / mining / infer executor + +# FROM ubuntu:20.04 +FROM python:3.8.16 + +ENV LANG=C.UTF-8 + +# Change mirror +RUN sed -i 's#http://archive.ubuntu.com#http://mirrors.ustc.edu.cn#g' /etc/apt/sources.list \ + && sed -i 's#http://security.ubuntu.com#http://mirrors.ustc.edu.cn#g' /etc/apt/sources.list + +# Set timezone +RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \ + && echo 'Asia/Shanghai' >/etc/timezone + +# Install linux package +RUN apt-get update && apt-get install -y gnupg2 git libglib2.0-0 \ + libgl1-mesa-glx libsm6 libxext6 libxrender-dev \ + build-essential ninja-build \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +COPY requirements.txt /app/ +RUN pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple + +WORKDIR /app +# copy user code to WORKDIR +COPY ./app/*.py /app/ + +# copy user config template and manifest.yaml to /img-man +RUN mkdir -p /img-man +COPY img-man/*.yaml /img-man/ + +# view https://github.com/protocolbuffers/protobuf/issues/10051 for detail +ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +# entry point for your app +# the whole docker image will be started with `nvidia-docker run ` +# and this command will run automatically + +RUN echo "python /app/start.py" > /usr/bin/start.sh +CMD bash /usr/bin/start.sh diff --git a/seg-semantic-demo-tmi/README.MD b/seg-semantic-demo-tmi/README.MD new file mode 100644 index 0000000..cf47032 --- /dev/null +++ b/seg-semantic-demo-tmi/README.MD @@ -0,0 +1,3 @@ +# ymir 自定义语义分割镜像 + + diff --git a/seg-semantic-demo-tmi/app/pycococreatortools.py b/seg-semantic-demo-tmi/app/pycococreatortools.py new file mode 100644 index 0000000..edf777b --- /dev/null +++ b/seg-semantic-demo-tmi/app/pycococreatortools.py @@ -0,0 +1,143 @@ +#!/usr/bin/env python3 +""" +from https://github.com/waspinator/pycococreator/blob/0.2.1/pycococreatortools/pycococreatortools.py +""" +import datetime +from itertools import groupby + +import numpy as np +from PIL import Image +from pycocotools import mask + + +def resize_binary_mask(array, new_size): + image = Image.fromarray(array.astype(np.uint8) * 255) + image = image.resize(new_size) + return np.asarray(image).astype(np.bool_) + + +def close_contour(contour): + if not np.array_equal(contour[0], contour[-1]): + contour = np.vstack((contour, contour[0])) + return contour + + +def binary_mask_to_rle(binary_mask, compress=True): + """ + if compress: + return {'counts': b'', 'size': list(binary_mask.shape)} + else: + return {'counts': [0, 56541, 7, 338, ...], 'size': list(binary_mask.shape)} + """ + if compress: + rle = mask.encode(np.asfortranarray(binary_mask.astype(np.uint8))) + rle['counts'] = rle['counts'].decode('utf-8') + return rle + + rle = {'counts': [], 'size': list(binary_mask.shape)} + counts = rle.get('counts') + for i, (value, elements) in enumerate(groupby(binary_mask.ravel(order='F'))): + if i == 0 and value == 1: + counts.append(0) + counts.append(len(list(elements))) + + return rle + + +def binary_mask_to_polygon(binary_mask, tolerance=0): + """Converts a binary mask to COCO polygon representation + + Args: + binary_mask: a 2D binary numpy array where '1's represent the object + tolerance: Maximum distance from original points of polygon to approximated + polygonal chain. If tolerance is 0, the original coordinate array is returned. + + """ + from skimage import measure + + polygons = [] + # pad mask to close contours of shapes which start and end at an edge + padded_binary_mask = np.pad(binary_mask, pad_width=1, mode='constant', constant_values=0) + contours = measure.find_contours(padded_binary_mask, 0.5) + contours = np.subtract(contours, 1) + for contour in contours: + contour = close_contour(contour) + contour = measure.approximate_polygon(contour, tolerance) + if len(contour) < 3: + continue + contour = np.flip(contour, axis=1) + segmentation = contour.ravel().tolist() + # after padding and subtracting 1 we may get -0.5 points in our segmentation + segmentation = [0 if i < 0 else i for i in segmentation] + polygons.append(segmentation) + + return polygons + + +def create_image_info(image_id, + file_name, + image_size, + date_captured=datetime.datetime.utcnow().isoformat(' '), + license_id=1, + coco_url="", + flickr_url=""): + + image_info = { + "id": image_id, + "file_name": file_name, + "width": image_size[0], + "height": image_size[1], + "date_captured": date_captured, + "license": license_id, + "coco_url": coco_url, + "flickr_url": flickr_url + } + + return image_info + + +def create_annotation_info(annotation_id, + image_id, + category_info, + binary_mask, + image_size=None, + tolerance=2, + bounding_box=None): + + if image_size is not None: + binary_mask = resize_binary_mask(binary_mask, image_size) + + binary_mask_encoded = mask.encode(np.asfortranarray(binary_mask.astype(np.uint8))) + + area = mask.area(binary_mask_encoded) + if area < 1: + return None + + if bounding_box is None: + bounding_box = mask.toBbox(binary_mask_encoded) + + if category_info["is_crowd"]: + is_crowd = 1 + # segmentation = binary_mask_to_rle(binary_mask) + segmentation = binary_mask_encoded + # avoid TypeError: Object of type bytes is not JSON serializable + segmentation['counts'] = segmentation['counts'].decode('utf-8') + else: + is_crowd = 0 + segmentation = binary_mask_to_polygon(binary_mask, tolerance) + if not segmentation: + return None + + annotation_info = { + "id": annotation_id, + "image_id": image_id, + "category_id": category_info["id"], + "iscrowd": is_crowd, + "area": area.tolist(), + "bbox": bounding_box.tolist(), + "segmentation": segmentation, + "width": binary_mask.shape[1], + "height": binary_mask.shape[0], + } + + return annotation_info diff --git a/seg-semantic-demo-tmi/app/result_to_coco.py b/seg-semantic-demo-tmi/app/result_to_coco.py new file mode 100644 index 0000000..5346320 --- /dev/null +++ b/seg-semantic-demo-tmi/app/result_to_coco.py @@ -0,0 +1,105 @@ +#!/usr/bin/env python3 + +import datetime +import os.path as osp +from typing import Dict, List + +import imagesize +import numpy as np +from easydict import EasyDict as edict +from tqdm import tqdm + +import pycococreatortools + +INFO = { + "description": "Example Dataset", + "url": "https://github.com/waspinator/pycococreator", + "version": "0.1.0", + "year": 2022, + "contributor": "ymir", + "date_created": datetime.datetime.utcnow().isoformat(' ') +} + +LICENSES = [{ + "id": 1, + "name": "Attribution-NonCommercial-ShareAlike License", + "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/" +}] + +CATEGORIES = [ + { + 'id': 1, + 'name': 'square', + 'supercategory': 'shape', + }, + { + 'id': 2, + 'name': 'circle', + 'supercategory': 'shape', + }, + { + 'id': 3, + 'name': 'triangle', + 'supercategory': 'shape', + }, +] + + +def convert(ymir_cfg: edict, results: List[Dict], with_blank_area: bool): + """ + convert ymir infer result to coco instance segmentation format + the mask is encode in compressed rle + the is_crowd is True + """ + class_names = ymir_cfg.param.class_names + + categories = [] + # categories should start from 0 + for idx, name in enumerate(class_names): + categories.append(dict(id=idx, name=name, supercategory='none')) + + coco_output = {"info": INFO, "licenses": LICENSES, "categories": categories, "images": [], "annotations": []} + + image_id = 1 + annotation_id = 1 + + for idx, d in enumerate(tqdm(results, desc='convert result to coco')): + image_f = d['image'] + result = d['result'] + + width, height = imagesize.get(image_f) + + image_info = pycococreatortools.create_image_info(image_id=image_id, + file_name=osp.basename(image_f), + image_size=(width, height)) + + coco_output["images"].append(image_info) # type: ignore + + # category_id === class_id start from 0 + unique_ids = np.unique(result) + for np_class_id in unique_ids: + if with_blank_area: + class_id = int(np_class_id) - 1 + else: + class_id = int(np_class_id) + + # remove background class in infer-result + if with_blank_area and class_id < 0: + continue + + assert class_id < len(class_names), f'class_id {class_id} must < class_num {len(class_names)}' + category_info = {'id': class_id, 'is_crowd': True} + binary_mask = result == np_class_id + annotation_info = pycococreatortools.create_annotation_info(annotation_id, + image_id, + category_info, + binary_mask, + tolerance=2) + + if annotation_info is not None: + coco_output["annotations"].append(annotation_info) # type: ignore + annotation_id = annotation_id + 1 + + image_id += 1 + + return coco_output diff --git a/seg-semantic-demo-tmi/app/start.py b/seg-semantic-demo-tmi/app/start.py new file mode 100644 index 0000000..040817b --- /dev/null +++ b/seg-semantic-demo-tmi/app/start.py @@ -0,0 +1,239 @@ +import logging +import os +import random +import sys +import time +from typing import List + +import cv2 +import numpy as np +from easydict import EasyDict as edict +from tensorboardX import SummaryWriter +from ymir_exc import monitor +from ymir_exc import result_writer as rw +from ymir_exc.util import get_merged_config + +from result_to_coco import convert + + +def start() -> int: + cfg = get_merged_config() + + if cfg.ymir.run_training: + _run_training(cfg) + if cfg.ymir.run_mining: + _run_mining(cfg) + if cfg.ymir.run_infer: + _run_infer(cfg) + + return 0 + + +def _run_training(cfg: edict) -> None: + """sample function of training + + which shows: + - how to get config file + - how to read training and validation datasets + - how to write logs + - how to write training result + """ + # use `env.get_executor_config` to get config file for training + gpu_id: str = cfg.param.get('gpu_id') + class_names: List[str] = cfg.param.get('class_names') + expected_miou: float = cfg.param.get('expected_miou', 0.6) + idle_seconds: float = cfg.param.get('idle_seconds', 60) + trigger_crash: bool = cfg.param.get('trigger_crash', False) + # use `logging` or `print` to write log to console + # notice that logging.basicConfig is invoked at executor.env + logging.info(f'gpu device: {gpu_id}') + logging.info(f'dataset class names: {class_names}') + logging.info(f"training config: {cfg.param}") + + # count for image and annotation file + with open(cfg.ymir.input.training_index_file, 'r') as fp: + lines = fp.readlines() + + valid_image_count = 0 + valid_ann_count = 0 + + N = len(lines) + monitor_gap = max(1, N // 100) + for idx, line in enumerate(lines): + asset_path, annotation_path = line.strip().split() + if os.path.isfile(asset_path): + valid_image_count += 1 + + if os.path.isfile(annotation_path): + valid_ann_count += 1 + + # use `monitor.write_monitor_logger` to write write task process percent to monitor.txt + if idx % monitor_gap == 0: + monitor.write_monitor_logger(percent=0.2 * idx / N) + + logging.info(f'total image-ann pair: {N}') + logging.info(f'valid images: {valid_image_count}') + logging.info(f'valid annotations: {valid_ann_count}') + + # use `monitor.write_monitor_logger` to write write task process percent to monitor.txt + monitor.write_monitor_logger(percent=0.2) + + # suppose we have a long time training, and have saved the final model + # model output dir: os.path.join(cfg.ymir.output.models_dir, your_stage_name) + stage_dir = os.path.join(cfg.ymir.output.models_dir, 'epoch10') + os.makedirs(stage_dir, exist_ok=True) + with open(os.path.join(stage_dir, 'epoch10.pt'), 'w') as f: + f.write('fake model weight') + with open(os.path.join(stage_dir, 'config.py'), 'w') as f: + f.write('fake model config file') + # use `rw.write_model_stage` to save training result + rw.write_model_stage(stage_name='epoch10', + files=['epoch10.pt', 'config.py'], + evaluation_result=dict(mIoU=random.random() / 2)) + + _dummy_work(idle_seconds=idle_seconds, trigger_crash=trigger_crash) + + write_tensorboard_log(cfg.ymir.output.tensorboard_dir) + + stage_dir = os.path.join(cfg.ymir.output.models_dir, 'epoch20') + os.makedirs(stage_dir, exist_ok=True) + with open(os.path.join(stage_dir, 'epoch20.pt'), 'w') as f: + f.write('fake model weight') + with open(os.path.join(stage_dir, 'config.py'), 'w') as f: + f.write('fake model config file') + rw.write_model_stage(stage_name='epoch20', + files=['epoch20.pt', 'config.py'], + evaluation_result=dict(mIoU=expected_miou)) + + # if task done, write 100% percent log + logging.info('training done') + monitor.write_monitor_logger(percent=1.0) + + +def _run_mining(cfg: edict) -> None: + # use `cfg.param` to get config file for training + # pretrained models in `cfg.ymir.input.models_dir` + gpu_id: str = cfg.param.get('gpu_id') + class_names: List[str] = cfg.param.get('class_names') + idle_seconds: float = cfg.param.get('idle_seconds', 60) + trigger_crash: bool = cfg.param.get('trigger_crash', False) + # use `logging` or `print` to write log to console + logging.info(f"mining config: {cfg.param}") + logging.info(f'gpu device: {gpu_id}') + logging.info(f'dataset class names: {class_names}') + + # use `cfg.input.candidate_index_file` to read candidate dataset items + # note that annotations path will be empty str if there's no annotations in that dataset + # count for image files + with open(cfg.ymir.input.candidate_index_file, 'r') as fp: + lines = fp.readlines() + + valid_images = [] + valid_image_count = 0 + for line in lines: + if os.path.isfile(line.strip()): + valid_image_count += 1 + valid_images.append(line.strip()) + + # use `monitor.write_monitor_logger` to write task process to monitor.txt + logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") + monitor.write_monitor_logger(percent=0.2) + + _dummy_work(idle_seconds=idle_seconds, trigger_crash=trigger_crash) + + # write mining result + # here we give a fake score to each assets + total_length = len(valid_images) + mining_result = [] + for index, asset_path in enumerate(valid_images): + mining_result.append((asset_path, index / total_length)) + time.sleep(0.1) + monitor.write_monitor_logger(percent=0.2 + 0.8 * index / valid_image_count) + + rw.write_mining_result(mining_result=mining_result) + + # if task done, write 100% percent log + logging.info('mining done') + monitor.write_monitor_logger(percent=1.0) + + +def _run_infer(cfg: edict) -> None: + # use `cfg.param` to get config file for training + # models are transfered in `cfg.ymir.input.models_dir` model_params_path + class_names = cfg.param.get('class_names') + idle_seconds: float = cfg.param.get('idle_seconds', 60) + trigger_crash: bool = cfg.param.get('trigger_crash', False) + seed: int = cfg.param.get('seed', 15) + # use `logging` or `print` to write log to console + logging.info(f"infer config: {cfg.param}") + + # use `cfg.ymir.input.candidate_index_file` to read candidate dataset items + # note that annotations path will be empty str if there's no annotations in that dataset + with open(cfg.ymir.input.candidate_index_file, 'r') as fp: + lines = fp.readlines() + + valid_images = [] + invalid_images = [] + valid_image_count = 0 + for line in lines: + if os.path.isfile(line.strip()): + valid_image_count += 1 + valid_images.append(line.strip()) + else: + invalid_images.append(line.strip()) + + # use `monitor.write_monitor_logger` to write log to console and write task process percent to monitor.txt + logging.info(f"assets count: {len(lines)}, valid: {valid_image_count}") + monitor.write_monitor_logger(percent=0.2) + + _dummy_work(idle_seconds=idle_seconds, trigger_crash=trigger_crash) + + # write infer result + random.seed(seed) + results = [] + + fake_mask_num = min(len(class_names), 10) + for iter, img_file in enumerate(valid_images): + img = cv2.imread(img_file, cv2.IMREAD_GRAYSCALE) + mask = np.zeros(shape=img.shape[0:2], dtype=np.uint8) + for idx in range(fake_mask_num): + percent = 100 * idx / fake_mask_num + value = np.percentile(img, percent) + mask[img > value] = idx + 1 + + results.append(dict(image=img_file, result=mask)) + + # real-time monitor + monitor.write_monitor_logger(percent=0.2 + 0.8 * iter / valid_image_count) + + coco_results = convert(cfg, results, True) + rw.write_infer_result(infer_result=coco_results, algorithm='segmentation') + + # if task done, write 100% percent log + logging.info('infer done') + monitor.write_monitor_logger(percent=1.0) + + +def _dummy_work(idle_seconds: float, trigger_crash: bool = False) -> None: + if idle_seconds > 0: + time.sleep(idle_seconds) + if trigger_crash: + raise RuntimeError('app crashed') + + +def write_tensorboard_log(tensorboard_dir: str) -> None: + tb_log = SummaryWriter(tensorboard_dir) + + total_epoch = 30 + for e in range(total_epoch): + tb_log.add_scalar("fake_loss", 10 / (1 + e), e) + time.sleep(1) + monitor.write_monitor_logger(percent=e / total_epoch) + + +if __name__ == '__main__': + logging.basicConfig(stream=sys.stdout, + format='%(levelname)-8s: [%(asctime)s] %(message)s', + datefmt='%Y%m%d-%H:%M:%S', + level=logging.INFO) + sys.exit(start()) diff --git a/seg-semantic-demo-tmi/fast.Dockerfile b/seg-semantic-demo-tmi/fast.Dockerfile new file mode 100644 index 0000000..30ec6b8 --- /dev/null +++ b/seg-semantic-demo-tmi/fast.Dockerfile @@ -0,0 +1,22 @@ +FROM youdaoyzbx/ymir-executor:ymir2.0.2-seg-semantic-demo-base + +WORKDIR /app +# copy user code to WORKDIR +COPY ./app/*.py /app/ + +COPY ./requirements.txt /app/ +RUN pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple + +# copy user config template and manifest.yaml to /img-man +RUN mkdir -p /img-man +COPY img-man/*.yaml /img-man/ + +# view https://github.com/protocolbuffers/protobuf/issues/10051 for detail +ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +# entry point for your app +# the whole docker image will be started with `nvidia-docker run ` +# and this command will run automatically + +RUN echo "python /app/start.py" > /usr/bin/start.sh +CMD bash /usr/bin/start.sh diff --git a/seg-semantic-demo-tmi/img-man/infer-template.yaml b/seg-semantic-demo-tmi/img-man/infer-template.yaml new file mode 100644 index 0000000..67295db --- /dev/null +++ b/seg-semantic-demo-tmi/img-man/infer-template.yaml @@ -0,0 +1,12 @@ +# infer template for your executor app +# after build image, it should at /img-man/infer-template.yaml +# key: gpu_id, task_id, model_params_path, class_names, gpu_count should be preserved + +# gpu_id: '0' +# gpu_count: 1 +# task_id: 'default-infer-task' +# model_params_path: [] +# class_names: [] + +# just for test, remove this key in your own docker image +idle_seconds: 3 # idle seconds for each task diff --git a/seg-semantic-demo-tmi/img-man/manifest.yaml b/seg-semantic-demo-tmi/img-man/manifest.yaml new file mode 100644 index 0000000..1aadd8e --- /dev/null +++ b/seg-semantic-demo-tmi/img-man/manifest.yaml @@ -0,0 +1,2 @@ +# object_type: 2 for object detection, 3 for semantic segmentation, default: 2 +"object_type": 3 diff --git a/seg-semantic-demo-tmi/img-man/mining-template.yaml b/seg-semantic-demo-tmi/img-man/mining-template.yaml new file mode 100644 index 0000000..3eae941 --- /dev/null +++ b/seg-semantic-demo-tmi/img-man/mining-template.yaml @@ -0,0 +1,12 @@ +# mining template for your executor app +# after build image, it should at /img-man/mining-template.yaml +# key: gpu_id, task_id, model_params_path, class_names, gpu_count should be preserved + +# gpu_id: '0' +# gpu_count: 1 +# task_id: 'default-mining-task' +# model_params_path: [] +# class_names: [] + +# just for test, remove this key in your own docker image +idle_seconds: 3 # idle seconds for each task diff --git a/seg-semantic-demo-tmi/img-man/training-template.yaml b/seg-semantic-demo-tmi/img-man/training-template.yaml new file mode 100644 index 0000000..5f2b638 --- /dev/null +++ b/seg-semantic-demo-tmi/img-man/training-template.yaml @@ -0,0 +1,18 @@ +# training template for your executor app +# after build image, it should at /img-man/training-template.yaml +# key: gpu_id, task_id, pretrained_model_paths, class_names, gpu_count should be preserved + +# gpu_id: '0' +# gpu_count: 1 +# task_id: 'default-training-task' +# pretrained_model_params: [] +# class_names: [] + +# format of annotations and images that ymir should provide to this docker container +# annotation format: must be seg-coco +# image format: must be raw +export_format: 'seg-coco:raw' + +# just for test, remove this key in your own docker image +expected_miou: 0.983 # expected mIoU for training task +idle_seconds: 3 # idle seconds for each task diff --git a/seg-semantic-demo-tmi/requirements.txt b/seg-semantic-demo-tmi/requirements.txt new file mode 100644 index 0000000..708647b --- /dev/null +++ b/seg-semantic-demo-tmi/requirements.txt @@ -0,0 +1,11 @@ +pycocotools +pydantic>=1.8.2 +pyyaml>=5.4.1 +tensorboardX>=2.4 +numpy +opencv-python>=4.0 +pillow +imagesize +tqdm +easydict +ymir_exc@git+https://github.com/modelai/ymir-executor-sdk.git@ymir2.1.0