mlops_pos

A project that aims to create a well-organized ML pipeline for the pose estimation task.

├── Dockerfile
├── README.md
├── cfg
│   ├── config.yaml
│   ├── datasets
│   │   └── coco17-pose.yaml
│   └── models
│       └── yolov8-pose.yaml
├── commands.py
├── datasets
│   └── COCO
│       ├── images
│       └── labels
├── datasets.dvc
├── poetry.lock
├── pos_detection
│   ├── __init__.py
│   ├── datasets
│   │   ├── COCOL.py
│   │   └── __init__.py
│   ├── infer.py
│   ├── model
│   │   ├── YOLOL.py
│   │   ├── __init__.py
│   │   └── utils.py
│   ├── pose_drawer.py
│   ├── pose_match.py
│   └── train.py
├── pyproject.toml
└── test_imgs
    └── game-streamer-tips.jpg

Above is the basic structure of the project. commands.py allows to run train, infer and match_pose CLI. Source code is in pos_detection folder. Folder datasets is empty (datasets are not uploaded to the git). There is a test image in the test_imgs folder. You may run simply python match_pose and the program will draw simple sketch of the person on the test image.

Relevance

The pose estimation problem is quite popular and very practical, and can be applied in various areas. From movie production and gaming to streaming, the topic was mainly inspired by the popularity of vtubers. vtubers are using machine learning to help them register their movements and depict them on the screen in some avatars. The quality of such translation is quite high today, and we can apply known solutions.

Architecture and data

Many architectures can technically be reused in the pose estimation task (from R-CNN) but I consider using YOLO-v8 as a baseline and, depending on the result quality, searching for more solutions. There are a lot of open datasets for the pose estimation task, and some of them are very large (this dataset and that one). The mentioned datasets are going to be our baseline, and we might increase it from there. There are over 40 thousand labeled images in these datasets covering verious human poses in different sitiations (focusing on upper half of the body).

Application

In my view, the process should look as follows: an image is fed into the pipeline, and the model processes it, resulting in the estimated pose. After that, we use one of the prepared avatars and map it onto the pose. Perfectly, this process could be animated, but I am not yet sure how difficult it's going to be. The avatar may be displayed on the original image instead of the person, but that would require size matching, which is complicated. Thus, I consider displaying the result on different static images. Generally speaking, this is a project to simulate a vtuber experience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mlops_pos

Relevance

Architecture and data

Application

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.dvc		.dvc
cfg		cfg
pos_detection		pos_detection
test_imgs		test_imgs
.dvcignore		.dvcignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
commands.py		commands.py
datasets.dvc		datasets.dvc
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

AlexeyRogS/mlops_pos

Folders and files

Latest commit

History

Repository files navigation

mlops_pos

Relevance

Architecture and data

Application

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages