Nano Computer Vision model

Setup:

macOS Ventura 13.6.1
2 GHz 4‑ядерный процессор Intel Core i5
16 ГБ 3733 MHz LPDDR4X

Description

The primary goal of this repository is to solve different computer vision problems, such as:

image classification
object detection (TBD)
segmentation (TBD)
generation (TBD)

User have an opportunity to download pretrained weights from dvc or train his own model on provided datasets.

Example: Flowers classification

poetry install
poetry shell

dvc pull

python train.py +data=flowers +model=lenet # train model lenet on flowers dataset
python infer.py +data=flowers +model=lenet # infer model lenet on flowers dataset

python export_model.py +data=flowers +model=lenet # export model to onnx

Model repository structure

Triton server metrics

В качестве основных метрик будем смотреть на latency99, latency90 и throughput при concurrency == 4

Instance number: 1

metrics value	max batch	max queue delay (mcs)	metric value
latency90 (usec)	4	1000	19962
latency99 (usec)	4	1000	27117
throughput (inf/s)	4	1000	233.097
latency90 (usec)	4	500	16475
latency99 (usec)	4	500	23288
throughput (inf/s)	4	500	256.486
latency90 (usec)	4	2000	17884
latency99 (usec)	4	2000	24160
throughput (inf/s)	4	2000	241.077
latency90 (usec)	4	4000	17249
latency99 (usec)	4	4000	22958
throughput (inf/s)	4	4000	252.268

Судя по метрикам правильнее будет использовать max queue delay ms == 4000, так как при таком делее разница между latency90 и latency99 наименьшая, а throughput практически достигает максимума.

при замерах count == 2 метрики были сильно хуже

Подсчет ограниченности по арифметике или памяти

Предполагаем, что я запускаю модель на V100 => Пороги: 40 - 140

FLOPS(conv2d) == 2 x N x OUT x IN x Kx x Ky x H_OUT x W_OUT

FLOPS(FC) == 2 x N x OUT x IN

layer name	kernel	in/out channels	in/out size	MFLOPS	NPARAMS
conv1	3x3	3/16	224/112	10,838	NPARAMS
conv2	3x3	16/32	112/56	28,901	NPARAMS
conv3	3x3	32/64	56/28	28,901	NPARAMS
conv4	3x3	64/128	28/14	28,901	NPARAMS
conv5	3x3	128/256	14/7	28,901	NPARAMS
clf	nan	77256/4	nan	0.1	NPARAMS

Conv2D:

Из последних строчек на скриншоте видно, что размеры сверток в совокупности с FP32 не очень эффективно работают

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.dvc		.dvc
configs		configs
data		data
images		images
model_repository		model_repository
nano_cv		nano_cv
weights		weights
.dvcignore		.dvcignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
client.py		client.py
commands.py		commands.py
docker-compose.yaml		docker-compose.yaml
export_model.py		export_model.py
infer.py		infer.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_server.py		run_server.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nano Computer Vision model

Setup:

Description

The primary goal of this repository is to solve different computer vision problems, such as:

Example: Flowers classification

Model repository structure

Triton server metrics

Instance number: 1

Подсчет ограниченности по арифметике или памяти

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

sameertantry/nano_cv

Folders and files

Latest commit

History

Repository files navigation

Nano Computer Vision model

Setup:

Description

The primary goal of this repository is to solve different computer vision problems, such as:

Example: Flowers classification

Model repository structure

Triton server metrics

Instance number: 1

Подсчет ограниченности по арифметике или памяти

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages