Description

Here we try to recognise the letters in a captcha with an OCR model. Firstly we wand to understand the structure of the project. Then we create a DVC pipeline and experiment with it.

Blogpost

The originial blogpost can be found here.

Data

The data can be downloaded here.

Original notebook

The original notebook was downloaded from here.

The environment

can be found in the conda.txt file and installed with conda by conda create -naem <NAME> --file conda.txt

DVC

Adding pipeline steps to dvc works as follows

    dvc run --name <STAGENAME> \
    python <PATH_TO_SCRIPT> --config=params.yaml
    --deps <PATH_TO_INPUT>
    --outs <PATH_TO_OUTPUT>
    --params <FIELD in params.yaml>

The first stage e.g. is added as

dvc run --name split_data \
--deps data/raw \
--outs data/split/characterset.txt \
--outs data/split/x_train.txt \
--outs data/split/y_train.txt \
--outs data/split/x_valid.txt \
--outs data/split/y_valid.txt \
--params split \
--params base \
python src/stages/split.py --config params.yaml

Check out the two generate files! dvc.lock and dvc.yaml and note down what you find in them.

Then add the other stages with the dvc run command

Dataset stage

dvc run --name create_datasets \
 --deps data/split/x_train.txt \
 --deps data/split/x_valid.txt \
 --deps data/split/y_train.txt \
 --deps data/split/y_valid.txt \
 --outs data/datasets/train_dataset \
 --outs data/datasets/validation_dataset \
 --params transform \
 --params base \
 python src/stages/datasets.py --config params.yaml

Model setup stage

dvc run --name model_setup \
 --deps data/split/characterset.txt \
 --outs models/untrained_model.h5 \
 --params model_setup \
 --params base \
 python src/stages/model_setup.py --config params.yaml

Training stage

dvc run --name training \
 --deps models/untrained_model.h5 \
 --deps data/datasets/train_dataset \
 --deps data/datasets/validation_dataset \
 --outs models/trained_model.h5 \
 --outs models/prediction_model.h5 \
 --params train \
 --params base \
 python src/stages/training.py --config params.yaml

Prediction stage

dvc run --name predict \
 --deps models/prediction_model.h5 \
 --deps data/split/characterset.txt \
 --deps data/datasets/validation_dataset \
 python src/stages/predict.py --config params.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.dvc		.dvc
src		src
.dvcignore		.dvcignore
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
conda.txt		conda.txt
config		config
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
ocr-for-captchas.ipynb		ocr-for-captchas.ipynb
params.yaml		params.yaml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Blogpost

Data

Original notebook

The environment

DVC

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Description

Blogpost

Data

Original notebook

The environment

DVC

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages