Skip to content

ryokugyu/dvc_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DVC Tutorial

Machine Learning projects deals with both the data and code. DVC is a tool which provides data versioning. For more information on DVC visit here.

The main purpose of this tutorial is to get a brief overview of the DVC and how it is solving modern machine learning project issues.

Contents of the repository

Follow these steps:

Clone this repository

git clone https://github.com/ryokugyu/dvc_tutorial.git

After cloning the repository, change directory to dvc_tutorial.

Initialize DVC repository

dvc init

After initializing the DVC repository. Let's pull the data into our machine locally:

dvc push

Now we have both the data and code present locally in our machine. First, split the dataset into 70-30% ratio.

Splitting the dataset:

dvc run -d data/mnist_train.csv -d code/split_test_train.py -d code/conf.py -o data/X_train.npy -o data/Y_train.npy -o data/X_val.npy -o data/Y_val.npy python code/split_test_train.py 0.33 2
  • Now, lets create a folder named model.

mkdir model

compiling and training the model. also validating the performance.

storing the model matrix also

dvc run -v -d  data/X_train.npy -d data/Y_train.npy -d data/X_val.npy -d data/Y_val.npy -d code/conf.py -d code/model_train.py -o model/model.json -o model/model.h5 python code/model_train.py 1 256

loading the model and testing the model performance with exxternal testing dataset

dvc run -d data/mnist_train.csv -d code/conf.py -d code/model_test.py -M data/eval.txt -f Dvcfile python code/model_test.py

DVC metric feature

dvc metrics show

About

Get Started: MNIST tutorial for DVC

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages