MLOps

Design for an MLOps pipeline

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Setting up the Project

Clone the repository

git clone https://github.com/StarsCDS/MLOps.git

Install dependencies

Create a virtual environment using venv (you can also use conda instead of this)

python -m venv MLOps

Install dependencies (you can also manually install all the dependencies from requirements.txt)

cd MLOps
make requirements

Data Versioning with `dvc`

Pull the raw data (run pip install dvc dvc-gdrive if dvc command is not found)

dvc pull

Data versioning tips

Add new data/modifications to data

dvc add <filepath>

Push new data to remote

dvc push

Version control the .dvc file using git

git add <filepath>.dvc

Data Preprocessing

Unzip the raw data

make data

Process the unzipped data

make features

Train a model using the processed data

make train

Predict an image using the trained model

make predict img='/path/to/image'

Visualize the trained model

make visualization

Mlflow

It is helpful for managing and monitoring machine learning experiments
script.py has the mlflow code
Experiments can be monitored by running the following commands (default: https://localhost:5000)

python script.py
mlflow ui

Github Actions

It is helpful for automating all tasks related to code merging and deployment
The configuration file for the action that is run at every push to main branch is at .github/workflows/python-app.yml
It check for lint errors and runs unit tests before pushing code main branch and also on every pull request to the main branch

Containerization using Docker

Build an image

docker build -t mlops-api .

Bulid a container (run the image)

docker run -p 8000:8000 mlops-api.

Orchestration and Containerization with Kubernetes and Docker

minikube is being used to run kubernetes run on a single cluster for development purposes

Setting up

Initialize minikube to make create an environment for kubernetes

minikube start

List the docker images inside minikube

minikube image list

Add the docker image inside the minikube VM

eval $(minikube docker-env)
docker build -t mlops-api .

Create a deployment(container) in kubernetes

kubectl create deployment mlops-deploy --image=mlops-api

Check if it's running properly

kubectl get deployment
kubectl get pod

Expose the deployment

kubectl expose deployment mlops-deploy --type=NodePort --port=8000

Check Node Port

kubectl get svc

Run the service tunnel

minikube service mlops-deploy

After this the api can be accessed

Tips

Manually increase the replicas(number of containers)

kubectl scale deploy/mlops-deploy --replicas=5

View logs of a particular container

kubectl logs -f <container-name>

References

TODO

Use cookiecutter/yeoman for project structure
Add unit tests
Add github actions for running the unit tests on pull requests
Use dvc to manage data versions
Use mlflow/kubeflow for mlops

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOps

Project Organization

Setting up the Project

Clone the repository

Install dependencies

Data Versioning with `dvc`

Data versioning tips

Data Preprocessing

Mlflow

Github Actions

Containerization using Docker

Orchestration and Containerization with Kubernetes and Docker

Setting up

Tips

References

Further Reading

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.aws		.aws
.dvc		.dvc
.github/workflows		.github/workflows
api		api
data/raw		data/raw
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
share		share
src		src
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

License

airex-lab/MLOps

Folders and files

Latest commit

History

Repository files navigation

MLOps

Project Organization

Setting up the Project

Clone the repository

Install dependencies

Data Versioning with dvc

Data versioning tips

Data Preprocessing

Mlflow

Github Actions

Containerization using Docker

Orchestration and Containerization with Kubernetes and Docker

Setting up

Tips

References

Further Reading

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Data Versioning with `dvc`

Packages