Leveraging ML-OPS tools for model deployment on Kubernetes.

Project Title: Implementing Kubeflow with MLflow for Model Experiment Tracking

Project Description:

Kubeflow is an open-source platform that makes it easy to deploy and manage machine learning workflows on Kubernetes. MLflow is another open-source platform that provides tools for tracking and managing machine learning experiments. In this project, you will be setting up a Kubeflow cluster and integrating it with MLflow to track the experiments run on the cluster. You will then use this setup to train and track the performance of a model on a dataset.

Project Tasks:

Install and configure Kubeflow on a Kubernetes/minikube cluster
Install and configure MLflow on the same cluster
Write a Python script to train a model on a dataset and log the experiment with MLflow
Use the Kubeflow pipeline system to run multiple experiments with different hyperparameters and track them with MLflow
Compare the performance of the different models using the MLflow UI
Integration of the MLflow on the charmed Kubeflow as well as artifacts storage for all the data outputs in the model using MINIO

Steps to be followed

1. Install and configure Kubeflow on Minikube cluster

Prerequisites :

Docker — version 1.19
kubectl — version 1.15
minikube — version 1.15

1a) You can deploy the Kubeflow pipeline on Kubernetes/minikube cluster on Windows host machine powershell with administrative previliges using following few commands :

set PIPELINE_VERSION=2.0.0
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"

1b) Run the Kubectl command to view the Pods status

kubectl get pods -A

It'll show all the pods in the default as well as Kubeflow namespace.

To view the Pods only from kubeflow namespace you can use following command :

kubectl get pods -n kubeflow

1c) Port-forward the kubeflow service to view kubeflow dashboard

Use the below command for port-forward :

 kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

It'll give the local IP address through which we can view our kubeflow dashboard.

1d) Open the Web Browser and open localhost at port 8080

After opening the localhost:8080 you can view your Kubeflow dashboard.

Hence, we have successfully installed and configured the Kubeflow on our minikube cluster.

2. Install and configure the MLFLOW on the Minikube cluster

To integrate MLflow and kubeflow together on a Minikube cluster, follow these steps :

2a) Set up and start a Minikube cluster on your local machine.

2b) Install and Configure the Kubeflow on Minikube cluster which we have done already in Step 1.

2c) Install MLflow on your Minikube cluster. You can use Helm charts to simplify the installation process.

Run following few command's to install MLflow using Helm Charts :

helm repo add community-charts https://community-charts.github.io/helm-charts

helm install my-mlflow community-charts/mlflow --version 0.7.19

2d) Verify the installation and check the status of the MLflow deployment:

Use the following kubectl command to verify the installation :

kubectl get pods -n default

You'll get to see your mlflow pod up and running.

2e) Type "mlflow ui" in your Terminal

Once you type the "mlflow ui" in your terminal it'll give you the Localhost address for accessing your MLflow Dashboard.

2f) Open the web browser and paste the Address got from the previous step

Hence, we have successfully Integrated Kubeflow and MLflow on our minikube cluster.

3. Setup Jupyter Notebooks

3a) Create Conda Environment to Open Jupyter notebook

Create conda environment

Activate conda environment

You can activate your conda environment created in the previous step using following command :
```
conda activate <ENV_NAME>
```
Launch Jupyter Notebook from anaconda prompt

Type the following command to launch jupyter notebook :
```
jupyter notebook
```

3b) Create the ".ipynb" file to write a Python script to train a model on a dataset and log the experiment with MLflow

At first, let's clone this repository so you have access to the code. You can use the terminal or directly do that in the browser.

git clone https://github.com/adilshaikh165/ML-OPS.git

Then open "MLOPS-INTERNSHIP-ASSESSMENT-TASK.ipynb" to get the gist of the Python Script which I have created to train a model on a "bank-full.csv" and log the experiment with MLflow.

4. Using MLflow UI to visualize and store all the Log's related to specific experiment

4a) Create experiment with basic classifier and records metrics

Refer the "create_experiment()" Function from the "MLOPS-INTERNSHIP-ASSESSMENT-TASK.ipynb" file.

It'll create and log a "basic classfier experiment" in the MLflow UI and will record all the relavant metrics as shown in the below few screenshots :

You can view all the relavant metrics, tags and artificats related to that perticular run.

4b) Tune the ML Model using hyperparameters to increase it's accuracy

Refer the "hyper_parameter_tuning()" Function from the "MLOPS-INTERNSHIP-ASSESSMENT-TASK.ipynb" file.

It'll create and log a "Optimized Classifier Experiment" in the MLflow UI and will record all the relavant metrics as well as Parameters as shown in the below few screenshots :

This time along with the Metrics, tags and artifcats you'll also get to log all hyper parameters, metrics, and artifacts which contains model, roc_auc curve PNG, confusion Matrix PNG Related to that Optimized Model.

5. Creating the Ml Pipeline using Kubeflow Pipeline

Kubeflow Pipelines (KFP) is the most used component of Kubeflow. It allows you to create for every step or function in your ML project a reusable containerized pipeline component which can be chained together as a ML pipeline.

For the digits recognizer application, the pipeline is already created with the Python SDK. You can find the code in the file

kf-pipeline.ipynb

Write a Python Function needed to train and predict

We need to create a various functions in order to train and predict our ML Model. The various functions are prepare_data(), train_test_split() and training_basic_classifier(). You can find all these functions in "kf-pipeline.ipynb" file.

Define the pipeline function and put together all the components

   @dsl.pipeline(
    name='Basic MLOPS classifier Kubeflow Demo Pipeline',
    description='A sample pipeline that performs IRIS classifier task'
   )
 
   def basic_classifier_pipeline(data_path: str):
     vop = dsl.VolumeOp(
     name="t-vol-1",
     resource_name="t-vol-1", 
     size="1Gi", 
     modes=dsl.VOLUME_MODE_RWO)
     
     prepare_data_task = create_step_prepare_data().add_pvolumes({data_path: vop.volume})
     train_test_split = create_step_train_test_split().add_pvolumes({data_path: vop.volume}).after(prepare_data_task)
     classifier_training = create_step_training_basic_classifier().add_pvolumes({data_path: vop.volume}).after(train_test_split)
     
     
     prepare_data_task.execution_options.caching_strategy.max_cache_staleness = "P0D"
     train_test_split.execution_options.caching_strategy.max_cache_staleness = "P0D"
     classifier_training.execution_options.caching_strategy.max_cache_staleness = "P0D"

Mounting volume for component's output storage and binding this volume with all the components. The pipeline defines a volume named "t-vol-1" with a size of 1GiB. This volume is used to store the dataset and the model artifacts.
Compiling pipeline and generating yaml

Once the pipeline is complied the yaml file is automatically generated and it can be directly uploaded to kubeflow and create experiments and runs using UI. You can refer the sample yaml file in the GitHub repo named as "basic_classifier_pipeline_adil.yaml".
```
kfp.compiler.Compiler().compile(
  pipeline_func=basic_classifier_pipeline,
  package_path='basic_classifier_pipeline_adil.yaml')
```
Create a run from pipeline function using the code.
Creation of the Persistent Volume
Prepare Data for train-test split. prepare_data_task loads the dataset from a URL and saves it to a subdirectory called data in the pipeline's working directory.
Generation of train-test split. train_test_split splits the dataset into a training set and a test set.
Training of Basic classifier model. classifier_training trains a logistic regression model on the training set. This step involves conversion of data type String into float for columns "job" and "married".

I have mapped various attributes of the column "job" into the static float values and perform "One-hot" encoding on marital column.

6. Integration of the MLflow on the charmed Kubeflow as well as artifacts storage for all the data outputs in the model using MINIO

Kindly refer this blog link where I have explained deeply about the Kubeflow on the charmed kubeflow using "microk8s". This stable version of Charmed Kubeflow removes all the drawbacks of the traditional or local deployment of kubeflow.

Blog link : https://adilshaikh165.hashnode.dev/mlflow-integration-with-kubeflow-on-charmed-kubeflow

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
images		images
mlruns		mlruns
MLOPS-INTERNSHIP-ASSESSMENT-TASK.ipynb		MLOPS-INTERNSHIP-ASSESSMENT-TASK.ipynb
README.md		README.md
bank-full.csv		bank-full.csv
basic_classifier_pipeline_adil.yaml		basic_classifier_pipeline_adil.yaml
confusion_matrix.png		confusion_matrix.png
kf-pipeline.ipynb		kf-pipeline.ipynb
mlops_task_classifier_exp_2023-07-08.zip		mlops_task_classifier_exp_2023-07-08.zip
mlops_task_classifier_exp_2023-07-09.zip		mlops_task_classifier_exp_2023-07-09.zip
mlops_task_classifier_exp_2023-07-10.zip		mlops_task_classifier_exp_2023-07-10.zip
roc_auc_curve.png		roc_auc_curve.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging ML-OPS tools for model deployment on Kubernetes.

Project Title: Implementing Kubeflow with MLflow for Model Experiment Tracking

Project Description:

Project Tasks:

Steps to be followed

1. Install and configure Kubeflow on Minikube cluster

Prerequisites :

1a) You can deploy the Kubeflow pipeline on Kubernetes/minikube cluster on Windows host machine powershell with administrative previliges using following few commands :

1b) Run the Kubectl command to view the Pods status

1c) Port-forward the kubeflow service to view kubeflow dashboard

1d) Open the Web Browser and open localhost at port 8080

2. Install and configure the MLFLOW on the Minikube cluster

2a) Set up and start a Minikube cluster on your local machine.

2b) Install and Configure the Kubeflow on Minikube cluster which we have done already in Step 1.

2c) Install MLflow on your Minikube cluster. You can use Helm charts to simplify the installation process.

2d) Verify the installation and check the status of the MLflow deployment:

2e) Type "mlflow ui" in your Terminal

2f) Open the web browser and paste the Address got from the previous step

3. Setup Jupyter Notebooks

3a) Create Conda Environment to Open Jupyter notebook

3b) Create the ".ipynb" file to write a Python script to train a model on a dataset and log the experiment with MLflow

4. Using MLflow UI to visualize and store all the Log's related to specific experiment

4a) Create experiment with basic classifier and records metrics

4b) Tune the ML Model using hyperparameters to increase it's accuracy

5. Creating the Ml Pipeline using Kubeflow Pipeline

6. Integration of the MLflow on the charmed Kubeflow as well as artifacts storage for all the data outputs in the model using MINIO

About

Uh oh!

Releases

Packages

Uh oh!

Languages

adilshaikh165/ML-OPS

Folders and files

Latest commit

History

Repository files navigation

Leveraging ML-OPS tools for model deployment on Kubernetes.

Project Title: Implementing Kubeflow with MLflow for Model Experiment Tracking

Project Description:

Project Tasks:

Steps to be followed

1. Install and configure Kubeflow on Minikube cluster

Prerequisites :

1a) You can deploy the Kubeflow pipeline on Kubernetes/minikube cluster on Windows host machine powershell with administrative previliges using following few commands :

1b) Run the Kubectl command to view the Pods status

1c) Port-forward the kubeflow service to view kubeflow dashboard

1d) Open the Web Browser and open localhost at port 8080

2. Install and configure the MLFLOW on the Minikube cluster

2a) Set up and start a Minikube cluster on your local machine.

2b) Install and Configure the Kubeflow on Minikube cluster which we have done already in Step 1.

2c) Install MLflow on your Minikube cluster. You can use Helm charts to simplify the installation process.

2d) Verify the installation and check the status of the MLflow deployment:

2e) Type "mlflow ui" in your Terminal

2f) Open the web browser and paste the Address got from the previous step

3. Setup Jupyter Notebooks

3a) Create Conda Environment to Open Jupyter notebook

3b) Create the ".ipynb" file to write a Python script to train a model on a dataset and log the experiment with MLflow

4. Using MLflow UI to visualize and store all the Log's related to specific experiment

4a) Create experiment with basic classifier and records metrics

4b) Tune the ML Model using hyperparameters to increase it's accuracy

5. Creating the Ml Pipeline using Kubeflow Pipeline

6. Integration of the MLflow on the charmed Kubeflow as well as artifacts storage for all the data outputs in the model using MINIO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages