Skip to content

Udacity project for nanodegree in Machine Learning Engineer using Azure

Notifications You must be signed in to change notification settings

pedrojlucas/udacityproj2

Repository files navigation

Udacity Nanodegree Machine Learning Engineer with Azure project 2 (MLOps)

This project is part of the assigments of the Udacity Nanodegree Machine Learning Engineer with Azure. In this case the project is focused in developing all the MLOps phylosohpy using a example case.

In first place we will train a model with a bank marketing dataset, that is intended to find out if a customer will subscribe certain financial product or not. The training of the classification model will be carried out using Azure AutoML. The best model from the AutoML training will be deployed as an endpoint, we will test it with a python script that has attached some dummy data as a json payload.

Finally we will develop a pipeline that will automatize all the previous steps that we have carried out manually: launch of AutoML training and select the best model for deployment as an endpoint.

Architectural Diagram

Project Architectural Diagram

As we can see in the diagram on the left side, we began our project with a csv file that provides us the necessary data for our project. The first thing we need to do is register this data as an Azure dataset. Doing that all the tools available in Azure ML Studio can access to that data.

The following step is to create an Azure ML AutoML experiment for our problem, this is a classification problem where we are going to predict if certain customer is going to subscribe a deposit or not based on different features provided in our data.

Once the AutoML experiment is finished we have several models available, and we can choose a model based on the metrics that best suit our case. Then we can deploy the best model as an endpoint with a URI, it is important to deploy it with authentication enabled to assure that cybersecurity issues will not arise.

The deployed model will provide an API for interaction and to get predictions. We can see how to access this API by taking a look at the documentation generated by Swagger that can be accessed by web browser.

These steps can be automatized using an Azure pipeline, that groups all the steps needed to deploy the best AutoML model from the dataset we have registered in a single workflow. This pipeline can be published and then can be called anytime to replay all the steps and build an AutoML classification model and deploy it as an endpoint.

Finally we can ask the models for predictions, using the URI from the endpoint and the adequate credentials (provided by Azure as primary and secondary keys). The data that the model needs to make its prediction is attached as a payload to the script in json format.

In order to monitor our architecture and model perfomance we can use Apache Benchmarking tool that will give us some metrics oriented to detect bottlenecks in our infraestructure or potential problems with our deployed model prediction processes.

Key Steps

In this point we will discuss the most important steps in the process of creating a machine learning model endpoint and how to build a pipeline that automatizes the process in Azure.

First of all we register our dataset in Azure in order to be available for all the Azure tools.

Registered Dataset

We train a classification model using Azure AutoML. Here we can see the AutoML Azure experiment completed.

AutoML completed

Next we can select the best model from the AutoML experiment. This model is deployed clicking in the "Deploy" tab and choosing "Web service deployment". It will take some minutes to finish the endpoint creation.

AutoML Best model

In order to monitor the perfomance of the model endpoint, we have to activate the Application Insights so Azure is more verbose in its outputs to the logs. To activate the Application Insights we can do it by the UI or using the AzureML SDK in a python script.

Model AppInsights

We can use a python script to access the logs, as it is shown in the next screenshot.

Model logs

The model endpoint will provide an API to get predictions. This API is documented by Swagger so we will know the format of the data we have to send in our request to the endpoint for getting new predictions. In order to get Swagger running we need to download a Docker container and run it in the proper port (in our case is the 9000) and use a python script to serve the swagger.json file that contains all the API information, we can access it through a web browser pointed to the URL http://localhost:9000.

Swagger API doc

Using a python script to launch requests to the model endpoint, we can get predictions. The script has embedded the data needed as input to the model in json format. In the next screenshot we can see the answer to our prediction request.

Endpoint results

Finally, we can test the perfomance of our model serving using Apache Benchmark that will provide very useful information about the response of our model endpoint when it receives a prediction request.

Apache Benchmark

Now we are going to put some automation in the process by creating a pipeline. We can create a pipeline using the UI or using the SDK through a Jupyter Notebook. We have chosen to do that by the later. In the next screenshot we can see a pipeline created running.

Pipeline running

We can also create a pipeline endpoint so we can interact with it and launch the pipeline whenever is needed. In the pipeline endpoints section of the pipeline tab in Azure UI we can see the pipeline endpoint created lately.

Pipeline published

Clicking on the name of the pipeline endpoint we can see URI for the REST API of the pipeline endpoint and we can get access to different options as the parameters of the pipeline and a scheme of the associated workflow to the pipeline.

Pipeline endpoint

In the Jupyter Notebook used to interact with the Azure SDK and create our pipeline we can also see the pipeline details by RunDetails widget.

Pipeline scheme

At the end of the workflow process associated to the pipeline we can see that the pipeline run is completed.

Pipeline Scheduled completed

Screen Recording

Next it is a link to a video were all the pipeline workflow is described.

https://youtu.be/YJxICUy7xZg

Future improvements

As a improvement over the existing project maybe we can develop a better UI for interacting with our model endpoint, so the user does not need to use python scripts directly and we will have a more user friendly interface.

We can also develop some kind of stress tests using several simultaneous requests to check if our infraestructure can scale properly to fulfill a high demand period. We will mesaure this with de Apache Benchmarking and monitor it through the logs that the endpoint is generating.

About

Udacity project for nanodegree in Machine Learning Engineer using Azure

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published