This project is the second project of the Udacity's "Machine Learning Engineer with Microsoft Azure" Nanodegree. It consists of two parts:
-
In the first part we use AutoML in our bankmarketing dataset, find the best model and deploy it via a REST API. We also use the Swagger in order to see the documentation of the API that we deployed.
-
In the second part we create and publish a pipeline using the Azure ML SDK, while we use the same dataset.
Feel free to read below more details about each task.
One image corresponds to many words, so below you see the architectural diagram of the operations in this project
-
Register of the bankmarketing dataset. This dataset contains data about bank marketing campaigns based on phone calls to potential clients. The campaign has as target to convince the potential clients to make a term deposit with the bank. We seek to predict whether or not the potential client would accept to make a term deposit with the bank.
-
Running AutoML and choose classification from Azure ML studio. This will run many algorithms and finds the best algorithm based on the accuracy.
-
Deploying the best model. We choose the best model and deploy it, a REST endpoint will be made.
-
Enable the application insights, producing Swagger documentation and benchmarking the REST endpoint.
-
Consuming the REST endpoint, so we send json requests to the REST endpoint to get predictions.
We create and publish a pipeline using the Azure ML SDK. Pipeline is a workflow of a complete machine learning task, the subtasks are incorporated in the pipeline as series of steps.
-
Initialize the
Workspace, reload theExperimentthe cluster and theDatasetfrom the first task. -
Setting the AutoML settings and configuration in order to incorporate an
AutoMLStepin the pipeline. Setting alsoPipelineDatafor the metrics output and the best model output of the pipeline. -
Creation of the
Pipelineand running. -
Retrieve the metrics(such as
weighted_accuracy,AUC_weighted,average_precision_score_macro,f1_score_micro) of all child runs and the best model. -
Publishing the pipeline so that you can rerun the pipeline via the REST endpoint.
-
Making a request to the REST endpoint with a JSON payload object that has the experiment name in it.
-
We could add an alert when the dataset changes so that we could be notified and done it in an automated way and/or running a script in an automated way.
-
Increase the time of training in order to have more accurate predictions.
- After the registration of the aforementioned(above)dataset we can see the registered datasets as below:
- After the AutoML has finished you can see the status "Completed" as below, with other useful information such as the duration and the compute target.
- If you select the run of the AutoML and select "Models" you can see the list of all the models in descent order of accuracy. As you can see below the best algorithm was "Voting Ensemble".
- After the deployment of the model in order to make a REST endpoint to interact with. A pretty useful thing of the endpoint is the application insigths. With the application insights you can see the "Failed Requests", "Server Response Time" and "Server Requests" and other useful things. We enabled with running the "logs.py" script. Below you see in the first image that the "Applications Insights Enabled" as "true" and to the next three images the running of the "logs.py" in the command prompt.
- If we hit the "Swagger URI" in the deployed model(you can see it in the first of the previous 4 images) we can download the swagger.json. If we place it in the "swagger" folder and run the "bash swagger.sh"(choosing the right port in the file) and the "serve.py"(choosing the right port in the file) and open the localhost as the below images showing, you can see all the HTTP API methods and responses.
- The most useful thing about deploying a REST endpoint is to consuming it i.e to send HTTP requests with a json and to take responses. In our case the json has as values the x's of which the y's want to be predicted from the model. So we replace in the "endpoint.py" the "scoring_uri" and the "key" with the values that are visible to the endpoint in the ML studio. Then we run the "endpoint.py" and get the result as the below image depict, the response was "yes" and "no" for the first and second json respectively.
- In order to load-test our model we have run the benchmark.sh. Some requests are proceed to the endpoint and useful metrics about the total time of requests, the total requests, how many requests have been failed and other metrics as the results show below.
As you can see in the last image you see some gathered data about the requests have been done with the benchmark.
- Creation of the pipeline from the Azure SDK using the "aml-pipelines-with-automated-machine-learning-step.ipynb" notebook depicted below.
- Below we see in turn the pipeline endpoint completion, the bank marketin dataset(2 images), published pipeline overview(5 images due to small screen size).
You can see a screencast that shows to you the working deployed model endpoint, the deployed pipeline, the AutoML model and successfull API requests we have done: https://youtu.be/-XwdFxlc-lQ















benchmark1.png)
benchmark2.png)
benchmark3.png)
benchmark4.png)
benchmark5.png)












