Loans are the core business of loan companies. The main profit comes directly from the loan’s interest. The loan companies grant a loan after an intensive process of verification and validation. However, they still don’t have assurance if the applicant is able to repay the loan with no difficulties.
In this tutorial, we’ll build a predictive model to predict if an applicant is able to repay the lending company or not. We will prepare the data using Watson Studio's Refinery and then build a model in two ways: using SPSS Modeler and using the new AutoAI feature of Watson Studio. Finally, we will deploy a web application that will use either of these two models.
After completing this tutorial, you’ll understand how to:
- Add and prepare your data
- Build a machine learning model using two different techniques
- Save & Deploy the models
- Use the models from a web application
In order to complete this tutorial, you will need:
- IBM Cloud account.
- Object Storage Service.
- Watson Studio Service.
- Machine Learning Service.
Services will be deployed in the next steps.
The overall time of reading and following this tutorial is approximately one hour.
The dataset is taken from from Analytics Vidhya but data is added to the data folder for your convenience.
The format of the training data in train_loan.csv is:
- Loan_ID Unique Loan ID
- Gender Male/ Female
- Married Applicant married (Y/N)
- Dependents Number of dependents
- Education Applicant Education (Graduate/ Under Graduate)
- Self_Employed Self employed (Y/N)
- ApplicantIncome Applicant income
- CoapplicantIncome Coapplicant income
- LoanAmount Loan amount in thousands
- Loan_Amount_Term Term of loan in months
- Credit_History Credit history meets guidelines
- Property_Area Urban/ Semi Urban/ Rural
- Loan_Status Loan approved (Y/N) --> This is the target to predict.
test_loan.csvfile does not provide this field.
- Create a project in Watson Studio
- Upload the dataset to Watson Studio
- Refine the train dataset, using Watson Studio Refinery capability
- Build a visual flow model and deploy it as a web service with no coding
- Build an alternative model using AutoAI capabilities of Watson Studio & Watson Machine Learning
- Deploy a client Web Application
If it does not exist yet, go to the IBM Cloud Catalog and create an instance of Watson Studio, selecting the Lite plan and a location such as London or Dallas, for example.

Navigate to either https://eu-gb.dataplatform.cloud.ibm.com or https://dataplatform.cloud.ibm.com, depending if you want to work in London or Dallas, and log in with your IBM Cloud credentials.

From that Watson Studio main page, click on New project. Choose Create an empty project. Once you enter your project name, if there is no Cloud Object Storage associated to the project, click on Add to select a new storage service.
A new tab will open into the new Cloud Object Storage (COS) service. Select New, ensure the selected plan is Lite and press Create. A new pop up will show the creation confirmation. You may change the name of the service instance, and then press Create

The tab will be closed and you will return to the Watson Studio project creation screen. Just press Refresh to load the newly created COS instance. Finally, press the Create button to create the project in Watson Studio.
In the new project screen, select the Assets menu, on the top.
It will open the Find and add data section on the right-side panel. In the Load area, drag and drop the two dataset files (train_loan.csv and test_loan.csv files) from your computer, under the data directory of the git clone, to that area.

- On the asset page, click on the
train_loan.csvdata asset. A new screen will open
<br/
You can see that all the columns have been identified asstrings. Although there are different ways to fix it, let's use the Refinery capability to adjust the data without the need of any programming.
- Press the
Refinebutton, on the top right. The tool will start, analyze the data set and present a new screen. You will note that a list of transformation steps has been created and an initial step ofConvert column typehas been automatically added. This step has adjusted the types of all columns to its best-fit type. So it has convertedApplication IncometoInteger,CoaplicantIncometoDecimalandLoanAmount,Loan_Amount_TermandCredit_HistoryallInteger. You can re-visit how the file is transformed on each step.
- Now, select the column
Loan_Status, which is the target column we will predict, and press theOperationsbutton. - Scroll down under
Organizeand selectConditional Replace
- Add two conditions. One where the field
is equal toYand replace by1. Replace any remaining values with value0. PressApplybutton, at the bottom of the page. A new step is created. It says:Replaced values for Loan_Status: Loan_Status where value contains "Y" as "1". Replaced all remaining values with "0". - Finally, convert the column type for
Loan_Statusfromstringtointeger. To do that, press the context button of the column, then selectConvert Column Typeand select the suggested typeInteger
There will be 3 steps recorded. - Press the
savebutton, and then theeditbutton, to adjust the output options.
- In the new screen, press the
editicon to change the name of the output file. Puttrain_shaped.csv. We will use this file later. You can also see the tool supports different file formats. We will leave CSV as it is. - Press
savebutton to save the new names and then press theDonebutton to return back to the Refinery screen. - Press the
savebutton again, and then theplaybutton (selectSave and create a job) - A new
job creationscreen will open. Here it is possible to configure a single run or schedule a recurrent run of the transformation flow. Put a name for the job and finally pressCreate and Runin the bottom right part of the screen.
- A new job execution window will show. It will show the job is running. Just return to the project assets page. You will see a new data asset named
train_shaped.csv. You can preview it and validate that all the changes defined have been applied.
- On the same Assets page, select
Add to Projectand from the different options selectModeler flows. - Under the
New Modeler Flowscreen, name your modeler flow asLoan Eligibility Predictive model, and ensure the selected runtime isIBM SPSS Modeler - Click Create.
- Add data to the canvas using the
Data Assetnode. - Double click on the node and click
Change Data Assetto open the Asset Browser. Selecttrain_shaped.csvthen clickOKandSave.
Let’s look into the summary statistics of our data using the Data Audit node.
- Drag and drop the
Data Auditnode, and connect it with theData Assetnode. After running the node you can see your audit report on right side panel.
We can see that some columns have missing values. Let’s remove the rows that have null values using the Select node.
- Drag and drop the
Selectnode, connect it with theData Assetnode and right click on it and open the node. - Select discard mode and provide the below condition to remove rows with null values.
(@NULL(Gender) or @NULL(Married) or @NULL(Dependents) or @NULL(Self_Employed) or @NULL(LoanAmount) or @NULL(Loan_Amount_Term) or @NULL(Credit_History))
Now our data is clean, and we can proceed with building the model.
- Drag and Drop the
Typenode to configure variables type, fromField Operationspalette. - Double click the node or right click to open it.
- Choose
Configure Typestoreadthe metadata. - Change the Role from the drop down menu of [Loan_Status] from
InputtoTarget. - Change the Role drop down menu of [LoanID] from
nonetoRecord ID. - Click
Save.
The model predicts the loan eligibility of two classes (Either Y:Yes or N:No). Thus, the choice of algorithms fell into Bayesian networks since it’s known to give good results for predicting classification problems.
-
Split data into training and testing sets using the
Partitionnode, fromField Operationspalette. (we are not going to use thetest_loan.csvas that file does not contain a target to validate the training) -
Double click the
Partitionnode to customize the partition size into80:20, change the ratio in theTraining Partitionto80andTesting Partitionto20.
-
Drag and drop the
Bayes Netnode from theModelingPalette. -
Double click the node to have a look to the settings. This time we are not going to touch anything.

-
Run your
Bayesian Networknode, then you’ll see your model in an orange colored node.
- Right click on the orange colored node, then click on
View. - Now you can see the
Network Graphand other model information here.
- Drag and drop the
Analysisnode from theOutputsection, and connect it with the model. - After running the node, you can see your analysis report on the right side panel.

The analysis report shows we have achieved 75.22% accuracy (it might be different) on our test data partition with this model. At the end, you can build more models within the same canvas until you get the result you want.
Let's build the inference flow with the data structure that will be used during inference, which is different to that used for training.
- As in step 4.2, add data to the canvas using the
Data Assetnode. - Double click on the new node and click
Change Data Assetto open the Asset Browser. Selecttest_loan.csvthen clickOKandSave. - Delete the connection from
Partitionto the modelLoan Status(yellow node), by selecting it and opening the contextual menu pressing the right mouse button and pressingdelete. - Connect the new
Data Assetnode to theLoan Statusmodel (yellow node). - Drag and drop the
Tablenode from theOutputsection, and connect it with the model (yellow node).
Right-click on the Table node and select Save branch as a model. If a Watson Machine Learning instance does not exist in the project, then the following screen may appear. Click on Create a new Watson Machine Learning service instance.

A new tab will open to create a new instance of Watson Machine Learning.
- Select
Newand select the Lite plan and a location such asLondonorDallas, ideally the same you chose for Watson Studio. - Then press
Create. A new dialog pops up. - Review the options, change the
Service nameif you want, and pressConfirm.
The tab will close and you will go back to the SPSS Flow screen. Just repeat the right-click on theTablenode and selectSave branch as a model. TheSave Modelscreen will open.
- Put a meaningfull model name, review that the
Tablebranch is selected and finally, pressSaveYou will see a confirmation message and when accepted you will return back to the SPSS Flow Editor. Just click on the project name on the top and return to the Asset page. In the Asset page underWatson Machine Learning modelsyou can access your saved model.
- Select the model.
A new screen opens. There you can see some model information, as well as the input and output schema.

- Select the
Deploymentstab. - Select the
Add Deploymentlink. - Add a name for the deployment. (ej.
SPSS Deploy model) - Click on
SaveIt will take some time, and you may need to refresh the screen before showing:
YES! IT HAS FAILED!! Why?
Well, guessing it is part of the exercise!
Let's see the error message... - Select the `SPSS Deploy model` (or whatever the name you gave to the model) link - In the new screen, select the `Details` tab, and check the error message:
Think on it and answer the questions in the exercise guide!
Step 5: Build an alternative model using AutoAI capabilities of Watson Studio & Watson Machine Learning
- On the Assets page, select
Add to Projectand from the different options selectAutoAI experiment. The following screen appears
- Put a name for the model
- Check the WML instance created in the previous step is selected and press
CreateIn the new screen, selectSelect from Projectto select a data asset as the data to train the model. Currently this is limited to a single CSV file. Soon it will also support selecting a database connection, including table joins. A new pop up dialog appears.
- Select
train_shaped.csv - Press
Select AssetThe file is added as a data source, as shown in the next screen.
- In the right column, select the column name that contains the target to predict (
Loan_Status) - Press
Experiment Settingsto further customize the experiment.
- In the configuration screen, adjust the train / test split to 85/15
- Uncheck the Loan_ID as it is not a valid feature. It is only the record id.
- Press
Predictionto see the available configuration, although do not change anything. - Press
Generalto see the available configuration, although do not change anything. - Finally, press
Save settingsWhen the browser returns to the previous screen, pressRun experiment
This is the initial screen, when AutoAI starts its calculation.

After some minutes, the animation will evolve and when it finishes will be similar to this other picture

- Click on
Pipeline comparisonto visualize more metrics comparing the four experiments.
- Click on
Holdoutto visualize the metrics with the 15% of holdout data instead of crosvalidation. (results are slightly worse) - Explore the comparison and then click on the first and best experiment (
Pipeline 3in this case). Another screen will open.
- Check the different sections. The picture above shows the Confusion Matrix
- Finally select
Save asandModelA pop up dialog will appear. Accept the content as it is and press "Save" Click to return to the "Asset" windows. You will see the new model there.
- Click on the model name to open it.

- Click on
Deployments - In the new screen select
Add deploymentlink. - In the new screen put a deployment name, such as
AutoAI deployed modeland click onSaveAs before wait a minute, and refresh the screen
- Check the status is
ready - Click on the deployment name (
AutoAI deployed modelin this case) - In the new screen, click on
Implementationon the top. - In the
Code Snippetstab select thePythonas language. - Copy the example code in
Python. Save the code aside to be reused later.
We will use a notebook to validate that the deployed model works fine as a web service.
- Go to the main asset page, click on
Add to Projectbutton and selectNotebook. A pop up screen will appear. - Press the
From URLtab. - Add the following URL
https://raw.githubusercontent.com/jaumemir/watson_studio/master/assets/ScoringSimulation.ipynbwhich points to a file in this same github repository. Name it, and select a runtime environment (the free one is enougth).
- Press
Create Notebook - Watson Studio will instantiate an environment and the Jupyter Notebook screen will open.

- Press the
databutton. A side window on the right will show the data assets. - Ensure the first empty cell is selected and under the
test_loan.csvfilename, expand theinsert to codedroplist and selectinsert pandas dataframe. - Ensure the last variable name generated is
df_data_1. Rename anydf_data_2todf_data_1if needed.
- Save the notebook pressing the save button and then run the cell. You should get a table showing the first 5 rows.
- Run the second cell. It will show a description of the types of each columns of the dataset.
- Run third cell, that will sample some records in a python list format, ready for next step.
- Run 4th cell, that will prepare the payload message that will be sent for scoring.
- In the 5th cell, copy the WML credentials from the IBM Cloud WML instance to the cell and run it. If you don't have the credentials in hand, go to IBM Cloud resource list and find the WML instance. Click on it and navegate to the
Credentialsmenu, where you will find them.
Once done, you should have a notebook like the one in the figure.
- Run 6th cell, which will retrieve the authentication token from the IBM Cloud IAM service.
- In the 7th cell put the code you saved from step 5.3.
- From the pasted code, remove or comment out the line starting with
payload_scoring = - Execute that cell.
You should get the scoring results, with an array, where each element contains the predicted target and the confidence (or probability) for each of the two possible classes (0,1).

In this step, you will deploy a web application that will call the published AutoAI web service endpoint to get the loan granting decision.

Click Deploy to IBM Cloud button above to deploy the application to IBM Cloud.
The IBM Cloud DevOps will open and a new Toolchain will be created for you.

- Just click to
Delivery Pipelineand then toNewkey to generate a new key. PressOKin the emerging dialog. - Region, organization and space are populated automatically. If organization and space are not populated, change the region until you find all fields are populated. Check they are correct and then press
Createon the top right
The toolchain and delivery pipeline will be created and launched its execution.
Click onDelivery Pipelineto access and monitor how the toolchain builds the application and deploys it as an IBM Cloudfoundry node.js application.
When it completes, click onView consoleor if there is any problem, just go toResource List, find the applicationwatson_studio-202001nnnnnnnnn(beingnnndigits) and open it.
- Click on the
Runtimemenu
- Click on
Environment Variablesand scroll down - Fill in the three environment variables. The needed values can be found in the notebook from Step 5
- Find
APIKEYin thewml_credentialsdictionary, in cell 5 - Find
ML_INSTANCE_IDalso in thewml_credentialsdictionary, in cell 5 - Find
WML_URLin cell 7, in the code:response_scoring = requests.post(https://eu-gb.ml.cloud.ibm.com/v4/deployments/b67c9df3-535f-4b98-ba55-71dc811e36f5/predictions, json=payload_scoring, headers=header). The selected URL is the value you need. - Press
SaveThe application will restart.
Once restarted, click onVisit App URL. The application screen will open
- Put some values at your criteria
- Press
Send Data to WatsonYou will see the results of the prediction.
You have learned how to create a complete predictive model without programming: from importing the data, preparing the data, to training and saving the model. You also learned how to use SPSS Modeler and AutoAI and export the model from AutoAI to Watson Machine Learning, where you deployed the model as a web service. Then you created a notebook to test the model as a web service implementation and finally you have deployed a web application that consumes the web service and show the model results.
- Adapted from the original tutorial from Hissah AlMuneef | Published January 18, 2019

