diff --git a/examples/bankDefault.ipynb b/examples/bankDefault.ipynb new file mode 100644 index 00000000..2097cc59 --- /dev/null +++ b/examples/bankDefault.ipynb @@ -0,0 +1,542 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Tutorial of using AIF 360 to predict default of credit cards with adjusted bias on sex\n", + "\n", + "Author: Yi Wang\n", + "\n", + "GitHub: https://github.com/yahowang\n", + "\n", + "Data source: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients\n", + "\n", + "Idea Inspiration: https://nbviewer.jupyter.org/github/IBM/AIF360/blob/master/examples/tutorial_credit_scoring.ipynb\n", + "\n", + "Public Paper: Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.\n", + "\n", + "License: MIT License" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "## Intro\n", + "Because of the advance of financial crediting system, customers are able to make an expense with their creditability and pay off in the future. Yet, it is challenging yet vital for banks to determine whether their customers would have the ability to pay off their credits to minimize the risk of capital loss. Therefore, one of the ways of doing so is to track historical transactions and payments to target potential customers who will not meet payoff requirements. These customers will be classified and will have a greater change of fedault in the future. \n", + "\n", + "In this task, the dataset was obtained from banks in Taiwan and was meant to target the case of customers default payments. This is a binary classification problem and the target valuable is whether acustomer will default or not based on different features and payment histories. \n", + "\n", + "This dataset contains more females than males and the males have more instances being credit card default. With the use of AI Fairness 360 (AIF 360), we can adjust the bias on sex and compare the performance of a simple model on the original and transformed dataset.\n", + "\n", + "## Feature Description\n", + "Since UCI Machine Learning Repository has some wrong information about feature decriptions, I will list the features in detail as below.\n", + "\n", + "$X_1$: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.\n", + "\n", + "$X_2$: Sex (1 = male; 2 = female).\n", + "\n", + "$X_3$: Education (1 = graduate school; 2 = university; 3 = high school; 0,4,5,6 = others).\n", + "\n", + "$X_4$: Marriage (1 = married; 2 = single; 3 = divorce; 0 = others).\n", + "\n", + "$X_5$: Age (year).\n", + "\n", + "$X_6$ - $X_{11}$: History of past payment (with 10 categories for each feature).\n", + "- The measurement scale for the repayment status is: \n", + " - -2 = No consumption\n", + " - -1 = payment in full\n", + " - 0 = minimum amount of payment\n", + " - 1 = payment delay for one month\n", + " - 2 = payment delay for two months\n", + " - 3 = payment delay for three months\n", + " - 4 = payment delay for four months\n", + " - 5 = payment delay for five months\n", + " - 6 = payment delay for six months\n", + " - 7 = payment delay for seven months\n", + " - 8 = payment delay for eight months\n", + " - 9 = payment delay for nine months and above\n", + "- $X_6$ = the repayment status in September, 2005 \n", + "- $X_7$ = the repayment status in August, 2005\n", + "- $X_8$ = the repayment status in July, 2005\n", + "- $X_9$ = the repayment status in June, 2005\n", + "- $X_{10}$ = the repayment status in May, 2005\n", + "- $X_{11}$ = the repayment status in April, 2005\n", + "\n", + "$X_{12}$-$X_{17}$: Amount of bill statement (NT dollar)\n", + "- $X_{12}$ = amount of bill statement in September, 2005\n", + "- $X_{13}$ = amount of bill statement in August, 2005\n", + "- $X_{14}$ = amount of bill statement in July, 2005\n", + "- $X_{15}$ = amount of bill statement in June, 2005\n", + "- $X_{16}$ = amount of bill statement in May, 2005\n", + "- $X_{17}$ = amount of bill statement in April, 2005\n", + "\n", + "$X_{18}$-$X_{23}$: Amount of previous payment (NT dollar).\n", + "- $X_{18}$ = amount paid in September, 2005\n", + "- $X_{19}$ = amount paid in August, 2005\n", + "- $X_{20}$ = amount paid in July, 2005\n", + "- $X_{21}$ = amount paid in June, 2005\n", + "- $X_{22}$ = amount paid in May, 2005\n", + "- $X_{23}$ = amount paid in April, 2005.\n", + "\n", + "$X_{24}$: Whether the customer will default in the next session (1 = Yes; 0 = No)" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "## Loading Raw Data and Preprocessing" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "First we need to load the data and wrap it in a standard AIF360 dataset such that many metrics can be used later." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from matplotlib import pylab as plt\n", + "import matplotlib\n", + "from sklearn.preprocessing import OrdinalEncoder\n", + "from sklearn.preprocessing import MinMaxScaler\n", + "from sklearn.preprocessing import OneHotEncoder\n", + "from sklearn.preprocessing import StandardScaler\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from aif360.datasets import StandardDataset\n", + "from aif360.metrics import BinaryLabelDatasetMetric\n", + "from aif360.algorithms.preprocessing import Reweighing\n", + "from IPython.display import Markdown, display\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.metrics import accuracy_score, roc_curve, auc" + ], + "outputs": [], + "execution_count": 16, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "The customeized preprocessing function for this dataset" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "def preprocess(df):\n", + " \"\"\"\n", + " Customized preprocessing method for the dataset.\n", + " Input: original df\n", + " Output: preprocessed df\n", + " \"\"\"\n", + " # Standard Scaler\n", + " scaler = StandardScaler()\n", + " continuous =['LIMIT_BAL','BILL_AMT1','BILL_AMT2','BILL_AMT3','BILL_AMT4','BILL_AMT5','BILL_AMT6','PAY_AMT1','PAY_AMT2','PAY_AMT3','PAY_AMT4','PAY_AMT5','PAY_AMT6']\n", + " df[continuous] = scaler.fit_transform(df[continuous])\n", + "\n", + " # Payment status contains a mixture of qualitative and quantitative values\n", + " # Thus, it needs a special preprocessing to decouple the mixture\n", + " mixed_features = [\"PAY_0\", \"PAY_2\", \"PAY_3\", \"PAY_4\", \"PAY_5\", \"PAY_6\"]\n", + " for mixed_feature in mixed_features:\n", + " for i in range(-2,1):\n", + " df[mixed_feature + \"_\" + str(i)] = [1 if t else 0 for t in df[mixed_feature] == i]\n", + " df[mixed_feature] = [0 if t == i else t for t in df[mixed_feature]]\n", + "\n", + " # One Hot\n", + " ohe = OneHotEncoder(sparse=False, handle_unknown = \"ignore\")\n", + " features = [\"EDUCATION\", \"MARRIAGE\"]\n", + " temp = ohe.fit_transform(df[features])\n", + "\n", + " for i in range(len(ohe.get_feature_names())):\n", + " feature_name = ohe.get_feature_names()[i].split(\"_\")\n", + " feature_name = features[int(feature_name[0][1])] + \"_\" + feature_name[1] \n", + " temp_i = pd.DataFrame(temp[:, i], columns = [feature_name])\n", + " df = pd.concat([df, temp_i], axis = 1)\n", + "\n", + " for feature in features:\n", + " df = df.drop(feature, axis = 1)\n", + "\n", + " # Minmax\n", + " continuous = [\"AGE\"] + mixed_features\n", + " minmax = MinMaxScaler()\n", + " df[continuous] = minmax.fit_transform(df[continuous])\n", + "\n", + " return df" + ], + "outputs": [], + "execution_count": 2, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "#### Dataset Wrapper" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "Here we convert the raw data to a standard dataset. We need to protect the feature that has bias. Here we should pretect 'SEX' in our data. We also need to specify the privileged group and unprivileged group in the feature. Here, the privileged group is male and unprivileged group is the female." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "file_path = \"https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls\"\n", + "df = pd.read_excel(file_path, skiprows = 1)\n", + "df['SEX'] = df['SEX']-1 # make male to 0 and female to 1\n", + "\n", + "privileged_groups = [{'SEX': 0}] # male\n", + "unprivileged_groups = [{'SEX': 1}] # female\n", + "label_name = \"default payment next month\"\n", + "favorable_classes = [1] # 1 means default of credit being true\n", + "protected_attribute_names = ['SEX']\n", + "privileged_classes = [[0]] # male\n", + "\n", + "metadata={'label_maps': [{1: 'Default', 0: 'Not Default'}]}\n", + "\n", + "orig_data = StandardDataset(df, \n", + " label_name, \n", + " favorable_classes, \n", + " protected_attribute_names, \n", + " privileged_classes,\n", + " custom_preprocessing=preprocess)\n", + "\n", + "np.random.seed(2080)\n", + "dataset_orig_train, dataset_orig_vt = orig_data.split([0.7], shuffle=True)\n", + "dataset_orig_valid, dataset_orig_test = dataset_orig_vt.split([0.5], shuffle=True)" + ], + "outputs": [], + "execution_count": 3, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "## Metric for the original dataset" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "Now that we have identified the protected attribute 'SEX' and defined privileged and unprivileged values, we can use aif360 to detect bias in the dataset. One simple test is to compare the percentage of favorable results for the privileged and unprivileged groups, subtracting the former percentage from the latter. A negative value indicates that females are less likely having credit card default in the future. This is implemented in the method called mean_difference on the BinaryLabelDatasetMetric class. The code below performs this check and displays the output, showing that the difference is -0.034253." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "# Metric for the original dataset\n", + "metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, \n", + " unprivileged_groups=unprivileged_groups,\n", + " privileged_groups=privileged_groups)\n", + "display(Markdown(\"#### Original training dataset\"))\n", + "print(\"Difference in mean outcomes between unprivileged and privileged groups = %f\" % metric_orig_train.mean_difference())" + ], + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/markdown": [ + "#### Original training dataset" + ], + "text/plain": [ + "" + ] + }, + "metadata": {} + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Difference in mean outcomes between unprivileged and privileged groups = -0.034253\n" + ] + } + ], + "execution_count": 58, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "## Transform the training data\n", + "\n", + "We do see some bias between females and males. Now we need to adjust for this bias in our training. Reweighing is a preprocessing technique that reweighs each (group, label) combination differently to ensure fairness before classificatio" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "RW = Reweighing(unprivileged_groups=unprivileged_groups,\n", + " privileged_groups=privileged_groups)\n", + "RW.fit(dataset_orig_train)\n", + "dataset_transf_train = RW.transform(dataset_orig_train)" + ], + "outputs": [], + "execution_count": 4, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "Check the mean difference between two groups again with the mean difference." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, \n", + " unprivileged_groups=unprivileged_groups,\n", + " privileged_groups=privileged_groups)\n", + "display(Markdown(\"#### Transformed training dataset\"))\n", + "print(\"Difference in mean outcomes between unprivileged and privileged groups = %f\" % metric_transf_train.mean_difference())" + ], + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/markdown": [ + "#### Transformed training dataset" + ], + "text/plain": [ + "" + ] + }, + "metadata": {} + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Difference in mean outcomes between unprivileged and privileged groups = -0.000000\n" + ] + } + ], + "execution_count": 5, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "## Model comparison with fittings on different datasets\n", + "\n", + "We would like to see if the preprocssing of data would hurt the performance of the model. So we trained a simple Random Forest classifier on both original and transformed training data, then compare the Area Under ROC, namely AUC, on testing data." + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "### Fit a Random Forest classifier with the original data" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "X_train = dataset_orig_train.features\n", + "y_train = dataset_orig_train.labels.ravel()\n", + "X_valid = dataset_orig_valid.features\n", + "y_valid = dataset_orig_valid.labels.ravel()\n", + "X_test = dataset_orig_test.features\n", + "y_test = dataset_orig_test.labels.ravel()\n", + "\n", + "random_state = 2080\n", + "max_depth = range(1,10) # max depths that the model can split further\n", + "minimum_splits = range(2,10) # minimum splits that the model can split further\n", + "accuracies = [] # validation accuracies\n", + "models = []\n", + "\n", + "# finding the best metrics\n", + "for d in max_depth:\n", + " for s in minimum_splits:\n", + " rfc = RandomForestClassifier(n_estimators=50, min_samples_split=s, max_depth=d, random_state=random_state)\n", + " rfc.fit(X_train,y_train, sample_weight=dataset_orig_train.instance_weights)\n", + " accuracies.append(accuracy_score(y_valid, rfc.predict(X_valid)))\n", + " models.append(rfc)\n", + "\n", + "# find the best model based on the validation accuracies\n", + "model = models[np.argmax(accuracies)]\n", + "\n", + "# calculate the false positive rate and true positive rate of our best model\n", + "fpr, tpr, threshold = roc_curve(y_test, model.predict_proba(X_test)[:,1])" + ], + "outputs": [], + "execution_count": 17, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "#### Visualize the ROC for model fitted on the original data" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % auc(fpr, tpr))\n", + "plt.title('Receiver Operating Characteristic RF (Original Data)')\n", + "plt.legend(loc = 'lower right')\n", + "plt.plot([0, 1], [0, 1],'r--')\n", + "plt.xlim([0, 1])\n", + "plt.ylim([0, 1])\n", + "plt.ylabel('True Positive Rate')\n", + "plt.xlabel('False Positive Rate')\n", + "plt.show()" + ], + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": [ + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + } + } + ], + "execution_count": 18, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "### Fit a Random Forest classifier with the transformed data" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "X_train = dataset_transf_train.features\n", + "y_train = dataset_transf_train.labels.ravel()\n", + "\n", + "random_state = 2080\n", + "max_depth = range(1,10)\n", + "minimum_splits = range(2,10)\n", + "accuracies = []\n", + "models = []\n", + "\n", + "# finding the best metrics\n", + "for d in max_depth:\n", + " for s in minimum_splits:\n", + " rfc = RandomForestClassifier(n_estimators=50, min_samples_split=s, max_depth=d, random_state=random_state)\n", + " rfc.fit(X_train,y_train, sample_weight=dataset_transf_train.instance_weights)\n", + " accuracies.append(accuracy_score(y_valid, rfc.predict(X_valid)))\n", + " models.append(rfc)\n", + "\n", + "# find the best model according to the accuracy score\n", + "model = models[np.argmax(accuracies)]\n", + "\n", + "# calculate the false positive rate and true positive rate of our best model\n", + "fpr, tpr, threshold = roc_curve(y_test, model.predict_proba(X_test)[:,1])" + ], + "outputs": [], + "execution_count": 19, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "#### Visualize the ROC for model fitted on the transformed data" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % auc(fpr, tpr))\n", + "plt.title('Receiver Operating Characteristic RF (Transformed Data)')\n", + "plt.legend(loc = 'lower right')\n", + "plt.plot([0, 1], [0, 1],'r--')\n", + "plt.xlim([0, 1])\n", + "plt.ylim([0, 1])\n", + "plt.ylabel('True Positive Rate')\n", + "plt.xlabel('False Positive Rate')\n", + "plt.show()" + ], + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": [ + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + } + } + ], + "execution_count": 20, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "## Conclusion" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "We do see that mitigating the gender effect on the model can reduce the mean difference between males and female in terms of their proportion of being default. The model fitted on the transformed data can also perform more fairly than the original dataset in terms of ethics. The ROC curves are a bit different but the AUC is not impacted largely, which is a good thing we would like to observe. This means that the model fitted on unbiased 'SEX' groups can evaluate people more fairly without paying too much attention on their sex." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [], + "outputs": [], + "execution_count": null, + "metadata": {} + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.7.6", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + }, + "nteract": { + "version": "0.22.4" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file