Supervised ML Model Training: ML Model Training for Charity¶

Introduction

Training a Machine Learning Model to Find Receptive Donors for our Charity The following notebook provides a walk through of the algorithm design choices I made to create a ML model capable of selecting receptive donors as well as an analysis of those results.

Overview

Below I will employ several supervised algorithms to model individuals' income using data collected from the 1994 U.S. Census. I will then choose the best candidate algorithm from preliminary results and further optimize this algorithm to best model the data.

The goal with this implementation is to construct a model that accurately predicts whether an individual makes more than $50,000. This sort of task arises often in a non-profit setting, where organizations survive on donations. Understanding an individual's income can help a non-profit better understand how large of a donation to request, or whether or not they should reach out to begin with. While it can be difficult to determine an individual's general income bracket directly from public sources, we can infer this value from other publically available features.

Data

The dataset for this project originates from the UCI Machine Learning Repository. The datset was donated by Ron Kohavi and Barry Becker, after being published in the article "Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid".

You can find the article by Ron Kohavi online. The data we investigate here consists of small changes to the original dataset, such as removing the 'fnlwgt' feature and records with missing or ill-formatted entries.

Skewed Distributions of Continuous Census Data Features

For highly-skewed feature distributions such as 'capital-gain' and 'capital-loss', it is common practice to apply a logarithmic transformation on the data so that the very large and very small values do not negatively affect the performance of a learning algorithm. Using a logarithmic transformation significantly reduces the range of values caused by outliers. Care must be taken when applying this transformation however: The logarithm of 0 is undefined, so we must translate the values by a small amount above 0 to apply the the logarithm successfully.

Log-transformed Distributions of Continuous Census Data Features

Normalizing Numerical Features

In addition to performing transformations on features that are highly skewed, it is often good practice to perform some type of scaling on numerical features. Applying a scaling to the data does not change the shape of each feature's distribution (such as 'capital-gain' or 'capital-loss' above); however, normalization ensures that each feature is treated equally when applying supervised learners. Note that once scaling is applied, observing the data in its raw form will no longer have the same original meaning, as exampled below.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
ml_model_training_for_charity.html		ml_model_training_for_charity.html
ml_model_training_for_charity.ipynb		ml_model_training_for_charity.ipynb
ml_model_training_for_charity.pdf		ml_model_training_for_charity.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Supervised ML Model Training: ML Model Training for Charity¶

Introduction

Overview

Data

Skewed Distributions of Continuous Census Data Features

Log-transformed Distributions of Continuous Census Data Features

Normalizing Numerical Features

About

Uh oh!

Releases

Packages

Languages

BryanHolbrook/ml-model-training-charityml

Folders and files

Latest commit

History

Repository files navigation

Supervised ML Model Training: ML Model Training for Charity¶

Introduction

Overview

Data

Skewed Distributions of Continuous Census Data Features

Log-transformed Distributions of Continuous Census Data Features

Normalizing Numerical Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages