GitHub - drkarthi/driven-data-predicting-poverty: My submission for the 'Predicting Poverty' data science competition on DrivenData.

The 'Predicting Poverty' data science competition

This is my submission for the 'Predicting Poverty' data science competition hosted by the World Bank on DrivenData (https://www.drivendata.org/competitions/50/worldbank-poverty-prediction/). The goal of the competition is to predict if a household is poor or not. The input data came from household-level and individual-level World Bank surveys for three countries. The column names and categorical column values were anonymized to keep the countries anonymous. The evaluation metric used for the leaderboard was mean log loss. I finished the competition with a mean log loss score of 0.1625 (the top score was 0.1480) and ranked 126 out of 2500+ contestants.

Summary of my approach

I used a regularized logistic regression model to classify if the household is poor and predict the probability. I chose the parameters of the regression model and lasso vs ridge regression using k-fold cross validation (with k = 5). I decided to start with logistic regression since it is easy to implement and good for predicting probabilities. I compared the performance against the random forest model (from the h20 library) and found that regularized logistic regression performed better on my features for this dataset.

Some of the other techniques that helped improve the performance of the classifier are removing columns with high missing data, imputing missing data and dropping features that have low variation. Please refer to the IPython notebook for the code and more details on the approach.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
output		output
README.md		README.md
predict.py		predict.py
predict_poverty.ipynb		predict_poverty.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The 'Predicting Poverty' data science competition

Summary of my approach

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The 'Predicting Poverty' data science competition

Summary of my approach

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages