GitHub - Tsihan/567Project_HomeCredit: This is the group project for CSCI 567 24 Spring.

Environment Settings

To run these notebook codes, you'd better use conda to manage a virtual environment. The recommended version of Python should be 3.11.X. We assume you will use Anaconda, then the command is conda create -n HomeCredit python=3.11 -y and conda activate HomeCredit(If you use some IDE like Pychram you can do it in a graphic way).

And also remember to install the required libraries in the requirements.txt file. use this command: pip install -r requirements.txt. Some of the libraries are unnecessary, but for simplicity we don't filter them out.

Run the code

To run the notebook, you can simply run them either by VS Code or Pycharm. If you want to run the code regarding to the Neuron Network model, you need to be careful of your OS. To comment and uncomment some code snippets to use GPU or MPS (Metal Performance Shaders). Or just use the CPU for a simple test.

How to get the data

The data is available here: https://www.kaggle.com/competitions/home-credit-credit-risk-model-stability/data. If you don't want to register, we also provide them here: https://drive.google.com/file/d/17u2HrtrU8T3aG50jjeKn5xqRwa-6B24w/view?usp=drive_link. We use CSV files to do the competition, the complete size is 3GB or so when compressed, and 22GB or so when uncompressed. Take a look at the files' positions in the code, and make sure to put them in the correct place(so the test folder, the train folder, and these notebooks should be in parallel positions). The feature_definitions.csv and sample_submission.csv won't be used in the model, just to understand the feature defined in the CSV files and the submission form.

Notebook File Names

The names start with data_processed_frequence_encoded means we use 255 features and do the frequency encoding. The names start with eg means we just use limited features to do the running. There are some in-between files which you can understand according to the names.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.gitignore		.gitignore
README.md		README.md
create_dataset.py		create_dataset.py
data_process_PCA.ipynb		data_process_PCA.ipynb
data_process_catboost.ipynb		data_process_catboost.ipynb
data_process_catboost_frequence_encode.ipynb		data_process_catboost_frequence_encode.ipynb
data_process_lightgbm.ipynb		data_process_lightgbm.ipynb
data_process_simple_stacking.ipynb		data_process_simple_stacking.ipynb
data_process_simple_stacking_frequence_encode.ipynb		data_process_simple_stacking_frequence_encode.ipynb
data_processed_frequence_encoded_catboost.ipynb		data_processed_frequence_encoded_catboost.ipynb
data_processed_frequence_encoded_extra_tree.ipynb		data_processed_frequence_encoded_extra_tree.ipynb
data_processed_frequence_encoded_lightgbm.ipynb		data_processed_frequence_encoded_lightgbm.ipynb
data_processed_frequence_encoded_random_forest.ipynb		data_processed_frequence_encoded_random_forest.ipynb
data_processed_frequence_encoded_ridge_regression.ipynb		data_processed_frequence_encoded_ridge_regression.ipynb
data_processed_frequence_encoded_simple_avg.ipynb		data_processed_frequence_encoded_simple_avg.ipynb
data_processed_frequence_encoded_simple_stacking.ipynb		data_processed_frequence_encoded_simple_stacking.ipynb
data_processed_frequence_encoded_xgboost.ipynb		data_processed_frequence_encoded_xgboost.ipynb
eg_CNN.ipynb		eg_CNN.ipynb
eg_MLP.ipynb		eg_MLP.ipynb
eg_catboost.ipynb		eg_catboost.ipynb
eg_catboost_frequence_encode.ipynb		eg_catboost_frequence_encode.ipynb
eg_extra_tree.ipynb		eg_extra_tree.ipynb
eg_lightgbm.ipynb		eg_lightgbm.ipynb
eg_logistic_regression.ipynb		eg_logistic_regression.ipynb
eg_random_forest.ipynb		eg_random_forest.ipynb
eg_ridge_regression.ipynb		eg_ridge_regression.ipynb
eg_simple_avg.ipynb		eg_simple_avg.ipynb
eg_simple_stacking.ipynb		eg_simple_stacking.ipynb
eg_simple_stacking_fulldata.ipynb		eg_simple_stacking_fulldata.ipynb
eg_two_layer_stacking_v1.ipynb		eg_two_layer_stacking_v1.ipynb
eg_two_layer_stacking_v2.ipynb		eg_two_layer_stacking_v2.ipynb
eg_xg_boost.ipynb		eg_xg_boost.ipynb
feature_definitions.csv		feature_definitions.csv
pca_visualization.png		pca_visualization.png
plot.ipynb		plot.ipynb
requirements.txt		requirements.txt
sample_submission.csv		sample_submission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Environment Settings

Run the code

How to get the data

Notebook File Names

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Tsihan/567Project_HomeCredit

Folders and files

Latest commit

History

Repository files navigation

Environment Settings

Run the code

How to get the data

Notebook File Names

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages