Skip to content

Tsihan/567Project_HomeCredit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Environment Settings

To run these notebook codes, you'd better use conda to manage a virtual environment. The recommended version of Python should be 3.11.X. We assume you will use Anaconda, then the command is conda create -n HomeCredit python=3.11 -y and conda activate HomeCredit(If you use some IDE like Pychram you can do it in a graphic way).

And also remember to install the required libraries in the requirements.txt file. use this command: pip install -r requirements.txt. Some of the libraries are unnecessary, but for simplicity we don't filter them out.

Run the code

To run the notebook, you can simply run them either by VS Code or Pycharm. If you want to run the code regarding to the Neuron Network model, you need to be careful of your OS. To comment and uncomment some code snippets to use GPU or MPS (Metal Performance Shaders). Or just use the CPU for a simple test.

How to get the data

The data is available here: https://www.kaggle.com/competitions/home-credit-credit-risk-model-stability/data. If you don't want to register, we also provide them here: https://drive.google.com/file/d/17u2HrtrU8T3aG50jjeKn5xqRwa-6B24w/view?usp=drive_link. We use CSV files to do the competition, the complete size is 3GB or so when compressed, and 22GB or so when uncompressed. Take a look at the files' positions in the code, and make sure to put them in the correct place(so the test folder, the train folder, and these notebooks should be in parallel positions). The feature_definitions.csv and sample_submission.csv won't be used in the model, just to understand the feature defined in the CSV files and the submission form.

Notebook File Names

The names start with data_processed_frequence_encoded means we use 255 features and do the frequency encoding. The names start with eg means we just use limited features to do the running. There are some in-between files which you can understand according to the names.

About

This is the group project for CSCI 567 24 Spring.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •