This is the GitHub Repository for the Final Project of the discipline DS-GA 1019 - Advanced Python for Data Science at New York University (Spring 2023).
Authors:
- Jennifer Rodriguez-Trujillo (NYU CDS 2023 - jr5951@nyu.edu)
- Joseph Schuman (NYU CDS 2023 - js12580@nyu.edu)
- Khevna Parikh (NYU CDS 2023 - kp2936@nyu.edu)
- Kristin Mullaney (NYU CDS 2023 - kmm9492@nyu.edu)
- Rodrigo Kreis de Paula (NYU CDS 2023 - rk4197@nyu.edu)
- Sarvesh Patki (NYU CDS 2023 - ssp6603@nyu.edu)
Summary:
This project aims to use optimization techniques to improve the performance of a logistic regression model. This model predicts whether a customer will default on their credit card payments within 120 days, which is crucial for managing risk in a consumer lending business. By optimizing the code, the project hopes to effectively analyze extensive and complex data to quickly identify future defaults, leading to cost savings for businesses and consumers. The project will utilize techniques such as line profiling, minimizing repetitive for-loops, vectorization through NumPy, Python Jax, and Numba to improve the code's performance. The focus of the project is not on the actual model itself but rather on how well the code can be optimized to process the dataset efficiently.
Keywords:
Parallel Computing - Python - Numba - Cython - NumPy - Jax
Licence:
This repo is licenced under the MIT Licence. See "LICENCE" for more details.