GitHub - Suneetha-2022/Capstone_Project: This Capstone Project implement the following technologies to manage an ETL process for a Loan Application dataset and a Credit Card dataset: Python (Pandas, advanced modules e.g., Matplotlib), MariaDB, Apache Spark (Spark Core, Spark SQL), and Python Visualization and Analytics libraries.

Title: Credit Card and Loan Application Data Processing

Project Definition:

This project demonstrates ETL process for a Loan Application dataset and a Credit Card dataset.

Environment:

1.Python 3.10.6, Python modules like Pandas and advanced modules like Matplotlib, MariaDB, Heidi SQL, Apache Spark (Spark Core, Spark SQL), VS Code, git and git bash for source control.

2.Setting up Virtual environment in Python, install all the required packages and add all the dataset files, virtual environment files and files with usernames and passwords to gitignore.

Modules: The whole project is divided into four modules.

1.ETL: Extracting, transforming according to mapping document and Loading Credit Card dataset.

2.API: Getting data as response from given API link, transforming according to mapping document and loading into MariaDB database.

3.Front End: After the data is loaded into Maria DB database, need to develop front end console to display data as per requirements.

4.Visualizations: Visualizations has two sub modules.

4.1. Customer Visualizations: Plot customer data as per requirements using the loaded data from MariaDB database.

4.2. Loan application Visualizations: Plot loan applications data as per requirements using the loaded data from Maria DB database.

ETL Module:

Credit Card data is JSON data from Google drive, which is extracted, cleansed and transformed using Python and PySpark data frames and loaded to MariaDB database.

Capstone_project database is created in Maria DB using HeidiSQL and loaded the transformed JSON data into created Capstone_project database using spark data frames.

API:

Loan Application data from REST API is extracted in python as list using response.json() method.

Resultant list is converted to spark data frames and loaded to Capstone_project database.

Front End:

Using PyInputplus package and regex package in python front end menu program is developed with input validations, where upon selection of an item in menu, user can view the data as per requirements.

Visualizations:

Data is extracted from database using Spark and SQL as spark data frames and then converted spark data frames to pandas data frames to plot the chart using matplotlib module in Python.

Few of the visualizations are shown below.

Challenges:

1.Installing Apache Spark, Python Pyspark and setting up environment variables.

2.Using Spark dataframe methods during transformations in ETL process.

3.Develop front end menu according to requirements for both data display and Visualizations.

How to run:

1.Run each python file in ETL folder and API folder to transform and load the data to database.

2.Run each file with function definitions for front end and Visualizations.

3.Run Front end Menu and Visualization Menu for required output.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
API		API
Cutomer_datavis		Cutomer_datavis
ETL		ETL
Front End		Front End
Loan_Visualization		Loan_Visualization
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
ve_instructions.txt		ve_instructions.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Title: Credit Card and Loan Application Data Processing

Project Definition:

Environment:

Modules: The whole project is divided into four modules.

ETL Module:

API:

Front End:

Visualizations:

Challenges:

How to run:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Suneetha-2022/Capstone_Project

Folders and files

Latest commit

History

Repository files navigation

Title: Credit Card and Loan Application Data Processing

Project Definition:

Environment:

Modules: The whole project is divided into four modules.

ETL Module:

API:

Front End:

Visualizations:

Challenges:

How to run:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages