Rick and Morty Data Engineering Project

Overview

This project demonstrates the end-to-end process of extracting data from the Rick and Morty open-source API, transforming it, and loading the transformed data into an Amazon RDS MySQL database. The project leverages AWS services such as Lambda for automation and S3 for storage. The data is then connected to HeidiSQL for easy querying and analysis.

Prerequisites

Before starting, ensure you have the following:

AWS Account: Access to an AWS account with permissions to create and manage S3, RDS, and Lambda.
Python: Python 3.8 or higher installed.
AWS CLI: Installed and configured with your credentials.
HeidiSQL: Installed on your local machine.

Environment Setup

Clone the Repository:

git clone https://github.com/yourusername/data-engineering.git
cd data-engineering

Create a Virtual Environment:

python3 -m venv Mustard
source Mustard/bin/activate

Install Dependencies:
```
pip install -r requirements.txt
```
AWS Configuration: Run aws configure to set up your AWS credentials.
```
aws configure
```
Provide your AWS Access Key, Secret Key, Region, and Output format.

Data Extraction

Script: `extraction.py`

The data is extracted from the Rick and Morty API's three main endpoints: Characters, Locations, and Episodes. The data is saved as CSV files in an S3 bucket.

Steps:

Extract Data:
- The script iterates over the API pages to collect data from each endpoint.
- Data is saved into three separate DataFrames: characters.csv, locations.csv, and episodes.csv.
Save Data to S3:
- The data is saved to the Rick&Morty/Untransformed/ folder in your S3 bucket.

Data Transformation

Script: `transformation.py`

The transformation involves cleaning and restructuring the data to prepare it for loading into the RDS database.

Steps:

Load Data from S3:
- The script reads the raw CSV files from the S3 bucket.
Transform Data:
- Extracts necessary fields from complex data types (e.g., nested JSON structures).
- Creates additional DataFrames such as appearances.csv to normalize the data.
Save Transformed Data:
- The transformed data is saved back to the S3 bucket in the Rick&Morty/Transformed/ folder.

Data Loading

Script: `loading.py`

This script loads the transformed data into the RDS MySQL database.

Steps:

Connect to RDS:
- Uses the mysql-connector-python library to connect to the RDS instance.
Create Tables:
- Executes SQL scripts to create tables (Character_Table, Episode_Table, Appearance_Table, Location_Table) in the database.
Insert Data:
- Iterates over the DataFrames and inserts the data into the corresponding tables.

Connecting to RDS using HeidiSQL

Open HeidiSQL.
Create a New Session:
- Select MySQL as the connection type.
- Fill in the connection details:
  - Hostname/IP: Your RDS endpoint.
  - User: admin (or your RDS username).
  - Password: Your RDS password.
  - Database: rick-and-morty-db.
Connect:
- Once connected, you can view the tables and execute queries against your Rick and Morty data.

Error Handling and Troubleshooting

NoCredentialsError: Ensure your AWS credentials are correctly configured using aws configure.
AccessDeniedError: Attach the necessary IAM policies to your user or role to allow S3 and RDS operations.
KeyError: Ensure that the data fields being referenced exist in your DataFrames after transformation.

Conclusion

This project showcases a complete ETL pipeline using AWS services and Python. The data extracted from the Rick and Morty API is transformed and stored in a MySQL database hosted on AWS RDS. Finally, the data is accessible and queryable through HeidiSQL, making it a robust solution for data analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Notebooks		Notebooks
Source Data		Source Data
lambda codes		lambda codes
.gitignore		.gitignore
README.md		README.md
Rick&Morty_ERD.png		Rick&Morty_ERD.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rick and Morty Data Engineering Project

Overview

Table of Contents

Prerequisites

Environment Setup

Data Extraction

Script: `extraction.py`

Data Transformation

Script: `transformation.py`

Data Loading

Script: `loading.py`

Connecting to RDS using HeidiSQL

Error Handling and Troubleshooting

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Ogutu01/Data-Engineering-AWS

Folders and files

Latest commit

History

Repository files navigation

Rick and Morty Data Engineering Project

Overview

Table of Contents

Prerequisites

Environment Setup

Data Extraction

Script: extraction.py

Data Transformation

Script: transformation.py

Data Loading

Script: loading.py

Connecting to RDS using HeidiSQL

Error Handling and Troubleshooting

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Script: `extraction.py`

Script: `transformation.py`

Script: `loading.py`

Packages