Skip to content

felipeyoshi/modern-data-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Modern Data Stack with AdventureWorks

Project Overview

This project is part of a data engineering course by FIA Business School designed to provide hands-on experience with the modern data stack. Students will learn how to ingest, transform, model, and visualize data using a variety of tools and platforms.

Objectives

  • Ingest data from CSV files into a PostgreSQL database.
  • Transfer data from PostgreSQL to Snowflake using Airbyte.
  • Model data in Snowflake using a three-layer architecture (staging, dimension/fact, marts).
  • Create visualizations using Metabase, Plotly, or Streamlit.

Data

We are using the AdventureWorks dataset for this project. The dataset includes various aspects of a fictional company, such as customer, sales, and product data.

Files

The dataset consists of the following files:

  • AdventureWorks Calendar Lookup.csv
  • AdventureWorks Customer Lookup.csv
  • AdventureWorks Product Categories Lookup.csv
  • AdventureWorks Product Lookup.csv
  • AdventureWorks Product Subcategories Lookup.csv
  • AdventureWorks Returns Data.csv
  • AdventureWorks Sales Data 2020.csv
  • AdventureWorks Territory Lookup.csv

Environment Setup

Requirements

  • PostgreSQL
  • Snowflake
  • Airbyte
  • Metabase/Plotly/Streamlit

Installation

Detailed steps to install and configure the necessary software and tools will be provided in separate documents or during classroom sessions.

Database Schema

SQL scripts for creating tables and ingesting data into PostgreSQL are provided. Students are expected to create similar schemas in Snowflake as part of the data modeling exercise.

Data Transformation and Loading

We will use Airbyte to transfer data from PostgreSQL to Snowflake, followed by transforming the data according to the three-layer architecture approach.

Data Visualization

Students will use Metabase, Plotly, or Streamlit to create dashboards or visual reports that provide insights into the data. Specific requirements for visualizations will be provided in project guidelines.

Contribution

Students are encouraged to contribute to the project by suggesting improvements or identifying bugs. Contributions should be submitted as pull requests to the repository.

Contact Information

For more information or questions about the project, please contact Felipe Yoshimoto at https://www.linkedin.com/in/felipe-yoshimoto-252a04204/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published