Skip to content

CoreSheep/netflix-dbt-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 Netflix dbt Demo - A Complete Data Engineering Journey! πŸš€

dbt Models Overview

Welcome to the Netflix dbt Demo - your gateway to mastering modern data engineering with Snowflake, AWS S3, and dbt! This project demonstrates how to build a robust, scalable data pipeline using industry-standard tools. 🎯

🌟 What You'll Learn

This project is your complete guide to:

  • πŸ—οΈ Data Staging: Load data from AWS S3 into Snowflake
  • πŸ—„οΈ Schema Design: Create tables, views, and define proper schemas in Snowflake
  • πŸ”§ dbt Mastery: Explore all popular dbt features including:
    • πŸ“Š Models (staging, dimension, fact, and mart layers)
    • 🌱 Seeds (static reference data)
    • πŸ“Έ Snapshots (slowly changing dimensions)
    • ⚑ Macros (reusable SQL logic)
    • πŸ§ͺ Tests (data quality validation)
    • πŸ“š Documentation (auto-generated docs)
    • 🎨 Lineage (visual data flow)

πŸ“Š Dataset: MovieLens 20M

We're using the famous MovieLens 20M Dataset which includes:

  • 🎭 27,000 movies with rich metadata
  • πŸ‘₯ 138,000 users and their preferences
  • ⭐ 20 million ratings (the heart of our analysis!)
  • 🏷️ 465,000 tag applications for content categorization
  • 🧬 Tag genome data with 12 million relevance scores across 1,100 tags

Perfect for learning data modeling and analytics! πŸŽͺ

πŸ—οΈ Project Architecture

πŸ“ netflix_dbt/
β”œβ”€β”€ πŸ—‚οΈ models/
β”‚   β”œβ”€β”€ πŸ“Š staging/     # Raw data transformation
β”‚   β”œβ”€β”€ 🎯 dim/         # Dimension tables
β”‚   β”œβ”€β”€ πŸ“ˆ fact/        # Fact tables  
β”‚   └── πŸŽͺ mart/        # Business-ready datasets
β”œβ”€β”€ 🌱 seeds/           # Reference data
β”œβ”€β”€ πŸ“Έ snapshots/       # SCD tracking
β”œβ”€β”€ ⚑ macros/          # Reusable logic
β”œβ”€β”€ πŸ§ͺ tests/           # Data quality tests
└── πŸ“š docs/            # Auto-generated documentation

πŸ–ΌοΈ Screenshots Gallery

🎯 dbt Models & Lineage & πŸ“š Auto-Generated Documentation

dbt Docs Server Interactive documentation with column descriptions, tests, and lineage graphs

❄️ Snowflake Data Warehouse

Snowflake Interface Your data warehouse in action - tables, views, and query results

πŸš€ Quick Start Guide

Prerequisites

  • 🍎 macOS (this guide is Mac-focused)
  • πŸ’» VS Code or Cursor (we recommend Cursor for the best dbt experience!)
  • ☁️ AWS Account with S3 access
  • ❄️ Snowflake Account (free trial available)

1. πŸ”‘ Environment Setup

AWS Configuration

# Set your AWS credentials
export AWS_ACCESS_KEY_ID="your_access_key_here"
export AWS_SECRET_ACCESS_KEY="your_secret_key_here"
export AWS_DEFAULT_REGION="us-east-1"  # or your preferred region

Snowflake Configuration

# Set your Snowflake credentials
export SNOWFLAKE_USER="your_username"
export SNOWFLAKE_PASSWORD="your_password"
export SNOWFLAKE_ACCOUNT="your_account_identifier"
export SNOWFLAKE_WAREHOUSE="COMPUTE_WH"
export SNOWFLAKE_DATABASE="MOVIELENS"
export SNOWFLAKE_SCHEMA="DEV"

2. πŸ› οΈ dbt Installation

For VS Code Users:

# Install dbt
pip install dbt-snowflake

# Install the dbt Power User extension
# Search for "dbt Power User" in VS Code extensions

For Cursor Users (Recommended! 🎯):

# Install dbt
pip install dbt-snowflake

# Cursor has excellent dbt support built-in!
# Just open your dbt project and start coding

3. πŸ“¦ Project Setup

# Clone and navigate to the project
git clone <your-repo-url>
cd netflix_dbt_demo/netflix_dbt

# Install dbt packages
dbt deps

# Test your connection
dbt debug

# Run the full pipeline
dbt run

# Run tests
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

🎯 Key dbt Features Demonstrated

πŸ“Š Models (The Heart of dbt)

  • Staging Layer: Clean and standardize raw data
  • Dimension Tables: dim_movies, dim_users, dim_genome_tags
  • Fact Tables: fact_ratings, fact_genome_scores
  • Mart Layer: mart_movie_releases for business analytics

🌱 Seeds

  • Reference Data: Movie release dates and other static data
  • Perfect for lookup tables and configuration data

πŸ“Έ Snapshots

  • SCD Type 2: Track changes to tags over time
  • Maintain historical data integrity

⚑ Macros

  • Reusable Logic: Custom SQL functions
  • DRY Principle: Don't repeat yourself!

πŸ§ͺ Tests

  • Data Quality: Ensure data integrity
  • Business Rules: Validate rating ranges, uniqueness
  • Custom Tests: Tailored to your specific needs

πŸ“š Documentation

  • Auto-Generated: From your YAML files
  • Interactive: Click through lineage graphs
  • Rich Descriptions: Column-level documentation

πŸŽͺ Fun Features to Explore

  1. 🎬 Movie Analytics: Find the most popular movies by genre
  2. ⭐ Rating Patterns: Analyze user rating behaviors
  3. 🏷️ Tag Evolution: Track how movie tags change over time
  4. πŸ“Š Data Quality: Ensure your data is clean and reliable
  5. πŸ”„ Incremental Processing: Handle large datasets efficiently

πŸ› οΈ Common Commands

# Run specific models
dbt run --select staging
dbt run --select dim_movies

# Run tests
dbt test
dbt test --select fact_ratings

# Generate and serve docs
dbt docs generate
dbt docs serve

# Create snapshots
dbt snapshot

# Seed data
dbt seed

πŸŽ“ Learning Resources

This project is inspired by the excellent work of Darshil Parmar and his comprehensive YouTube course:

πŸŽ₯ Complete dbt Course on YouTube

A huge shoutout to Darshil for creating such an amazing learning resource! πŸ™

🀝 Contributing

Found a bug? Have an idea for improvement? We'd love your contribution!

  1. 🍴 Fork the repository
  2. 🌿 Create a feature branch
  3. πŸ’» Make your changes
  4. πŸ§ͺ Test thoroughly
  5. πŸ“ Submit a pull request

πŸ“„ License

This project is open source and available under the MIT License.

πŸŽ‰ Acknowledgments

  • 🎬 MovieLens Research Group for the amazing dataset
  • πŸŽ“ Darshil Parmar for the incredible dbt course
  • ❄️ Snowflake for the powerful data warehouse
  • πŸ”§ dbt Labs for the amazing transformation tool
  • ☁️ AWS for the cloud infrastructure

Happy Data Engineering! πŸš€βœ¨

Remember: The best way to learn data engineering is by building real projects. This demo gives you hands-on experience with industry-standard tools and best practices. Start building, keep learning, and most importantly - have fun! πŸŽͺ

About

Netflix dbt Demo - A Complete Data Engineering Journey

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors