Welcome to the Netflix dbt Demo - your gateway to mastering modern data engineering with Snowflake, AWS S3, and dbt! This project demonstrates how to build a robust, scalable data pipeline using industry-standard tools. π―
This project is your complete guide to:
- ποΈ Data Staging: Load data from AWS S3 into Snowflake
- ποΈ Schema Design: Create tables, views, and define proper schemas in Snowflake
- π§ dbt Mastery: Explore all popular dbt features including:
- π Models (staging, dimension, fact, and mart layers)
- π± Seeds (static reference data)
- πΈ Snapshots (slowly changing dimensions)
- β‘ Macros (reusable SQL logic)
- π§ͺ Tests (data quality validation)
- π Documentation (auto-generated docs)
- π¨ Lineage (visual data flow)
We're using the famous MovieLens 20M Dataset which includes:
- π 27,000 movies with rich metadata
- π₯ 138,000 users and their preferences
- β 20 million ratings (the heart of our analysis!)
- π·οΈ 465,000 tag applications for content categorization
- 𧬠Tag genome data with 12 million relevance scores across 1,100 tags
Perfect for learning data modeling and analytics! πͺ
π netflix_dbt/
βββ ποΈ models/
β βββ π staging/ # Raw data transformation
β βββ π― dim/ # Dimension tables
β βββ π fact/ # Fact tables
β βββ πͺ mart/ # Business-ready datasets
βββ π± seeds/ # Reference data
βββ πΈ snapshots/ # SCD tracking
βββ β‘ macros/ # Reusable logic
βββ π§ͺ tests/ # Data quality tests
βββ π docs/ # Auto-generated documentation
Interactive documentation with column descriptions, tests, and lineage graphs
Your data warehouse in action - tables, views, and query results
- π macOS (this guide is Mac-focused)
- π» VS Code or Cursor (we recommend Cursor for the best dbt experience!)
- βοΈ AWS Account with S3 access
- βοΈ Snowflake Account (free trial available)
# Set your AWS credentials
export AWS_ACCESS_KEY_ID="your_access_key_here"
export AWS_SECRET_ACCESS_KEY="your_secret_key_here"
export AWS_DEFAULT_REGION="us-east-1" # or your preferred region# Set your Snowflake credentials
export SNOWFLAKE_USER="your_username"
export SNOWFLAKE_PASSWORD="your_password"
export SNOWFLAKE_ACCOUNT="your_account_identifier"
export SNOWFLAKE_WAREHOUSE="COMPUTE_WH"
export SNOWFLAKE_DATABASE="MOVIELENS"
export SNOWFLAKE_SCHEMA="DEV"# Install dbt
pip install dbt-snowflake
# Install the dbt Power User extension
# Search for "dbt Power User" in VS Code extensions# Install dbt
pip install dbt-snowflake
# Cursor has excellent dbt support built-in!
# Just open your dbt project and start coding# Clone and navigate to the project
git clone <your-repo-url>
cd netflix_dbt_demo/netflix_dbt
# Install dbt packages
dbt deps
# Test your connection
dbt debug
# Run the full pipeline
dbt run
# Run tests
dbt test
# Generate documentation
dbt docs generate
dbt docs serve- Staging Layer: Clean and standardize raw data
- Dimension Tables:
dim_movies,dim_users,dim_genome_tags - Fact Tables:
fact_ratings,fact_genome_scores - Mart Layer:
mart_movie_releasesfor business analytics
- Reference Data: Movie release dates and other static data
- Perfect for lookup tables and configuration data
- SCD Type 2: Track changes to tags over time
- Maintain historical data integrity
- Reusable Logic: Custom SQL functions
- DRY Principle: Don't repeat yourself!
- Data Quality: Ensure data integrity
- Business Rules: Validate rating ranges, uniqueness
- Custom Tests: Tailored to your specific needs
- Auto-Generated: From your YAML files
- Interactive: Click through lineage graphs
- Rich Descriptions: Column-level documentation
- π¬ Movie Analytics: Find the most popular movies by genre
- β Rating Patterns: Analyze user rating behaviors
- π·οΈ Tag Evolution: Track how movie tags change over time
- π Data Quality: Ensure your data is clean and reliable
- π Incremental Processing: Handle large datasets efficiently
# Run specific models
dbt run --select staging
dbt run --select dim_movies
# Run tests
dbt test
dbt test --select fact_ratings
# Generate and serve docs
dbt docs generate
dbt docs serve
# Create snapshots
dbt snapshot
# Seed data
dbt seedThis project is inspired by the excellent work of Darshil Parmar and his comprehensive YouTube course:
π₯ Complete dbt Course on YouTube
A huge shoutout to Darshil for creating such an amazing learning resource! π
Found a bug? Have an idea for improvement? We'd love your contribution!
- π΄ Fork the repository
- πΏ Create a feature branch
- π» Make your changes
- π§ͺ Test thoroughly
- π Submit a pull request
This project is open source and available under the MIT License.
- π¬ MovieLens Research Group for the amazing dataset
- π Darshil Parmar for the incredible dbt course
- βοΈ Snowflake for the powerful data warehouse
- π§ dbt Labs for the amazing transformation tool
- βοΈ AWS for the cloud infrastructure
Happy Data Engineering! πβ¨
Remember: The best way to learn data engineering is by building real projects. This demo gives you hands-on experience with industry-standard tools and best practices. Start building, keep learning, and most importantly - have fun! πͺ
