Skip to content

A dbt project to build analysis ready star schema Includes Jupyter notebooks for analysis and ERD visualization.

License

Notifications You must be signed in to change notification settings

sshossen/laptop_dbt

Repository files navigation

🖥️ Laptop Data Modeling with dbt

📌 Project Idea

This project takes a dataset of laptop specifications and prices, and transforms it into a star schema using dbt.
The goal is to showcase how to go from raw data → preprocessing → database → dbt models → clean data warehouse design.

It’s a great starting project for anyone learning dbt and data modeling.


🔄 Workflow

  1. Raw dataset

    • laptops.csv (original dataset with messy formats)
  2. Preprocessing (Python/Jupyter)

    • Converted RAM, weight, memory, etc. into numeric columns
    • Extracted CPU/GPU brands, display flags (IPS, Touchscreen, Retina), resolutions, PPI
    • Normalized OS types
    • Exported as laptop_cutted.csv
  3. Database (Postgres / pgAdmin)

    • Loaded preprocessed data into a staging table: stg_laptops
  4. dbt source definition

    • Declared stg_laptops in sources.yml
    • Example:
      {{ source('base', 'stg_laptops') }}
  5. Staging model

    • stg_laptops_clean.sql → cleans and standardizes raw staging data.
  6. Dimension models

    • dim_company.sql
    • dim_product.sql
    • dim_cpu.sql
    • dim_gpu.sql
    • dim_os.sql
    • dim_display.sql
    • dim_storage.sql

    Each dimension creates a surrogate key (SK) using md5() and stores cleaned attributes.

  7. Fact table

    • fact_laptop.sql
    • Grain: one row per laptop
    • Holds foreign keys to each dimension + measures (price, RAM, weight, etc.)
  8. Tests

    • Defined in schema.yml
    • Ensures:
      • SKs are unique & not null
      • Fact table foreign keys correctly map to dimensions
    • ✅ All tests passed
  9. ERD

    • Star schema diagram created with DBML
    • File: docs/laptops_erd.dbml
    • Rendered ERD (example below):

    Star Schema ERD


📊 Schema Overview

  • Fact Table

    • fact_laptop → One row per laptop. Links to all dimensions and contains measures (price, RAM, weight, etc.).
  • Dimension Tables

    • dim_company → Laptop brand/manufacturer (Apple, Dell, HP, etc.).
    • dim_product → Product model and category (MacBook Pro, Ultrabook, Notebook, etc.).
    • dim_cpu → CPU brand and generation/family (Intel i5, i7, Ryzen, etc.).
    • dim_gpu → GPU brand and type (NVIDIA GeForce, Intel Iris, AMD Radeon, etc.).
    • dim_os → Operating system (Windows, macOS, Linux, No OS).
    • dim_display → Screen attributes (size, resolution, IPS, Retina, Touchscreen).
    • dim_storage → Storage breakdown (HDD, SSD, Hybrid, Flash capacities).

Together they form a star schema for analyzing laptops by company, product, CPU/GPU, OS, storage, and display.


🧪 Data Quality

Tests included:

  • Unique & not null constraints on all dimension SKs
  • Fact → Dim relationships for referential integrity
  • Fact grain check: laptop_id_nat unique & not null

All tests passed, ensuring the schema is clean and reliable.



🚀 How to Run

  1. Clone this repo
    git clone https://github.com/sshossen/laptop_dbt.git
    cd laptop_dbt
  2. Create a virtual environment and install dependencies
    python3 -m venv venv
    source venv/bin/activate    # Mac/Linux
    # or: venv\Scripts\activate  # Windows
    pip install -r requirements.txt
  3. Configure dbt connection (edit your profiles.yml)
    Example for Postgres:
laptop_dbt:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      user: your_user
      password: your_password
      port: 5432
      dbname: your_database
      schema: public

4.Run dbt commands

dbt run             # build models
dbt test            # run tests
dbt docs generate   # build docs locally
dbt docs serve      # serve docs locally

About

A dbt project to build analysis ready star schema Includes Jupyter notebooks for analysis and ERD visualization.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages