This project takes a dataset of laptop specifications and prices, and transforms it into a star schema using dbt.
The goal is to showcase how to go from raw data → preprocessing → database → dbt models → clean data warehouse design.
It’s a great starting project for anyone learning dbt and data modeling.
-
Raw dataset
laptops.csv(original dataset with messy formats)
-
Preprocessing (Python/Jupyter)
- Converted RAM, weight, memory, etc. into numeric columns
- Extracted CPU/GPU brands, display flags (IPS, Touchscreen, Retina), resolutions, PPI
- Normalized OS types
- Exported as
laptop_cutted.csv
-
Database (Postgres / pgAdmin)
- Loaded preprocessed data into a staging table:
stg_laptops
- Loaded preprocessed data into a staging table:
-
dbt source definition
- Declared
stg_laptopsinsources.yml - Example:
{{ source('base', 'stg_laptops') }}
- Declared
-
Staging model
stg_laptops_clean.sql→ cleans and standardizes raw staging data.
-
Dimension models
dim_company.sqldim_product.sqldim_cpu.sqldim_gpu.sqldim_os.sqldim_display.sqldim_storage.sql
Each dimension creates a surrogate key (SK) using
md5()and stores cleaned attributes. -
Fact table
fact_laptop.sql- Grain: one row per laptop
- Holds foreign keys to each dimension + measures (price, RAM, weight, etc.)
-
Tests
- Defined in
schema.yml - Ensures:
- SKs are unique & not null
- Fact table foreign keys correctly map to dimensions
- ✅ All tests passed
- Defined in
-
ERD
- Star schema diagram created with DBML
- File:
docs/laptops_erd.dbml - Rendered ERD (example below):
-
Fact Table
fact_laptop→ One row per laptop. Links to all dimensions and contains measures (price, RAM, weight, etc.).
-
Dimension Tables
dim_company→ Laptop brand/manufacturer (Apple, Dell, HP, etc.).dim_product→ Product model and category (MacBook Pro, Ultrabook, Notebook, etc.).dim_cpu→ CPU brand and generation/family (Intel i5, i7, Ryzen, etc.).dim_gpu→ GPU brand and type (NVIDIA GeForce, Intel Iris, AMD Radeon, etc.).dim_os→ Operating system (Windows, macOS, Linux, No OS).dim_display→ Screen attributes (size, resolution, IPS, Retina, Touchscreen).dim_storage→ Storage breakdown (HDD, SSD, Hybrid, Flash capacities).
Together they form a star schema for analyzing laptops by company, product, CPU/GPU, OS, storage, and display.
Tests included:
- Unique & not null constraints on all dimension SKs
- Fact → Dim relationships for referential integrity
- Fact grain check:
laptop_id_natunique & not null
All tests passed, ensuring the schema is clean and reliable.
- Clone this repo
git clone https://github.com/sshossen/laptop_dbt.git cd laptop_dbt - Create a virtual environment and install dependencies
python3 -m venv venv source venv/bin/activate # Mac/Linux # or: venv\Scripts\activate # Windows pip install -r requirements.txt
- Configure dbt connection (edit your profiles.yml)
Example for Postgres:
laptop_dbt:
target: dev
outputs:
dev:
type: postgres
host: localhost
user: your_user
password: your_password
port: 5432
dbname: your_database
schema: public4.Run dbt commands
dbt run # build models
dbt test # run tests
dbt docs generate # build docs locally
dbt docs serve # serve docs locally
