Singapore HDB Resale Price Prediction Workshop

Overview

A hands-on data science workshop that demonstrates building an end-to-end machine learning pipeline using Kedro. This project predicts Singapore HDB (Housing Development Board) resale prices based on proximity to MRT stations and shopping malls.

What This Covers

Data Engineering: Extract, clean, and transform housing, transport, and geolocation data
Feature Engineering: Calculate distances to amenities using geographical coordinates
Machine Learning: Train a linear regression model to predict property prices
Data Visualization: Create interactive maps showing Singapore's housing and transport infrastructure

Quick Setup

1. Install Dependencies

Option A: Using uv (Recommended)

uv sync

Option B: Using pip

pip install -r requirements.txt

2. Run the Pipeline

kedro run

This will:

Extract and clean HDB resale data, MRT stations, and mall locations
Generate geographical features (distances to nearest amenities)
Train a linear regression model
Create visualizations and performance reports

3. Explore Results

Pipeline Visualization: View the interactive pipeline graph:

kedro viz

Open your browser to see the data flow, pipeline dependencies, and execution status.

Interactive Map: Open the Jupyter notebook to view Singapore's housing locations:

kedro jupyter notebook

Navigate to notebooks/map_view.ipynb to see HDB locations (red), MRT stations (blue), and malls (green) on an interactive map.

Model Outputs: Check the data/08_reporting/ folder for:

Model performance metrics
Accessibility heatmap visualization

Project Structure

Extract Pipeline: Fetches HDB resale prices, MRT station data, and mall geodata
Clean Pipeline: Validates and standardizes the datasets
Transform Pipeline: Calculates geographical features and distances
Model Pipeline: Trains and evaluates the price prediction model

Running Individual Pipelines

kedro run --pipeline extract    # Data extraction only
kedro run --pipeline clean      # Data cleaning only
kedro run --pipeline transform  # Feature engineering only
kedro run --pipeline model      # Model training only

Best Practices Demonstrated

This workshop showcases several data science and engineering best practices:

Modular Pipeline Design: Code is organized into reusable, testable pipeline components (extract, clean, transform, model)
Data Catalog: Centralized data management with automatic loading/saving and format handling
Data Versioning: Automatic versioning of model outputs and datasets for reproducibility
Configuration Management: Parameters separated from code using YAML configuration files
Environment Isolation: Dependencies managed with uv.lock for reproducible environments
Testing: Unit tests for pipeline components to ensure code quality
Documentation: Clear separation between raw, cleaned, and processed data layers
Visualization: Interactive pipeline exploration with Kedro Viz

Testing

pytest

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
conf		conf
data		data
notebooks		notebooks
src/kedro_workshop		src/kedro_workshop
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Singapore HDB Resale Price Prediction Workshop

Overview

What This Covers

Quick Setup

1. Install Dependencies

2. Run the Pipeline

3. Explore Results

Project Structure

Running Individual Pipelines

Best Practices Demonstrated

Testing

About

Uh oh!

Releases

Packages

Languages

ymekesser/kedro_workshop

Folders and files

Latest commit

History

Repository files navigation

Singapore HDB Resale Price Prediction Workshop

Overview

What This Covers

Quick Setup

1. Install Dependencies

2. Run the Pipeline

3. Explore Results

Project Structure

Running Individual Pipelines

Best Practices Demonstrated

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages