Skip to content

bbl-dres/kennwerte-db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kennwerte-db — PDFs to structured construction cost benchmarks

kennwerte-db

License: MIT SQLite JavaScript Python MapLibre GL JS GitHub Pages

Open-source construction cost benchmark database for Swiss public buildings.
Collects, structures, and presents cost Kennwerte (CHF/m² GF, CHF/m³ GV, BKP/eBKP-H breakdowns)
from realised projects to support early-stage cost estimation and portfolio-level cost analysis.

Live Demo

Gallery view Detail view

Features

  • Gallery, List, Map, Dashboard — four views for browsing and comparing construction projects
  • Detail view — SIA 416 volumes/areas, eBKP-H and BKP cost breakdowns, peer comparison box plots, image gallery with lightbox
  • Scatter plot — GF vs. CHF/m² GF across the full dataset, colored by construction type
  • Filters — by source, category, canton, construction type, country, data quality, year range, GF range
  • Click-to-filter tags — click any tag on a card or detail view to filter the dataset
  • Cost estimator — quick benchmark-based estimates using filtered comparison sets
  • Full-text search across projects, locations, and architects
  • Runs entirely in the browser — no server, no build step

Data Sources

Data is extracted from publicly available Bautendokumentationen published by Swiss federal, cantonal, and municipal building authorities.

Source Organisation Documents URL
BBL Bundesamt für Bauten und Logistik ~144 bbl.admin.ch
armasuisse armasuisse Immobilien (VBS) ~53 ar.admin.ch
Stadt Zürich Hochbaudepartement ~36 stadt-zuerich.ch
Stadt Bern Hochbau Stadt Bern ~55 bern.ch
Stadt St. Gallen Hochbauamt ~78 stadt.sg.ch
Kanton Aargau Immobilien Aargau ~7 ag.ch

Extraction Pipeline

The pipeline converts PDF Bautendokumentationen into structured benchmark data in two stages:

Stage 1: PDF to Markdown — Each PDF is converted to clean Markdown using PyMuPDF4LLM (for text-layer PDFs) or Docling (IBM, for scanned/complex-layout PDFs). This preserves table structure — critical for BKP cost breakdowns — and produces human-readable intermediate files.

Stage 2: Markdown to structured data — Markdown tables are parsed to extract BKP/eBKP-H costs, SIA 416 quantities (GF, GV, NGF, HNF), project metadata (architect, client, timeline), and benchmarks (CHF/m², CHF/m³). Multi-language support for German, French, and Italian documents.

The extracted data is loaded into a SQLite database that runs entirely in the browser via sql.js.

See docs/PIPELINE.md for full technical details.

Tech Stack

Frontend

Library Purpose License
Vanilla JS (ES6+) Application logic, no framework
CSS Custom Properties Design token system
sql.js SQLite compiled to WebAssembly, runs in browser MIT
MapLibre GL JS Interactive vector map with clustering BSD-3
Google Material Icons UI iconography Apache 2.0

Extraction Pipeline

Library Purpose License
PyMuPDF PDF text and image extraction AGPL
PyMuPDF4LLM PDF to Markdown conversion AGPL
Docling Advanced PDF parsing with table extraction (IBM) MIT
Tesseract OCR OCR for scanned pages (DE/FR/IT) Apache 2.0
Pillow Image processing MIT-like

APIs

Service Purpose
geo.admin.ch Geocoding Swiss addresses (free, no key)
Geoapify Geocoding international addresses
CARTO Base map tiles via MapLibre

Project Structure

index.html                  Single-page application
css/
  tokens.css                Design tokens (colors, spacing, typography)
  styles.css                Component styles
js/
  db.js                     Database queries
  utils.js                  Shared state, formatting, tag helpers
  views.js                  Gallery, list, map, dashboard renderers
  detail.js                 Detail view, cost tables, carousel, estimator
  main.js                   App init, routing, filters
data/
  kennwerte.db              SQLite database
  pdfs/                     Source PDFs (gitignored)
  markdown/                 Converted markdown (gitignored)
scripts/
  pdf_to_markdown.py        Stage 1: PDF to Markdown
  extract_from_markdown.py  Stage 2: Markdown to structured data
  extract.py                Legacy: direct PDF to DB (v1 regex)
  extract_all.py            Batch wrapper
  download_*.sh/py          Per-source download scripts
docs/
  PIPELINE.md               Extraction pipeline architecture
  DATAMODEL.md              Entity model and field definitions
  SOURCES.md                Data source inventory
  REQUIREMENTS.md           Requirements specification

Documentation

  • PIPELINE.md — extraction architecture, tool comparison, processing flow
  • DATAMODEL.md — entity model, field definitions, SIA 416 quantities
  • SOURCES.md — data source inventory with URLs and status
  • REQUIREMENTS.md — functional and non-functional requirements

License

MIT — Digital Real Estate and Support