Open-source construction cost benchmark database for Swiss public buildings.
Collects, structures, and presents cost Kennwerte (CHF/m² GF, CHF/m³ GV, BKP/eBKP-H breakdowns)
from realised projects to support early-stage cost estimation and portfolio-level cost analysis.
![]() |
![]() |
- Gallery, List, Map, Dashboard — four views for browsing and comparing construction projects
- Detail view — SIA 416 volumes/areas, eBKP-H and BKP cost breakdowns, peer comparison box plots, image gallery with lightbox
- Scatter plot — GF vs. CHF/m² GF across the full dataset, colored by construction type
- Filters — by source, category, canton, construction type, country, data quality, year range, GF range
- Click-to-filter tags — click any tag on a card or detail view to filter the dataset
- Cost estimator — quick benchmark-based estimates using filtered comparison sets
- Full-text search across projects, locations, and architects
- Runs entirely in the browser — no server, no build step
Data is extracted from publicly available Bautendokumentationen published by Swiss federal, cantonal, and municipal building authorities.
| Source | Organisation | Documents | URL |
|---|---|---|---|
| BBL | Bundesamt für Bauten und Logistik | ~144 | bbl.admin.ch |
| armasuisse | armasuisse Immobilien (VBS) | ~53 | ar.admin.ch |
| Stadt Zürich | Hochbaudepartement | ~36 | stadt-zuerich.ch |
| Stadt Bern | Hochbau Stadt Bern | ~55 | bern.ch |
| Stadt St. Gallen | Hochbauamt | ~78 | stadt.sg.ch |
| Kanton Aargau | Immobilien Aargau | ~7 | ag.ch |
The pipeline converts PDF Bautendokumentationen into structured benchmark data in two stages:
Stage 1: PDF to Markdown — Each PDF is converted to clean Markdown using PyMuPDF4LLM (for text-layer PDFs) or Docling (IBM, for scanned/complex-layout PDFs). This preserves table structure — critical for BKP cost breakdowns — and produces human-readable intermediate files.
Stage 2: Markdown to structured data — Markdown tables are parsed to extract BKP/eBKP-H costs, SIA 416 quantities (GF, GV, NGF, HNF), project metadata (architect, client, timeline), and benchmarks (CHF/m², CHF/m³). Multi-language support for German, French, and Italian documents.
The extracted data is loaded into a SQLite database that runs entirely in the browser via sql.js.
See docs/PIPELINE.md for full technical details.
| Library | Purpose | License |
|---|---|---|
| Vanilla JS (ES6+) | Application logic, no framework | — |
| CSS Custom Properties | Design token system | — |
| sql.js | SQLite compiled to WebAssembly, runs in browser | MIT |
| MapLibre GL JS | Interactive vector map with clustering | BSD-3 |
| Google Material Icons | UI iconography | Apache 2.0 |
| Library | Purpose | License |
|---|---|---|
| PyMuPDF | PDF text and image extraction | AGPL |
| PyMuPDF4LLM | PDF to Markdown conversion | AGPL |
| Docling | Advanced PDF parsing with table extraction (IBM) | MIT |
| Tesseract OCR | OCR for scanned pages (DE/FR/IT) | Apache 2.0 |
| Pillow | Image processing | MIT-like |
| Service | Purpose |
|---|---|
| geo.admin.ch | Geocoding Swiss addresses (free, no key) |
| Geoapify | Geocoding international addresses |
| CARTO | Base map tiles via MapLibre |
index.html Single-page application
css/
tokens.css Design tokens (colors, spacing, typography)
styles.css Component styles
js/
db.js Database queries
utils.js Shared state, formatting, tag helpers
views.js Gallery, list, map, dashboard renderers
detail.js Detail view, cost tables, carousel, estimator
main.js App init, routing, filters
data/
kennwerte.db SQLite database
pdfs/ Source PDFs (gitignored)
markdown/ Converted markdown (gitignored)
scripts/
pdf_to_markdown.py Stage 1: PDF to Markdown
extract_from_markdown.py Stage 2: Markdown to structured data
extract.py Legacy: direct PDF to DB (v1 regex)
extract_all.py Batch wrapper
download_*.sh/py Per-source download scripts
docs/
PIPELINE.md Extraction pipeline architecture
DATAMODEL.md Entity model and field definitions
SOURCES.md Data source inventory
REQUIREMENTS.md Requirements specification
- PIPELINE.md — extraction architecture, tool comparison, processing flow
- DATAMODEL.md — entity model, field definitions, SIA 416 quantities
- SOURCES.md — data source inventory with URLs and status
- REQUIREMENTS.md — functional and non-functional requirements
MIT — Digital Real Estate and Support


