LienBosmans · LienBosmans · Sep 28, 2025 · Sep 27, 2025 · Sep 27, 2025 · Sep 27, 2025
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -33,49 +33,51 @@ Do you have an idea to extend PyStack't with extra functionality? Awesome! Pleas
 -   We try to limit the number of external dependencies.
     -   For performant data transformations, please use [DuckDB](https://duckdb.org/docs/stable/) (SQL) or [Polars](https://docs.pola.rs/) (DataFrame).
     -   For (interactive) visualizations, please use [Matplotlib](https://matplotlib.org/) or [Dash](https://dash.plotly.com/).
--   Every PyStack't function reads from or writes to a DuckDB database file that uses the [Stack't relational schema](#stackt-relational-schema).
+-   Every PyStack't function reads from or writes to a DuckDB database file that uses the [Stack't relational schema](/docs/content/explained/pystackt_design.md).
     -   New functionality must also be compatible with DuckDB files with the Stack't relational schema.
     -   If the Stack't relational schema does not fit your use-case, and you want to propose an improvement, please reach out directly to [Lien Bosmans](mailto:lienbosmans@live.com).
 
 
 ### Data extractors
 
-Example: GitHub extractor [ [code](/src/pystackt/extractors/github/) | [docs](/docs/extract/get_github_log.md) ]
+Example: GitHub extractor [ [code](/src/pystackt/extractors/github/) | [docs](/docs/content/reference/extract/get_github_log.md) ]
 
 What is expected:
 1.  Choose a publicly available data source that contains real-life event data.
 1.  Figure out how the source data is structured, how the API works, ...
-1.  Map the data to the [Stack't relational schema](#stackt-relational-schema).
+1.  Map the data to the [Stack't relational schema](/docs/content/explained/pystackt_design.md).
 1.  Clean up your code. Save it in a new subfolder of [/src/pystackt/extractors/](/src/pystackt/extractors/).
     -   Re-use existing functionality when possible.
     -   Write modular functions.
     -   Include error handling. 
     -   Use doc strings and in-line comments.
 1.  Test your code.
-1.  Write end-user documentation. Add it as a markdown file in the folder [/docs/extract/](/docs/extract/). The documentation should include
-    -   code snippet with example
-    -   table that explains all parameters of the function
-    -   explanation on how to generate credentials to connect to the data source (if relevant)
-    -   description of which data is extracted
-    -   (link to) explanation of how the extracted data is allowed to be used
+1.  Write reference documentation. Add it as markdown files in the folder [/docs/content/reference/extract/](/docs/content/reference/extract/).  
+    -   The function documentation should include
+        -   code snippet with example
+        -   table that explains all parameters of the function
+        -   explanation on how to generate credentials to connect to the data source (if relevant)
+        -   (link to) explanation of how the extracted data is allowed to be used
+    -   The output data documentation should include
+        -   description of which data is extracted
 
 
 ### Data exporters
 
-Example: OCEL 2.0 [ [code](/src/pystackt/exporters/ocel2/) | [docs](/docs/export/export_to_ocel2.md) ]
+Example: OCEL 2.0 [ [code](/src/pystackt/exporters/ocel2/) | [docs](/docs/content/reference/export/export_to_ocel2.md) ]
 
 Please note that the exported data format should be **object-centric** and **supported by at least one tool** (software, application, Python package, script, ...) that is open-source (*preferred*) or offers a free license for developpers / students / personal use.
 
 What is expected:
 1.  Choose an object-centric event data format.
-1.  Map the [Stack't relational schema](#stackt-relational-schema) to your chosen data format.
+1.  Map the [Stack't relational schema](/docs/content/explained/pystackt_design.md) to your chosen data format.
 1.  Clean up your code. Save it in a new subfolder of [/src/pystackt/exporters/](/src/pystackt/exporters/).
     -   Re-use existing functionality when possible.
     -   Write modular functions.
     -   Include error handling. 
     -   Use doc strings and in-line comments.
 1.  Test your code.
-1.  Write end-user documentation. Add it as a markdown file in the folder [/docs/export/](/docs/export/). The documentation should include
+1.  Write end-user documentation. Add it as a markdown file in the folder [/docs/content/reference/export/](/docs/content/reference/export/). The documentation should include
     -   code snippet with example
     -   table that explains all parameters of the function
     -   overview of any information loss that happens when exporting to this format
@@ -86,7 +88,7 @@ What is expected:
 Data preparation is definitely more than simply extracting and exporting data, so we also welcome additional functionality that support activities like data exploration, data cleaning, data filtering, ...
 
 The previously discussed items still apply:
-1.  Start from the [Stack't relational schema](#stackt-relational-schema) in a DuckDB file.
+1.  Start from the [Stack't relational schema](/docs/content/explained/pystackt_design.md) in a DuckDB file.
     -   If the Stack't relational schema does not work with the application you have in mind, include a function to prepare the data first. ([example](/src/pystackt/exploration/graph/data_prep/))
 1.  Clean up your code. Document your code. Test your code.
 1.  Write end-user documentation.
@@ -106,89 +108,3 @@ Simply create a pull request (PR)! Some good practices to consider:
         -   documentation of one function + code improvements of another function
 -   Write meaningful commit messages. 
 -   Don't combine independent changes in the same commit.
-
-
-## Stack't relational schema
-
-The Stack't relational schema describes how to store object-centric event data in a relational database using a fixed set of tables and table columns. This absence of any schema changes makes the format well-suited to act as a central data hub, enabling the modular design of PyStack't.
-
-An overview of the tables and columns is included in this document. For more information on the design choices and the proof-of-concept implementation [Stack't](https://github.com/LienBosmans/stack-t), we recommend reading the paper [Dynamic and Scalable Data Preparation for Object-Centric Process Mining](https://arxiv.org/abs/2410.00596).
-
-![PyStack't has a modular design.](/docs/pystackt_architecture.png)
-
-**Event-related tables**. To maintain flexibility and support dynamic changes, event types and their attribute definitions are stored in rows rather than being defined by table and column names. This approach enables the use of the exact same tables across all processes, reducing the impact of schema modifications. Changing an event type involves updating foreign keys rather than moving data to different tables, and attributes can be added or removed without altering the schema.
-- Table `event_types` contains an entry for each unique event type. \
-    Columns: 
-    -   `id` is the primary key.
-    -   `description` should be human-readable.
--   Table `event_attributes` stores entries for each unique event attribute. \
-    Columns:
-    -   `id` is the primary key.
-    -   `event_type_id` is a foreign key referencing table `event_types`.
-    -   `description` should be human-readable.
-    -   `datatype` of the attribute (integer, varchar, timestamp, ...) of the attribute.
--   Table `events` records details for each event. \
-    Columns:
-    -   `id` is the primary key.
-    -   `event_type_id` is a foreign key referencing table `event_types`.
-    -   `timestamp`, preferably using UTC time zone.
-    -   `description` should be human-readable.
--   Table `event_attribute_values` stores all attribute values for different events. This setup decouples events and their attributes by storing each attribute value in a new row, facilitating support for late-arriving data points. \
-    Columns:
-    -   `id` is the primary key.
-    -   `event_id` is a foreign key referencing table `events`.
-    -   `event_attribute_id` is a foreign key referencing table `event_attributes`.
-    -   `attribute_value` is the value of the attribute. This value should match the datatype of the attribute.
-
-**Object-related tables** also leverage row-based storage to manage attributes independently. This approach reduces the number of duplicate or NULL values significantly when attributes are updated asynchronously and frequently.
--   Table `object_types` records entries for each unique object type.\
-    Columns:
-    -   `id` is the primary key.
-    -   `description` should be human-readable
--   Table `object_attributes` contains entries for each unique object attribute. \
-    Columns:
-    -   `id` is the primary key.
-    -   `object_type_id` is a foreign key referencing table `object_types`.
-    -   `description` should be human-readable.
-    -   `datatype` (integer, varchar, timestamp, ...) of the attribute.
--   Table `object` stores details for each object.\
-    Columns:
-    -   `id` is the primary key.
-    -   `object_type_id` is a foreign key referencing table `object_types`.
-    -   `description` should be human-readable.
--   Table `object_attribute_values` records attribute values for objects.\
-    Columns: 
-    -   `id` is the primary key.
-    -   `object_id` is a foreign key referencing table `objects`.
-    -   `object_attribute_id` is a foreign key referencing table `object_attributes`.
-    -   `timestamp` indicates when the attribute was updated. Timestamps are preferably stored using the UTC time zone.
-    -   `attribute_value` is the updated value of the attribute. This value should match the datatype of the attribute.
-
-**Relation-related tables** serve as bridging tables to manage the different many-to-many relations between events and objects. The qualifier definitions are stored separately to minimize the impact of renaming them in case of changing business requirements 
--   Table `relation_qualifiers` stores qualifier definitions. In cases where relation qualifiers are not available in the source data, a dummy qualifier can be introduced.\
-    Columns
-    -   `id` is the primary key.
-    -   `description` should be human-readable.
-    -   `datatype` (integer, varchar, timestamp, ...) of the attribute.
--   Table `object_to_object` stores (dynamic) relations between objects.\
-    Columns:
-    -   `id` is the primary key.
-    -   `source_object_id` is a foreign key referencing table `objects`.
-    -   `target_object_id` is a foreign key referencing table `objects`.
-    -   `timestamp` indicates when the relationship became active. To signify the end of an object-to-object relationship, a NULL value is used for the qualifier value, rather than an end timestamp. This design choice facilitates append-only data ingestion. Timestamps are preferably stored using the UTC time zone.
-    -   `qualifier_id` is a foreign key referencing table `qualifiers`.
-    -   `qualifier_value` provides additional relationship details. This value should match the datatype of the qualifier.
--   Table `event_to_object` stores relations between events and objects.\
-    Columns:
-    -   `id` is the primary key.
-    -   `event_id` is a foreign key referencing table `events`.
-    -   `object_id` is a foreign key referencing table `objects`.
-    -   `qualifier_id` is a foreign key referencing table `qualifiers`.
-    -   `qualifier_value` provides additional relationship details. This value should match the datatype of the qualifier.
--   Table `event_to_object_attribute_value` stores relations between events and changes to object attributes.\
-    Columns:
-    -   `id` is the primary key.
-    -   `event_id` is a foreign key referencing table `events`.
-    -   `object_attribute_value_id` is a foreign key referencing table `object_attribute_values`.
-    -   `qualifier_id` is a foreign key referencing table `qualifiers`.
-    -   `qualifier_value` provides additional relationship details. This value should match the datatype of the qualifier.
diff --git a/README.md b/README.md
@@ -9,28 +9,15 @@ PyStack't is published on [PyPi](https://pypi.org/project/pystackt/) and can be
 pip install pystackt
 ```
 
-## [📖 Documentation](https://lienbosmans.github.io/pystackt/)
+## 📖 Documentation
 
 -   [Extensive documentation](https://lienbosmans.github.io/pystackt/) is available via GitHub pages. 
 -   A [demo video on Youtube](https://youtu.be/AS8wI90wRM8) can walk you throught the different functionalities.
-
-## 🔍 Viewing Data  
-PyStack't creates **DuckDB database files**. From DuckDB version 1.2.1 onwards, you can explore them using the [**UI extension**](https://duckdb.org/docs/stable/extensions/ui.html). Below code will load the UI by navigating to `http://localhost:4213` in your default browser.
-
-```python
-import duckdb
-
-with duckdb.connect("./stackt.duckdb") as quack:
-    quack.sql("CALL start_ui()")
-    input("Press Enter to close the connection...")
-```
-
-Alternatively, you can use a database manager. You can follow this [DuckDB guide](https://duckdb.org/docs/guides/sql_editors/dbeaver.html) to download and install **DBeaver** for easy access.
-
+-   Our BPM 2025 demo paper [PyStack't: Real-Life Data for Object-Centric Process Mining](https://ceur-ws.org/Vol-4032/paper-28.pdf) is available on CEUR.
 
 ## 📝 Examples
 
-### ⛏️🐙 Extract object-centric event log from GitHub repo ([`get_github_log`](https://lienbosmans.github.io/pystackt/extract/get_github_log.html))
+### ⛏️🐙 Extract object-centric event log from GitHub repo ([`get_github_log`](https://lienbosmans.github.io/pystackt/content/reference/extract/get_github_log.html)
 ```python
 from pystackt import *
 
@@ -44,7 +31,7 @@ get_github_log(
 )
 ```
 
-### 📈 Interactive data exploration ([`start_visualization_app`](https://lienbosmans.github.io/pystackt/exploration/interactive_data_visualization_app.html))
+### 📈 Interactive data exploration ([`start_visualization_app`](https://lienbosmans.github.io/pystackt/content/reference/exploration/interactive_data_visualization_app.html))
 
 ```python
 from pystackt import *
@@ -61,7 +48,7 @@ start_visualization_app(
 )
 ```
 
-### 📤 Export to OCEL 2.0 ([`export_to_ocel2`](https://lienbosmans.github.io/pystackt/export/export_to_ocel2.html))
+### 📤 Export to OCEL 2.0 ([`export_to_ocel2`](https://lienbosmans.github.io/pystackt/content/reference/export/export_to_ocel2.html)
 ```python
 from pystackt import *
 

diff --git a/docs/README.md b/docs/README.md
@@ -1,52 +1,30 @@
 # PyStack't Documentation
 
-PyStack't (`pip install pystackt`) is a Python package that supports data preparation for object-centric process mining. It covers extraction of object-centric event data, storage of that data, (visual) data exploration, and export to OCED formats.
+PyStack't is a Python package that supports data preparation for object-centric process mining. It covers extraction of object-centric event data, storage of that data, (visual) data exploration, and export to popular OCED formats.
 
-[Source code](https://github.com/LienBosmans/pystackt) | [PyPi](https://pypi.org/project/pystackt/) | [Contributing Guide](https://github.com/LienBosmans/pystackt/blob/main/CONTRIBUTING.md)
+The documentation is structured in four different parts: 
+-   [Tutorials](#-tutorials-start-here): hands-on lessons for beginners
+-   [Reference material](#-reference-material): technical descriptions
+-   [How-to guides](#-how-to-guides): practical directions
+-   [Behind-the-scenes](#-behind-the-scenes): context and background
 
+## 📚 Tutorials (start here)
 
-## Data Storage
+-   [Extracting your first object-centric event log from a GitHub repository](content/tutorials/tutorial_extracting_OCED.md)
 
-PyStack't uses the Stack't relational schema to store object-centric event data. This schema was created specifically to support the data preparation stage, taking into account data engineering best practices. For more information on the design of Stack't, we recommend the paper [Dynamic and Scalable Data Preparation for Object-Centric Process Mining](https://arxiv.org/abs/2410.00596).
+## 📖 Reference material
+### Functions
+-   [⛏️ get_github_log](content/reference/extract/get_github_log.md)
+-   [📤 export_to_ocel2](content/reference/export/export_to_ocel2.md)
+-   [📤 export_to_promg](content/reference/export/export_to_promg.md)
+-   [📈 create_statistics_views](content/reference/exploration/create_statistics_views.md)
+-   [📈 interactive data visualization app](content/reference/exploration/interactive_data_visualization_app.md)
 
-![PyStack't has a modular design.](/docs/pystackt_architecture.png)
+### Output data
+-   [🗺️ Overview of `get_github_log` output](content/reference/extract/github_OCED.md)
 
-While any relational database can be used to store data in the Stack't relational schema, PyStack't uses [DuckDB](https://duckdb.org/) because it's open-source, fast and simple to use. (Think SQLite but for analytical workloads.)
+## ❓ How-to guides
+-   [How to view DuckDB files?](content/howto/view_duckdb_files.md)
 
-From DuckDB version 1.2.1 onwards, you can explore them using the [**UI extension**](https://duckdb.org/docs/stable/extensions/ui.html). Below code will load the UI by navigating to `http://localhost:4213` in your default browser.
-
-```python
-import duckdb
-
-with duckdb.connect("./stackt.duckdb") as quack:
-    quack.sql("CALL start_ui()")
-    input("Press Enter to close the connection...")
-```
-
-Alternatively, you can use a database manager. You can follow this [DuckDB guide](https://duckdb.org/docs/guides/sql_editors/dbeaver.html) to download and install **DBeaver** for easy access.
-
-
-## Data extraction
-
-Extracting data from different systems is an important part of data preparation. While PyStack't does not include all functionality that a data stack offers (incremental ingests, scheduling refreshes, monitoring data pipelines...), it aims to provide simple-to-use methods to get real-life data for your object-centric process mining adventures.
-
-### ⛏️ List of data extraction functionality
-- [`get_github_log`](extract/get_github_log.md)
-
-
-## Data export
-
-The Stack't relational schema is intended as an intermediate storage hub. PyStack't provides export functionality to export the data to specific OCED formats that can be used by process mining applications and algorithms. This decoupled set-up has as main advantage that any future data source can be exported to all supported data formats, and any future OCED format can be combined with existing data extraction functionality.
-
-### 📤 List of data export functionality
-- [`export_to_ocel2`](export/export_to_ocel2.md)
-- [`export_to_promg`](export/export_to_promg.md)
-
-
-## Data exploration
-
-Dispersing process data across multiple tables makes exploring object-centric event data less straightforward compared to traditional process mining. PyStack't aims to bridge this gap by providing dedicated data exploration functionality. Notably, the latest release includes an interactive data exploration app that runs locally and works out-of-the-box with any OCED data structured in the Stack't relational schema.
-
-### 📈 List of data exploration functionality
-- [`create_statistics_views`](exploration/create_statistics_views.md)
-- [`interactive data visualization app`](exploration/interactive_data_visualization_app.md)
+## 💡 Behind-the-scenes
+-   [About the design of PyStack't](content/explained/pystackt_design.md)
diff --git a/docs/_config.yml b/docs/_config.yml
@@ -0,0 +1,17 @@
+theme: jekyll-theme-minimal
+
+title: PyStack't Documentation
+description: "Real-life data for object-centric processing mining"
+logo: /assets/images/pystackt_logo_black_circle_small.png
+
+paper_url: https://ceur-ws.org/Vol-4032/paper-28.pdf
+paper_title: "L. Bosmans, J. Peeperkorn, J. De Smedt, PyStack’t: Real-Life Data for Object-Centric Process Mining"
+
+demo_url: https://www.youtube.com/watch?v=AS8wI90wRM8&feature=youtu.be
+demo_title: "PyStack't Demo BPM 2025"
+
+pypi_url: https://pypi.org/project/pystackt/
+pypi_title: "pip install pystackt"
+
+contributing_url: https://github.com/LienBosmans/pystackt/blob/main/CONTRIBUTING.md
+contributing_title: "Contributing guide"