This is a pico-example of a meta-data driven lakehouse for Microsoft Fabric. For a full scale solution, I recommend twoday's AquaVilla best practices.
/Setup.ipynb holds all to get started. It is one script, to setup the full example.
NB! 2025-02-07: There has been fixed a concurrency issue, where transformations could leak data to other transformations. There is tiny minor risc of experince this. Re-running the /Setup.ipynb notebook fixes this.
AquaShack is traditionel medallion architecture, due to history the Bronze, Silver and Gold layers are named Landing, Base and Curated.
When installed, the notebook 1_AquaShack_Landing_To_Base will move example data from Landing to Base. The notebook 2_AquaShack_Base_To_Curated will move data from Base to Curated.
Delta and Parquet: Integer, GUID/UUID or SHA256 as ID?
Spark SQL: Why the choice of language doesn’t impact performance
Data Architecture: Data capture time and event time in medallion architecture
Microsoft Fabric: Building Pseudo Identity Columns Without monotonically_increasing_id() in Spark
Lakehousing: Removing one of the biggest performance killers in Silver-to-Gold processing
