From cff7bbc36fbc005ed658fbf97da78da2ccb1d698 Mon Sep 17 00:00:00 2001 From: Dmytro Yurchuk Date: Sun, 3 Sep 2023 22:01:17 +0300 Subject: [PATCH 1/4] rfc_SSAS_ingestion --- active/000-SSAS-ingestion.md | 91 ++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 active/000-SSAS-ingestion.md diff --git a/active/000-SSAS-ingestion.md b/active/000-SSAS-ingestion.md new file mode 100644 index 0000000..88ddde4 --- /dev/null +++ b/active/000-SSAS-ingestion.md @@ -0,0 +1,91 @@ +- Start Date: (2013-09-01) +- RFC PR: (after opening the RFC PR, update this with a link to it and update the file name) +- Discussion Issue: (None) +- Implementation PR(s): (leave this empty) + +# SSAS Ingestion Module + +## Summary + +Adding the functionality of ingesting MSSQL OLAP metadata into DataHub is to provide a more comprehensive view of the data landscape and enable better data discovery and analysis. +The company I work for has developed an MVP ingestion module that caters to both tabular and multidimensional SSAS. We are considering contributing it to Datahub, but I have a couple of questions about the process. + +## Motivation + +By ingesting OLAP metadata from MSSQL, DataHub can provide users with a better understanding of the data stored in MSSQL OLAP cubes, including information about dimensions, hierarchies, measures, and calculations. + +Ingesting MSSQL OLAP metadata into DataHub can help improve data governance and data quality. Metadata can be used to build full data lineage, improve data discovery and analysis. By having a centralized view of the OLAP metadata, DataHub can help ensure that data is being used correctly and consistently across the organization. + + +## Requirements + +- Ingestion metadata from SSAS Tabular models +- Ingestion metadata from SSAS Multidimensional models + + +### Extensibility + +- Build lineage to/from SSAS models + +## Detailed design + +General information about [OLAP cubes](https://learn.microsoft.com/en-us/system-center/scsm/olap-cubes-overview?view=sc-sm-2022). + + +The interaction with SSAS (SQL Server Analysis Services) is carried out through [Microsoft's solution](https://learn.microsoft.com/en-us/analysis-services/instances/configure-http-access-to-analysis-services-on-iis-8-0?view=asallproducts-allversions). + +Arguments in favor of such a solution: +- Cross-platform compatibility. +- A single, standardized entry point for working with SSAS. + + +General scheme. +```mermaid +graph LR; + id1[DataHub]---id2[IIS web server]; + id2[IIS web server]---id3[SSAS1]; + id2[IIS web server]---id4[SSAS2]; +``` +Data exchange occurs using XMLA queries wrapped in HTTP. +- For multidimensional SSAS servers, a [DISCOVER_XML_METADATA](https://learn.microsoft.com/en-us/openspecs/sql_server_protocols/ms-ssas/51647299-75c7-471d-896f-a691e4114b18) type query is used. +- For tabular SSAS servers, [DMV](https://learn.microsoft.com/en-us/analysis-services/instances/use-dynamic-management-views-dmvs-to-monitor-analysis-services?view=asallproducts-allversions) (Dynamic Management View) queries are utilized. + + + +The following scheme was proposed for entity mapping: +```mermaid +graph TB; + c1---b1; + b1---a1; + b1---a2; + subgraph s1[DataSet]; + a1["Dimension"]; + a2["Measure"]; + end; + subgraph s2[DataJob]; + b1["Cube"]; + end; + subgraph s3[DataFlow]; + c1["Catalog(database)"]; + end; +``` +## How we teach this + +We should create/update user guides to educate users for: + - Search & discovery experience (how to find a SSAS models in DataHub) + - Lineage experience (how to find different entities connected to the SSAS models) + +## Rollout / Adoption Strategy + +If it will be standalone module only who want will use it. So we no need any migration tools and braking changes. + +## Future Work + +Establish a complete data lineage from the data source to the analytical models. + +## Unresolved questions + +- It would be better to create this module as a standalone, focusing solely on SSAS, or should it be integrated into the existing Mssql module? +- Is it relevant to add SSAS entities (catalog, cube, dimension, measure) to the DataHub? +- Does the proposed communication method with SSAS align with the project's needs? +- Does the proposed entity mapping approach for SSAS entities suit the project's requirements? \ No newline at end of file From 02ba8cda9e0364eed3b0f5006327279924cb1cbb Mon Sep 17 00:00:00 2001 From: Dmytro Yurchuk Date: Wed, 6 Sep 2023 11:32:25 +0300 Subject: [PATCH 2/4] better mapping --- active/000-SSAS-ingestion.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/active/000-SSAS-ingestion.md b/active/000-SSAS-ingestion.md index 88ddde4..c3bab7f 100644 --- a/active/000-SSAS-ingestion.md +++ b/active/000-SSAS-ingestion.md @@ -58,17 +58,21 @@ graph TB; c1---b1; b1---a1; b1---a2; - subgraph s1[DataSet]; + subgraph s1[Properties]; a1["Dimension"]; a2["Measure"]; end; - subgraph s2[DataJob]; + subgraph s2[DataSet]; b1["Cube"]; end; - subgraph s3[DataFlow]; + subgraph s3[Container]; c1["Catalog(database)"]; end; ``` +- Server maps to a container. +- Catalog maps to a container (and is hierarchically nested within the server container). +- Cube is mapped as a dataset. +- Dimension and measure become properties of the dataset. ## How we teach this We should create/update user guides to educate users for: From ed37435a562ae9fc3c7022710ab0bb1ea9e8a2e4 Mon Sep 17 00:00:00 2001 From: Dmytro Yurchuk Date: Fri, 8 Sep 2023 09:48:12 +0300 Subject: [PATCH 3/4] file name --- active/{000-SSAS-ingestion.md => 4-SSAS-ingestion.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename active/{000-SSAS-ingestion.md => 4-SSAS-ingestion.md} (97%) diff --git a/active/000-SSAS-ingestion.md b/active/4-SSAS-ingestion.md similarity index 97% rename from active/000-SSAS-ingestion.md rename to active/4-SSAS-ingestion.md index c3bab7f..8a70c3a 100644 --- a/active/000-SSAS-ingestion.md +++ b/active/4-SSAS-ingestion.md @@ -1,5 +1,5 @@ - Start Date: (2013-09-01) -- RFC PR: (after opening the RFC PR, update this with a link to it and update the file name) +- RFC PR: [https://github.com/datahub-project/rfcs/pull/4](https://github.com/datahub-project/rfcs/pull/4) - Discussion Issue: (None) - Implementation PR(s): (leave this empty) From 3dd9f86adb9187a7d82e4b07709543053ac08617 Mon Sep 17 00:00:00 2001 From: Dmytro Yurchuk Date: Mon, 15 Apr 2024 16:04:28 +0300 Subject: [PATCH 4/4] PR --- active/4-SSAS-ingestion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/active/4-SSAS-ingestion.md b/active/4-SSAS-ingestion.md index 8a70c3a..f6525ad 100644 --- a/active/4-SSAS-ingestion.md +++ b/active/4-SSAS-ingestion.md @@ -1,7 +1,7 @@ - Start Date: (2013-09-01) - RFC PR: [https://github.com/datahub-project/rfcs/pull/4](https://github.com/datahub-project/rfcs/pull/4) - Discussion Issue: (None) -- Implementation PR(s): (leave this empty) +- Implementation PR(s): [https://github.com/datahub-project/datahub/pull/10286](https://github.com/datahub-project/datahub/pull/10286) # SSAS Ingestion Module