Skip to content

Planning-Inspectorate/odw-synapse-workspace

Repository files navigation

Introduction

This repo contains all the artifacts and infrastructure code for the PINS Operational Data Warehouse (ODW). It consists of the following:

  • Infrastructure - Contains the root Terraform module for deploying the ODW environment
  • Pipelines - Contains Azure DevOps Pipeline definitions and steps
  • Workspace - Contains development data artifacts ingested into the development Azure Synapse Workspace
  • odw - Contains ETL code and utility functions that are installed on Synapse spark pools

Reference Documentation

Getting Started

The following steps outline how to get up and running with this repo on your own system:

  1. Environment access
    1. Github access - if you're reading this repo readme you probably already have this
    2. Azure DevOps access to the operational-data-warehouse Azure DevOps project
    3. Azure Portal access - additional access is required to the Azure Portal and the corresponding Azure Resources in each environment
  2. Application Installation - the following desktop applications are optional but provide advantages when working with some of the Azure resources - PINS Azure auth policy is to restrict access to PINS devices only so non-PINS devices will need to be whitelisted to use these
    1. Install Visual Studio Code or equivalent IDE - for editing and commiting code artifacts
    2. Install Azure Data Studio - for connecting to Azure SQL instances and managing/commiting data notebooks
    3. Install Microsoft Azure Storage Explorer
  3. Clone Repo
    1. Create a Personal Access Token in GitHub or use another authentication method e.g. SSH
    2. Clone the repo in VSCode/Azure Data Studio to a local folder

Environments

The ODW environment is deployed to three Azure subscriptions as follows:

Environment Name Subscription Name Subscription ID
Development pins-odw-data-dev-sub ff442a29-fc06-4a13-8e3e-65fd5da513b3
Pre-Production pins-odw-data-preprod-sub 6b18ba9d-2399-48b5-a834-e0f267be122d
Production pins-odw-data-prod-sub a82fd28d-5989-4e06-a0bb-1a5d859f9e0c

Within each subscription, the infrastructure is split into several resource groups, aligned to the data landing zone architecture:

Resource Group Name Description
pins-rg-data-odw-{env}-{region} Contains the Data Lake and Synapse Workspace resources
pins-rg-data-odw-{env}-{region}-synapse-managed Managed resource group for the Synapse Workspace
pins-rg-datamgmt-odw-{env}-{region} Contains data management resource such as Purview and Bastion VM(s)
pins-rg-datamgmt-odw-{env}-{region}-purview-managed Managed resource group for the Purview Account
pins-rg-devops-odw-{env}-{region} Contains Azue DevOps agents for deployments into the private network
pins-rg-monitoring-odw-{env}-{region} Contains monitoring resources such as Log Analytics and App Insights
pins-rg-network-odw-{env}-global Contains private DNS zones for private-link-enabled resources
pins-rg-network-odw-{env}-{region} Contains the virtual network, network security groups and private endpoints
pins-rg-shir-odw-{env}-{region} Contains self-hosted integration runtime VM(s) used by the Synapse Workspace

Some of the key resources used in the deployment are:

Resource Name Description
Synapse Workspace Analytics product for loading, transforming and analysing data using SQL and/or Spark
ADLS Storage Account Hierarchical namespace enabled Storage Account to act as a data lake
Key Vault Secrets storage for connection strings, password, etc for connected services
Log Analytics Activity and metric diagnostic log storage with querying capabilities using KQL

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors