Skip to content
Open
9 changes: 9 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,15 @@
"console": "integratedTerminal",
"args": ["--config-path", "./example/pipeline_parking_sensors.json", "--working-dir", "./tmp", "--show-result", "--cleanup-database"],
"justMyCode": true
},
{
"name": "Python: main.py JDBC",
"type": "python",
"request": "launch",
"program": "src/main.py",
"console": "integratedTerminal",
"args": ["--config-path", "./example/pipeline_fruit_batch_JDBC.json", "--working-dir", "./tmp", "--show-result", "--cleanup-database"],
"justMyCode": true
},
{
"name": "Python: main.py ADLS",
Expand Down
70 changes: 70 additions & 0 deletions doc/ADLS_ingestion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# ADLS/JDBC Data Ingestion

## If you want to setup ADLS or JDBC AS pipline data source you need setup the service principal, Azure key vault and databricks secret scope

### Create Azure Lake Storage Account

- The Data Lake Storage account that will use the service principal must be Gen2. Azure Data Lake Gen1 storage accounts are not supported.
- The Data Lake Storage account has hierarchical namespace enabled.
- Admin permissions for your Azure tenant, if you have to create a new service principal.

### Create an Azure service principal for Customer Insights

#### Look for an existing service principal

1. Go to the Azure admin portal and sign in to your organization.
2. From Azure services, select Azure Active Directory.
3. Under Manage, select App registrations.
4. If you find a matching record, it means that the service principal already exists. Grant permissions for the service principal to access the storage account.

#### Create a new service principal

1. Go to the Azure admin portal and sign in to your organization.
2. From Azure services, select Azure Active Directory.
3. Under Manage, select App registrations.
4. New Registration
![new app](images/new_app.png)
5. recorded information
- Application (client) ID
- Directory (tenant) ID
- Client credentials
![new app](images/app_info.png)

#### Grant permissions to the service principal to access the storage account

1. Go to the ADLS in Azure admin portal.
2. Select Access Cintrol (IAM)
3. Add role assignment
![new app](images/app_adls01.png)
4. Select the label Role => Contributor => Next
![new app](images/app_adls02.png)
5. Under label Members => User, group, service pricipal
6. Search and select service principal name just created
7. Click Review + assign
![new app](images/app_adls03.png)

#### ADD ADLS and JDBC info into Azure Key Vault

- Secrets => Generate/Import
- add service principal and JDBC user name and password
![new app](images/key_vault.png)

#### ADD Databricks secret scope for access Key Vault
secret scope is stored in (backed by) an encrypted database owned and managed by Azure Databricks. The secret scope name:

- Must be unique within a workspace.
- Must consist of alphanumeric characters, dashes, underscores, @, and periods, and may not exceed 128 characters.

The names are considered non-sensitive and are readable by all users in the workspace.

#### Create an Azure Key Vault-backed secret scope using the UI

- Go to https://<databricks-instance>#secrets/createScope. This URL is case sensitive; scope in createScope must be uppercase.
![new app](images/dbs_secret_scope.png)

input scope name and Azure Key Vault DNS Name and Resource ID

DNS Name and Resource ID you can get these value from Azure portal your key vault => properties
![new app](images/key_vault02.png)

Finally, you have your account is ready and can access ADLS and JDBC in Databricks
Binary file added doc/images/app_adls01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/app_adls02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/app_adls03.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/app_info.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/dbs_secret_scope.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/key_vault.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/key_vault02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/new_app.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
99 changes: 97 additions & 2 deletions example/pipeline_fruit_batch_ADLS.json
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,44 @@
}
],
"type": "struct"
}
},
"sampleData": [
{
"ID": 1,
"Amount": 12,
"TS": "2022-01-10T00:00:00.000+08:00"
},
{
"ID": 2,
"Amount": 13,
"TS": "2022-01-10T00:00:00.000+08:00"
},
{
"ID": 3,
"Amount": 14,
"TS": "2022-01-10T00:00:00.000+08:00"
},
{
"ID": 4,
"Amount": 15,
"TS": "2022-01-10T00:00:00.000+08:00"
},
{
"ID": 5,
"Amount": 16,
"TS": "2022-01-10T00:00:00.000+08:00"
},
{
"ID": 6,
"Amount": 17,
"TS": "2022-01-10T00:00:00.000+08:00"
},
{
"ID": 7,
"Amount": 18,
"TS": "2022-01-10T00:00:00.000+08:00"
}
]
},
{
"name": "price_ingestion",
Expand Down Expand Up @@ -175,7 +212,65 @@
}
],
"type": "struct"
}
},
"sampleData": [
{
"ID": 1,
"Fruit": "Red Grape",
"Color": "Red",
"Price": 2,
"Start_TS": "2015-01-01T00:00:00.000+08:00",
"End_TS": "2099-12-31T23:59:59.000+08:00"
},
{
"ID": 2,
"Fruit": "Peach",
"Color": "Yellow",
"Price": 3,
"Start_TS": "2015-01-01T00:00:00.000+08:00",
"End_TS": "2099-12-31T23:59:59.000+08:00"
},
{
"ID": 3,
"Fruit": "Orange",
"Color": "Orange",
"Price": 2,
"Start_TS": "2015-01-01T00:00:00.000+08:00",
"End_TS": "2099-12-31T23:59:59.000+08:00"
},
{
"ID": 4,
"Fruit": "Green Apple",
"Color": "Green",
"Price": 3,
"Start_TS": "2015-01-01T00:00:00.000+08:00",
"End_TS": "2099-12-31T23:59:59.000+08:00"
},
{
"ID": 5,
"Fruit": "Fiji Apple",
"Color": "Red",
"Price": 3.5,
"Start_TS": "2015-01-01T00:00:00.000+08:00",
"End_TS": "2099-12-31T23:59:59.000+08:00"
},
{
"ID": 6,
"Fruit": "Banana",
"Color": "Yellow",
"Price": 1,
"Start_TS": "2015-01-01T00:00:00.000+08:00",
"End_TS": "2099-12-31T23:59:59.000+08:00"
},
{
"ID": 7,
"Fruit": "Green Grape",
"Color": " Green",
"Price": 2,
"Start_TS": "2015-01-01T00:00:00.000+08:00",
"End_TS": "2099-12-31T23:59:59.000+08:00"
}
]
}
],
"standard": [
Expand Down
Loading