Skip to content

Commit 531b7ad

Browse files
authored
pipelines: use lakeflow-pipelines template (#4237)
## Changes Change template in "pipelines init" from "cli-pipelines" to "lakeflow-pipelines". Remove the code for "cli-pipelines" template and point "cli-pipelines" template into "lakeflow-pipelines" in case anything relies on undocumented hidden template. There are a few minor problems with lakeflow-pipelines template that we will address in a follow-up. For example, we create job parameters, but never use them. ## Why lakeflow-pipelines template follows recommended layout for SDP projects in DABs and is consistent with how other templates are structured. ## Tests Updated acceptance tests
1 parent 5963b41 commit 531b7ad

File tree

85 files changed

+615
-1097
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+615
-1097
lines changed

acceptance/pipelines/e2e/output.txt

Lines changed: 30 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,26 @@
22
=== E2E Test: Complete pipeline lifecycle (init, deploy, run, stop, destroy)
33
=== Initialize pipeline project
44
>>> [PIPELINES] init --output-dir output
5+
Welcome to the template for Lakeflow Declarative Pipelines!
56

6-
Welcome to the template for pipelines!
7+
Please answer the below to tailor your project to your preferences.
8+
You can always change your mind and change your configuration in the databricks.yml file later.
79

10+
Note that [DATABRICKS_URL] is used for initialization
11+
(see https://docs.databricks.com/dev-tools/cli/profiles.html for how to change your profile).
812

9-
Your new project has been created in the 'my_project' directory!
13+
Your new project has been created in the 'lakeflow_project' directory!
1014

11-
Refer to the README.md file for "getting started" instructions!
15+
Please refer to the README.md file for "getting started" instructions.
1216

1317
=== Deploy pipeline
1418
>>> [PIPELINES] deploy
15-
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/my_project/dev/files...
19+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/lakeflow_project/dev/files...
1620
Deploying resources...
1721
Updating deployment state...
1822
Deployment complete!
19-
View your pipeline my_project_pipeline here: [DATABRICKS_URL]/pipelines/[UUID]?o=[NUMID]
23+
View your job sample_job here: [DATABRICKS_URL]/jobs/[NUMID]?o=[NUMID]
24+
View your pipeline lakeflow_project_etl here: [DATABRICKS_URL]/pipelines/[UUID]?o=[NUMID]
2025

2126
=== Run pipeline
2227
>>> [PIPELINES] run
@@ -31,31 +36,32 @@ Pipeline configurations for this update:
3136

3237
=== Edit project by creating and running a new second pipeline
3338
>>> [PIPELINES] deploy
34-
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/my_project/dev/files...
39+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/lakeflow_project/dev/files...
3540
Deploying resources...
3641
Updating deployment state...
3742
Deployment complete!
38-
View your pipeline my_project_pipeline here: [DATABRICKS_URL]/pipelines/[UUID]?o=[NUMID]
39-
View your pipeline my_project_pipeline_2 here: [DATABRICKS_URL]/pipelines/[UUID]?o=[NUMID]
43+
View your job sample_job here: [DATABRICKS_URL]/jobs/[NUMID]?o=[NUMID]
44+
View your pipeline lakeflow_project_etl here: [DATABRICKS_URL]/pipelines/[UUID]?o=[NUMID]
45+
View your pipeline lakeflow_project_etl_2 here: [DATABRICKS_URL]/pipelines/[UUID]?o=[NUMID]
4046

4147
=== Assert the second pipeline is created
4248
>>> [CLI] pipelines get [UUID]
4349
{
4450
"creator_user_name":"[USERNAME]",
4551
"last_modified":[UNIX_TIME_MILLIS],
46-
"name":"[dev [USERNAME]] my_project_pipeline_2",
52+
"name":"[dev [USERNAME]] lakeflow_project_etl_2",
4753
"pipeline_id":"[UUID]",
4854
"run_as_user_name":"[USERNAME]",
4955
"spec": {
5056
"channel":"CURRENT",
5157
"deployment": {
5258
"kind":"BUNDLE",
53-
"metadata_file_path":"/Workspace/Users/[USERNAME]/.bundle/my_project/dev/state/metadata.json"
59+
"metadata_file_path":"/Workspace/Users/[USERNAME]/.bundle/lakeflow_project/dev/state/metadata.json"
5460
},
5561
"development":true,
5662
"edition":"ADVANCED",
5763
"id":"[UUID]",
58-
"name":"[dev [USERNAME]] my_project_pipeline_2",
64+
"name":"[dev [USERNAME]] lakeflow_project_etl_2",
5965
"storage":"dbfs:/pipelines/[UUID]",
6066
"tags": {
6167
"dev":"[USERNAME]"
@@ -64,7 +70,7 @@ View your pipeline my_project_pipeline_2 here: [DATABRICKS_URL]/pipelines/[UUID]
6470
"state":"IDLE"
6571
}
6672

67-
>>> [PIPELINES] run my_project_pipeline_2
73+
>>> [PIPELINES] run lakeflow_project_etl_2
6874
Update URL: [DATABRICKS_URL]/#joblist/pipelines/[UUID]/updates/[UUID]
6975

7076
Update ID: [UUID]
@@ -75,26 +81,27 @@ Pipeline configurations for this update:
7581
• All tables are refreshed
7682

7783
=== Stop both pipelines before destroy
78-
>>> [PIPELINES] stop my_project_pipeline
79-
Stopping my_project_pipeline...
80-
my_project_pipeline has been stopped.
84+
>>> [PIPELINES] stop lakeflow_project_etl
85+
Stopping lakeflow_project_etl...
86+
lakeflow_project_etl has been stopped.
8187

82-
>>> [PIPELINES] stop my_project_pipeline_2
83-
Stopping my_project_pipeline_2...
84-
my_project_pipeline_2 has been stopped.
88+
>>> [PIPELINES] stop lakeflow_project_etl_2
89+
Stopping lakeflow_project_etl_2...
90+
lakeflow_project_etl_2 has been stopped.
8591

8692
=== Destroy project
8793
>>> [PIPELINES] destroy --auto-approve
8894
The following resources will be deleted:
89-
delete resources.pipelines.my_project_pipeline
90-
delete resources.pipelines.my_project_pipeline_2
95+
delete resources.jobs.sample_job
96+
delete resources.pipelines.lakeflow_project_etl
97+
delete resources.pipelines.lakeflow_project_etl_2
9198

9299
This action will result in the deletion of the following Lakeflow Declarative Pipelines along with the
93100
Streaming Tables (STs) and Materialized Views (MVs) managed by them:
94-
delete resources.pipelines.my_project_pipeline
95-
delete resources.pipelines.my_project_pipeline_2
101+
delete resources.pipelines.lakeflow_project_etl
102+
delete resources.pipelines.lakeflow_project_etl_2
96103

97-
All files and directories at the following location will be deleted: /Workspace/Users/[USERNAME]/.bundle/my_project/dev
104+
All files and directories at the following location will be deleted: /Workspace/Users/[USERNAME]/.bundle/lakeflow_project/dev
98105

99106
Deleting files...
100107
Destroy complete!

acceptance/pipelines/e2e/output/my_project/.vscode/__builtins__.pyi renamed to acceptance/pipelines/e2e/output/lakeflow_project/.vscode/__builtins__.pyi

File renamed without changes.

acceptance/pipelines/e2e/output/my_project/.vscode/extensions.json renamed to acceptance/pipelines/e2e/output/lakeflow_project/.vscode/extensions.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"recommendations": [
33
"databricks.databricks",
4-
"ms-python.vscode-pylance",
5-
"redhat.vscode-yaml"
4+
"redhat.vscode-yaml",
5+
"charliermarsh.ruff"
66
]
77
}
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
{
2+
"jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])",
3+
"jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------",
4+
"python.testing.pytestArgs": [
5+
"."
6+
],
7+
"files.exclude": {
8+
"**/*.egg-info": true,
9+
"**/__pycache__": true,
10+
".pytest_cache": true,
11+
"dist": true,
12+
},
13+
"files.associations": {
14+
"**/.gitkeep": "markdown"
15+
},
16+
17+
// Pylance settings (VS Code)
18+
// Set typeCheckingMode to "basic" to enable type checking!
19+
"python.analysis.typeCheckingMode": "off",
20+
"python.analysis.extraPaths": ["src", "lib", "resources"],
21+
"python.analysis.diagnosticMode": "workspace",
22+
"python.analysis.stubPath": ".vscode",
23+
24+
// Pyright settings (Cursor)
25+
// Set typeCheckingMode to "basic" to enable type checking!
26+
"cursorpyright.analysis.typeCheckingMode": "off",
27+
"cursorpyright.analysis.extraPaths": ["src", "lib", "resources"],
28+
"cursorpyright.analysis.diagnosticMode": "workspace",
29+
"cursorpyright.analysis.stubPath": ".vscode",
30+
31+
// General Python settings
32+
"python.defaultInterpreterPath": "./.venv/bin/python",
33+
"python.testing.unittestEnabled": false,
34+
"python.testing.pytestEnabled": true,
35+
"[python]": {
36+
"editor.defaultFormatter": "charliermarsh.ruff",
37+
"editor.formatOnSave": true,
38+
},
39+
}
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# lakeflow_project
2+
3+
The 'lakeflow_project' project was generated by using the lakeflow-pipelines template.
4+
5+
* `src/`: Python source code for this project.
6+
* `resources/`: Resource configurations (jobs, pipelines, etc.)
7+
8+
## Getting started
9+
10+
Choose how you want to work on this project:
11+
12+
(a) Directly in your Databricks workspace, see
13+
https://docs.databricks.com/dev-tools/bundles/workspace.
14+
15+
(b) Locally with an IDE like Cursor or VS Code, see
16+
https://docs.databricks.com/dev-tools/vscode-ext.html.
17+
18+
(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
19+
20+
# Using this project using the CLI
21+
22+
The Databricks workspace and IDE extensions provide a graphical interface for working
23+
with this project. It's also possible to interact with it directly using the CLI:
24+
25+
1. Authenticate to your Databricks workspace, if you have not done so already:
26+
```
27+
$ databricks configure
28+
```
29+
30+
2. To deploy a development copy of this project, type:
31+
```
32+
$ databricks bundle deploy --target dev
33+
```
34+
(Note that "dev" is the default target, so the `--target` parameter
35+
is optional here.)
36+
37+
This deploys everything that's defined for this project.
38+
For example, the default template would deploy a pipeline called
39+
`[dev yourname] lakeflow_project_etl` to your workspace.
40+
You can find that resource by opening your workpace and clicking on **Jobs & Pipelines**.
41+
42+
3. Similarly, to deploy a production copy, type:
43+
```
44+
$ databricks bundle deploy --target prod
45+
```
46+
Note the default template has a includes a job that runs the pipeline every day
47+
(defined in resources/sample_job.job.yml). The schedule
48+
is paused when deploying in development mode (see
49+
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
50+
51+
4. To run a job or pipeline, use the "run" command:
52+
```
53+
$ databricks bundle run
54+
```

acceptance/pipelines/init/error-cases/output/my_project/databricks.yml renamed to acceptance/pipelines/e2e/output/lakeflow_project/databricks.yml

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,42 @@
1-
# This is a Databricks pipelines definition for my_project.
1+
# This is a Databricks asset bundle definition for lakeflow_project.
22
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
33
bundle:
4-
name: my_project
4+
name: lakeflow_project
55
uuid: [UUID]
66

77
include:
88
- resources/*.yml
99
- resources/*/*.yml
10-
- ./*.yml
1110

1211
# Variable declarations. These variables are assigned in the dev/prod targets below.
1312
variables:
1413
catalog:
1514
description: The catalog to use
1615
schema:
1716
description: The schema to use
18-
notifications:
19-
description: The email addresses to use for failure notifications
2017

2118
targets:
2219
dev:
2320
# The default target uses 'mode: development' to create a development copy.
24-
# - Deployed pipelines get prefixed with '[dev my_user_name]'
21+
# - Deployed resources get prefixed with '[dev my_user_name]'
22+
# - Any job schedules and triggers are paused by default.
23+
# See also https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.
2524
mode: development
2625
default: true
2726
workspace:
2827
host: [DATABRICKS_URL]
2928
variables:
3029
catalog: hive_metastore
3130
schema: ${workspace.current_user.short_name}
32-
notifications: []
33-
3431
prod:
3532
mode: production
3633
workspace:
3734
host: [DATABRICKS_URL]
3835
# We explicitly deploy to /Workspace/Users/[USERNAME] to make sure we only have a single copy.
3936
root_path: /Workspace/Users/[USERNAME]/.bundle/${bundle.name}/${bundle.target}
37+
variables:
38+
catalog: hive_metastore
39+
schema: prod
4040
permissions:
4141
- user_name: [USERNAME]
4242
level: CAN_MANAGE
43-
variables:
44-
catalog: hive_metastore
45-
schema: default
46-
notifications: [[USERNAME]]

acceptance/pipelines/e2e/output/my_project/out.gitignore renamed to acceptance/pipelines/e2e/output/lakeflow_project/out.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,7 @@ dist/
44
__pycache__/
55
*.egg-info
66
.venv/
7+
scratch/**
8+
!scratch/README.md
79
**/explorations/**
810
**/!explorations/README.md
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
[project]
2+
name = "lakeflow_project"
3+
version = "0.0.1"
4+
authors = [{ name = "[USERNAME]" }]
5+
requires-python = ">=3.10,<3.13"
6+
dependencies = [
7+
# Any dependencies for jobs and pipelines in this project can be added here
8+
# See also https://docs.databricks.com/dev-tools/bundles/library-dependencies
9+
#
10+
# LIMITATION: for pipelines, dependencies are cached during development;
11+
# add dependencies to the 'environment' section of your pipeline.yml file instead
12+
]
13+
14+
[dependency-groups]
15+
dev = [
16+
"pytest",
17+
"ruff",
18+
"databricks-dlt",
19+
"databricks-connect>=15.4,<15.5",
20+
"ipykernel",
21+
]
22+
23+
[project.scripts]
24+
main = "lakeflow_project.main:main"
25+
26+
[build-system]
27+
requires = ["hatchling"]
28+
build-backend = "hatchling.build"
29+
30+
[tool.hatch.build.targets.wheel]
31+
packages = ["src"]
32+
33+
[tool.ruff]
34+
line-length = 120
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# The main pipeline for lakeflow_project
2+
3+
resources:
4+
pipelines:
5+
lakeflow_project_etl:
6+
name: lakeflow_project_etl
7+
# Catalog is required for serverless compute
8+
catalog: main
9+
schema: ${var.schema}
10+
serverless: true
11+
root_path: "../src/lakeflow_project_etl"
12+
13+
libraries:
14+
- glob:
15+
include: ../src/lakeflow_project_etl/transformations/**
16+
17+
environment:
18+
dependencies:
19+
# We include every dependency defined by pyproject.toml by defining an editable environment
20+
# that points to the folder where pyproject.toml is deployed.
21+
- --editable ${workspace.file_path}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
resources:
2+
pipelines:
3+
lakeflow_project_etl_2:
4+
name: lakeflow_project_etl_2

0 commit comments

Comments
 (0)