Skip to content

Commit 06fa7e5

Browse files
authored
Config remote sync command (#4289)
## Changes New experimental command `databricks bundle config-remote-sync`. The command fetches the latest remote changes of resources and compares them to the deployed state. With the `--save` flag, it also writes changes back to YAML files. The command leverages the diffing algorithm of the `bundle plan` command. Also note that new dependency is added to do yaml patching that preserves comments This is a first PR, I will continue fixing all limitations in the next PRs Current limitations: - The command only works in direct mode, for TF engine the plan is to prepare a deploy-time conversion of Terraform state to direct engine state (behind the env-flag DATABRICKS_BUNDLE_ENABLE_EXPERIMENTAL_YAML_SYNC) - Server-side defaults are hardcoded for now until it is better supported in the `bundle plan` - Selectors like `tasks[task_key='my_task']` need to be tested for cases when the object or parent doesn't exist - CLI transformations (dev prefixes, path translations, resource mutators etc) and variables are not properly handled, so there are some bugs possible when it affects these fields - Tested only for jobs No changelog entries are needed as the command is intended to be private for now, and we don't want to encourage users to use it due to the unstable API ## Why This command serves as the backend for the new visual authoring feature in DABs in the Workspace. User edits job or pipeline in the Workspace UI, then clicks the "Sync" button, and then the new command applies the diff to the source configuration files. Then the user may accept or reject these changes in the editor ## Tests Currently, only unit tests to speed up devloop, but once I have full functionality, I'll add proper acceptance test coverage Acceptance tests are failing because of the change in formatting (even though the command is hidden) <!-- If your PR needs to be included in the release notes for next release, add a separate entry in NEXT_CHANGELOG.md as part of your PR. -->
1 parent 4f0b92a commit 06fa7e5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+3433
-13
lines changed

acceptance/bin/edit_resource.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,14 @@ def set(self, job_id, value):
3232
return run([CLI, "jobs", "reset", job_id, "--json", json.dumps(payload)])
3333

3434

35+
class pipelines:
36+
def get(self, pipeline_id):
37+
return run_json([CLI, "pipelines", "get", pipeline_id])["spec"]
38+
39+
def set(self, pipeline_id, value):
40+
return run([CLI, "pipelines", "update", pipeline_id, "--json", json.dumps(value)])
41+
42+
3543
def main():
3644
parser = argparse.ArgumentParser()
3745
parser.add_argument("type")
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
bundle:
2+
name: test-bundle-$UNIQUE_NAME
3+
4+
resources:
5+
jobs:
6+
my_job:
7+
tasks:
8+
- task_key: main
9+
notebook_task:
10+
notebook_path: /Users/{{workspace_user_name}}/notebook
11+
new_cluster:
12+
spark_version: $DEFAULT_SPARK_VERSION
13+
node_type_id: $NODE_TYPE_ID
14+
num_workers: 1
15+
16+
targets:
17+
default:
18+
resources:
19+
jobs:
20+
my_job:
21+
email_notifications:
22+
on_success:
23+
- success@example.com
24+
parameters:
25+
- name: catalog
26+
default: main
27+
- name: env
28+
default: dev
29+
trigger:
30+
periodic:
31+
interval: 1
32+
unit: DAYS
33+
tags:
34+
env: dev
35+
version: v1
36+
team: data-team
37+
max_concurrent_runs: 1
38+
environments:
39+
- environment_key: default
40+
spec:
41+
environment_version: "3"
42+
dependencies:
43+
- ./*.whl

acceptance/bundle/config-remote-sync/config_edits/out.test.toml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
Uploading dummy.whl...
2+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/test-bundle-[UNIQUE_NAME]/default/files...
3+
Deploying resources...
4+
Updating deployment state...
5+
Deployment complete!
6+
7+
=== Case 1: Added locally, updated remotely with different value
8+
9+
=== Case 2: Added locally, removed remotely
10+
11+
=== Case 3: Removed locally, removed remotely
12+
13+
=== Case 4: Updated locally, updated remotely with different value
14+
15+
=== Case 5: Updated locally, removed remotely
16+
17+
=== Case 6: Added locally and remotely with same value (no drift expected)
18+
19+
=== Edit job remotely
20+
21+
=== Detect and save changes
22+
Detected changes in 1 resource(s):
23+
24+
Resource: resources.jobs.my_job
25+
email_notifications.on_failure[0]: update
26+
max_concurrent_runs: update
27+
tags['env']: update
28+
29+
30+
=== Configuration changes
31+
32+
>>> diff.py databricks.yml.backup databricks.yml
33+
--- databricks.yml.backup
34+
+++ databricks.yml
35+
@@ -1,5 +1,4 @@
36+
bundle:
37+
name: test-bundle-[UNIQUE_NAME]
38+
-
39+
resources:
40+
jobs:
41+
@@ -13,5 +12,4 @@
42+
node_type_id: [NODE_TYPE_ID]
43+
num_workers: 1
44+
-
45+
targets:
46+
default:
47+
@@ -24,5 +22,5 @@
48+
- success@example.com
49+
on_failure:
50+
- - config-failure@example.com
51+
+ - remote-failure@example.com
52+
parameters:
53+
- name: catalog
54+
@@ -35,7 +33,6 @@
55+
unit: DAYS
56+
tags:
57+
- env: config-production
58+
team: data-team
59+
- max_concurrent_runs: 3
60+
+ max_concurrent_runs: 5
61+
timeout_seconds: 3600
62+
environments:
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
#!/bin/bash
2+
3+
envsubst < databricks.yml.tmpl > databricks.yml
4+
5+
touch dummy.whl
6+
$CLI bundle deploy
7+
job_id="$(read_id.py my_job)"
8+
9+
title "Case 1: Added locally, updated remotely with different value"
10+
echo
11+
old=$(cat <<'EOF'
12+
email_notifications:
13+
on_success:
14+
- success@example.com
15+
EOF
16+
)
17+
new=$(cat <<'EOF'
18+
email_notifications:
19+
on_success:
20+
- success@example.com
21+
on_failure:
22+
- config-failure@example.com
23+
EOF
24+
)
25+
update_file.py databricks.yml "$old" "$new"
26+
read -r -d '' case1 <<'EOF' || true
27+
r["email_notifications"]["on_failure"] = ["remote-failure@example.com"]
28+
EOF
29+
30+
title "Case 2: Added locally, removed remotely"
31+
echo
32+
old=$(cat <<'EOF'
33+
max_concurrent_runs: 1
34+
EOF
35+
)
36+
new=$(cat <<'EOF'
37+
max_concurrent_runs: 1
38+
timeout_seconds: 3600
39+
EOF
40+
)
41+
update_file.py databricks.yml "$old" "$new"
42+
read -r -d '' case2 <<'EOF' || true
43+
r.pop("timeout_seconds", None)
44+
EOF
45+
46+
title "Case 3: Removed locally, removed remotely"
47+
echo
48+
old=$(cat <<'EOF'
49+
tags:
50+
env: dev
51+
version: v1
52+
team: data-team
53+
EOF
54+
)
55+
new=$(cat <<'EOF'
56+
tags:
57+
env: dev
58+
team: data-team
59+
EOF
60+
)
61+
update_file.py databricks.yml "$old" "$new"
62+
read -r -d '' case3 <<'EOF' || true
63+
r["tags"].pop("version", None)
64+
EOF
65+
66+
title "Case 4: Updated locally, updated remotely with different value"
67+
echo
68+
update_file.py databricks.yml 'max_concurrent_runs: 1' 'max_concurrent_runs: 3'
69+
read -r -d '' case4 <<'EOF' || true
70+
r["max_concurrent_runs"] = 5
71+
EOF
72+
73+
title "Case 5: Updated locally, removed remotely"
74+
echo
75+
update_file.py databricks.yml 'env: dev' 'env: config-production'
76+
read -r -d '' case5 <<'EOF' || true
77+
r["tags"].pop("env", None)
78+
EOF
79+
80+
title "Case 6: Added locally and remotely with same value (no drift expected)"
81+
echo
82+
old=$(cat <<'EOF'
83+
my_job:
84+
email_notifications:
85+
EOF
86+
)
87+
new=$(cat <<'EOF'
88+
my_job:
89+
description: A test job
90+
email_notifications:
91+
EOF
92+
)
93+
update_file.py databricks.yml "$old" "$new"
94+
read -r -d '' case6 <<'EOF' || true
95+
r["description"] = "A test job"
96+
EOF
97+
98+
title "Edit job remotely"
99+
echo
100+
edit_resource.py jobs $job_id <<EOF
101+
$case1
102+
103+
$case2
104+
105+
$case3
106+
107+
$case4
108+
109+
$case5
110+
111+
$case6
112+
EOF
113+
114+
title "Detect and save changes"
115+
echo
116+
cp databricks.yml databricks.yml.backup
117+
$CLI bundle config-remote-sync --save
118+
119+
title "Configuration changes"
120+
echo
121+
trace diff.py databricks.yml.backup databricks.yml
122+
rm databricks.yml.backup
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Cloud = true
2+
3+
RecordRequests = false
4+
5+
Ignore = [".databricks", "dummy.whl", "databricks.yml", "databricks.yml.backup"]
6+
7+
[Env]
8+
DATABRICKS_BUNDLE_ENABLE_EXPERIMENTAL_YAML_SYNC = "true"
9+
10+
[EnvMatrix]
11+
DATABRICKS_BUNDLE_ENGINE = ["direct"]
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Top-level comment about the bundle
2+
bundle:
3+
name: test-bundle-$UNIQUE_NAME
4+
5+
# Resources section with extra spacing
6+
resources:
7+
jobs:
8+
my_job:
9+
# Comment about max concurrent runs
10+
max_concurrent_runs: 1
11+
12+
# Task configuration
13+
tasks:
14+
- task_key: main
15+
notebook_task:
16+
notebook_path: /Users/{{workspace_user_name}}/notebook
17+
new_cluster:
18+
spark_version: $DEFAULT_SPARK_VERSION
19+
node_type_id: $NODE_TYPE_ID
20+
num_workers: 1 # inline comment about workers
21+
22+
# Tags for categorization
23+
tags:
24+
env: dev # environment tag
25+
team: data-eng

acceptance/bundle/config-remote-sync/formatting_preserved/out.test.toml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/test-bundle-[UNIQUE_NAME]/default/files...
2+
Deploying resources...
3+
Updating deployment state...
4+
Deployment complete!
5+
6+
=== Modify max_concurrent_runs from 1 to 5
7+
=== Detect and save changes
8+
Detected changes in 1 resource(s):
9+
10+
Resource: resources.jobs.my_job
11+
max_concurrent_runs: update
12+
13+
14+
=== Configuration changes
15+
16+
>>> diff.py databricks.yml.backup databricks.yml
17+
--- databricks.yml.backup
18+
+++ databricks.yml
19+
@@ -2,5 +2,4 @@
20+
bundle:
21+
name: test-bundle-[UNIQUE_NAME]
22+
-
23+
# Resources section with extra spacing
24+
resources:
25+
@@ -8,6 +7,5 @@
26+
my_job:
27+
# Comment about max concurrent runs
28+
- max_concurrent_runs: 1
29+
-
30+
+ max_concurrent_runs: 5
31+
# Task configuration
32+
tasks:
33+
@@ -19,5 +17,4 @@
34+
node_type_id: [NODE_TYPE_ID]
35+
num_workers: 1 # inline comment about workers
36+
-
37+
# Tags for categorization
38+
tags:
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#!/bin/bash
2+
3+
envsubst < databricks.yml.tmpl > databricks.yml
4+
5+
touch dummy.whl
6+
$CLI bundle deploy
7+
job_id="$(read_id.py my_job)"
8+
9+
title "Modify max_concurrent_runs from 1 to 5"
10+
edit_resource.py jobs $job_id <<EOF
11+
r["max_concurrent_runs"] = 5
12+
EOF
13+
14+
title "Detect and save changes"
15+
echo
16+
cp databricks.yml databricks.yml.backup
17+
$CLI bundle config-remote-sync --save
18+
19+
title "Configuration changes"
20+
echo
21+
trace diff.py databricks.yml.backup databricks.yml
22+
rm databricks.yml.backup

0 commit comments

Comments
 (0)