Skip to content

Add WAL for direct deployment state recovery#4106

Open
varundeepsaini wants to merge 9 commits intodatabricks:mainfrom
varundeepsaini:feature/deploy-append-log
Open

Add WAL for direct deployment state recovery#4106
varundeepsaini wants to merge 9 commits intodatabricks:mainfrom
varundeepsaini:feature/deploy-append-log

Conversation

@varundeepsaini
Copy link
Copy Markdown
Contributor

@varundeepsaini varundeepsaini commented Dec 6, 2025

Closes: #4090

Changes

Add write-ahead log (WAL) to record state changes during direct deployment, enabling recovery of partial state if deployment is interrupted.

Added an Offset To KillCaller. Now it starts killing the process after Offset Successful requests to the endpoint

Why

Today, if deployment is interrupted before Finalize(), no state is saved, and created resources become orphaned. The WAL writes each state change immediately to disk and replays them on restart.

Tests

Tests added for WAL save/replay, delete/replay, finalize cleanup, and edge cases.

@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch 2 times, most recently from b51a199 to 7f67cb2 Compare December 6, 2025 19:08
@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch from 7f67cb2 to baf371e Compare January 11, 2026 20:01
@varundeepsaini varundeepsaini marked this pull request as draft January 11, 2026 20:50
@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch from 86b90ce to 98a1893 Compare January 15, 2026 16:39
@varundeepsaini varundeepsaini marked this pull request as ready for review January 15, 2026 16:39
@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch from 98a1893 to 5cb5da4 Compare January 15, 2026 16:40
@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik @andrewnester the pr is ready for review

@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik @andrewnester bumping again ^^

@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch from 5cb5da4 to 0e7c9fa Compare January 20, 2026 14:11
@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik fixed the build failures, could you approve the workflows

@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch from 1b3af30 to 34e0f37 Compare January 23, 2026 19:22
@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik could you approve the workflow

@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@andrewnester @denik bump ^^

@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch 2 times, most recently from b6f651d to f509784 Compare February 2, 2026 14:37
Copy link
Copy Markdown
Contributor

@denik denik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - looks good. Left a few comments.

@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik could you approve the workflows

@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Feb 9, 2026

Commit: 6b1c292

Run: 23617448383

Env 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 7 10 270 817 7:25
💚​ aws windows 7 10 272 815 5:26
💚​ aws-ucws linux 7 10 366 733 7:28
💚​ aws-ucws windows 7 10 368 731 4:24
💚​ azure linux 1 12 273 815 8:09
💚​ azure windows 1 12 275 813 6:28
💚​ azure-ucws linux 1 12 371 729 9:44
💚​ azure-ucws windows 1 12 373 727 7:16
💚​ gcp linux 1 12 269 818 9:08
💚​ gcp windows 1 12 271 816 7:31
17 interesting tests: 10 SKIP, 7 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
💚​ TestAccept 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 24 slowest tests (at least 2 minutes):
duration env testname
5:55 gcp linux TestSecretsPutSecretStringValue
5:52 gcp windows TestSecretsPutSecretStringValue
4:53 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:24 azure linux TestSecretsPutSecretStringValue
4:15 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:11 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:50 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:46 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:17 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:12 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:08 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:55 azure windows TestSecretsPutSecretStringValue
2:48 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:47 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:44 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:44 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:42 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:41 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:40 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:40 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:38 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:34 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:34 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:10 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

@varundeepsaini varundeepsaini requested a review from denik February 11, 2026 08:06
@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch from 1485ae4 to 1154360 Compare February 11, 2026 16:50
@varundeepsaini
Copy link
Copy Markdown
Contributor Author

tests finally ran
lets goo 🚀 🫡 !!!

@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik bump ^^

1 similar comment
@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik bump ^^

@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch from 1154360 to 552e3e5 Compare March 24, 2026 18:58
@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch 4 times, most recently from f34406a to 63b765b Compare March 24, 2026 19:50
@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik could you please run the ci
Thanksss !!

@varundeepsaini varundeepsaini requested a review from denik March 26, 2026 10:07
@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch from 63b765b to 79e3593 Compare March 26, 2026 10:07
Varun Deep Saini and others added 8 commits March 26, 2026 20:36
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <deepsainivarun@gmail.com>
Signed-off-by: Varun Deep Saini <deepsainivarun@gmail.com>
@varundeepsaini varundeepsaini force-pushed the feature/deploy-append-log branch from 79e3593 to 6e12ee1 Compare March 26, 2026 15:06
@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik i looked at the test failures, rebasing should solve them, have done that.
could you run the tests again ?

@github-actions
Copy link
Copy Markdown

An authorized user can trigger integration tests manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 4106
  • Commit SHA: 6b1c29263f1c5f3cbc87635342fc57f74a6f9995

Checks will be approved automatically on success.

@varundeepsaini
Copy link
Copy Markdown
Contributor Author

@denik
could you run the ci again. Thanks

if len(plan.Plan) == 0 {
// Avoid creating state file if nothing to deploy
if b.StateDB.RecoveredFromWAL() {
if err := b.StateDB.Finalize(); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to call Finalize() here? We already called DeploymentState.Open() which already saved the state, right?

]

[[Repls]]
Old = 'Updating deployment state...\n'
Copy link
Copy Markdown
Contributor

@denik denik Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said we should not have this difference I meant it should not be printed in the first place, not that we should hide it :)

@denik
Copy link
Copy Markdown
Contributor

denik commented Mar 27, 2026

Hi @varundeepsaini thanks a lot, this is great work. There are additional considerations, e.g. recently added state migration. I'm going to take over this PR as I'd like to have WAL functionality in.

@varundeepsaini
Copy link
Copy Markdown
Contributor Author

Sure @denik
Thanks

@simonfaltum simonfaltum removed their request for review April 8, 2026 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

State file append log, avoid orphaned resources on interrupted deploy

3 participants