Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
d959f83
Add WAL for direct deployment state recovery
Jan 11, 2026
fe9c2cc
Updated tests and enhanced kill caller with an offset
Jan 12, 2026
3663508
Updated existing tests
Jan 23, 2026
79c630e
test fixes
Feb 2, 2026
045324f
Fixes
Feb 7, 2026
fc413f6
fixed tests
Feb 9, 2026
967a3a2
updated tests
varundeepsaini Mar 24, 2026
047c96c
dedup
varundeepsaini Mar 24, 2026
0abb2a8
Update WAL corrupted entry outputs
varundeepsaini Mar 26, 2026
64f8ac8
WIP
denik Mar 27, 2026
e701d91
Updated tests and enhanced kill caller with an offset
Jan 12, 2026
77c94f2
Updated existing tests
Jan 23, 2026
8f1f66c
Merge simplified WAL handling into state.go
denik Mar 27, 2026
feb74a7
fixes
denik Mar 27, 2026
2536569
fixes
denik Mar 27, 2026
04271ae
rm unnecessary assert
denik Mar 27, 2026
2668e55
Centralize state open/close lifecycle for direct engine
denik Mar 28, 2026
6663374
lint
denik Mar 28, 2026
b3f4c9a
fixes
denik Apr 29, 2026
616c6a4
lint
denik Apr 29, 2026
932dd16
restore test
denik Apr 30, 2026
0fa6503
Skip state file write when WAL has no resource entries
denik Apr 30, 2026
f53fa90
Revert per-engine test splits for no-resource deploys
denik Apr 30, 2026
c93e363
fmt
denik Apr 30, 2026
7837c72
Maintain stateIDs as single source of truth for resource IDs
denik Apr 30, 2026
8dba305
Remove defer Close from processBundleRetInternal; align with main app…
denik Apr 30, 2026
352b1a9
Rename Close to Finalize; make plan a local var in processBundleRetIn…
denik Apr 30, 2026
e9f921e
Restore process.go structure to match main more closely
denik Apr 30, 2026
0534025
Fix migration count, remove unnecessary defer Finalize, fix errcheck
denik Apr 30, 2026
62747df
Fix WAL validation: lowercase suffix, partial recovery, directory cre…
denik Apr 30, 2026
6ad895e
restore non-material changes: assertions and comment
denik Apr 30, 2026
0d50d50
deduplicate UpgradeToWrite+defer Finalize in Deploy
denik May 1, 2026
450ea4f
update out.test.toml
denik May 1, 2026
37465ef
fix compilation in configsync/variables.go
denik May 4, 2026
d253db6
use OpenWithData+UpgradeToWrite in migrate to avoid disk roundtrip
denik May 6, 2026
ea08040
use OpenWithData+UpgradeToWrite in uploadStateForYamlSync
denik May 7, 2026
857ddae
remove redundant defer Finalize in Deploy
denik May 10, 2026
e31dc25
move Finalize into destroyCore before files.Delete
denik May 10, 2026
5691aeb
remove noise comment from bundle_apply.go
denik May 10, 2026
cf6ec4e
fix gofumpt and test output
denik May 10, 2026
9a197ea
shrink chain-10-jobs to chain-3-jobs
denik May 10, 2026
b620a88
fix test names in state_test.go: Close -> Finalize, restore SaveFinalize
denik May 10, 2026
fd82708
clean up WAL acceptance tests
denik May 10, 2026
1913749
fix crash-after-create: handle Linux exit code 1 after KillCaller
denik May 10, 2026
b7fd655
update selftest
denik May 10, 2026
819e5ba
fix WAL acceptance test hygiene
denik May 11, 2026
5a55d3b
destroyCore: warn on Finalize failure instead of aborting
denik May 11, 2026
cedbf69
deployCore: use Finalize return value instead of re-opening state
denik May 11, 2026
3600791
statemgmt.Load: accept state directly instead of engine
denik May 11, 2026
cb79c68
fmt
denik May 11, 2026
c955dab
deployCore: move ParseResourcesState before PushResourcesState
denik May 11, 2026
1165051
simplify test
denik May 11, 2026
cb1eca9
update outputs
denik May 11, 2026
5434e40
fix Windows replacement for process kill during deployment
denik May 11, 2026
70f0021
formatting
denik May 11, 2026
bccddc4
clean up
denik May 11, 2026
0ea9229
rm unnecessarial SERIAL replacement
denik May 11, 2026
c712ab4
rm noop replacement for lineage
denik May 11, 2026
f76a342
clean up
denik May 11, 2026
56fd4a6
testserver: replace KillCaller config with HTTP kill API
denik May 11, 2026
db5c68d
remove blank line
denik May 11, 2026
6e285b4
wal tests: remove redundant server stubs covered by default handlers
denik May 11, 2026
a2973ff
wal tests: move test.toml comments to script, remove empty test.toml …
denik May 11, 2026
a8e6799
Add databricks.yml
May 11, 2026
afec54d
clean up
May 11, 2026
66c05fe
add replace_ids.py
May 11, 2026
b3b493d
clean up
May 11, 2026
fa57e20
test more commands for validation
May 11, 2026
871e2de
remove normal-deploy test
May 11, 2026
9627ae6
clean up
May 11, 2026
95bf98b
test recover in plan/deploy/summary
May 11, 2026
bd3b806
clean up
May 11, 2026
8c24d6f
add assert_*.py
May 11, 2026
c31fb7f
corrupted-wal-entry: use envsubst + template file for WAL generation
May 11, 2026
70a624f
kill_caller selftests: move test.toml comments to script, remove empt…
May 11, 2026
6261894
formatting
May 12, 2026
627bbeb
fix CI: commit missing test.tomls and fix assert_*.py permissions
May 12, 2026
3bec1fa
fix: use TOML basic strings with \n escapes in Repls to avoid CRLF on…
May 12, 2026
31d7134
refactor: merge duplicate IsDirect() blocks in dashboard.go
May 13, 2026
5df004f
refactor: merge duplicate IsDirect() blocks in deploy.go
May 13, 2026
89adf92
restore comment: Finalize is called even on Apply failure to save par…
May 13, 2026
fa54fa8
update NEXT_CHANGELOG.md
May 13, 2026
6baf10b
rm databricksyyml
May 13, 2026
1f149e0
add a warning on Close error
May 18, 2026
3dbc7ea
comment fix
May 18, 2026
6dacaa9
update max entry to 10MB
May 18, 2026
5ae2920
update tests after rebase
May 18, 2026
796d00d
clean up NEXT_CHANGELOG
May 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEXT_CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,6 @@

### Bundles
* Make sure warnings asking for approval are understood by agents ([#5239](https://github.com/databricks/cli/pull/5239))
* engine/direct: Changes to state file now persisted to .wal file right away instead of being saved in the end ([#5149](https://github.com/databricks/cli/pull/5149))

### Dependency updates
12 changes: 12 additions & 0 deletions acceptance/bin/assert_exists.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/usr/bin/env python3
import os, sys

errors = 0

for filename in sys.argv[1:]:
if not os.path.exists(filename):
sys.stderr.write(f"Unexpected: {filename} does not exist.\n")
errors += 1

if errors:
sys.exit(1)
12 changes: 12 additions & 0 deletions acceptance/bin/assert_not_exists.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/usr/bin/env python3
import os, sys

errors = 0

for filename in sys.argv[1:]:
if os.path.exists(filename):
sys.stderr.write(f"Unexpected: {filename} exists.\n")
errors += 1

if errors:
sys.exit(1)
39 changes: 39 additions & 0 deletions acceptance/bin/kill_after.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env python3
"""Set up a kill rule on the testserver for the current test token.

Usage: kill_after.py PATTERN OFFSET TIMES

PATTERN HTTP method and path, e.g. "POST /api/2.2/jobs/create"
OFFSET number of requests to let through before killing starts
TIMES number of times to kill the caller

The rule is scoped to the current DATABRICKS_TOKEN so it only affects
the test that registers it, even when tests share a server.
"""

import json
import os
import sys
import urllib.request

host = os.environ.get("DATABRICKS_HOST", "")
token = os.environ.get("DATABRICKS_TOKEN", "")

if not host:
print("DATABRICKS_HOST not set", file=sys.stderr)
sys.exit(1)

if len(sys.argv) != 4:
print(f"usage: {sys.argv[0]} PATTERN OFFSET TIMES", file=sys.stderr)
sys.exit(1)

pattern, offset, times = sys.argv[1], int(sys.argv[2]), int(sys.argv[3])

data = json.dumps({"pattern": pattern, "offset": offset, "times": times}).encode()
req = urllib.request.Request(
f"{host}/__testserver/kill",
data=data,
headers={"Content-Type": "application/json", "Authorization": f"Bearer {token}"},
method="POST",
)
urllib.request.urlopen(req)
37 changes: 37 additions & 0 deletions acceptance/bundle/deploy/wal/chain-3-jobs/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
bundle:
name: wal-chain-test

resources:
jobs:
# Linear chain: job_01 -> job_02 -> job_03
# Execution order: job_01 first, job_03 last
job_01:
name: "job-01"
description: "first in chain"
tasks:
- task_key: "task"
spark_python_task:
python_file: ./test.py
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
job_02:
name: "job-02"
description: "depends on ${resources.jobs.job_01.id}"
tasks:
- task_key: "task"
spark_python_task:
python_file: ./test.py
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
job_03:
name: "job-03"
description: "depends on ${resources.jobs.job_02.id}"
tasks:
- task_key: "task"
spark_python_task:
python_file: ./test.py
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
3 changes: 3 additions & 0 deletions acceptance/bundle/deploy/wal/chain-3-jobs/out.test.toml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

110 changes: 110 additions & 0 deletions acceptance/bundle/deploy/wal/chain-3-jobs/output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
=== First deploy (crashes on job_03) ===

>>> errcode [CLI] bundle deploy
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/wal-chain-test/default/files...
Deploying resources...
[PROCESS_KILLED]

Exit code: [KILLED]

=== WAL content after crash ===
{
"cli_version": "[DEV_VERSION]",
"lineage": "[UUID]",
"serial": 1,
"state_version": 2
}
{
"k": "resources.jobs.job_01",
"v": {
"__id__": "[JOB_01_ID]",
"state": {
"deployment": {
"kind": "BUNDLE",
"metadata_file_path": "/Workspace/Users/[USERNAME]/.bundle/wal-chain-test/default/state/metadata.json"
},
"description": "first in chain",
"edit_mode": "UI_LOCKED",
"format": "MULTI_TASK",
"max_concurrent_runs": 1,
"name": "job-01",
"queue": {
"enabled": true
},
"tasks": [
{
"new_cluster": {
"node_type_id": "[NODE_TYPE_ID]",
"spark_version": "15.4.x-scala2.12"
},
"spark_python_task": {
"python_file": "/Workspace/Users/[USERNAME]/.bundle/wal-chain-test/default/files/test.py"
},
"task_key": "task"
}
]
}
}
}
{
"k": "resources.jobs.job_02",
"v": {
"__id__": "[JOB_02_ID]",
"depends_on": [
{
"label": "${resources.jobs.job_01.id}",
"node": "resources.jobs.job_01"
}
],
"state": {
"deployment": {
"kind": "BUNDLE",
"metadata_file_path": "/Workspace/Users/[USERNAME]/.bundle/wal-chain-test/default/state/metadata.json"
},
"description": "depends on [JOB_01_ID]",
"edit_mode": "UI_LOCKED",
"format": "MULTI_TASK",
"max_concurrent_runs": 1,
"name": "job-02",
"queue": {
"enabled": true
},
"tasks": [
{
"new_cluster": {
"node_type_id": "[NODE_TYPE_ID]",
"spark_version": "15.4.x-scala2.12"
},
"spark_python_task": {
"python_file": "/Workspace/Users/[USERNAME]/.bundle/wal-chain-test/default/files/test.py"
},
"task_key": "task"
}
]
}
}
}

=== Number of jobs saved in WAL ===
2

=== Bundle summary (reads from WAL) ===
Name: wal-chain-test
Target: default
Workspace:
User: [USERNAME]
Path: /Workspace/Users/[USERNAME]/.bundle/wal-chain-test/default
Resources:
Jobs:
job_01:
Name: job-01
URL: [DATABRICKS_URL]/jobs/[JOB_01_ID]?o=[NUMID]
job_02:
Name: job-02
URL: [DATABRICKS_URL]/jobs/[JOB_02_ID]?o=[NUMID]
job_03:
Name: job-03
URL: (not deployed)

=== WAL after successful deploy ===
WAL deleted (expected)
24 changes: 24 additions & 0 deletions acceptance/bundle/deploy/wal/chain-3-jobs/script
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Linear chain: job_01 -> job_02 -> job_03
# Let first 2 jobs/create succeed, then kill on the 3rd
kill_after.py "POST /api/2.2/jobs/create" 2 1

echo "=== First deploy (crashes on job_03) ==="
trace errcode $CLI bundle deploy

echo ""
echo "=== WAL content after crash ==="
jq -S . .databricks/bundle/default/resources.json.wal 2>/dev/null || echo "No WAL file"

echo ""
echo "=== Number of jobs saved in WAL ==="
grep -c '"k":"resources.jobs' .databricks/bundle/default/resources.json.wal 2>/dev/null || echo "0"

echo ""
echo "=== Bundle summary (reads from WAL) ==="
$CLI bundle summary

echo ""
echo "=== WAL after successful deploy ==="
cat .databricks/bundle/default/resources.json.wal 2>/dev/null || echo "WAL deleted (expected)"

replace_ids.py
1 change: 1 addition & 0 deletions acceptance/bundle/deploy/wal/chain-3-jobs/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
print("test")
23 changes: 23 additions & 0 deletions acceptance/bundle/deploy/wal/corrupted-wal-entry/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
bundle:
name: wal-corrupted-test

resources:
jobs:
valid_job:
name: "valid-job"
tasks:
- task_key: "task-a"
spark_python_task:
python_file: ./test.py
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
another_valid:
name: "another-valid"
tasks:
- task_key: "task-b"
spark_python_task:
python_file: ./test.py
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

34 changes: 34 additions & 0 deletions acceptance/bundle/deploy/wal/corrupted-wal-entry/output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@

>>> cat .databricks/bundle/default/resources.json.wal
{"lineage":"test-lineage-123","serial":6}
{"k":"resources.jobs.valid_job","v":{"__id__":"","state":{"name":"valid-job"}}}
{"k":"resources.jobs.another_valid","v":{"__id__":"","state":{"name":"another-valid"}}}
{"k":"resources.jobs.partial_write","v":{"__id__":"33","state":{"name":"partial-

>>> [CLI] bundle deploy
Warn: Skipping corrupted WAL entry at [TEST_TMP_DIR]/.databricks/bundle/default/resources.json.wal:4: unexpected end of JSON input
Warn: Saved 1 corrupted WAL entries to [TEST_TMP_DIR]/.databricks/bundle/default/resources.json.wal.corrupted
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/wal-corrupted-test/default/files...
Deploying resources...
Updating deployment state...
Deployment complete!

>>> [CLI] bundle summary
Name: wal-corrupted-test
Target: default
Workspace:
User: [USERNAME]
Path: /Workspace/Users/[USERNAME]/.bundle/wal-corrupted-test/default
Resources:
Jobs:
another_valid:
Name: another-valid
URL: [DATABRICKS_URL]/jobs/[NUMID]?o=[NUMID]
valid_job:
Name: valid-job
URL: [DATABRICKS_URL]/jobs/[NUMID]?o=[NUMID]

>>> cat .databricks/bundle/default/resources.json.wal.corrupted
{"k":"resources.jobs.partial_write","v":{"__id__":"33","state":{"name":"partial-
=== WAL after successful deploy ===
WAL deleted (expected)
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"state_version": 1,
"cli_version": "0.0.0",
"lineage": "test-lineage-123",
"serial": 5,
"state": {}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"lineage":"test-lineage-123","serial":6}
{"k":"resources.jobs.valid_job","v":{"__id__":"$JOB1","state":{"name":"valid-job"}}}
{"k":"resources.jobs.another_valid","v":{"__id__":"$JOB2","state":{"name":"another-valid"}}}
{"k":"resources.jobs.partial_write","v":{"__id__":"33","state":{"name":"partial-
22 changes: 22 additions & 0 deletions acceptance/bundle/deploy/wal/corrupted-wal-entry/script
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Create pre-existing jobs in the testserver so WAL recovery triggers DoUpdate (reset) instead of DoCreate
JOB1=$($CLI jobs create --json '{"name":"valid-job"}' | jq -r '.job_id')
JOB2=$($CLI jobs create --json '{"name":"another-valid"}' | jq -r '.job_id')
echo "$JOB1:JOB1_ID" >> ACC_REPLS
echo "$JOB2:JOB2_ID" >> ACC_REPLS

mkdir -p .databricks/bundle/default
cp resources.json .databricks/bundle/default/

envsubst < resources.json.wal.tmpl > .databricks/bundle/default/resources.json.wal

trace cat .databricks/bundle/default/resources.json.wal
trace $CLI bundle deploy
trace $CLI bundle summary
trace cat .databricks/bundle/default/resources.json.wal.corrupted

printf "\n=== WAL after successful deploy ===\n"
if [ -f ".databricks/bundle/default/resources.json.wal" ]; then
echo "WAL exists (unexpected)"
else
echo "WAL deleted (expected)"
fi
1 change: 1 addition & 0 deletions acceptance/bundle/deploy/wal/corrupted-wal-entry/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
print("test")
25 changes: 25 additions & 0 deletions acceptance/bundle/deploy/wal/crash-after-create/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
bundle:
name: wal-crash-test

resources:
jobs:
job_a:
name: "test-job-a"
description: "first job"
tasks:
- task_key: "task-a"
spark_python_task:
python_file: ./test.py
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
job_b:
name: "test-job-b"
description: "depends on ${resources.jobs.job_a.id}"
tasks:
- task_key: "task-b"
spark_python_task:
python_file: ./test.py
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
4 changes: 4 additions & 0 deletions acceptance/bundle/deploy/wal/crash-after-create/out.test.toml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading