Skip to content

Commit 665c353

Browse files
Parallelize file uploads in fs cp command. (#4132)
## What changes are proposed in this pull request? This PR improves the performance of the `databricks fs cp` command when copying directories by parallelizing file uploads. The command uses 8 concurrent workers by default but the number can be controlled via `--concurrency`. Implementation details: - **No ordering guarantee:** Files are now copied in parallel with no guaranteed order (previously sequential). - **Fail-fast on errors:** If any file copy fails, the context is cancelled and remaining operations are stopped (first error is returned). - **Retry responsibility:** The implementation does not retry failed operations; this remains the responsibility of the underlying `Filer` implementation as before. **Why `--concurrency`?** No strong preference here, it does not seem that there is a pattern in the CLI to control concurrency in other places. This is the flag name used in most Go tools but I'm happy to use something else. ## How is this tested? Added acceptance tests to exercise most code paths + unit tests to validate that the context cancellation and propagation works properly. --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 2e6dca4 commit 665c353

File tree

26 files changed

+530
-39
lines changed

26 files changed

+530
-39
lines changed

NEXT_CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@
66

77
### CLI
88

9+
* Improve performance of `databricks fs cp` command by parallelizing file uploads when
10+
copying directories with the `--recursive` flag.
11+
912
### Bundles
1013

1114
### Dependency updates
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
file1 content
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
file2 content

acceptance/cmd/fs/cp/dir-to-dir/out.test.toml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
2+
>>> [CLI] fs cp -r localdir dbfs:/Volumes/main/default/fs-cp-test-[UNIQUE_NAME]/uploaded-dir
3+
localdir/file1.txt -> dbfs:/Volumes/main/default/fs-cp-test-[UNIQUE_NAME]/uploaded-dir/file1.txt
4+
localdir/file2.txt -> dbfs:/Volumes/main/default/fs-cp-test-[UNIQUE_NAME]/uploaded-dir/file2.txt
5+
6+
>>> [CLI] fs cat dbfs:/Volumes/main/default/fs-cp-test-[UNIQUE_NAME]/uploaded-dir/file1.txt
7+
file1 content
8+
9+
>>> [CLI] fs cat dbfs:/Volumes/main/default/fs-cp-test-[UNIQUE_NAME]/uploaded-dir/file2.txt
10+
file2 content
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
CATALOG_NAME="main"
2+
SCHEMA_NAME="default"
3+
VOLUME_NAME="fs-cp-test-${UNIQUE_NAME}"
4+
5+
cleanup() {
6+
$CLI volumes delete "${CATALOG_NAME}.${SCHEMA_NAME}.${VOLUME_NAME}" 2>/dev/null || true
7+
}
8+
trap cleanup EXIT
9+
10+
# Create volume for testing.
11+
$CLI volumes create "${CATALOG_NAME}" "${SCHEMA_NAME}" "${VOLUME_NAME}" MANAGED >/dev/null
12+
13+
# Create parent directory.
14+
$CLI fs mkdir dbfs:/Volumes/${CATALOG_NAME}/${SCHEMA_NAME}/${VOLUME_NAME}
15+
16+
# Recursive directory copy (output sorted for deterministic ordering).
17+
trace $CLI fs cp -r localdir dbfs:/Volumes/${CATALOG_NAME}/${SCHEMA_NAME}/${VOLUME_NAME}/uploaded-dir 2>&1 | sort
18+
19+
# Verify files were uploaded correctly.
20+
trace $CLI fs cat dbfs:/Volumes/${CATALOG_NAME}/${SCHEMA_NAME}/${VOLUME_NAME}/uploaded-dir/file1.txt
21+
trace $CLI fs cat dbfs:/Volumes/${CATALOG_NAME}/${SCHEMA_NAME}/${VOLUME_NAME}/uploaded-dir/file2.txt
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Local = true
2+
Cloud = true
3+
RequiresUnityCatalog = true
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
hello world!

acceptance/cmd/fs/cp/file-to-dir/out.test.toml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
2+
>>> [CLI] fs cp local.txt dbfs:/Volumes/main/default/fs-cp-test-[UNIQUE_NAME]/mydir/
3+
local.txt -> dbfs:/Volumes/main/default/fs-cp-test-[UNIQUE_NAME]/mydir/local.txt
4+
5+
>>> [CLI] fs cat dbfs:/Volumes/main/default/fs-cp-test-[UNIQUE_NAME]/mydir/local.txt
6+
hello world!

0 commit comments

Comments
 (0)