Skip to content

Conversation

@bmiddha
Copy link
Member

@bmiddha bmiddha commented Sep 16, 2025

Summary

zipsync is a tool to pack and unpack zip archives. It is designed as a single-purpose tool to pack and unpack build cache entries.

Details

Unpack

load archive -> parse central dir -> read metadata
scan filesystem & delete extraneous entries
for each entry (except metadata):
  if unchanged (sha1 matches) => skip
  else extract (decompress if needed)

Pack

for each file F
  write LocalFileHeader(F)
  stream chunks:
    read -> hash + crc + maybe compress -> write
  finalize compressor
  write DataDescriptor(F)
add metadata entry (same pattern)
write central directory records

Supported compression types are store (no compression), deflate (level 9), auto (switches between store/deflate based on file extension).

Constraints

Though archives created by zipsync can be used by other zip compatible programs, the opposite is not the case. zipsync only implements a subset of zip features to achieve greater performance.

What's wrong with the current setup?

The current setup cleans target directories when unpacking; then the build cache entry is unpacked. This setup ends up deleting and rewriting a lot of the same files.

Pros

With tar + gzip files are archived first and compressed second. This allows the compression to work across file boundaries. Duplicate content across files can be efficiently compressed.

Cons

Since compression is the last step, uncompressing the archive is required to inspect it. To enumerate the archive, it must be uncompressed first.
It does not clean the target directory so a rm -rf step is required.

Requirements

zipsync was created with the following constraints in mind

Optimize for partial unpack scenario

Optimize for unpack performance. Most of the build cached files already exist on disk and there is a good chance for them to be already in the expected state.

Only write files when needed

This will minimize the number of write syscalls. Also, if the kernel has already cached the file from a recent read, the cache remains intact if we don't needlessly delete and rewrite the file.

Clean extra files and directories

This will remove the need to run rm -rf on the target directories. More time saved

Disallow symlinks

Symlinks in build cache entries are not supported. This will remove the need to scan the target directories for symlinks before running tar.

Why zip

zip was picked because:

  • It is a well understood format. This will keep malware scanning happy.
  • Easy to inspect build cache entries. The built-in os zip tools can be used to browse or extract files.
  • The files are compressed first then added to the archive. This allows us to inspect the archive contents without paying the cost to decompress its entire contents allowing efficient unpacking.

How it was tested

  • zipsync tests
  • rebuild then build with node apps/rush/lib/start-dev.js --debug build --verbose -t module-minifier
==[ @rushstack/heft-typescript-plugin (build) ]==================[ 19 of 25 ]==

Build cache hit.
Cache key: rushstack+heft-typescript-plugin-_phase_build-1ce783d05b5d5382cf8f9652ff0f14416482830c
Using zipsync to restore cached folders.
Restored 55 files from cache.
Skipped 55 files that were already up to date.
Successfully restored output from the build cache.
Invoking: heft run --only build -- --clean
 ---- build started ----
[build:clean] Deleted 0 files and 3 folders
[build:typescript] Using TypeScript version 5.8.2
[build:typescript] Copied 2 files and linked 0 files
[build:lint] Using ESLint version 9.25.1
[build:api-extractor] Using API Extractor version 7.52.9
[build:api-extractor] Analysis will use the bundled TypeScript version 5.8.2
 ---- build finished (3.138s) ----
-------------------- Finished (3.142s) --------------------
pnpm-sync: Starting operation for /Users/bharatmiddha/code/rushstack-zipsync/heft-plugins/heft-jest-plugin/node_modules/.pnpm-sync.json
pnpm-sync: Synced 54 files in 8 ms
"@rushstack/heft-typescript-plugin (build)" was restored from the build cache.

Benchmark Results

This document contains performance measurements for packing and unpacking a synthetic dataset using tar, zip, and zipsync.

The dataset consists of two directory trees (subdir1, subdir2) populated with 1000 text files each.

zipsync scenarios

  • "all-existing": unpack directory is fully populated with existing files
  • "none-existing": unpack directory is empty
  • "partial-existing": unpack directory contains half of the files

zip and tar scenarios clean the unpack directory before unpacking. This time is included in the measurements because
zipsync internally handles cleaning as part of its operation.

System

OS Arch Node CPU Logical Cores Memory
linux 6.8.0-1030-azure x64 v22.16.0 AMD EPYC 7763 64-Core Processor 16 62.8 GB

Iterations: 100

Compressed (baseline: tar-gz)

Unpack Phase
Archive min (ms) mean (ms) p95 (ms) max (ms) std (ms) speed (x)
tar-gz 266.30 270.50 274.37 280.90 2.55 1.00x
zip-deflate 399.53 406.79 419.26 446.22 7.23 0.66x
zipsync-zstd-all-existing 109.66 110.27 111.22 112.41 0.53 2.45x
zipsync-zstd-none-existing 103.49 107.21 106.00 400.57 29.49 2.52x
zipsync-zstd-partial-existing 106.37 108.72 108.71 248.10 14.03 2.49x
zipsync-deflate-all-existing 109.50 111.76 113.31 158.65 5.26 2.42x
zipsync-deflate-none-existing 103.94 107.07 106.68 308.18 20.29 2.53x
zipsync-deflate-partial-existing 106.98 109.24 109.68 203.16 9.52 2.48x
zipsync-auto-all-existing 109.40 110.70 111.83 115.55 0.82 2.44x
zipsync-auto-none-existing 103.69 106.55 106.03 280.99 17.60 2.54x
zipsync-auto-partial-existing 107.24 109.23 109.68 200.99 9.27 2.48x
Pack Phase
Archive min (ms) mean (ms) p95 (ms) max (ms) std (ms) speed (x) size
tar-gz 356.77 362.24 369.07 374.05 3.34 1.00x 184 KB
zip-deflate 335.16 338.37 343.46 361.64 3.78 1.07x 553 KB
zipsync-zstd-all-existing 308.51 357.24 571.98 1203.73 129.28 1.01x 411 KB
zipsync-zstd-none-existing 307.43 323.88 341.90 571.49 43.61 1.12x 411 KB
zipsync-zstd-partial-existing 308.10 327.44 343.94 572.20 46.03 1.11x 411 KB
zipsync-deflate-all-existing 383.12 397.67 409.68 418.62 7.62 0.91x 535 KB
zipsync-deflate-none-existing 375.56 386.97 395.66 421.82 6.79 0.94x 535 KB
zipsync-deflate-partial-existing 374.72 384.00 396.75 403.25 6.31 0.94x 535 KB
zipsync-auto-all-existing 375.61 386.74 397.81 418.54 7.20 0.94x 535 KB
zipsync-auto-none-existing 377.37 391.10 401.92 424.10 6.81 0.93x 535 KB
zipsync-auto-partial-existing 378.48 388.57 396.22 399.01 6.12 0.93x 535 KB

Uncompressed (baseline: tar)

Unpack Phase
Archive min (ms) mean (ms) p95 (ms) max (ms) std (ms) speed (x)
tar 187.76 194.85 199.68 204.63 2.90 1.00x
zip-store 466.15 472.24 481.09 523.13 6.49 0.41x
zipsync-store-all-existing 135.16 138.58 141.28 156.08 2.36 1.41x
zipsync-store-none-existing 131.30 134.84 134.81 249.28 12.26 1.45x
zipsync-store-partial-existing 134.01 137.91 141.63 191.66 6.02 1.41x
Pack Phase
Archive min (ms) mean (ms) p95 (ms) max (ms) std (ms) speed (x) size
tar 71.30 73.12 74.24 76.60 0.91 1.00x 54.5 MB
zip-store 258.91 261.80 268.43 275.49 2.86 0.28x 53.5 MB
zipsync-store-all-existing 197.01 200.60 203.57 245.36 5.18 0.36x 53.6 MB
zipsync-store-none-existing 197.13 199.42 201.82 207.10 1.75 0.37x 53.6 MB
zipsync-store-partial-existing 197.79 200.58 206.07 212.59 2.67 0.36x 53.6 MB

@github-project-automation github-project-automation bot moved this to Needs triage in Bug Triage Sep 16, 2025
@bmiddha bmiddha force-pushed the bmiddha/zipsync-3 branch 5 times, most recently from 620afe6 to 3e3a75d Compare September 16, 2025 20:25
@bmiddha bmiddha changed the title [OperationBuildCache] Add new build cache engine - zipsync [rush] Add new build cache engine - zipsync Sep 16, 2025
@bmiddha bmiddha force-pushed the bmiddha/zipsync-3 branch 2 times, most recently from 2636c4a to de527a2 Compare September 17, 2025 19:03
@bmiddha
Copy link
Member Author

bmiddha commented Sep 26, 2025

I've removed the integration into the build cache. We would need to re-design some things to use a worker pool. Using zipsync without a worker pool will end up being slower than tar+gzip. This is because of the overhead of booting up a worker and the node require calls.

@bmiddha bmiddha enabled auto-merge (squash) September 26, 2025 23:22
@bmiddha bmiddha disabled auto-merge September 26, 2025 23:24
@bmiddha bmiddha changed the title [rush] Add new build cache engine - zipsync [zipsync] Add new tool to efficiently pack and unpack cache entries Sep 26, 2025
@bmiddha bmiddha enabled auto-merge (squash) September 26, 2025 23:25
@bmiddha bmiddha disabled auto-merge September 26, 2025 23:26
Copy link
Contributor

@dmichon-msft dmichon-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to see more JSDoc and code comments around the binary-heavy parts, especially.

Copy link
Contributor

@dmichon-msft dmichon-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor notes left.

@bmiddha bmiddha merged commit 674aa9d into main Sep 27, 2025
8 checks passed
@bmiddha bmiddha deleted the bmiddha/zipsync-3 branch September 27, 2025 01:18
@github-project-automation github-project-automation bot moved this from Needs triage to Closed in Bug Triage Sep 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Closed

Development

Successfully merging this pull request may close these issues.

3 participants