-
Notifications
You must be signed in to change notification settings - Fork 105
feat(replicache): Add bulk insertion optimization with putMany #5380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
arv
wants to merge
3
commits into
main
Choose a base branch
from
arv/basic-repl-btree-opt
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add efficient bulk insertion methods (putMany) to BTree and database layers, significantly improving performance for large batch operations like sync patches. Core Changes - Add putMany() to BTreeWrite with fast path for empty trees and slow path for merging - Add putMany() to DataNodeImpl and InternalNodeImpl for node-level bulk operations - Add Write.putMany() in database layer with index update support - Add optimizePatch() to eliminate redundant operations in sync patches - Extract binarySearchFrom() to enable optimized searching from start index Performance Benchmark results (putMany vs sequential put): - **100 entries (small values)**: 3.36x faster - **1,000 entries (small values)**: 5.30x faster - **10,000 entries (small values)**: 4.15x faster - **Construction only (10,000 entries)**: 53.73x faster - **Update operations (1,000 entries)**: 4.47x faster Additional benefits: - Reduces chunk writes through optimal tree construction - Minimizes redundant operations through patch optimization Testing - Add comprehensive test suite covering bulk operations, rebalancing, and edge cases - Add performance benchmarks comparing sequential put() vs putMany() - Add patch optimization tests with 24 scenarios Implementation Details Fast path (empty tree): - Builds tree bottom-up using optimal partitioning - Constructs ideal tree structure in single pass - Reuses arrays to minimize allocations Slow path (existing tree): - Groups entries by affected child nodes - Performs batch rebalancing per group - Uses restricted binary search for sorted input Patch optimization: - Drops operations before last clear - Merges consecutive operations on same key - Removes pointless deletes after clear - Sorts operations for optimal bulk loading No breaking changes. Additive optimization compatible with V6 and V7 formats.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
| Branch | arv/basic-repl-btree-opt |
| Testbed | self-hosted |
Click to view all benchmark results
| Benchmark | Throughput | Benchmark Result operations / second (ops/s) (Result Δ%) | Lower Boundary operations / second (ops/s) (Limit %) |
|---|---|---|---|
| src/client/custom.bench.ts > big schema | 📈 view plot 🚷 view threshold | 132,508.00 ops/s(-75.42%)Baseline: 538,987.49 ops/s | -342,400.26 ops/s (-258.40%) |
| src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers) | 📈 view plot 🚷 view threshold | 2,433.03 ops/s(-4.59%)Baseline: 2,550.10 ops/s | 2,251.90 ops/s (92.56%) |
| src/client/zero.bench.ts > pk compare > pk = N | 📈 view plot 🚷 view threshold | 60,904.00 ops/s(-6.65%)Baseline: 65,239.87 ops/s | 58,983.27 ops/s (96.85%) |
| src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers) | 📈 view plot 🚷 view threshold | 3,672.00 ops/s(-5.80%)Baseline: 3,898.09 ops/s | 3,530.74 ops/s (96.15%) |
|
| Branch | arv/basic-repl-btree-opt |
| Testbed | Linux |
Click to view all benchmark results
| Benchmark | File Size | Benchmark Result kilobytes (KB) (Result Δ%) | Upper Boundary kilobytes (KB) (Limit %) |
|---|---|---|---|
| zero-package.tgz | 📈 view plot 🚷 view threshold | 1,779.16 KB(+0.21%)Baseline: 1,775.44 KB | 1,810.95 KB (98.24%) |
| zero.js | 📈 view plot 🚷 view threshold | 243.81 KB(+0.75%)Baseline: 242.00 KB | 246.84 KB (98.77%) |
| zero.js.br | 📈 view plot 🚷 view threshold | 66.90 KB(+0.68%)Baseline: 66.45 KB | 67.78 KB (98.71%) |
|
We've now been using this build of Replicache in our expo react native mobile app using the sqlite kvStore against op-sqlite@15.2 for quite a while and are very happy with it. It is saving us over half of our initial sync snapshot time (download complete to ready). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat(replicache): Add bulk insertion optimization with putMany
Overview
This PR adds bulk insertion optimization to Replicache's BTree and database layer, significantly improving performance for large batch operations like sync patches.
Changes
Core BTree Changes
packages/replicache/src/btree/node.tsputMany()method toDataNodeImplfor efficient merging of sorted entriesputMany()method toInternalNodeImplwith child grouping and rebalancingputManyMergeAndPartition()helper for node rebalancing during bulk operationsbinarySearchFrom()to enable optimized searching from a start indexreadTreeData()to acceptgetEntrySizeparameter (test helper improvement)packages/replicache/src/btree/write.tsBTreeWrite.putMany()method with two optimized paths:Database Layer
packages/replicache/src/db/write.tsWrite.putMany()method that delegates toBTreeWrite.putMany()put()semanticsSync Layer Optimization
packages/replicache/src/sync/patch.tsoptimizePatch()function to eliminate redundant operations:cleardeloperations afterclearapply()to use optimized patches with bulk loadingmergeUpdate()helper for update operation handlingbulkLoadPuts()to handle consecutive put operations efficientlyPerformance Impact
The optimization targets common sync patterns:
Benchmark Results
Comparison of
putMany()vs sequentialput()operations:Key findings:
Additional benefits:
Testing
New test files:
packages/replicache/src/btree/write.bench.ts- Performance benchmarks comparing sequential put() vs putMany()packages/replicache/src/btree/node.test.ts- 17 new tests for putMany() behaviorpackages/replicache/src/db/write.test.ts- 3 new tests for database-level putMany()packages/replicache/src/sync/patch.test.ts- 24 new tests for patch optimizationTest scenarios covered:
Compatibility
put()anddel()methods remain unchangedputMany()is an additive optimization that can be adopted incrementallyImplementation Details
Key Algorithm Improvements
Memory Efficiency
Future Work
delMany()for bulk deletions