-
Notifications
You must be signed in to change notification settings - Fork 113
[server] Global RT DIV: Fix VT DIV Sync Order #2340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
KaiSernLim
merged 19 commits into
linkedin:main
from
KaiSernLim:global-rt-div-fix-vt-div-sync-order
Jan 5, 2026
Merged
[server] Global RT DIV: Fix VT DIV Sync Order #2340
KaiSernLim
merged 19 commits into
linkedin:main
from
KaiSernLim:global-rt-div-fix-vt-div-sync-order
Jan 5, 2026
+466
−72
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… server, during ingestion phase (before EOP) and after ingestion phase (after EOP, and consuming RT), and finally verifies that all ingested data can be successfully queried (no data loss). ❤️🩹
92c099d to
f4ae940
Compare
lluwm
reviewed
Dec 10, 2025
Contributor
lluwm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change, @KaiSernLim. It looks very promising. Left a few comments for you to consider.
...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java
Show resolved
Hide resolved
...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java
Show resolved
Hide resolved
.../src/test/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTaskTest.java
Outdated
Show resolved
Hide resolved
...enice-test-common/src/integrationTest/java/com/linkedin/venice/endToEnd/TestGlobalRtDiv.java
Show resolved
Hide resolved
ee7014c to
f351d86
Compare
… post-processing actions and enhance VT DIV sync logic. 👧
lluwm
reviewed
Dec 24, 2025
...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreBufferService.java
Outdated
Show resolved
Hide resolved
lluwm
approved these changes
Jan 5, 2026
Contributor
lluwm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! Thank you!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem Statement
This PR fixes 3 important issues discovered in certification testing:
EndOfPushdue to the VT DIV being synced to drainer beforeEndOfPushbeing sent to drainer. This will causeOffsetRecord.isEndOfPushReceived()to be set asfalse, while the DIV has already seen the EOP message. The checkpointed position will be at the EOP message, but it will be skipped because the DIV has already seen the EOP message, so theisEndOfPushReceived()will never be set astrue. The first push will succeed, but it will become a problem on restart and the host will never finish ingestion.DeleteValues inshouldSyncOffsetFromSnapshot()MessageType:GLOBAL_RT_DIV. A commit extended this special key message type forDELETEmessage values, but code paths were not updated from previously, when allGLOBAL_RT_DIVwould containPUTmessage values.VeniceKafkaInputTTLFilter.skipRmdRecord()and they do not have replication metadata fields populated, which causes issues. Skipping it inPubSubSplitIteratorshould cause it to be omitted from repushes.Solution
Read the previous section.
Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
Added new parametrized integration test
testServerRestart().Does this PR introduce any user-facing or breaking changes?