Skip to content

Conversation

@maheshrbapatu
Copy link
Collaborator

Description

Implement partial success handling in BatchOffload for both StorageBackendAdaptor and OffsetAllocatorStorageBackend. When some keys fail but others succeed, continue processing and report only successful keys to master via complete_handler. This prevents inconsistency where storage backend has written keys but master wasn't notified.

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

  • Added PartialSuccess tests for both Adaptor and OffsetAllocator.

Checklist

  • I have performed a self-review of my own code.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

…sting

- Add SetTestFailurePredicate() virtual method to StorageBackendInterface
- Implement test failure injection in StorageBackendAdaptor and OffsetAllocatorStorageBackend
- Refactor TestPartialSuccessBehavior helper to use predicate-based injection
- Remove repetitive key creation code (use loops)
- Add PartialSuccess tests for both Adaptor and OffsetAllocator backends
- Update OutOfSpace test to handle partial success semantics (value==0)

This enables deterministic testing of partial success behavior without
relying on approximate space exhaustion, making tests more reliable.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @maheshrbapatu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of the storage backend's batch offload operations. By introducing partial success handling, the system can now gracefully manage scenarios where individual key operations within a batch might fail. Instead of aborting the entire batch, the process continues, ensuring that all possible successful operations are completed and accurately reported. This improvement prevents data inconsistencies and makes the storage system more resilient to transient errors during batch writes.

Highlights

  • Partial Success Handling: Implemented partial success handling in the BatchOffload method for both StorageBackendAdaptor and OffsetAllocatorStorageBackend. This means that if some keys fail during a batch operation, the process will continue for other keys, and only the successfully processed keys will be reported.
  • Error Reporting Consistency: The change ensures that the master is only notified of keys that were successfully written to the storage backend, preventing inconsistencies where keys might be written but the master remains unaware due to a partial batch failure.
  • Testability Enhancements: Added a SetTestFailurePredicate method to StorageBackendInterface and its concrete implementations. This allows for deterministic injection of failures during testing, making it easier to verify the partial success behavior.
  • Improved Error Logging: Modified error logging within BatchOffload to provide more specific details about which key failed and the nature of the error, while still allowing the batch process to continue.
  • New Test Cases: Introduced new test cases (AdaptorBatchOffload_PartialSuccess and OffsetAllocatorStorageBackend_PartialSuccess) to thoroughly validate the new partial success logic using the deterministic failure injection mechanism.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements partial success handling for BatchOffload in StorageBackendAdaptor and OffsetAllocatorStorageBackend. Instead of failing the entire batch on a single key failure, the operation now continues, and only successfully offloaded keys are reported. This is a good improvement for robustness. The changes are well-tested, with a new test helper TestPartialSuccessBehavior and failure injection mechanism (SetTestFailurePredicate) to deterministically test partial success scenarios.

My feedback includes a suggestion to improve efficiency in OffsetAllocatorStorageBackend by using break instead of continue on space exhaustion errors, and a couple of suggestions to improve code clarity and maintainability in the tests.

@maheshrbapatu maheshrbapatu requested review from ykwd and removed request for ykwd January 4, 2026 10:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements partial success handling in BatchOffload for both StorageBackendAdaptor and OffsetAllocatorStorageBackend. The key improvement is that when some keys fail during batch processing but others succeed, the operation continues processing remaining keys and reports only the successful keys to the master via the complete_handler. This prevents inconsistency where the storage backend has written keys but the master wasn't notified.

  • Changed BatchOffload to continue processing on individual key failures instead of aborting the entire batch
  • Added test failure injection mechanism to enable deterministic testing of partial success scenarios
  • Return the count of successfully processed keys instead of always returning 0

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
mooncake-store/tests/storage_backend_test.cpp Adds TestPartialSuccessBehavior helper function and two new test cases (AdaptorBatchOffload_PartialSuccess and OffsetAllocatorStorageBackend_PartialSuccess) to verify partial success handling; also updates OutOfSpace test to handle partial success return values
mooncake-store/src/storage_backend.cpp Implements partial success logic in BatchOffload for both StorageBackendAdaptor and OffsetAllocatorStorageBackend by continuing on individual failures and returning the count of successful keys; adds test failure injection support
mooncake-store/include/storage_backend.h Adds SetTestFailurePredicate virtual method to StorageBackendInterface and implements it in StorageBackendAdaptor and OffsetAllocatorStorageBackend with corresponding test_failure_predicate_ member variables

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ykwd ykwd requested a review from zhangzuo21 January 5, 2026 03:16
@zhangzuo21
Copy link
Collaborator

Thanks for your contribution! The code is well implemented from my perspective. Feel free to check for the review generated by gemini code agent.

@maheshrbapatu
Copy link
Collaborator Author

Thanks for taking a look at this @zhangzuo21. I have adressed all the review comments now

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 80.00000% with 21 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
mooncake-store/src/storage_backend.cpp 54.83% 14 Missing ⚠️
mooncake-store/tests/storage_backend_test.cpp 91.30% 6 Missing ⚠️
mooncake-store/include/storage_backend.h 80.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@stmatengss
Copy link
Collaborator

It is a very useful feature. LGTM. Thx.

@stmatengss stmatengss merged commit 4591728 into kvcache-ai:main Jan 7, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants