Skip to content

HDDS-13919. S3 Conditional Writes (PutObject)#9815

Draft
peterxcli wants to merge 27 commits intoapache:masterfrom
peterxcli:HDDS-13919-conditional-writes
Draft

HDDS-13919. S3 Conditional Writes (PutObject)#9815
peterxcli wants to merge 27 commits intoapache:masterfrom
peterxcli:HDDS-13919-conditional-writes

Conversation

@peterxcli
Copy link
Member

@peterxcli peterxcli commented Feb 24, 2026

What changes were proposed in this pull request?

  • Add createKeyIfNotExists, rewriteKeyIfMatch to RpcClient API, prevent the logic mix wit the existing rewriteKey
  • Object Endpoint put endpoint handler multiplex createKeyIfNotExists, rewriteKeyIfMatch and createKey
  • OM side validation logic of etag matching in key creation and commit request
  • Error translation from OM exception S3 Error for conditional request

## Specification
### AWS S3 Conditional Write Specification
#### If-None-Match Header
```
If-None-Match: "*"
```
- Succeeds only if object does NOT exist
- Returns `412 Precondition Failed` if object exists
- Primary use case: Create-only semantics
#### If-Match Header
```
If-Match: "<etag>"
```
- Succeeds only if object EXISTS and ETag matches
- Returns `412 Precondition Failed` if object doesn't exist or ETag mismatches
- Primary use case: Atomic updates (compare-and-swap)

Should be merged after #9332

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13919

How was this patch tested?

Clean CI: https://github.com/peterxcli/ozone/actions/runs/23552109067/job/68568728541

  • unit tests for affected components
  • rpc client integration test for newly introduced two Bucket API
  • s3 sdk test
  • robot test

…ion handling

- Implemented tests for `createKey` and `rewriteKey` methods to validate behavior when using the `EXPECTED_GEN_CREATE_IF_NOT_EXISTS` constant.
- Added scenarios for key creation when the key is absent and when it already exists.
- Enhanced the `rewriteFailsWhenKeyExists` test to cover cases for both committed and uncommitted keys.
- Updated error handling to ensure correct responses for key existence checks.
…ion handling

- Implemented tests for `createKey` and `rewriteKey` methods to validate behavior when using the `EXPECTED_GEN_CREATE_IF_NOT_EXISTS` constant.
- Added scenarios for key creation when the key is absent and when it already exists.
- Enhanced the `rewriteFailsWhenKeyExists` test to cover cases for both committed and uncommitted keys.
- Updated error handling to ensure correct responses for key existence checks.
@peterxcli peterxcli changed the title Hdds 13919 conditional writes HDDS-13919. S3 Conditional Writes (PutObject) Feb 24, 2026
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a quick glance it looks mostly reasonable.

.addAllMetadataGdpr(metadata)
.addAllTags(tags)
.setLatestVersionLocation(getLatestVersionLocation)
.setExpectedETag(expectedETag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it set setExpectedDataGeneration here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, please take a look at the design doc:

#### If-Match Implementation
To optimize performance and reduce latency, we avoid a pre-flight check (GetS3KeyDetails) and instead validate the ETag during the OM Write operation.
This requires adding an optional `expectedETag` field to `KeyArgs`. This approach optimizes the "happy path" (successful match) by removing an extra network round trip.
For failing requests, they still incur the cost of a write RPC and Raft log entry, but this is acceptable under optimistic concurrency control assumptions.
##### S3 Gateway Layer
1. Parse `If-Match: "<etag>"` header.
2. Populate `KeyArgs` with the parsed `expectedETag`.
3. Send the write request (CreateKey) to OM.
##### OM Create Phase
Validation is performed within the `validateAndUpdateCache` method to ensure atomicity within the Ratis state machine application.
1. **Locking**: The OM acquires the write lock for the bucket/key.
2. **Key Lookup**: Retrieve the existing key from `KeyTable`.
3. **Validation**:
- **Key Not Found**: If the key does not exist, throw `KEY_NOT_FOUND` (maps to S3 412).
- **No ETag Metadata**: If the existing key (e.g., uploaded via OFS) does not have an ETag property, throw `ETAG_NOT_AVAILABLE` (maps to S3 412). The precondition cannot be evaluated, so we must fail rather than silently proceed.
- **ETag Mismatch**: Compare `existingKey.ETag` with `expectedETag`. If they do not match, throw `ETAG_MISMATCH` (maps to S3 412).
4. **Extract Generation**: If ETag matches, extract `existingKey.updateID`.
5. **Create Open Key**: Create open key entry with `expectedDataGeneration = existingKey.updateID`.

@@ -187,8 +188,18 @@ public Response put(
throw newError(S3ErrorTable.NO_SUCH_BUCKET, bucketName, ex);
} else if (ex.getResult() == ResultCodes.FILE_ALREADY_EXISTS) {
throw newError(S3ErrorTable.NO_OVERWRITE, keyPath, ex);
} else if (ex.getResult() == ResultCodes.KEY_ALREADY_EXISTS) {
throw newError(PRECOND_FAILED, keyPath, ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you like to consider having different error messages for different cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants