Skip to content

Conversation

@shrutisinghania
Copy link
Collaborator

Update configuration effectiveness for HTTP_API_CLIENT, STORAGE_CLIENT and BIDI mode.

@gemini-code-assist
Copy link

Summary of Changes

Hello @shrutisinghania, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the clarity and accuracy of the CONFIGURATION.md documentation. The updates provide precise details on how various GCS connector configurations behave and apply differently based on the chosen client type (HTTP_API_CLIENT or STORAGE_CLIENT gRPC) and the activation of Bidi mode. The changes aim to improve user understanding of configuration effectiveness, ensuring that developers can correctly configure their GCS connector for optimal performance and functionality.

Highlights

  • Client Type Specificity for Rewrite Operations: Clarified the effectiveness of rewrite configurations (fs.gs.copy.with.rewrite.enable, fs.gs.rewrite.max.chunk.size), detailing when they apply to HTTP_API_CLIENT versus STORAGE_CLIENT (gRPC).
  • Batching and Parallel Execution Differences: Updated documentation for fs.gs.max.requests.per.batch and fs.gs.batch.threads to clearly distinguish their behavior and applicability between HTTP_API_CLIENT and STORAGE_CLIENT.
  • Custom HTTP Headers and CSEK Support: Explained how custom HTTP headers (fs.gs.storage.http.headers) are handled as request metadata by STORAGE_CLIENT (gRPC) and added a note on Customer-Supplied Encryption Keys (CSEK) support limitations for STORAGE_CLIENT (gRPC) with Bidi Reads and write operations.
  • Input/Output Stream Configuration Effectiveness: Provided clarity on which client type (HTTP_API_CLIENT or STORAGE_CLIENT) and Bidi mode (fs.gs.bidi.enable) affect various input/output stream configurations, including fast-fail-on-not-found, GZIP encoding support, pipe types, buffer sizes, upload chunk sizes, upload cache, and direct uploads.
  • Vectored Read and Bidi Configuration Details: Updated documentation for vectored read configurations (fs.gs.vectored.read.merge.enable, fs.gs.vectored.read.merged.range.max.size, fs.gs.vectored.read.threads), specifying their support across client types and Bidi mode, and introduced fs.gs.bidi.thread.count for Bidi-enabled scenarios. Also clarified that all Bidi configurations are only effective when STORAGE_CLIENT (gRPC) is used.
  • Requester Pays Configuration Applicability: Added a note specifying that Requester Pays configurations are only effective with HTTP_API_CLIENT.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@shrutisinghania
Copy link
Collaborator Author

/gcbrun

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the configuration documentation by clarifying the effectiveness of various settings for HTTP_API_CLIENT, STORAGE_CLIENT, and Bidi mode. The changes are clear and valuable. I've provided a few suggestions to further enhance readability, consistency, and completeness in some of the descriptions. Overall, this is a great update.


Whether to perform copy operation using Rewrite requests which allows copy
files between different locations and storage classes.
files between different locations and storage classes. Is effective only if `fs.gs.client.type` is set to `HTTP_API_CLIENT`. For `STORAGE_CLIENT` (gRPC), rewrite is always enabled.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability, consider rephrasing this sentence. Starting with 'This setting is...' can make the purpose clearer. I've noticed this pattern in a few other places in this file (lines 305, 323, 347, 370, 374). It would be great to apply this change consistently for a more natural flow.

Suggested change
files between different locations and storage classes. Is effective only if `fs.gs.client.type` is set to `HTTP_API_CLIENT`. For `STORAGE_CLIENT` (gRPC), rewrite is always enabled.
files between different locations and storage classes. This setting is effective only if `fs.gs.client.type` is set to `HTTP_API_CLIENT`. For `STORAGE_CLIENT` (gRPC), rewrite is always enabled.


Maximum size of object chunk that will be rewritten in a single rewrite
request when `fs.gs.copy.with.rewrite.enable` is set to `true`.
request. For `HTTP_API_CLIENT`, this is effective only when `fs.gs.copy.with.rewrite.enable` is `true`. For `STORAGE_CLIENT` (gRPC), rewrite is always enabled, so this configuration is always effective.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve clarity and reduce redundancy, you could rephrase this sentence. The phrase 'so this configuration is always effective' is redundant given the preceding context.

Suggested change
request. For `HTTP_API_CLIENT`, this is effective only when `fs.gs.copy.with.rewrite.enable` is `true`. For `STORAGE_CLIENT` (gRPC), rewrite is always enabled, so this configuration is always effective.
request. For `HTTP_API_CLIENT`, this is effective only when `fs.gs.copy.with.rewrite.enable` is `true`. For `STORAGE_CLIENT` (gRPC), this configuration is always effective as rewrite is always enabled.


These threads are used to execute the Class A, Class B and Free Cloud Storage
operations as copy, list, delete, etc. These operations are part of typical
`hdfs` CLI commands such as `hdfs mv`, `hdfs cp`, etc. Depending on the number of requests the connector evenly distributes the number of requests across batch threads.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sentence 'Depending on the number of requests the connector evenly distributes the number of requests across batch threads.' seems to be specific to the HTTP_API_CLIENT's batching behavior. For STORAGE_CLIENT, this property just controls the thread pool size for parallel operations. To improve clarity, consider moving this sentence to be part of the HTTP_API_CLIENT description on lines 122-123.

[`MediaHttUploader` class](https://cloud.google.com/java/docs/reference/google-api-client/latest/com.google.api.client.googleapis.media.MediaHttpUploader).
This is used only for JSON API and for best performance should be a multiple
of 8 MiB.
Not supported by `STORAGE_CLIENT` (gRPC) when `fs.gs.bidi.enable` is `true` and for best performance should be a multiple of 8 MiB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The updated description is more accurate but could be more comprehensive. It clarifies when the setting is not supported but doesn't explain when it is supported for STORAGE_CLIENT. Consider clarifying that this setting is used for HTTP_API_CLIENT and for STORAGE_CLIENT when using CHUNK_UPLOAD, but not with Bidi enabled.

Suggested change
Not supported by `STORAGE_CLIENT` (gRPC) when `fs.gs.bidi.enable` is `true` and for best performance should be a multiple of 8 MiB.
This setting is used for `HTTP_API_CLIENT` resumable uploads and for `STORAGE_CLIENT` (gRPC) with `CHUNK_UPLOAD`. It is not supported by `STORAGE_CLIENT` when `fs.gs.bidi.enable` is `true`. For best performance, this should be a multiple of 8 MiB.

It controls the parallel processing of range request. These threads will be
shared across all readVectored invocation. If the task queue of range request
is overloaded do consider increasing this value.
Supported by `HTTP_API_CLIENT`, `STORAGE_CLIENT` (gRPC) only when `fs.gs.bidi.enable` is `false`. If `fs.gs.bidi.enable` is `true`, `fs.gs.bidi.thread.count` is used instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comma after HTTP_API_CLIENT could create ambiguity. To improve clarity, consider rephrasing to make it clear that the condition only when fs.gs.bidi.enable is false applies to STORAGE_CLIENT.

Suggested change
Supported by `HTTP_API_CLIENT`, `STORAGE_CLIENT` (gRPC) only when `fs.gs.bidi.enable` is `false`. If `fs.gs.bidi.enable` is `true`, `fs.gs.bidi.thread.count` is used instead.
Supported by `HTTP_API_CLIENT` and by `STORAGE_CLIENT` (gRPC) only when `fs.gs.bidi.enable` is `false`. If `fs.gs.bidi.enable` is `true`, `fs.gs.bidi.thread.count` is used instead.

@shrutisinghania
Copy link
Collaborator Author

/gcbrun

@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.88%. Comparing base (24b6408) to head (dce0ef4).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1606      +/-   ##
============================================
- Coverage     81.90%   81.88%   -0.02%     
+ Complexity     2425     2424       -1     
============================================
  Files           128      128              
  Lines         10819    10819              
  Branches       1302     1302              
============================================
- Hits           8861     8859       -2     
- Misses         1415     1417       +2     
  Partials        543      543              
Flag Coverage Δ
integrationtest 66.82% <ø> (-0.13%) ⬇️
unittest 72.34% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants