3237: docs: add Iceberg public API documentation to contributor guide#40
3237: docs: add Iceberg public API documentation to contributor guide#40martin-augment wants to merge 5 commits intomainfrom
Conversation
Add documentation detailing all Comet classes and methods that form the public API used by Apache Iceberg. This helps contributors understand which APIs may affect Iceberg integration and need backward compatibility considerations. The documentation covers: - org.apache.comet.parquet: FileReader, RowGroupReader, ReadOptions, ParquetColumnSpec, column readers, BatchReader, Native JNI methods - org.apache.comet: CometSchemaImporter - org.apache.comet.vector: CometVector - org.apache.comet.shaded.arrow: RootAllocator, ValueVector Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a custom Java annotation @IcebergApi to mark all classes, methods, constructors, and fields that form the public API used by Apache Iceberg. This makes it easy to identify which APIs need backward compatibility considerations when making changes. The annotation is applied to: - org.apache.comet.parquet: FileReader, RowGroupReader, ReadOptions, WrappedInputFile, ParquetColumnSpec, AbstractColumnReader, ColumnReader, BatchReader, MetadataColumnReader, ConstantColumnReader, Native, TypeUtil, Utils - org.apache.comet: CometSchemaImporter - org.apache.comet.vector: CometVector Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add annotations to: - AbstractColumnReader.nativeHandle (protected field accessed by Iceberg subclasses) - AbstractCometSchemaImporter.close() (called by Iceberg) Also update documentation to include these APIs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a new Maven module containing dedicated unit tests for all @IcebergApi annotated classes, ensuring the public API contract with Apache Iceberg remains stable and tested. Key changes: - Add iceberg-public-api module with 169 tests covering all @IcebergApi classes - Fix CometVector constructor visibility (protected -> public) to match API annotation - Add IcebergApiVerificationTest for reflection-based API verification - Add tests for FileReader, BatchReader, ColumnReader, Native, TypeUtil, Utils - Add tests for CometVector, CometSchemaImporter, WrappedInputFile Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WalkthroughThis pull request introduces a public API surface for the Comet-Iceberg integration by creating the ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the maintainability and clarity of the Comet-Iceberg integration by formally defining and documenting the public API contract. By introducing a dedicated annotation, comprehensive documentation, and an automated test suite, it ensures that critical API elements are explicitly identified, understood, and protected against unintentional breaking changes, thereby improving the stability and future evolution of the integration. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Pull Request Review: Iceberg Public API DocumentationSummaryThis PR adds comprehensive documentation and test coverage for the Comet public API used by Apache Iceberg. The changes include a new Positive Aspects ✅1. Excellent Documentation
2. Well-Designed Annotation
3. Comprehensive Test Coverage
4. Proper Module Structure
5. Minimal Code Changes
Issues and Concerns 🔍1. Missing Import Ordering Fix
|
| Aspect | Rating | Notes |
|---|---|---|
| Documentation | ⭐⭐⭐⭐⭐ | Excellent, comprehensive coverage |
| Test Coverage | ⭐⭐⭐⭐ | Good API verification, could use more edge cases |
| Code Quality | ⭐⭐⭐⭐⭐ | Clean, minimal changes |
| Architecture | ⭐⭐⭐⭐⭐ | Well-designed annotation system |
| Security | ⭐⭐⭐ | Needs documentation on encryption handling |
Performance Considerations ✓
No performance concerns - this is purely documentation and testing infrastructure. The annotation overhead at runtime is negligible.
Recommendation
APPROVE with minor fixes
This is an excellent PR that addresses an important need for documenting the Iceberg integration API. The issues identified are minor and mostly related to documentation clarity and test coverage edge cases.
Required Changes:
- Fix the import ordering in
AbstractCometSchemaImporter.java
Recommended Changes (can be addressed in follow-up PRs):
- Add
@IcebergApito protected fields mentioned in documentation - Clarify shaded vs non-shaded Arrow class usage
- Add error handling and security sections to documentation
- Consider adding thread safety documentation
- Add deprecation/versioning policy
Great work on this comprehensive documentation effort! This will significantly help maintain compatibility with Iceberg going forward. 🎉
🤖 Augment PR SummarySummary: This PR formalizes and documents the subset of Comet’s JVM APIs that Apache Iceberg relies on for its hybrid Parquet reader integration. Changes:
Technical Notes: The API is designed to be shading-safe (e.g., using 🤖 Was this summary useful? React with 👍 or 👎 |
| // Methods used by Iceberg | ||
| public void setRequestedSchemaFromSpecs(List<ParquetColumnSpec> specs) | ||
| public RowGroupReader readNextRowGroup() throws IOException | ||
| public void skipNextRowGroup() |
There was a problem hiding this comment.
The guide documents skipNextRowGroup() as returning void, but the actual @IcebergApi method returns boolean (and BatchReader.nextBatch(int) also returns boolean). Keeping these signatures consistent in the guide will prevent confusion when validating the Iceberg-facing contract.
Other Locations
docs/source/contributor-guide/iceberg_public_api.md:162
🤖 Was this useful? React with 👍 or 👎
|
|
||
| ```java | ||
| // Constructor | ||
| public WrappedInputFile(org.apache.iceberg.io.InputFile inputFile) |
There was a problem hiding this comment.
The WrappedInputFile constructor is documented as taking org.apache.iceberg.io.InputFile, but the implemented API is WrappedInputFile(Object) to avoid shading/type-coupling. Consider aligning the rest of the signatures in this guide with the shading-safe API (e.g., Utils.getColumnReader using ParquetColumnSpec).
Other Locations
docs/source/contributor-guide/iceberg_public_api.md:228
🤖 Was this useful? React with 👍 or 👎
| @After | ||
| public void tearDown() throws IOException { | ||
| if (tempDir != null && Files.exists(tempDir)) { | ||
| Files.walk(tempDir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); |
| return Modifier.isPublic(modifiers) || Modifier.isProtected(modifiers); | ||
| } | ||
|
|
||
| /** Checks if native library is available. */ |
| * @deprecated since 0.10.0, will be removed in 0.11.0. | ||
| * @see <a href="https://github.com/apache/datafusion-comet/issues/2079">Comet Issue #2079</a> | ||
| */ | ||
| @IcebergApi |
There was a problem hiding this comment.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
common/src/main/java/org/apache/comet/parquet/RowGroupReader.java (1)
41-45: Add@IcebergApiannotation to the public constructor.The public constructor
RowGroupReader(long rowCount, long rowIndexOffset)should be annotated with@IcebergApito align with similar public constructors in the codebase (e.g.,CometSchemaImporter,WrappedInputFile). Since this class is part of the Iceberg API surface andFileReader.readNextRowGroup()(an@IcebergApimethod) directly instantiates this constructor, the constructor must be explicitly marked as part of the public API.common/src/main/java/org/apache/comet/parquet/ParquetColumnSpec.java (1)
89-91: Annotate isRepeated() if it’s part of the Iceberg-facing API.It’s a public getter but is the only one without
@IcebergApi, which could unintentionally exclude it from the documented API surface.✅ Suggested fix
- public boolean isRepeated() { + `@IcebergApi` + public boolean isRepeated() { return isRepeated; }
🤖 Fix all issues with AI agents
In `@docs/source/contributor-guide/iceberg_public_api.md`:
- Around line 308-321: Update the docs to use the actual public API names:
replace the nonexistent CometTypeUtils with TypeUtil when describing conversion
of ColumnDescriptor to ParquetColumnSpec, and change the incorrect
delegate().currentBatch() reference to call currentBatch() directly on the
appropriate reader type (e.g., ColumnReader, BatchReader or
AbstractColumnReader) so the flow reads that you retrieve results via
currentBatch() on the column/batch reader instead of via delegate().
- Around line 250-259: The doc currently lists AbstractCometSchemaImporter under
package org.apache.arrow.c but at runtime the shaded artifact relocates it to
org.apache.comet.shaded.arrow.c; update the text around
AbstractCometSchemaImporter to clarify both names (source vs runtime) or replace
the package with the runtime package org.apache.comet.shaded.arrow.c so API
consumers can import AbstractCometSchemaImporter correctly, and mention that JNI
classes remain under org.apache.arrow.* as an exception.
In
`@iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/parquet/BatchReaderApiTest.java`:
- Around line 85-89: The test BatchReaderApiTest.testCloseMethodExists only
asserts the existence of BatchReader.close() but does not verify the `@IcebergApi`
annotation; update this test to mirror
AbstractColumnReaderApiTest.testCloseMethodExists by retrieving the close method
via BatchReader.class.getMethod("close") and asserting that
method.isAnnotationPresent(IcebergApi.class) (or equivalent assertion using the
test framework) so the test both checks the method exists and has the
`@IcebergApi` annotation.
🧹 Nitpick comments (11)
common/src/main/java/org/apache/comet/parquet/ReadOptions.java (1)
151-175: Consider annotating the Builder constructor for API consistency.The
Builder(Configuration conf)constructor ispublicbut lacks the@IcebergApiannotation. While users are expected to useReadOptions.builder(conf), direct constructor access is possible. For consistency with other annotated constructors in this PR (e.g.,CometSchemaImporter,WrappedInputFile), consider either:
- Adding
@IcebergApito the constructor, or- Reducing the constructor visibility to package-private if it's only meant to be accessed via the factory method
common/src/main/java/org/apache/comet/parquet/TypeUtil.java (1)
39-46: Deprecated method annotated as Iceberg API - consider adding@Deprecatedannotation.The method is documented as deprecated via Javadoc (
@deprecated since 0.10.0, will be removed in 0.11.0), but the@DeprecatedJava annotation is missing. Adding both ensures:
- Compiler warnings for callers
- IDE visibility of deprecation status
♻️ Suggested change
* `@deprecated` since 0.10.0, will be removed in 0.11.0. * `@see` <a href="https://github.com/apache/datafusion-comet/issues/2079">Comet Issue `#2079`</a> */ + `@Deprecated` `@IcebergApi` public static ColumnDescriptor convertToParquet(StructField field) {common/src/main/java/org/apache/comet/parquet/WrappedInputFile.java (1)
44-65: Minor: Trailing comment artifacts and reflection accessibility note.
Lines 47 and 58 have trailing
//comments that appear to be artifacts and should be removed.The
setAccessible(true)calls may trigger warnings or failures under Java's module system (Java 9+) when the target class is in a non-open module. If Iceberg's InputFile implementations are in packages not opened to Comet, this could fail at runtime. This is likely acceptable given the specific integration context, but worth documenting.♻️ Remove trailing comment artifacts
- Method targetMethod = wrapped.getClass().getDeclaredMethod("getLength"); // + Method targetMethod = wrapped.getClass().getDeclaredMethod("getLength");- Method targetMethod = wrapped.getClass().getDeclaredMethod("newStream"); // + Method targetMethod = wrapped.getClass().getDeclaredMethod("newStream");iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/parquet/UtilsApiTest.java (1)
242-274: Consider adding assertion forisRepeatedfield in round-trip test.The round-trip conversion test validates most fields but doesn't explicitly assert that
isRepeatedis preserved. While the original spec setsisRepeatedtofalse(Line 255), the conversion logic inUtils.descriptorToParquetColumnSpecderives this from the repetition type, which should match.♻️ Suggested enhancement
assertThat(roundTrippedSpec.getLogicalTypeName()).isEqualTo(originalSpec.getLogicalTypeName()); + assertThat(roundTrippedSpec.isRepeated()).isEqualTo(originalSpec.isRepeated()); }iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/AbstractApiTest.java (1)
37-54: Consider handling deletion failures in tearDown.The current cleanup uses
File::deletewhich returnsfalseon failure but doesn't throw, so failures are silent. For test cleanup this is usually acceptable, but you may want to log or handle failures for debugging purposes.♻️ Optional: Add failure handling
`@After` public void tearDown() throws IOException { if (tempDir != null && Files.exists(tempDir)) { - Files.walk(tempDir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + Files.walk(tempDir) + .sorted(Comparator.reverseOrder()) + .forEach( + path -> { + try { + Files.deleteIfExists(path); + } catch (IOException e) { + // Log or ignore cleanup failures in tests + } + }); } }iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/IcebergApiVerificationTest.java (1)
266-338: Consider adding class-level annotation checks for consistency.The tests for
Utils,TypeUtil, andNativeclasses only verify method existence but don't check for class-level@IcebergApiannotation presence, unlike other individual class tests (e.g.,FileReader,BatchReader). This inconsistency may be intentional if these are utility classes without class-level annotations, but adding a comment would clarify the intent.♻️ Suggested: Add consistency or clarifying comments
`@Test` public void testUtilsHasExpectedPublicApi() { Class<?> clazz = Utils.class; + // Utils is a utility class with only static methods annotated, not the class itself // Check for expected static methods assertStaticMethodExists(Or, if the class should be annotated:
`@Test` public void testUtilsHasExpectedPublicApi() { Class<?> clazz = Utils.class; + assertThat(isPublic(clazz)).isTrue(); // Check for expected static methodsiceberg-public-api/src/test/java/org/apache/comet/iceberg/api/vector/CometVectorApiTest.java (1)
111-178: Inconsistent IcebergApi annotation verification for inherited methods.Several method existence tests (e.g.,
testCloseMethodExists,testIsNullAtMethodExists, and the getter methods) don't verify the@IcebergApiannotation, unlike the tests forsetNumNulls,setNumValues,numValues,getValueVector, andslice. If these methods are intentionally excluded from the Iceberg public API surface (e.g., inherited fromColumnVector), this is fine. Otherwise, consider adding annotation verification for consistency.iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/parquet/WrappedInputFileApiTest.java (1)
127-131: Potential integer overflow in test mock (minor).Casting
lengthfromlongtointat line 129 could overflow for values exceedingInteger.MAX_VALUE. Since test inputs are controlled small values, this is not an issue in practice, but consider usingMath.toIntExact(length)to fail fast if someone adds a test with a large length value.♻️ Suggested improvement
public InputStream newStream() { // Return a simple ByteArrayInputStream for testing - byte[] data = new byte[(int) length]; + byte[] data = new byte[Math.toIntExact(length)]; return new ByteArrayInputStream(data); }iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/parquet/RowGroupReaderApiTest.java (1)
128-154: Deduplicate MockInputFile across API tests.This helper duplicates the one in
FileReaderApiTest; consider extracting a shared test utility to reduce maintenance drift.iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/parquet/ColumnReaderApiTest.java (1)
75-92: Consider asserting@IcebergApion readBatch/currentBatch/close for consistency.Other API tests verify annotations on public methods; adding the checks here tightens the contract.
♻️ Proposed update
`@Test` public void testReadBatchMethodExists() throws NoSuchMethodException { Method method = ColumnReader.class.getMethod("readBatch", int.class); assertThat(method).isNotNull(); + assertThat(hasIcebergApiAnnotation(method)).isTrue(); } `@Test` public void testCurrentBatchMethodExists() throws NoSuchMethodException { Method method = ColumnReader.class.getMethod("currentBatch"); assertThat(method).isNotNull(); + assertThat(hasIcebergApiAnnotation(method)).isTrue(); assertThat(method.getReturnType().getSimpleName()).isEqualTo("CometVector"); } `@Test` public void testCloseMethodExists() throws NoSuchMethodException { Method method = ColumnReader.class.getMethod("close"); assertThat(method).isNotNull(); + assertThat(hasIcebergApiAnnotation(method)).isTrue(); }iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/parquet/AbstractColumnReaderApiTest.java (1)
57-59: Optional: redundant null assertions.The
getMethod()andgetDeclaredField()reflection methods throwNoSuchMethodException/NoSuchFieldExceptionif the element doesn't exist—they never returnnull. TheassertThat(...).isNotNull()calls are technically redundant since the test would already fail with an exception.This is a minor nit and can be kept for readability if preferred.
Also applies to: 64-66, 71-72, 78-79, 86-88, 94-95
| ## Package: `org.apache.arrow.c` | ||
|
|
||
| ### AbstractCometSchemaImporter | ||
|
|
||
| Base class for `CometSchemaImporter`. | ||
|
|
||
| ```java | ||
| // Methods used by Iceberg | ||
| public void close() | ||
| ``` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "class AbstractCometSchemaImporter" -g '*.java'Repository: martin-augment/datafusion-comet
Length of output: 206
🏁 Script executed:
rg -n "package.*arrow" -g '*.java' -A 2 | head -50Repository: martin-augment/datafusion-comet
Length of output: 645
🏁 Script executed:
cat pom.xml | grep -A 20 "shade\|arrow" | head -60Repository: martin-augment/datafusion-comet
Length of output: 3252
🏁 Script executed:
rg -n "relocation|shadedPattern|includes|excludes" pom.xml -B 2 -A 2 | grep -A 5 "arrow"Repository: martin-augment/datafusion-comet
Length of output: 57
🏁 Script executed:
fd "pom.xml" -x cat {} \; | grep -B 5 -A 15 "<relocations>"Repository: martin-augment/datafusion-comet
Length of output: 2282
Clarify the package name for AbstractCometSchemaImporter based on context of use.
The source code resides in org.apache.arrow.c, but the shade plugin relocates all org.apache.arrow.* classes to org.apache.comet.shaded.arrow.* (except JNI classes). Since AbstractCometSchemaImporter is not a JNI class, users importing from the shaded JAR will need to reference org.apache.comet.shaded.arrow.c. Update the documentation to specify which package name applies (source vs. runtime) or use the runtime package name if this guide is for API consumers.
🤖 Prompt for AI Agents
In `@docs/source/contributor-guide/iceberg_public_api.md` around lines 250 - 259,
The doc currently lists AbstractCometSchemaImporter under package
org.apache.arrow.c but at runtime the shaded artifact relocates it to
org.apache.comet.shaded.arrow.c; update the text around
AbstractCometSchemaImporter to clarify both names (source vs runtime) or replace
the package with the runtime package org.apache.comet.shaded.arrow.c so API
consumers can import AbstractCometSchemaImporter correctly, and mention that JNI
classes remain under org.apache.arrow.* as an exception.
| 1. Iceberg creates a `WrappedInputFile` from its `InputFile` | ||
| 2. Creates `ReadOptions` via builder pattern | ||
| 3. Instantiates `FileReader` with the wrapped input file | ||
| 4. Converts Parquet `ColumnDescriptor`s to `ParquetColumnSpec`s using `CometTypeUtils` | ||
| 5. Calls `setRequestedSchemaFromSpecs()` to specify which columns to read | ||
| 6. Iterates through row groups via `readNextRowGroup()` and `skipNextRowGroup()` | ||
|
|
||
| ### Column Reading Flow | ||
|
|
||
| 1. Creates `CometSchemaImporter` with a `RootAllocator` | ||
| 2. Uses `Utils.getColumnReader()` to create appropriate column readers | ||
| 3. Calls `reset()` and `setPageReader()` for each row group | ||
| 4. Uses `BatchReader` to coordinate reading batches across all columns | ||
| 5. Retrieves results via `delegate().currentBatch()` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether these identifiers exist in this repo.
rg -n "CometTypeUtils|delegate\(\)\.currentBatch" -g '*.java'Repository: martin-augment/datafusion-comet
Length of output: 57
🏁 Script executed:
#!/bin/bash
# Search for similar utility class names and column reader methods
rg -n "class.*TypeUtil|class.*Utils" -g '*.java' | head -20
rg -n "currentBatch\(\)" -g '*.java' | head -20
rg -n "ColumnReader" -g '*.java' | head -10Repository: martin-augment/datafusion-comet
Length of output: 4195
🏁 Script executed:
#!/bin/bash
# Read the documentation file to see the full context
cat -n docs/source/contributor-guide/iceberg_public_api.md | sed -n '300,330p'Repository: martin-augment/datafusion-comet
Length of output: 1483
🏁 Script executed:
#!/bin/bash
# Check TypeUtil methods
cat -n common/src/main/java/org/apache/comet/parquet/TypeUtil.java | head -60
echo "---"
# Check TypeUtilApiTest to see correct usage
cat -n iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/parquet/TypeUtilApiTest.javaRepository: martin-augment/datafusion-comet
Length of output: 12938
Correct API symbol names in documentation flow.
Line 311 references CometTypeUtils, which doesn't exist in the codebase. The correct class is TypeUtil. Additionally, line 321 references delegate().currentBatch(), which also doesn't exist. The correct method is currentBatch() called directly on the appropriate column reader class (e.g., ColumnReader, BatchReader, or AbstractColumnReader). Update these references so contributors can accurately map the documented flows to the actual public API.
🤖 Prompt for AI Agents
In `@docs/source/contributor-guide/iceberg_public_api.md` around lines 308 - 321,
Update the docs to use the actual public API names: replace the nonexistent
CometTypeUtils with TypeUtil when describing conversion of ColumnDescriptor to
ParquetColumnSpec, and change the incorrect delegate().currentBatch() reference
to call currentBatch() directly on the appropriate reader type (e.g.,
ColumnReader, BatchReader or AbstractColumnReader) so the flow reads that you
retrieve results via currentBatch() on the column/batch reader instead of via
delegate().
| @Test | ||
| public void testCloseMethodExists() throws NoSuchMethodException { | ||
| Method method = BatchReader.class.getMethod("close"); | ||
| assertThat(method).isNotNull(); | ||
| } |
There was a problem hiding this comment.
Missing @IcebergApi annotation verification for close() method.
In AbstractColumnReaderApiTest.testCloseMethodExists(), the test verifies that close() has the @IcebergApi annotation. For consistency, this test should also verify the annotation on BatchReader.close().
Proposed fix
`@Test`
public void testCloseMethodExists() throws NoSuchMethodException {
Method method = BatchReader.class.getMethod("close");
assertThat(method).isNotNull();
+ assertThat(hasIcebergApiAnnotation(method)).isTrue();
}🤖 Prompt for AI Agents
In
`@iceberg-public-api/src/test/java/org/apache/comet/iceberg/api/parquet/BatchReaderApiTest.java`
around lines 85 - 89, The test BatchReaderApiTest.testCloseMethodExists only
asserts the existence of BatchReader.close() but does not verify the `@IcebergApi`
annotation; update this test to mirror
AbstractColumnReaderApiTest.testCloseMethodExists by retrieving the close method
via BatchReader.class.getMethod("close") and asserting that
method.isAnnotationPresent(IcebergApi.class) (or equivalent assertion using the
test framework) so the test both checks the method exists and has the
`@IcebergApi` annotation.
There was a problem hiding this comment.
Code Review
This pull request introduces a new @IcebergApi annotation to formally define the public API used for Apache Iceberg integration. The changes are systematically applied across the codebase, and a new documentation file (iceberg_public_api.md) is added to describe this API. Crucially, a new test module (iceberg-public-api) is included, which uses reflection to verify the API contract, preventing accidental breaking changes in the future. The changes are well-structured and significantly improve the maintainability and clarity of the Iceberg integration. My review identified a few minor inaccuracies in the new documentation, for which I've provided suggestions.
| @@ -0,0 +1,342 @@ | |||
| <!-- | |||
There was a problem hiding this comment.
This is a great addition to the documentation! I have a few suggestions to improve its accuracy and clarity:
-
Utils.getColumnReadersignature: The signature forUtils.getColumnReaderon line 226 seems incorrect. The actual annotated method inUtils.javaisgetColumnReader(DataType, ParquetColumnSpec, ...). The documented one usesColumnDescriptorand has a different set of boolean flags. Could you please update the documentation to reflect the correct signature of the public API method? -
MetadataColumnReader.nativeHandle: On line 183,protected long nativeHandleis listed underMetadataColumnReader. This field is inherited fromAbstractColumnReaderand not declared inMetadataColumnReaderitself. To avoid confusion, it might be better to remove it from theMetadataColumnReadersection. -
CometTypeUtilstypo: On line 311,CometTypeUtilsis mentioned. This seems to be a typo and should probably beUtils, asUtils.descriptorToParquetColumnSpecis the relevant method in Comet's API. -
Ambiguous
reset()call: On line 319, the documentation mentionsCalls reset() ... for each row group. The methodreset()is not on any of the reader classes. It seems to refer toNative.resetBatch(), which is called internally. It would be clearer if this step was rephrased, for example: "Reading a new batch will reset the internal state."
3237: To review by AI