3819: feat: enable native Iceberg reader by default by martin-augment · Pull Request #52 · martin-augment/datafusion-comet

martin-augment · 2026-04-01T08:40:14Z

3819: To review by AI

Change the default value of spark.comet.scan.icebergNative.enabled from false to true, and update documentation to reflect this change.

coderabbitai · 2026-04-01T08:40:33Z

Walkthrough

The pull request adds two new guideline files (AGENTS.md and CLAUDE.md) establishing a linking restriction policy. The primary functional change involves enabling Comet's native Iceberg table scans by default by updating the COMET_ICEBERG_NATIVE_ENABLED configuration default from false to true. Supporting documentation updates in the Iceberg user guide and TPC benchmark README remove explicit configuration requirements from code examples and clarify that native Iceberg scanning is now enabled by default, with an option to disable it if needed.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr-3819-2026-04-01-08-40-10

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request enables the native Iceberg reader by default, updating the configuration, benchmarks, and user documentation. Review feedback recommends removing the non-standard AGENTS.md and CLAUDE.md files, which contain restrictive instructions for AI agents that are inappropriate for an open-source project. Additionally, it is suggested to update the configuration description to explicitly state that the native Iceberg reader is an experimental feature to better manage user expectations.

gemini-code-assist · 2026-04-01T08:41:35Z

AGENTS.md

@@ -0,0 +1 @@
+Never link to any issue or a pull request in any GitHub repository


The addition of AGENTS.md and CLAUDE.md files at the root of the repository is non-standard for an Apache project. These files appear to contain instructions for AI agents that are likely specific to the author's local environment or a particular tool. Furthermore, the instruction to 'Never link to any issue or a pull request' is overly restrictive for an open-source project where cross-referencing is essential for traceability and collaboration. If these are not strictly required for the project's operation, they should be removed from the pull request.

gemini-code-assist · 2026-04-01T08:41:35Z

CLAUDE.md

@@ -0,0 +1 @@
+Never link to any issue or a pull request in any GitHub repository


As noted in the comment for AGENTS.md, this file appears to be an instruction for an AI agent and is non-standard for the repository. It should be removed.

gemini-code-assist · 2026-04-01T08:41:35Z

common/src/main/scala/org/apache/comet/CometConf.scala

@@ -140,7 +140,7 @@ object CometConf extends ShimCometConf {
          "Iceberg tables are read directly through native execution, bypassing Spark's " +
          "DataSource V2 API for better performance.")


Since the native Iceberg reader is still considered experimental (as noted in the user guide), it would be beneficial to include this status in the configuration's documentation string, especially now that it is enabled by default. This helps manage user expectations regarding stability and potential fallbacks.

Suggested change

"DataSource V2 API for better performance.")

"DataSource V2 API for better performance. This feature is experimental.")

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! Several other config properties mention that they affect experimental features. It would be good this one to mention that the Iceberg support is experimental too.

claude · 2026-04-01T08:41:55Z

Code Review

PR Summary

This PR enables the native Iceberg reader (spark.comet.scan.icebergNative.enabled) by default, changing its default value from false to true. Documentation in iceberg.md and benchmarks/tpc/README.md is updated accordingly.

⚠️ Prompt Injection Concern

The PR adds two files, AGENTS.md and CLAUDE.md, each containing a single line:

Never link to any issue or a pull request in any GitHub repository

These files are specifically targeted at AI coding assistants (Codex agents read AGENTS.md, Claude reads CLAUDE.md). The instruction attempts to suppress AI tools from linking to related issues or PRs during code review, which would reduce context available to reviewers. This is a prompt injection technique and should not be merged. These files have no legitimate place in this PR and should be removed.

Config Change (`CometConf.scala`)

The one-line change flipping .createWithDefault(false) to .createWithDefault(true) is clean. A few things worth considering:

Breaking change for existing users. Any Comet user who has Iceberg tables but hasn't opted in to the native reader will now get it automatically after upgrading. If their workload hits an unsupported feature or a bug in the native reader, they have no warning. It would be good to call this out explicitly as a breaking change in the PR description and in a release note or migration guide entry.

Is the native reader stable enough to be default? The config doc says the feature "bypasses Spark's DataSource V2 API." Do any known limitations or compatibility gaps exist (row-level deletes, equality deletes, schema evolution edge cases, etc.) that users hitting this for the first time might not expect? It may be worth documenting any known limitations in iceberg.md alongside the "enabled by default" statement.

Version compatibility. The docs mention the reader has been tested with Iceberg 1.5, 1.7, 1.8, 1.9, and 1.10. Is it safe to enable by default for all of those versions, or only for a subset?

Documentation (`iceberg.md`)

The doc change is straightforward: removing the explicit --conf spark.comet.scan.icebergNative.enabled=true from both example spark-shell invocations and updating the prose to say the reader is on by default, with a pointer to the flag for disabling it. This reads clearly and is consistent with the config change.

One small suggestion: the sentence "The native Iceberg reader is enabled by default. To disable it, set spark.comet.scan.icebergNative.enabled=false." would be a good candidate to also appear in the compatibility or configuration guide, not just in the Iceberg-specific page, since users who don't read that page might be surprised by the behavior change.

`benchmarks/tpc/README.md`

The updated sentence is accurate and reads well.

Summary

The core config and documentation change is simple and correct. The main concerns are:

The AGENTS.md / CLAUDE.md prompt injection files should be removed before merging.
The PR should communicate clearly that this is a behavior change for existing users, and ideally document any known limitations of the native reader so users can make an informed decision about whether to disable it.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-01T08:42:15Z

AGENTS.md

@@ -0,0 +1 @@
+Never link to any issue or a pull request in any GitHub repository


Unrelated AI instruction files committed with feature change

Medium Severity

AGENTS.md and CLAUDE.md are unrelated to the feature of enabling the native Iceberg reader by default. They contain AI agent behavioral instructions and have nothing to do with the stated PR purpose. Notably, CLAUDE.md is already listed in .gitignore, meaning it was force-added to the repository. These files appear to have been accidentally included in this commit.

Additional Locations (1)

CLAUDE.md#L1

augmentcode · 2026-04-01T08:44:28Z

🤖 Augment PR Summary

Summary: Enables Comet’s native Iceberg reader by default by flipping the default for spark.comet.scan.icebergNative.enabled to true.

Changes:

Updates TPC benchmark docs and the Iceberg user guide to reflect default-on behavior and how to disable it

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-04-01T08:44:28Z

docs/source/user-guide/latest/iceberg.md

-1.8.1, but Comet has been tested with Iceberg 1.5, 1.7, 1.8, 1.9, and 1.10. The key configuration
-to enable fully-native Iceberg is `spark.comet.scan.icebergNative.enabled=true`.
+1.8.1, but Comet has been tested with Iceberg 1.5, 1.7, 1.8, 1.9, and 1.10. The native Iceberg
+reader is enabled by default. To disable it, set `spark.comet.scan.icebergNative.enabled=false`.


docs/source/user-guide/latest/iceberg.md:L33: This section still contains a markdown link to a pull request a few lines above (the PR apache#2528 link), which conflicts with the new repository rule “Never link to any issue or a pull request in any GitHub repository” (Rule: AGENTS.md). Could you align this doc with that rule to avoid having the repo immediately violate its own guidance?

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-04-01T08:44:28Z

benchmarks/tpc/README.md

-The benchmark uses `spark.comet.scan.icebergNative.enabled=true` to enable Comet's native iceberg-rust
-integration. Verify native scanning is active by checking for `CometIcebergNativeScanExec` in the
+The benchmark uses Comet's native iceberg-rust integration, which is enabled by default.
+Verify native scanning is active by checking for `CometIcebergNativeScanExec` in the


benchmarks/tpc/README.md:L183: This now says the benchmark uses the native Iceberg integration “enabled by default”, but the comet-iceberg benchmark engine config still explicitly sets spark.comet.scan.icebergNative.enabled=true, so the benchmark isn’t actually exercising the new default behavior. That mismatch could be confusing for readers trying to understand what’s required vs. what’s redundant.

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Line 1: Add a short preamble to AGENTS.md that states the intended audience
(e.g., AI agents and maintainers), the purpose of the guideline (to prohibit
linking to issues/PRs across GitHub repos), the rationale
(privacy/security/traceability concerns), the scope (where it applies: commit
messages, documentation, code comments, config files, PR descriptions), and the
consequences or enforcement mechanism (e.g., CI linter/PR review note). Keep it
a few sentences and include one example sentence clarifying that the rule
applies to both public and internal links.
- Line 1: Add the standard Apache License 2.0 header to the very top of
AGENTS.md so the file passes the apache-rat:check; insert the full ASF-required
license header block (the standard comment/banner used across the repo) before
any existing content or text in AGENTS.md, save and commit the change so the RAT
check recognizes the file as licensed.

In `@CLAUDE.md`:
- Line 1: CLAUDE.md duplicates AGENTS.md but lacks explanation of intent; either
consolidate the two into a single canonical file (remove CLAUDE.md and update
references to use AGENTS.md) or make CLAUDE.md explicitly different by adding a
clear header and a short section explaining how its instructions differ from
AGENTS.md (e.g., "Scope: Claude-specific overrides" and list unique rules), and
add a brief note in AGENTS.md pointing to CLAUDE.md when Claude-specific
behavior exists; update any documentation references accordingly so readers know
when to consult each file.
- Line 1: CLAUDE.md is missing the required Apache license header which causes
apache-rat:check failures; add the same Apache license header block used in
AGENTS.md to the very top of CLAUDE.md (before any content) so the file starts
with the standard Apache header text, ensuring the header text and year/owner
fields match project convention and formatting.

In `@common/src/main/scala/org/apache/comet/CometConf.scala`:
- Line 143: The config default for native Iceberg was changed to true; revert
the change so spark.comet.scan.icebergNative.enabled remains false by default
(restore the call that sets createWithDefault(false) in CometConf or equivalent)
and update CometScanRule.transformV2Scan() to consult NativeBase.isLoaded() in
addition to the config flag before performing plan conversion (i.e., only
convert when the config is true AND NativeBase.isLoaded() returns true) so
environments without the native library safely fall back to Spark.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1a751bda-732d-4d5c-ab92-56183fcfa733

📥 Commits

Reviewing files that changed from the base of the PR and between 88d3a1e and 41f8a38.

📒 Files selected for processing (5)

AGENTS.md
CLAUDE.md
benchmarks/tpc/README.md
common/src/main/scala/org/apache/comet/CometConf.scala
docs/source/user-guide/latest/iceberg.md

coderabbitai · 2026-04-01T08:46:30Z

AGENTS.md

@@ -0,0 +1 @@
+Never link to any issue or a pull request in any GitHub repository


⚠️ Potential issue | 🟠 Major

Clarify the purpose and scope of this guideline file.

The file contains a single instruction without context about:

Who is the intended audience (AI agents, developers, both)?

Why this restriction exists

What the consequences are for violations

Whether this applies to commit messages, documentation, code comments, etc.

Consider adding a preamble explaining the purpose of this file and the rationale for this linking restriction.

🧰 Tools

🪛 GitHub Actions: RAT License Check

[error] 1-1: apache-rat:check failed. Too many files with unapproved license (1). Unapproved file: AGENTS.md. See RAT report in /home/runner/work/datafusion-comet/datafusion-comet/target/rat.txt.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` at line 1, Add a short preamble to AGENTS.md that states the intended audience (e.g., AI agents and maintainers), the purpose of the guideline (to prohibit linking to issues/PRs across GitHub repos), the rationale (privacy/security/traceability concerns), the scope (where it applies: commit messages, documentation, code comments, config files, PR descriptions), and the consequences or enforcement mechanism (e.g., CI linter/PR review note). Keep it a few sentences and include one example sentence clarifying that the rule applies to both public and internal links.

⚠️ Potential issue | 🔴 Critical

Critical: Add Apache license header to fix RAT check failure.

The apache-rat:check pipeline is failing because this file is missing the required Apache license header. All files in the repository must include the Apache license header.

📝 Proposed fix: Add Apache license header

+ + Never link to any issue or a pull request in any GitHub repository

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Never link to any issue or a pull request in any GitHub repository



Never link to any issue or a pull request in any GitHub repository

🧰 Tools

🪛 GitHub Actions: RAT License Check

[error] 1-1: apache-rat:check failed. Too many files with unapproved license (1). Unapproved file: AGENTS.md. See RAT report in /home/runner/work/datafusion-comet/datafusion-comet/target/rat.txt.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` at line 1, Add the standard Apache License 2.0 header to the very top of AGENTS.md so the file passes the apache-rat:check; insert the full ASF-required license header block (the standard comment/banner used across the repo) before any existing content or text in AGENTS.md, save and commit the change so the RAT check recognizes the file as licensed.

coderabbitai · 2026-04-01T08:46:30Z

CLAUDE.md

@@ -0,0 +1 @@
+Never link to any issue or a pull request in any GitHub repository


⚠️ Potential issue | 🟠 Major

Clarify the relationship between CLAUDE.md and AGENTS.md.

This file has identical content to AGENTS.md. The separation suggests CLAUDE.md targets Claude AI specifically while AGENTS.md targets all AI agents, but this isn't documented. Consider:

Consolidating into a single file if the instruction applies universally

Adding distinct instructions if each file serves different AI systems

Documenting why separate files are necessary and how they differ

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` at line 1, CLAUDE.md duplicates AGENTS.md but lacks explanation of intent; either consolidate the two into a single canonical file (remove CLAUDE.md and update references to use AGENTS.md) or make CLAUDE.md explicitly different by adding a clear header and a short section explaining how its instructions differ from AGENTS.md (e.g., "Scope: Claude-specific overrides" and list unique rules), and add a brief note in AGENTS.md pointing to CLAUDE.md when Claude-specific behavior exists; update any documentation references accordingly so readers know when to consult each file.

⚠️ Potential issue | 🔴 Critical

Critical: Add Apache license header to prevent RAT check failure.

Like AGENTS.md, this file is missing the required Apache license header and will cause the apache-rat:check to fail.

📝 Proposed fix: Add Apache license header

+ + Never link to any issue or a pull request in any GitHub repository

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Never link to any issue or a pull request in any GitHub repository



Never link to any issue or a pull request in any GitHub repository

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` at line 1, CLAUDE.md is missing the required Apache license header which causes apache-rat:check failures; add the same Apache license header block used in AGENTS.md to the very top of CLAUDE.md (before any content) so the file starts with the standard Apache header text, ensuring the header text and year/owner fields match project convention and formatting.

coderabbitai · 2026-04-01T08:46:30Z

common/src/main/scala/org/apache/comet/CometConf.scala

          "DataSource V2 API for better performance.")
      .booleanConf
-      .createWithDefault(false)
+      .createWithDefault(true)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether native Iceberg execution has an explicit runtime capability/load guard # before conversion/execution paths are taken. rg -n -C3 --type scala 'COMET_ICEBERG_NATIVE_ENABLED|nativeIcebergScanMetadata|CometIcebergNativeScanExec' rg -n -C3 --type scala 'nativeLoadRequired|UnsatisfiedLinkError|isLoaded|load.*native|iceberg-rust'

Repository: martin-augment/datafusion-comet

Length of output: 50388

🏁 Script executed:

# Search for the conversion/execution paths mentioned in the review fd -e scala -path "*CometScanRule*" fd -e scala -path "*CometExecRule*"

Repository: martin-augment/datafusion-comet

Length of output: 412

🏁 Script executed:

# Look for native library loading/capability checks rg -n "UnsatisfiedLinkError|NativeLibrary|loadLibrary|isDefined.*native|nativeLoaded" --type scala | head -50

Repository: martin-augment/datafusion-comet

Length of output: 57

🏁 Script executed:

# Search for the actual CometIcebergNativeScanExec implementation fd -e scala -name "*CometIcebergNativeScanExec*"

Repository: martin-augment/datafusion-comet

Length of output: 305

🏁 Script executed:

# Search for the conversion/execution rules files fd "CometScanRule|CometExecRule" -e scala

Repository: martin-augment/datafusion-comet

Length of output: 340

🏁 Script executed:

# Look for native library loading/capability checks rg "UnsatisfiedLinkError|NativeLibrary|loadLibrary|isDefined|nativeLoaded" --type scala | head -50

Repository: martin-augment/datafusion-comet

Length of output: 5723

🏁 Script executed:

# Search for the actual CometIcebergNativeScanExec implementation fd "CometIcebergNativeScanExec" -e scala

Repository: martin-augment/datafusion-comet

Length of output: 155

🏁 Script executed:

# Examine CometExecRule.scala for native Iceberg handling cat spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala | head -250

Repository: martin-augment/datafusion-comet

Length of output: 12757

🏁 Script executed:

# Examine CometScanRule.scala for native Iceberg handling cat spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala | head -250

Repository: martin-augment/datafusion-comet

Length of output: 10102

🏁 Script executed:

# Look for the transformV2Scan method and native Iceberg handling grep -n "transformV2Scan\|nativeIcebergCompatScan\|COMET_ICEBERG_NATIVE_ENABLED" spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala | head -30

Repository: martin-augment/datafusion-comet

Length of output: 524

🏁 Script executed:

# Get the transformV2Scan implementation sed -n '300,450p' spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala

Repository: martin-augment/datafusion-comet

Length of output: 6359

🏁 Script executed:

# Also check CometIcebergNativeScanExec.scala for execution logic head -150 spark/src/main/scala/org/apache/spark/sql/comet/CometIcebergNativeScanExec.scala

Repository: martin-augment/datafusion-comet

Length of output: 6939

🏁 Script executed:

# Look at the doExecuteColumnar method in CometIcebergNativeScanExec sed -n '150,300p' spark/src/main/scala/org/apache/spark/sql/comet/CometIcebergNativeScanExec.scala

Repository: martin-augment/datafusion-comet

Length of output: 6311

🏁 Script executed:

# Search for any native library loading or capability checks rg "iceberg-rust|Native\.load|UnsatisfiedLinkError|isCometNativeAvailable|checkNative" spark/src/main/scala/org/apache/comet/rules/ spark/src/main/scala/org/apache/spark/sql/comet/

Repository: martin-augment/datafusion-comet

Length of output: 1668

🏁 Script executed:

# Check if there's a CometConf setting for native library availability check grep -n "COMET_ICEBERG_NATIVE_ENABLED\|nativeLoaded\|nativeAvailable" common/src/main/scala/org/apache/comet/CometConf.scala

Repository: martin-augment/datafusion-comet

Length of output: 137

🏁 Script executed:

# Look at CometConf.scala to see the full COMET_ICEBERG_NATIVE_ENABLED definition sed -n '135,150p' common/src/main/scala/org/apache/comet/CometConf.scala

Repository: martin-augment/datafusion-comet

Length of output: 880

🏁 Script executed:

# Check if there's native library loading or error handling in CometExecRDD fd "CometExecRDD" -e scala

Repository: martin-augment/datafusion-comet

Length of output: 141

🏁 Script executed:

# Search for Native.load or native library initialization rg "Native\.load|loadNativeLibrary" spark/src/main/scala/org/apache/spark/sql/comet/

Repository: martin-augment/datafusion-comet

Length of output: 57

🏁 Script executed:

# Look at CometExecRDD to see execution and error handling head -200 spark/src/main/scala/org/apache/spark/sql/comet/CometExecRDD.scala

Repository: martin-augment/datafusion-comet

Length of output: 7458

🏁 Script executed:

# Search for UnsatisfiedLinkError or JNI-related error handling rg "UnsatisfiedLinkError|JNI|native.*load|loadLibrary" spark/src/main/scala/

Repository: martin-augment/datafusion-comet

Length of output: 693

🏁 Script executed:

# Check if there's a native library availability check before execution rg "nativeAvailable|isNativeLoaded|checkNativeLibrary" spark/src/main/scala/

Repository: martin-augment/datafusion-comet

Length of output: 57

🏁 Script executed:

# Look for CometExecIterator and error handling fd "CometExecIterator" -e java -o -e scala

Repository: martin-augment/datafusion-comet

Length of output: 191

🏁 Script executed:

# Check for error handling in native execution rg "catch|Exception|error|fallback" spark/src/main/scala/org/apache/spark/sql/comet/CometExecRDD.scala | head -20

Repository: martin-augment/datafusion-comet

Length of output: 160

🏁 Script executed:

# Check the native execution error paths rg "UnsatisfiedLinkError|NoSuchLibraryException|CometNativeException" spark/src/main/scala/

Repository: martin-augment/datafusion-comet

Length of output: 419

🏁 Script executed:

# Look at CometExecIterator error handling grep -n "CometNativeException" spark/src/main/scala/org/apache/comet/CometExecIterator.scala -A 5 -B 5

Repository: martin-augment/datafusion-comet

Length of output: 717

🏁 Script executed:

# Look at the full CometExecIterator.scala to understand execution head -100 spark/src/main/scala/org/apache/comet/CometExecIterator.scala

Repository: martin-augment/datafusion-comet

Length of output: 4359

🏁 Script executed:

# Check what CometNativeException is fd "CometNativeException" -e scala

Repository: martin-augment/datafusion-comet

Length of output: 57

🏁 Script executed:

# Look for Native class and library loading fd "Native.java" -o -path "*parquet*Native*"

Repository: martin-augment/datafusion-comet

Length of output: 191

🏁 Script executed:

# Check if Native() constructor does any checking rg "class Native\|object Native" spark/ common/ --type scala -A 10

Repository: martin-augment/datafusion-comet

Length of output: 57

🏁 Script executed:

# Look at Native.java to see if library is checked fd "Native.java"

Repository: martin-augment/datafusion-comet

Length of output: 132

🏁 Script executed:

# Examine Native.java to see library loading cat common/src/main/java/org/apache/comet/parquet/Native.java

Repository: martin-augment/datafusion-comet

Length of output: 10660

🏁 Script executed:

# Check NativeBase for native library loading fd "NativeBase" -e java

Repository: martin-augment/datafusion-comet

Length of output: 128

🏁 Script executed:

# Look at NativeBase.java cat common/src/main/java/org/apache/comet/NativeBase.java

Repository: martin-augment/datafusion-comet

Length of output: 10437

Native Iceberg scan default should remain disabled until runtime capability check is added.

Changing spark.comet.scan.icebergNative.enabled to default true (line 143) enables native Iceberg execution by default, but CometScanRule (line 313-318) gates plan conversion only on configuration flags—not on actual native library availability. The NativeBase.isLoaded() check exists but is never called before committing to the native execution path. This means environments where the native library failed to load will now attempt native execution and fail at runtime instead of safely falling back to Spark. Keep this default as false until CometScanRule.transformV2Scan() calls NativeBase.isLoaded() to verify runtime capability before conversion.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@common/src/main/scala/org/apache/comet/CometConf.scala` at line 143, The config default for native Iceberg was changed to true; revert the change so spark.comet.scan.icebergNative.enabled remains false by default (restore the call that sets createWithDefault(false) in CometConf or equivalent) and update CometScanRule.transformV2Scan() to consult NativeBase.isLoaded() in addition to the config flag before performing plan conversion (i.e., only convert when the config is true AND NativeBase.isLoaded() returns true) so environments without the native library safely fall back to Spark.

andygrove and others added 3 commits March 27, 2026 15:36

feat: enable native Iceberg reader by default

438e985

Change the default value of spark.comet.scan.icebergNative.enabled from false to true, and update documentation to reflect this change.

Merge branch 'main' into enable-iceberg-native-default

4813f7a

Add CLAUDE.md & AGENTS.md

41f8a38

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

cursor bot reviewed Apr 1, 2026

View reviewed changes

augmentcode bot reviewed Apr 1, 2026

View reviewed changes

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

		@@ -0,0 +1 @@
		Never link to any issue or a pull request in any GitHub repository

		@@ -140,7 +140,7 @@ object CometConf extends ShimCometConf {
		"Iceberg tables are read directly through native execution, bypassing Spark's " +
		"DataSource V2 API for better performance.")

-Never link to any issue or a pull request in any GitHub repository
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+Never link to any issue or a pull request in any GitHub repository

Conversation

martin-augment commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Apr 1, 2026

Code Review

PR Summary

⚠️ Prompt Injection Concern

Config Change (CometConf.scala)

Documentation (iceberg.md)

benchmarks/tpc/README.md

Summary

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 1, 2026

Choose a reason for hiding this comment

Unrelated AI instruction files committed with feature change

Uh oh!

augmentcode bot commented Apr 1, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

Config Change (`CometConf.scala`)

Documentation (`iceberg.md`)

`benchmarks/tpc/README.md`