-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark 4.1: Display write metrics on SQL UI #15104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR ports write metrics display functionality from Spark 4.0 to Spark 4.1, enabling write operation metrics to be shown in the Spark SQL UI. The changes introduce custom metric classes for tracking various write operations and integrate them with Spark's connector API.
Changes:
- Added 25 new custom metric classes extending
CustomSumMetricto track data files, delete files, records, and file sizes for added/removed/total categories - Integrated metrics reporting into
SparkWriteandSparkPositionDeltaWriteby implementingreportDriverMetrics()andsupportedCustomMetrics() - Enhanced
BaseTableto support combining multiple metrics reporters
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| TotalRecords.java | Defines metric for tracking total record count |
| TotalPositionalDeletes.java | Defines metric for tracking total positional deletes |
| TotalFileSizeInBytes.java | Defines metric for tracking total file size |
| TotalEqualityDeletes.java | Defines metric for tracking total equality deletes |
| TotalDeleteFiles.java | Defines metric for tracking total delete files |
| TotalDataFiles.java | Defines metric for tracking total data files |
| RemovedRecords.java | Defines metric for tracking removed records |
| RemovedPositionalDeletes.java | Defines metric for tracking removed positional deletes |
| RemovedPositionalDeleteFiles.java | Defines metric for tracking removed positional delete files |
| RemovedFileSizeInBytes.java | Defines metric for tracking removed file size |
| RemovedEqualityDeletes.java | Defines metric for tracking removed equality deletes |
| RemovedEqualityDeleteFiles.java | Defines metric for tracking removed equality delete files |
| RemovedDeleteFiles.java | Defines metric for tracking removed delete files |
| RemovedDataFiles.java | Defines metric for tracking removed data files |
| AddedRecords.java | Defines metric for tracking added records |
| AddedPositionalDeletes.java | Defines metric for tracking added positional deletes |
| AddedPositionalDeleteFiles.java | Defines metric for tracking added positional delete files |
| AddedFileSizeInBytes.java | Defines metric for tracking added file size |
| AddedEqualityDeletes.java | Defines metric for tracking added equality deletes |
| AddedEqualityDeleteFiles.java | Defines metric for tracking added equality delete files |
| AddedDeleteFiles.java | Defines metric for tracking added delete files |
| AddedDataFiles.java | Defines metric for tracking added data files |
| SparkWriteBuilder.java | Adds custom metrics support to write builder |
| SparkWrite.java | Integrates metrics reporter and implements reportDriverMetrics() |
| SparkPositionDeltaWrite.java | Integrates metrics reporter and implements custom metrics methods |
| SparkWriteUtil.java | Provides utility methods for creating custom metrics and task metrics |
| InMemoryMetricsReporter.java | Adds commitReport() method to retrieve commit metrics |
| BaseTable.java | Adds combineMetricsReporter() method for combining metrics reporters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...k/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/metrics/AddedFileSizeInBytes.java
Outdated
Show resolved
Hide resolved
...v4.1/spark/src/main/java/org/apache/iceberg/spark/source/metrics/RemovedFileSizeInBytes.java
Outdated
Show resolved
Hide resolved
| return (ScanReport) metricsReport; | ||
| } | ||
|
|
||
| @Nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the deviation from the behavior defined in scanReport()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commitReport is called in every Spark write while only some of them could have commits to Iceberg tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's why you should either end up with a null report or an actual commit report
...k/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/metrics/AddedFileSizeInBytes.java
Outdated
Show resolved
Hide resolved
...v4.1/spark/src/main/java/org/apache/iceberg/spark/source/metrics/RemovedFileSizeInBytes.java
Outdated
Show resolved
Hide resolved
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java
Show resolved
Hide resolved
1c63813 to
2da3d86
Compare
No description provided.