Skip to content

3628: docs: expand profiling guide with JVM and async-profiler coverage#43

Open
martin-augment wants to merge 2 commits intomainfrom
pr-3628-2026-03-06-11-41-39
Open

3628: docs: expand profiling guide with JVM and async-profiler coverage#43
martin-augment wants to merge 2 commits intomainfrom
pr-3628-2026-03-06-11-41-39

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3628: To review by AI

Rename profiling_native_code.md to profiling.md and add sections for
async-profiler (unified JVM + native flame graphs), Java Flight Recorder,
a tool comparison table, and practical tips for profiling Comet's mixed
JVM/Rust execution.
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation for profiling Comet applications by consolidating and expanding existing guides into a single, comprehensive resource. It aims to provide developers with clear instructions and recommendations for using various profiling tools across JVM and native Rust components, improving the ability to diagnose performance issues and optimize code effectively.

Highlights

  • Expanded Profiling Guide: A new comprehensive profiling guide has been added, consolidating information on various tools for Comet development, which involves both JVM (Spark) and native (Rust) code.
  • Unified Profiling with async-profiler: The guide introduces async-profiler as the recommended tool for end-to-end Comet profiling, capable of generating unified flame graphs that include both JVM and native frames.
  • JVM Profiling with Java Flight Recorder (JFR): Detailed instructions and event types for using Java Flight Recorder (JFR) are provided for JVM-level investigations, such as GC pressure, allocations, and thread contention.
  • Native Rust Profiling with cargo-flamegraph: The guide includes updated information on profiling isolated Rust code using cargo-flamegraph for micro-benchmarks.
  • Consolidated Documentation: The previous 'Profiling Native Code' guide has been removed, and its relevant content, along with new comprehensive profiling strategies, has been integrated into the new 'Profiling' document.
Changelog
  • docs/source/contributor-guide/index.md
    • Updated the link for the profiling guide to point to the new, broader 'Profiling' document.
  • docs/source/contributor-guide/profiling.md
    • Added a new, comprehensive guide covering profiling tools and techniques for Comet development, including async-profiler, Java Flight Recorder (JFR), and cargo-flamegraph.
    • Included detailed installation instructions, usage examples for attaching to Spark applications, and options for passing profiler flags.
    • Provided guidance on choosing event types (cpu, wall, alloc, lock) and output formats for async-profiler.
    • Added platform-specific notes for Linux and macOS regarding profiler setup.
    • Detailed JFR usage, including adding flags to spark-submit, dynamic recording, viewing recordings, and useful JFR events for Comet debugging.
    • Explained how to use cargo-flamegraph for profiling native Rust code in isolation, including running micro-benchmarks.
    • Included tips for profiling Comet, such as using wall-clock profiling for JNI boundary overhead, alloc profiling for Arrow FFI, and correlating JVM and native frames.
  • docs/source/contributor-guide/profiling_native_code.md
    • Removed the outdated and less comprehensive guide on profiling native code.
Activity
  • The pull request was submitted for AI review.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Copy Markdown

claude bot commented Mar 6, 2026

PR Review: docs: expand profiling guide with JVM and async-profiler coverage

This is a well-written, substantial improvement to the contributor documentation. Replacing the narrow profiling_native_code.md with a unified profiling.md that covers async-profiler, JFR, and cargo-flamegraph is exactly the right approach for a mixed JVM/Rust codebase. The tool comparison table and practical tips section are particularly valuable additions.

I verified that the --async-profiler and --jfr flags referenced in the doc actually exist in benchmarks/tpc/run.py and match the described behavior.

A few issues to address:


Bug: Broken relative links (appears twice)

The link ../../benchmarks/tpc/README.md appears on lines 123 and 177. From the location of the file (docs/source/contributor-guide/profiling.md), two ../ levels up lands at docs/, not the project root — so this resolves to docs/benchmarks/tpc/README.md, which does not exist.

The correct relative path is ../../../benchmarks/tpc/README.md.

-See [benchmarks/tpc/README.md](../../benchmarks/tpc/README.md) for details.
+See [benchmarks/tpc/README.md](../../../benchmarks/tpc/README.md) for details.

Bug: Missing mkdir -p before tar extraction

The installation snippet extracts into /opt/async-profiler with --strip-components=1, but that directory doesn't exist yet. The tar command will fail without creating it first:

+mkdir -p /opt/async-profiler
 tar xzf async-profiler-3.0-linux-x64.tar.gz -C /opt/async-profiler --strip-components=1

Nit: Hardcoded version in download URL

The install example pins v3.0, which will silently become outdated as new releases are made. Consider adding a note or using a variable:

# Replace 3.0 with the latest version from the releases page
ASPROF_VERSION=3.0
wget https://github.com/async-profiler/async-profiler/releases/download/v${ASPROF_VERSION}/async-profiler-${ASPROF_VERSION}-linux-x64.tar.gz
tar xzf async-profiler-${ASPROF_VERSION}-linux-x64.tar.gz -C /opt/async-profiler --strip-components=1

Suggestion: Note on distributed Spark output file locations

The -agentpath: spark-submit example writes output to a fixed local path (file=executor.html). In a multi-node Spark cluster, each executor writes to its own node's local filesystem, so these files won't be directly accessible from the driver. A brief note here would prevent confusion:

Note: In a distributed Spark cluster, executor output files are written to each executor's local filesystem. You'll need to collect them from each node (e.g., via HDFS, S3, or scp) after the run. The benchmarks/tpc/run.py --async-profiler integration handles this automatically.


Minor: cd native tip could be clearer

In the "Isolate Rust-only performance issues" tip:

cd native
cargo flamegraph --root --bench <benchmark_name>

This works correctly since native/ is a Cargo workspace that includes both core and spark-expr (both have benches). A small clarifying comment like # native/ is the Cargo workspace root would help readers understand why this is the right directory.


Overall this is a great addition to the contributor guide — the async-profiler section in particular fills an important gap. Once the two broken links and the missing mkdir -p are fixed, this should be good to merge.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the profiling documentation by consolidating information about native code profiling and expanding it to cover JVM profiling with JFR and unified profiling with async-profiler. The new profiling.md guide is comprehensive and provides clear instructions and examples for different tools and scenarios. I've made a couple of minor suggestions to improve clarity and correct a grammatical error in the new documentation.

For continuous recording without a fixed duration:

```shell
--conf "spark.driver.extraJavaOptions=-XX:StartFlightRecording=disk=true,maxsize=500m,filename=driver.jfr"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This example for continuous JFR recording is presented as a single --conf flag, which is an incomplete command. For clarity and consistency with the preceding example, it would be better to show the full spark-submit command structure. This also provides an opportunity to include the corresponding spark.executor.extraJavaOptions which is important for profiling Spark applications.

Suggested change
--conf "spark.driver.extraJavaOptions=-XX:StartFlightRecording=disk=true,maxsize=500m,filename=driver.jfr"
spark-submit \
--conf "spark.driver.extraJavaOptions=-XX:StartFlightRecording=disk=true,maxsize=500m,filename=driver.jfr" \
--conf "spark.executor.extraJavaOptions=-XX:StartFlightRecording=disk=true,maxsize=500m,filename=executor.jfr" \
...

harness = false
```

These benchmarks are useful when for comparing performance between releases or between feature branches and the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a minor grammatical error here. The phrase 'when for' should likely be just 'for'.

Suggested change
These benchmarks are useful when for comparing performance between releases or between feature branches and the
These benchmarks are useful for comparing performance between releases or between feature branches and the

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is a minor typo in the sentence that should be corrected to prevent an incorrect grammar.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 6, 2026

🤖 Augment PR Summary

Summary: This PR consolidates and expands the contributor profiling documentation into a single cross-stack guide for Comet.

Changes:

  • Replaces the old “Profiling Native Code” page with a new unified Profiling page in the contributor guide.
  • Adds an async-profiler walkthrough for end-to-end profiling that includes both JVM (Spark) and native (Rust) frames.
  • Documents async-profiler installation, attach/agent usage, common event types, output formats, and platform-specific notes.
  • Adds a Java Flight Recorder (JFR) section focused on JVM-only diagnostics (GC, allocation, locks, I/O) including spark-submit and jcmd examples.
  • Preserves Rust-only profiling guidance via cargo bench and cargo-flamegraph, and adds practical profiling tips for Comet-specific workflows.

Why: Provide a clearer “choose the right profiler” reference for Comet development, covering JVM + native hot paths as well as native-only micro-benchmark work.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Adding a New Expression <adding_a_new_expression>
Tracing <tracing>
Profiling Native Code <profiling_native_code>
Profiling <profiling>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this renames/removes the old profiling_native_code page, it’s worth double-checking there aren’t any remaining references (internal links, bookmarks) to profiling_native_code that would now become broken in the built docs.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! Since a documentation page is removed it is good to check whether that page is referred by another page! I just double checked and there are no other references.

```shell
# Linux x64
wget https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz
tar xzf async-profiler-3.0-linux-x64.tar.gz -C /opt/async-profiler --strip-components=1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tar ... -C /opt/async-profiler will fail if /opt/async-profiler doesn’t already exist (and writing to /opt often requires root), so readers following this verbatim may hit an error.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to=have; category:documentation; feedback: The Augment AI reviewer is correct! The /opt folder may not exist or may exist but usually requires root permissions. To prevent any confusion in the readers it would be better to use some folder in the user's $HOME

```shell
spark-submit \
--conf "spark.driver.extraJavaOptions=-agentpath:/opt/async-profiler/lib/libasyncProfiler.so=start,event=cpu,file=driver.html" \
--conf "spark.executor.extraJavaOptions=-agentpath:/opt/async-profiler/lib/libasyncProfiler.so=start,event=cpu,file=executor.html" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fixed output filenames like executor.html/executor.jfr can lead to profiles being overwritten if multiple executors/JVMs write into the same working directory on a host.

Severity: low

Other Locations
  • docs/source/contributor-guide/profiling.md:138

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-but-wont-fix; category:documentation; feedback: The Augment AI reviewer is correct! The profiling output files would be overwritten but usually this is the needed behavior. The developer makes some improvement and reloads the page in the browser to see whether it has a good effect. If the developer wants to compare results then (s)he need to generate unique file names, e.g. with a timestamp.

### Integrated benchmark profiling

The TPC benchmark scripts in `benchmarks/tpc/` have built-in async-profiler support via
the `--async-profiler` flag. See [benchmarks/tpc/README.md](../../benchmarks/tpc/README.md)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relative link to ../../benchmarks/tpc/README.md points outside the docs source tree, which may render as a broken link in the published documentation build (even though it works in the repo).

Severity: medium

Other Locations
  • docs/source/contributor-guide/profiling.md:177

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

@martin-augment martin-augment Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category: documentation; feedback: The Augment AI reviewer is correct! The link is broken. It should use three "go up"s to point to the wanted document. Prevents releasing documentation with broken links.

@martin-augment
Copy link
Copy Markdown
Owner Author

martin-augment commented Mar 6, 2026

Bug: Broken relative links (appears twice)

The link ../../benchmarks/tpc/README.md appears on lines 123 and 177. From the location of the file (docs/source/contributor-guide/profiling.md), two ../ levels up lands at docs/, not the project root — so this resolves to docs/benchmarks/tpc/README.md, which does not exist.

The correct relative path is ../../../benchmarks/tpc/README.md.

-See [benchmarks/tpc/README.md](../../benchmarks/tpc/README.md) for details.
+See [benchmarks/tpc/README.md](../../../benchmarks/tpc/README.md) for details.

value:useful; category: documentation; feedback: The Claude AI reviewer is correct! The link is broken. It should use three "go up"s to point to the wanted document. Prevents releasing documentation with broken links.

@martin-augment
Copy link
Copy Markdown
Owner Author

Bug: Missing mkdir -p before tar extraction

The installation snippet extracts into /opt/async-profiler with --strip-components=1, but that directory doesn't exist yet. The tar command will fail without creating it first:

+mkdir -p /opt/async-profiler
 tar xzf async-profiler-3.0-linux-x64.tar.gz -C /opt/async-profiler --strip-components=1

value:good-to=have; category:documentation; feedback: The Claude AI reviewer is correct! The /opt folder may not exist or may exist but usually requires root permissions. To prevent any confusion in the readers it would be better to use some folder in the user's $HOME

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants