Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/contributor-guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Benchmarking Guide <benchmarking>
Adding a New Operator <adding_a_new_operator>
Adding a New Expression <adding_a_new_expression>
Tracing <tracing>
Profiling Native Code <profiling_native_code>
Profiling <profiling>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this renames/removes the old profiling_native_code page, it’s worth double-checking there aren’t any remaining references (internal links, bookmarks) to profiling_native_code that would now become broken in the built docs.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! Since a documentation page is removed it is good to check whether that page is referred by another page! I just double checked and there are no other references.

Spark SQL Tests <spark-sql-tests.md>
SQL File Tests <sql-file-tests.md>
Roadmap <roadmap.md>
Expand Down
293 changes: 293 additions & 0 deletions docs/source/contributor-guide/profiling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Profiling

This guide covers profiling tools and techniques for Comet development. Because Comet
spans JVM (Spark) and native (Rust) code, choosing the right profiler depends on what
you are investigating.

## Choosing a Profiling Tool

| Tool | JVM Frames | Native (Rust) Frames | Install Required | Best For |
| ------------------------------------------------------------------------------ | ---------- | -------------------- | ---------------- | ------------------------------------------------------------------------------ |
| [async-profiler](https://github.com/async-profiler/async-profiler) | Yes | Yes | Yes | End-to-end Comet profiling with unified JVM + native flame graphs |
| [Java Flight Recorder (JFR)](https://docs.oracle.com/en/java/javase/17/jfapi/) | Yes | No | No (JDK 11+) | GC pressure, allocations, thread contention, I/O — any JVM-level investigation |
| [cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph) | No | Yes | Yes | Isolated Rust micro-benchmarks without a JVM |

**Recommendation:** Use **async-profiler** when profiling Spark queries with Comet enabled —
it is the only tool that shows both JVM and native frames in a single flame graph.
Use **JFR** when you need rich JVM event data (GC, locks, I/O) without installing anything.
Use **cargo-flamegraph** when working on native code in isolation via `cargo bench`.

## Profiling with async-profiler

[async-profiler](https://github.com/async-profiler/async-profiler) captures JVM and
native code in the same flame graph by using Linux `perf_events` or macOS `dtrace`.
This makes it ideal for profiling Comet, where hot paths cross the JNI boundary between
Spark and Rust.

### Installation

Download a release from the [async-profiler releases page](https://github.com/async-profiler/async-profiler/releases):

```shell
# Linux x64
wget https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz
tar xzf async-profiler-3.0-linux-x64.tar.gz -C /opt/async-profiler --strip-components=1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tar ... -C /opt/async-profiler will fail if /opt/async-profiler doesn’t already exist (and writing to /opt often requires root), so readers following this verbatim may hit an error.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to=have; category:documentation; feedback: The Augment AI reviewer is correct! The /opt folder may not exist or may exist but usually requires root permissions. To prevent any confusion in the readers it would be better to use some folder in the user's $HOME

export ASYNC_PROFILER_HOME=/opt/async-profiler
```

On macOS, download the appropriate `macos` archive instead.

### Attaching to a running Spark application

Use the `asprof` launcher to attach to a running JVM by PID:

```shell
# Start CPU profiling for 30 seconds, output an HTML flame graph
$ASYNC_PROFILER_HOME/bin/asprof -d 30 -f flamegraph.html <pid>

# Wall-clock profiling
$ASYNC_PROFILER_HOME/bin/asprof -e wall -d 30 -f flamegraph.html <pid>

# Start profiling (no duration limit), then stop later
$ASYNC_PROFILER_HOME/bin/asprof start -e cpu <pid>
# ... run your query ...
$ASYNC_PROFILER_HOME/bin/asprof stop -f flamegraph.html <pid>
```

Find the Spark driver/executor PID with `jps` or `pgrep -f SparkSubmit`.

### Passing profiler flags via spark-submit

You can also attach async-profiler as a Java agent at JVM startup:

```shell
spark-submit \
--conf "spark.driver.extraJavaOptions=-agentpath:/opt/async-profiler/lib/libasyncProfiler.so=start,event=cpu,file=driver.html" \
--conf "spark.executor.extraJavaOptions=-agentpath:/opt/async-profiler/lib/libasyncProfiler.so=start,event=cpu,file=executor.html" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fixed output filenames like executor.html/executor.jfr can lead to profiles being overwritten if multiple executors/JVMs write into the same working directory on a host.

Severity: low

Other Locations
  • docs/source/contributor-guide/profiling.md:138

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-but-wont-fix; category:documentation; feedback: The Augment AI reviewer is correct! The profiling output files would be overwritten but usually this is the needed behavior. The developer makes some improvement and reloads the page in the browser to see whether it has a good effect. If the developer wants to compare results then (s)he need to generate unique file names, e.g. with a timestamp.

...
```

### Choosing an event type

| Event | When to use |
| ------- | --------------------------------------------------------------------------------------------------------- |
| `cpu` | Default. Shows where CPU cycles are spent. Use for compute-bound queries. |
| `wall` | Wall-clock time including blocked/waiting threads. Use to find JNI boundary overhead and I/O stalls. |
| `alloc` | Heap allocation profiling. Use to find JVM allocation hotspots around Arrow FFI and columnar conversions. |
| `lock` | Lock contention. Use when threads appear to spend time waiting on synchronized blocks or locks. |

### Output formats

| Format | Flag | Description |
| ---------------- | ------------------ | -------------------------------------------------- |
| HTML flame graph | `-f out.html` | Interactive flame graph (default and most useful). |
| JFR | `-f out.jfr` | Viewable in JDK Mission Control or IntelliJ. |
| Collapsed stacks | `-f out.collapsed` | For use with Brendan Gregg's FlameGraph scripts. |
| Text summary | `-o text` | Flat list of hot methods. |

### Platform notes

**Linux:** Set `perf_event_paranoid` to allow profiling:

```shell
sudo sysctl kernel.perf_event_paranoid=1 # or 0 / -1 for full access
sudo sysctl kernel.kptr_restrict=0 # optional: enable kernel symbols
```

**macOS:** async-profiler uses `dtrace` on macOS, which requires running as root or
with SIP (System Integrity Protection) adjustments. Native Rust frames may not be fully
resolved on macOS; Linux is recommended for the most complete flame graphs.

### Integrated benchmark profiling

The TPC benchmark scripts in `benchmarks/tpc/` have built-in async-profiler support via
the `--async-profiler` flag. See [benchmarks/tpc/README.md](../../benchmarks/tpc/README.md)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relative link to ../../benchmarks/tpc/README.md points outside the docs source tree, which may render as a broken link in the published documentation build (even though it works in the repo).

Severity: medium

Other Locations
  • docs/source/contributor-guide/profiling.md:177

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

@martin-augment martin-augment Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category: documentation; feedback: The Augment AI reviewer is correct! The link is broken. It should use three "go up"s to point to the wanted document. Prevents releasing documentation with broken links.

for details.

## Profiling with Java Flight Recorder

[Java Flight Recorder (JFR)](https://docs.oracle.com/en/java/javase/17/jfapi/) is built
into JDK 11+ and collects detailed JVM runtime data with very low overhead. It does not
see native Rust frames, but is excellent for diagnosing GC pressure, thread contention,
I/O latency, and JVM-level allocation patterns.

### Adding JFR flags to spark-submit

```shell
spark-submit \
--conf "spark.driver.extraJavaOptions=-XX:StartFlightRecording=duration=120s,filename=driver.jfr" \
--conf "spark.executor.extraJavaOptions=-XX:StartFlightRecording=duration=120s,filename=executor.jfr" \
...
```

For continuous recording without a fixed duration:

```shell
--conf "spark.driver.extraJavaOptions=-XX:StartFlightRecording=disk=true,maxsize=500m,filename=driver.jfr"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This example for continuous JFR recording is presented as a single --conf flag, which is an incomplete command. For clarity and consistency with the preceding example, it would be better to show the full spark-submit command structure. This also provides an opportunity to include the corresponding spark.executor.extraJavaOptions which is important for profiling Spark applications.

Suggested change
--conf "spark.driver.extraJavaOptions=-XX:StartFlightRecording=disk=true,maxsize=500m,filename=driver.jfr"
spark-submit \
--conf "spark.driver.extraJavaOptions=-XX:StartFlightRecording=disk=true,maxsize=500m,filename=driver.jfr" \
--conf "spark.executor.extraJavaOptions=-XX:StartFlightRecording=disk=true,maxsize=500m,filename=executor.jfr" \
...

```

You can also start and stop recording dynamically using `jcmd`:

```shell
jcmd <pid> JFR.start name=profile
# ... run your query ...
jcmd <pid> JFR.stop name=profile filename=recording.jfr
```

### Viewing recordings

- **[JDK Mission Control (JMC)](https://jdk.java.net/jmc/)** — the most comprehensive viewer.
Shows flame graphs, GC timeline, thread activity, I/O, and allocation hot spots.
- **IntelliJ IDEA** — open `.jfr` files directly in the built-in profiler
(Run → Open Profiler Snapshot).
- **`jfr` CLI** — quick summaries from the command line: `jfr summary driver.jfr`

### Useful JFR events for Comet debugging

| Event | What it shows |
| ------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `jdk.GCPhasePause` | GC pause durations — helps identify memory pressure from Arrow allocations. |
| `jdk.ObjectAllocationInNewTLAB` / `jdk.ObjectAllocationOutsideTLAB` | Allocation hot spots. |
| `jdk.JavaMonitorWait` / `jdk.ThreadPark` | Thread contention and lock waits. |
| `jdk.FileRead` / `jdk.FileWrite` / `jdk.SocketRead` | I/O latency. |
| `jdk.ExecutionSample` | CPU sampling (method profiling, similar to a flame graph). |

### Integrated benchmark profiling

The TPC benchmark scripts support `--jfr` for automatic JFR recording during benchmark
runs. See [benchmarks/tpc/README.md](../../benchmarks/tpc/README.md) for details.

## Profiling Native Code with cargo-flamegraph

For profiling Rust code in isolation — without a JVM — use `cargo bench` with
[cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph).

### Running micro benchmarks with cargo bench

When implementing a new operator or expression, it is good practice to add a new microbenchmark under `core/benches`.

It is often easiest to copy an existing benchmark and modify it for the new operator or expression. It is also
necessary to add a new section to the `Cargo.toml` file, such as:

```toml
[[bench]]
name = "shuffle_writer"
harness = false
```

These benchmarks are useful when for comparing performance between releases or between feature branches and the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a minor grammatical error here. The phrase 'when for' should likely be just 'for'.

Suggested change
These benchmarks are useful when for comparing performance between releases or between feature branches and the
These benchmarks are useful for comparing performance between releases or between feature branches and the

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is a minor typo in the sentence that should be corrected to prevent an incorrect grammar.

main branch to help prevent regressions in performance when adding new features or fixing bugs.

Individual benchmarks can be run by name with the following command.

```shell
cargo bench shuffle_writer
```

Here is some sample output from running this command.

```
Running benches/shuffle_writer.rs (target/release/deps/shuffle_writer-e37b59e37879cce7)
Gnuplot not found, using plotters backend
shuffle_writer/shuffle_writer
time: [2.0880 ms 2.0989 ms 2.1118 ms]
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
```

### Profiling with cargo-flamegraph

Install cargo-flamegraph:

```shell
cargo install flamegraph
```

Follow the instructions in [cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph) for your platform for
running flamegraph.

Here is a sample command for running `cargo-flamegraph` on MacOS.

```shell
cargo flamegraph --root --bench shuffle_writer
```

This will produce output similar to the following.

```
dtrace: system integrity protection is on, some features will not be available

dtrace: description 'profile-997 ' matched 1 probe
Gnuplot not found, using plotters backend
Testing shuffle_writer/shuffle_writer
Success

dtrace: pid 66402 has exited
writing flamegraph to "flamegraph.svg"
```

The generated flamegraph can now be opened in a browser that supports svg format.

Here is the flamegraph for this example:

![flamegraph](../_static/images/flamegraph.png)

## Tips for Profiling Comet

### Use wall-clock profiling to spot JNI boundary overhead

When profiling Comet with async-profiler, `wall` mode is often more revealing than `cpu`
because it captures time spent crossing the JNI boundary, waiting for native results,
and blocked on I/O — none of which show up in CPU-only profiles.

```shell
$ASYNC_PROFILER_HOME/bin/asprof -e wall -d 60 -f wall-profile.html <pid>
```

### Use alloc profiling around Arrow FFI

JVM allocation profiling can identify hotspots in the Arrow FFI path where temporary
objects are created during data transfer between JVM and native code:

```shell
$ASYNC_PROFILER_HOME/bin/asprof -e alloc -d 60 -f alloc-profile.html <pid>
```

Look for allocations in `CometExecIterator`, `CometBatchIterator`, and Arrow vector
classes.

### Isolate Rust-only performance issues

If a flame graph shows the hot path is entirely within native code, switch to
`cargo-flamegraph` to get better symbol resolution and avoid JVM noise:

```shell
cd native
cargo flamegraph --root --bench <benchmark_name>
```

### Correlating JVM and native frames

In async-profiler flame graphs, native Rust frames appear below JNI entry points like
`Java_org_apache_comet_Native_*`. Look for these transition points to understand how
time is split between Spark's JVM code and Comet's native execution.
94 changes: 0 additions & 94 deletions docs/source/contributor-guide/profiling_native_code.md

This file was deleted.