@@ -38,17 +38,21 @@ All benchmarks are run via `run.py`:
3838python3 run.py --engine <engine> --benchmark <tpch|tpcds> [options]
3939```
4040
41- | Option | Description |
42- | -------------- | -------------------------------------------------------- |
43- | ` --engine ` | Engine name (matches a TOML file in ` engines/ ` ) |
44- | ` --benchmark ` | ` tpch ` or ` tpcds ` |
45- | ` --iterations ` | Number of iterations (default: 1) |
46- | ` --output ` | Output directory (default: ` . ` ) |
47- | ` --query ` | Run a single query number |
48- | ` --no-restart ` | Skip Spark master/worker restart |
49- | ` --dry-run ` | Print the spark-submit command without executing |
50- | ` --jfr ` | Enable Java Flight Recorder profiling |
51- | ` --jfr-dir ` | Directory for JFR output files (default: ` /results/jfr ` ) |
41+ | Option | Description |
42+ | ------------------------- | ------------------------------------------------------------------------------- |
43+ | ` --engine ` | Engine name (matches a TOML file in ` engines/ ` ) |
44+ | ` --benchmark ` | ` tpch ` or ` tpcds ` |
45+ | ` --iterations ` | Number of iterations (default: 1) |
46+ | ` --output ` | Output directory (default: ` . ` ) |
47+ | ` --query ` | Run a single query number |
48+ | ` --no-restart ` | Skip Spark master/worker restart |
49+ | ` --dry-run ` | Print the spark-submit command without executing |
50+ | ` --jfr ` | Enable Java Flight Recorder profiling |
51+ | ` --jfr-dir ` | Directory for JFR output files (default: ` /results/jfr ` ) |
52+ | ` --async-profiler ` | Enable async-profiler (profiles Java + native code) |
53+ | ` --async-profiler-dir ` | Directory for async-profiler output (default: ` /results/async-profiler ` ) |
54+ | ` --async-profiler-event ` | Event type: ` cpu ` , ` wall ` , ` alloc ` , ` lock ` , etc. (default: ` cpu ` ) |
55+ | ` --async-profiler-format ` | Output format: ` flamegraph ` , ` jfr ` , ` collapsed ` , ` text ` (default: ` flamegraph ` ) |
5256
5357Available engines: ` spark ` , ` comet ` , ` comet-iceberg ` , ` gluten `
5458
@@ -392,3 +396,88 @@ docker compose -f benchmarks/tpc/infra/docker/docker-compose.yml \
392396
393397Open the ` .jfr ` files with [ JDK Mission Control] ( https://jdk.java.net/jmc/ ) ,
394398IntelliJ IDEA's profiler, or ` jfr ` CLI tool (` jfr summary driver.jfr ` ).
399+
400+ ## async-profiler Profiling
401+
402+ Use the ` --async-profiler ` flag to capture profiles with
403+ [ async-profiler] ( https://github.com/async-profiler/async-profiler ) . Unlike JFR,
404+ async-profiler can profile ** both Java and native (Rust/C++) code** in the same
405+ flame graph, making it especially useful for profiling Comet workloads.
406+
407+ ### Prerequisites
408+
409+ async-profiler must be installed on every node where the driver or executors run.
410+ Set ` ASYNC_PROFILER_HOME ` to the installation directory:
411+
412+ ``` shell
413+ # Download and extract (Linux x64 example)
414+ wget https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz
415+ tar xzf async-profiler-3.0-linux-x64.tar.gz -C /opt/async-profiler --strip-components=1
416+ export ASYNC_PROFILER_HOME=/opt/async-profiler
417+ ```
418+
419+ On Linux, ` perf_event_paranoid ` must be set to allow profiling:
420+
421+ ``` shell
422+ sudo sysctl kernel.perf_event_paranoid=1 # or 0 / -1 for full access
423+ sudo sysctl kernel.kptr_restrict=0 # optional: enable kernel symbols
424+ ```
425+
426+ ### Basic usage
427+
428+ ``` shell
429+ python3 run.py --engine comet --benchmark tpch --async-profiler
430+ ```
431+
432+ This produces HTML flame graphs in ` /results/async-profiler/ ` by default
433+ (` driver.html ` and ` executor.html ` ).
434+
435+ ### Choosing events and output format
436+
437+ ``` shell
438+ # Wall-clock profiling (includes time spent waiting/sleeping)
439+ python3 run.py --engine comet --benchmark tpch \
440+ --async-profiler --async-profiler-event wall
441+
442+ # Allocation profiling with JFR output
443+ python3 run.py --engine comet --benchmark tpch \
444+ --async-profiler --async-profiler-event alloc --async-profiler-format jfr
445+
446+ # Lock contention profiling
447+ python3 run.py --engine comet --benchmark tpch \
448+ --async-profiler --async-profiler-event lock
449+ ```
450+
451+ | Event | Description |
452+ | ------- | --------------------------------------------------- |
453+ | ` cpu ` | On-CPU time (default). Shows where CPU cycles go. |
454+ | ` wall ` | Wall-clock time. Includes threads that are blocked. |
455+ | ` alloc ` | Heap allocation profiling. |
456+ | ` lock ` | Lock contention profiling. |
457+
458+ | Format | Extension | Description |
459+ | ------------ | --------- | ---------------------------------------- |
460+ | ` flamegraph ` | ` .html ` | Interactive HTML flame graph (default). |
461+ | ` jfr ` | ` .jfr ` | JFR format, viewable in JMC or IntelliJ. |
462+ | ` collapsed ` | ` .txt ` | Collapsed stacks for FlameGraph scripts. |
463+ | ` text ` | ` .txt ` | Flat text summary of hot methods. |
464+
465+ ### Docker usage
466+
467+ The Docker image includes async-profiler pre-installed at
468+ ` /opt/async-profiler ` . The ` ASYNC_PROFILER_HOME ` environment variable is
469+ already set in the compose files, so no extra configuration is needed:
470+
471+ ``` shell
472+ docker compose -f benchmarks/tpc/infra/docker/docker-compose.yml \
473+ run --rm bench \
474+ python3 /opt/benchmarks/run.py \
475+ --engine comet --benchmark tpch --output /results --no-restart --async-profiler
476+ ```
477+
478+ Output files are collected in ` $RESULTS_DIR/async-profiler/ ` on the host.
479+
480+ ** Note:** On Linux, the Docker container needs ` --privileged ` or
481+ ` SYS_PTRACE ` capability and ` perf_event_paranoid <= 1 ` on the host for
482+ ` cpu ` /` wall ` events. Allocation (` alloc ` ) and lock (` lock ` ) events work
483+ without special privileges.
0 commit comments