Skip to content

Add benchmark configuration output to support posting to DB#239

Open
misiugodfrey wants to merge 26 commits intomainfrom
misiug/benchmarkjsonconfig
Open

Add benchmark configuration output to support posting to DB#239
misiugodfrey wants to merge 26 commits intomainfrom
misiug/benchmarkjsonconfig

Conversation

@misiugodfrey
Copy link
Contributor

@misiugodfrey misiugodfrey commented Feb 16, 2026

This PR consolidates benchmark result reporting, unifies engine/GPU detection across Docker and SLURM environments, and refactors shared test infrastructure into a common module. This is for the purpose of providing an interface to automatically post benchmark results to the benchmarking DB via the post_results.py script.

Run context collection (run_context.py):
Automatically gathers engine type, GPU model, scale factor, worker count, and num_drivers at benchmark time and writes them into the "context" section of benchmark_result.json, which can then be parsed as part of post_results.py. E.g.:

  "context": {
    "iterations_count": 5,
    "schema_name": "sf1",
    "scale_factor": 1,
    "n_workers": 1,
    "kind": "single-node",
    "engine": "java",
    "gpu_count": 0,
    "gpu_name": "NA",
    "num_drivers": 32,
    "execution_number": 1,
    "timestamp": "2026-03-04T23:51:21Z",
    "benchmark": "tpch"
  },

Unified engine and GPU detection:
It was necessary to gather information about a running cluster to put into the benchmark context. To that end some changes were necessary to provide the appropriate information at runtime (rather than trying to predict context from things like image names) as well as to unify how we present this information across the docker and slurm scripts.
Engine detection via cluster-tag: All variants (GPU, CPU, Java) now consistently set cluster-tag in the coordinator config. run_context.py queries the Presto /v1/cluster API to determine the engine type, replacing the previous approach of parsing Docker image names or SLURM logs.
GPU model detection via worker logs: Docker worker containers now run nvidia-smi -L at startup and write output to worker_logs/worker_.log, matching the existing SLURM behavior. run_context.py reads these logs uniformly regardless of the deployment environment.
launch_presto_servers.sh rewritten to capture nvidia-smi output and redirect presto_server stdout/stderr to per-worker log files. Mounted as a read-only volume so changes don't require image rebuilds.
worker_logs volume mount added to all worker services (GPU, CPU, Java) in Docker Compose files.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output sample you shared looks good to me.

Copy link
Contributor

@paul-aiyedun paul-aiyedun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall idea makes sense to me. However, I had questions about some of the implementation details, Slurm specific logic, and assumptions being made.

Also, please update the PR title to reflect this update.

coordinator=false
# Worker REST/HTTP port for internal and admin endpoints.
http-server.http.port=8080
http-server.http.port=10000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an intentional update?

Copy link
Contributor Author

@misiugodfrey misiugodfrey Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The worker's default port should not be the same as the coordinator's port. This change makes things more clear and the defaults are in sync with both the multi-worker case and the slurm case (where these values will be overwritten).

Without this change, you can't access the worker node's API for single worker sessions, which can still be necessary to get some information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After our discussion, I've reverted this change; although I may need to make further changes once testing can resume on slurm clusters.

-m, --metrics Collect detailed metrics from Presto REST API after each query.
Metrics are stored in query-specific directories.

ENVIRONMENT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use command line arguments instead of environment variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only left the DEBUG option as it would be tedious to pass that through the entire chain of scripts. If we want to change that further, we can, but I think this is cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the chain of scripts in this case (I believe run_benchmark.sh is a top-level script)? Besides the debug variable, there is also LOGS below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_benchmark.sh is the top-level script, but the PRESTO_BENCHMARK_DEBUG context is used by the python scripts that are called by run_benchmark.sh.

As for LOGS, this value is supposed to reflect the hard-coded directory structure that the docker containers mount (although it can be changed in the slurm context). For now I'm removing this comment because it should not be changable unless we can also change it in the docker templates. Easier to just have one hard-coded path for now.

PRESTO_BENCHMARK_DEBUG Set to 1 to print debug logs for worker/engine detection
(e.g. node URIs, reachability, metrics, Docker containers).
Use when engine is misdetected or the run fails.
Docker In Docker setups, engine is inferred from running worker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please expand on this?

Copy link
Contributor Author

@misiugodfrey misiugodfrey Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening here is that some of the details of the run are inferred based on the name of the image that is running. If presto-native-worker-gpu is running, then it infers that this is a velox-gpu run. If docker is not running, then it checks if it's in a Slurm environment, in which case it uses other techniques based on how the slurm scripts work.

To clarify, this isn't an environment variable, just a note that the environment in which it is running will affect this path. I can make the comments more clear.

@@ -32,3 +32,12 @@ class BenchmarkKeys(str, Enum):
CONTEXT_KEY = "context"
ITERATIONS_COUNT_KEY = "iterations_count"
SCHEMA_NAME_KEY = "schema_name"
SCALE_FACTOR_KEY = "scale_factor"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we copy over the metadata.json file from the dataset instead of duplicating some of the details here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean copying the metadata.json file directly, or referencing the data inside of it? Right now the scale factor can be overridden in our options when running benchmarks, so we either take the override if provided, or read the scale factor from the metadata.json.
I've removed this key since it is no longer used. The code to extract the scale factor is in run_context.py:get_scale_factor_from_schema()

"benchmark": benchmark_types[0] if len(benchmark_types) == 1 else benchmark_types,
**run_config,
}
with open(f"{bench_output_dir}/benchmark_config.json", "w") as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing JSON file has a "context" field. Can we extend that instead of writing to a new file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


def get_gpu_name_from_slurm_logs() -> str | None:
"""
When running under SLURM, workers run nvidia-smi -L and write to LOGS/worker_<id>.log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a need to log the output of nvidia-smi, then we should probably do this consistently (and not just for slurm).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach get the gpu details by running nvidia-smi in both contexts, it's just that in the docker context we run it on the host, whereas in the slurm context, the login-nodes don't have nvidia-smi installed, so you need to run it on the worker node (i.e. in the worker's container).

We could potentially try to consolidate this, but it's more an issue of how the slurm clusters' encapsulate things that required extra steps in that context.


def get_engine_from_slurm() -> str | None:
"""
Infer engine when running under SLURM from nvidia-smi -L output in LOGS/worker_0.log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we standardize how this is done instead of having a slurm specific solution?


def get_gpu_name() -> str | None:
"""
Return GPU model name. Under SLURM, read from LOGS/worker_<id>.log if LOGS is set;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.


def get_worker_image() -> str | None:
"""Return worker image name from env (set by cluster/container setup)."""
return os.environ.get("WORKER_IMAGE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only set in the slurm scripts. I'm removing this section since it doesn't make much sense to specify a WORKER_IMAGE name since we are detecting the engine/variant through other means.

return os.environ.get("WORKER_IMAGE")


def _current_username() -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan on adding the ability to run presto with existing images. I don't think we should make assumptions about image/tag names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to determine the engine/variant through the presto API, so this is all removed now.

@misiugodfrey misiugodfrey changed the title Misiug/benchmarkjsonconfig Add benchmark configuration output to support posting to DB Mar 4, 2026
@misiugodfrey misiugodfrey requested a review from mattgara March 5, 2026 00:34
- ../../config/generated/gpu/etc_worker/config_native.properties:/opt/presto-server/etc/config.properties
- ../../config/generated/gpu/etc_worker/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties
- ../../worker_logs:/opt/presto-server/logs
- ../../launch_presto_servers.sh:/opt/launch_presto_servers.sh:ro
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mounting this file was necessary so that older images can still benefit from the new version.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to post_results.py look good.

Copy link
Contributor

@paul-aiyedun paul-aiyedun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes overall look good to me. However, I had a few questions, code cleanup comments, and comments about logging.

└── result_dir
└── benchmark_result.json
└── logs # optional
└── slurm-4575179.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should this still be Slurm specific?

SCHEMA_NAME_KEY = "schema_name"
# Run configuration (from run context; written to context in benchmark_result.json)
TIMESTAMP_KEY = "timestamp"
NUM_WORKERS_KEY = "n_workers"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can this be worker_count (consistent with other keys)?

-m, --metrics Collect detailed metrics from Presto REST API after each query.
Metrics are stored in query-specific directories.

ENVIRONMENT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the chain of scripts in this case (I believe run_benchmark.sh is a top-level script)? Besides the debug variable, there is also LOGS below.

- ../../config/generated/gpu/etc_worker/node.properties:/opt/presto-server/etc/node.properties
- ../../config/generated/gpu/etc_worker/config_native.properties:/opt/presto-server/etc/config.properties
- ../../config/generated/gpu/etc_worker/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties
- ../../worker_logs:/opt/presto-server/logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for limiting this to just worker logs?

- ../../config/generated/gpu/etc_worker_{{ gpu_id }}/node.properties:/opt/presto-server/etc/node.properties
- ../../config/generated/gpu/etc_worker_{{ gpu_id }}/config_native.properties:/opt/presto-server/etc/config.properties
- ../../config/generated/gpu/etc_worker_{{ gpu_id }}/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties
- ../../worker_logs:/opt/presto-server/logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating this update in all the variant docker-compose files, can this be set in docker-compose.common.yml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this to presto-base-volumes so that it is shared now.

pass


def _parse_gpu_name_from_text(line: str) -> str | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this by directly printing the GPU name i.e. nvidia-smi --query-gpu=name --format=csv,noheader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this, but keep in mind that this needs to be run in the container and therefore the docker logs would still need to be parsed. If we use the above suggestion, it would be less obvious that we are getting nvidia-smi output (No GPU: anchor) and we would have to assume that the first line would give us this information.
I think it may be cleaner to use the nvidia-smi -L output as it's clear what that is, and if it's missing, the parse will fail, rather than assuming the first line of the logs will be a gpu name.

}


def _get_num_drivers(engine: str) -> int | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also get this from the logs?


if n_workers is not None:
ctx["n_workers"] = n_workers
ctx["kind"] = "single-node" if n_workers == 1 else f"{n_workers}-node"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what is the purpose of the kind field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is one of the columns specified in the database. I believe the intention is to separate single-node from multi-node runs. With the information we are providing, I think it is a little redundant (since we have num_workers), but may be more necessary for other platforms.

return ctx


def pytest_sessionfinish(session, exitstatus):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicating logic in

def pytest_sessionfinish(session, exitstatus):
?

@@ -15,27 +15,28 @@
expected directory structure is:

../benchmark-root/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What script is creating a directory with this structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This occurs in conftest.py:pytest_sessionfinish(). The top level script would just be run_benchmark.sh

Copy link
Contributor

@paul-aiyedun paul-aiyedun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple of comments, but changes look good to me.

if hasattr(session, "benchmark_results"):
benchmark_types = list(session.benchmark_results.keys())
json_result[BenchmarkKeys.CONTEXT_KEY]["benchmark"] = (
benchmark_types[0] if len(benchmark_types) == 1 else benchmark_types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to have the value be consistently a list?

Comment on lines +57 to +67
hostname = session.config.getoption("--hostname")
port = session.config.getoption("--port")
user = session.config.getoption("--user")
schema_name = session.config.getoption("--schema-name")

ctx = gather_run_context(
hostname=hostname,
port=port,
user=user,
schema_name=schema_name,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are presto specific options. I think we can set run_context in a presto specific fixture (similar to

def benchmark_result_collector(request):
).

@misiugodfrey misiugodfrey requested a review from a team as a code owner March 14, 2026 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants