Skip to content

Add LLM benchmarking framework to staging#2405

Closed
kubraaksux wants to merge 4 commits intoapache:mainfrom
kubraaksux:llm-benchmark
Closed

Add LLM benchmarking framework to staging#2405
kubraaksux wants to merge 4 commits intoapache:mainfrom
kubraaksux:llm-benchmark

Conversation

@kubraaksux
Copy link

Generic LLM benchmark suite for evaluating inference performance across different backends (vLLM, Ollama, OpenAI, MLX).

Features:

  • Multiple workload categories: math (GSM8K), reasoning (BoolQ, LogiQA), summarization (XSum, CNN/DM), JSON extraction
  • Pluggable backend architecture for different inference engines
  • Performance metrics: latency, throughput, memory usage
  • Accuracy evaluation per workload type
  • HTML report generation

This framework can be used to evaluate SystemDS LLM inference components once they are developed.

Generic LLM benchmark suite for evaluating inference performance
across different backends (vLLM, Ollama, OpenAI, MLX).

Features:
- Multiple workload categories: math (GSM8K), reasoning (BoolQ, LogiQA),
  summarization (XSum, CNN/DM), JSON extraction
- Pluggable backend architecture for different inference engines
- Performance metrics: latency, throughput, memory usage
- Accuracy evaluation per workload type
- HTML report generation

This framework can be used to evaluate SystemDS LLM inference
components once they are developed.
- Connection.java: Changed loadModel(modelName) to loadModel(modelName, workerScriptPath)
- Connection.java: Removed findPythonScript() method
- LLMCallback.java: Added Javadoc for generate() method
- JMLCLLMInferenceTest.java: Updated to pass script path to loadModel()
@kubraaksux kubraaksux closed this Feb 13, 2026
@kubraaksux kubraaksux deleted the llm-benchmark branch February 13, 2026 16:51
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant