-
Notifications
You must be signed in to change notification settings - Fork 355
kaito benchmark blog #5497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
kaito benchmark blog #5497
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces a comprehensive blog post about benchmarking KAITO's Retrieval-Augmented Generation (RAG) service on Azure Kubernetes Service (AKS). The post presents detailed performance comparisons between RAG and baseline LLM approaches across two distinct use cases: document question answering and code modification tasks.
Key Changes
- Added new "benchmarking" tag to the blog taxonomy for performance testing content
- Added new author "Bangqi Zhu" to the authors registry
- Published detailed technical blog post with benchmark methodologies, results, and insights demonstrating RAG's 68% improvement on document Q&A and 20% improvement on code modification tasks
Reviewed changes
Copilot reviewed 3 out of 7 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| website/blog/tags.yml | Added new "benchmarking" tag with label, permalink, and description for performance testing content |
| website/blog/authors.yml | Added author profile for Bangqi Zhu including LinkedIn, GitHub links, and profile image |
| website/blog/2025-12-08-kaito-rag-benchmarking/index.md | Comprehensive blog post covering RAG benchmarking methodology, results showing significant performance improvements, detailed implementation guides, and best practices for running benchmarks on AKS |
| <!-- truncate --> | ||
|
|
||
| ## Why Benchmark RAG? |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing hero image after the truncate marker. According to the blog post guidelines, a hero image should be placed immediately after <!-- truncate --> using the pattern . The required structure is:
<!-- truncate -->

## First SectionConsider adding a hero image that visually represents the blog post's main topic (benchmarking KAITO RAG).
| <!-- truncate --> | ||
|
|
||
| ## Why Benchmark RAG? |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing hero image after the truncate marker. According to the blog post guidelines, "Hero image: Use ./hero-image.png for same-directory assets" should be placed immediately after <!-- truncate -->. The pattern should be:
<!-- truncate -->

## Section 1: Problem/ContextCurrently, the post jumps directly to section content without a hero image.
Signed-off-by: Bangqi Zhu <bangqizhu@microsoft.com>
a19f855 to
80e993c
Compare
|
|
||
| - **How much does RAG improve answer quality?** Traditional LLMs rely solely on pre-trained knowledge, which can be outdated or incomplete for domain-specific queries. | ||
| - **Is RAG cost-effective?** Token usage directly impacts operational costs at scale. | ||
| - **Where does RAG struggle?** Understanding failure modes guides system improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - **Where does RAG struggle?** Understanding failure modes guides system improvements. | |
| - **Where does RAG experience bottlenecks?** Understanding different failure modes guides system improvements. |
|
|
||
| Let's dive into each benchmark and its results. | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| --- |
|
|
||
| Our comprehensive benchmarking reveals nuanced insights about RAG performance: | ||
|
|
||
| ✅ **RAG Excels At:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ✅ **RAG Excels At:** | |
| **RAG Excels At:** |
| - Automatic context retrieval with high precision | ||
| - Reducing hallucination on domain-specific queries | ||
|
|
||
| 💡 **Optimization Insights:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 💡 **Optimization Insights:** | |
| **Optimization Insights:** |
| --output comparison_report.json | ||
| ``` | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| --- |
| 4. **Better coverage**: RAG considers all indexed files, not just obvious candidates | ||
|
|
||
| :::tip Benchmark Validation | ||
| RAG's 60% success validates the TOP-4 filtering approach! This proves that: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| RAG's 60% success validates the TOP-4 filtering approach! This proves that: | |
| RAG's 60% success validates the TOP-4 filtering approach, demonstrating that: |
| Higher token usage with RAG is expected since we include retrieved context. The trade-off between accuracy gain and cost must be evaluated for your specific use case. | ||
| ::: | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| --- |
| --rag-url http://localhost:5000 \ | ||
| --llm-url http://your-llm-api.com \ | ||
| --judge-url http://your-llm-api.com \ | ||
| --llm-model "deepseek-v3.1" \ | ||
| --judge-model "deepseek-v3.1" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on the context of these inputs and how they're used for benchmarking against the chosen LLM?
sdesai345
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 7 changed files in this pull request and generated 5 comments.
|
|
||
| **Architecture Flow:** | ||
|
|
||
|  |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.
|  | |
|  |
| | **Overall Average** | 8.15/10 | 4.85/10 | **+68%** | | ||
| | **Token Usage** | Variable | Baseline | Context-dependent | | ||
|
|
||
|  |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.
|  | |
|  |
|
|
||
| ### Architecture Flow | ||
|
|
||
|  |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.
|  | |
|  |
| ✓ Tests passed (3/5 issues succeed, 2/5 fail) | ||
| ``` | ||
|
|
||
|  |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.
|  | |
|  |
| --create-namespace | ||
| ``` | ||
|
|
||
| 1. **Index Your Content**: |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbered list formatting is inconsistent. Item 2 "RAG Engine Deployed" should be numbered as "2." to continue the sequence. However, there's a list item "1. Index Your Content" starting at line 297, which appears to be a third item but is numbered as 1. Please renumber this as "3." to maintain proper list sequence.
| 1. **Index Your Content**: | |
| 3. **Index Your Content**: |
|
|
||
| ### Prerequisites | ||
|
|
||
| 1. **AKS Cluster with KAITO**: Follow [KAITO installation guide](https://kaito-project.github.io/kaito/docs/installation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this referencing OSS installation instructions and not the KAITO add-on?
|
|
||
| --- | ||
|
|
||
| ## Key Takeaways |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the emojis. This screams "AI generated" at me :)
| - **Document Benchmark**: [`rag_benchmark_docs/`](https://github.com/kaito-project/kaito/tree/main/rag_benchmark_docs) | ||
| - Quick start: [`RAG_BENCHMARK_DOCS_README.md`](https://github.com/kaito-project/kaito/blob/main/rag_benchmark_docs/RAG_BENCHMARK_DOCS_README.md) | ||
| - Complete guide: [`RAG_BENCHMARK_DOCS_GUIDE.md`](https://github.com/kaito-project/kaito/blob/main/rag_benchmark_docs/RAG_BENCHMARK_DOCS_GUIDE.md) | ||
|
|
||
| - **Code Benchmark**: [`code_benchmark/`](https://github.com/kaito-project/kaito/pull/1678) (PR pending merge) | ||
| - Quick start: [`GETTING_STARTED.md`](https://github.com/kaito-project/kaito/pull/1678/files#diff-d5b183b0a8f37a07a826b64ccfa966be89d3c80c948265bd66be8c53f7dd4f00) | ||
| - Complete guide: [`CODE_BENCHMARK_GUIDE.md`](https://github.com/kaito-project/kaito/pull/1678/files#diff-9a5ff0d2cd3c7b140aab1d0c9a6f4bfb0f3c91bf0e55fd31b57669289958056c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not really descriptive link text.
No description provided.