Skip to content

Conversation

@bangqipropel
Copy link

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a comprehensive blog post about benchmarking KAITO's Retrieval-Augmented Generation (RAG) service on Azure Kubernetes Service (AKS). The post presents detailed performance comparisons between RAG and baseline LLM approaches across two distinct use cases: document question answering and code modification tasks.

Key Changes

  • Added new "benchmarking" tag to the blog taxonomy for performance testing content
  • Added new author "Bangqi Zhu" to the authors registry
  • Published detailed technical blog post with benchmark methodologies, results, and insights demonstrating RAG's 68% improvement on document Q&A and 20% improvement on code modification tasks

Reviewed changes

Copilot reviewed 3 out of 7 changed files in this pull request and generated 4 comments.

File Description
website/blog/tags.yml Added new "benchmarking" tag with label, permalink, and description for performance testing content
website/blog/authors.yml Added author profile for Bangqi Zhu including LinkedIn, GitHub links, and profile image
website/blog/2025-12-08-kaito-rag-benchmarking/index.md Comprehensive blog post covering RAG benchmarking methodology, results showing significant performance improvements, detailed implementation guides, and best practices for running benchmarks on AKS

Comment on lines +18 to +20
<!-- truncate -->

## Why Benchmark RAG?
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing hero image after the truncate marker. According to the blog post guidelines, a hero image should be placed immediately after <!-- truncate --> using the pattern ![Hero Image](./hero-image.png). The required structure is:

<!-- truncate -->

![Hero Image Description](./hero-image.png)

## First Section

Consider adding a hero image that visually represents the blog post's main topic (benchmarking KAITO RAG).

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +18 to +20
<!-- truncate -->

## Why Benchmark RAG?
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing hero image after the truncate marker. According to the blog post guidelines, "Hero image: Use ./hero-image.png for same-directory assets" should be placed immediately after <!-- truncate -->. The pattern should be:

<!-- truncate -->

![Hero Image](./hero-image.png)

## Section 1: Problem/Context

Currently, the post jumps directly to section content without a hero image.

Copilot generated this review using guidance from repository custom instructions.
Signed-off-by: Bangqi Zhu <bangqizhu@microsoft.com>

- **How much does RAG improve answer quality?** Traditional LLMs rely solely on pre-trained knowledge, which can be outdated or incomplete for domain-specific queries.
- **Is RAG cost-effective?** Token usage directly impacts operational costs at scale.
- **Where does RAG struggle?** Understanding failure modes guides system improvements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Where does RAG struggle?** Understanding failure modes guides system improvements.
- **Where does RAG experience bottlenecks?** Understanding different failure modes guides system improvements.


Let's dive into each benchmark and its results.

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
---


Our comprehensive benchmarking reveals nuanced insights about RAG performance:

✅ **RAG Excels At:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**RAG Excels At:**
**RAG Excels At:**

- Automatic context retrieval with high precision
- Reducing hallucination on domain-specific queries

💡 **Optimization Insights:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
💡 **Optimization Insights:**
**Optimization Insights:**

--output comparison_report.json
```

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
---

4. **Better coverage**: RAG considers all indexed files, not just obvious candidates

:::tip Benchmark Validation
RAG's 60% success validates the TOP-4 filtering approach! This proves that:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RAG's 60% success validates the TOP-4 filtering approach! This proves that:
RAG's 60% success validates the TOP-4 filtering approach, demonstrating that:

Higher token usage with RAG is expected since we include retrieved context. The trade-off between accuracy gain and cost must be evaluated for your specific use case.
:::

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
---

Comment on lines +101 to +105
--rag-url http://localhost:5000 \
--llm-url http://your-llm-api.com \
--judge-url http://your-llm-api.com \
--llm-model "deepseek-v3.1" \
--judge-model "deepseek-v3.1" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on the context of these inputs and how they're used for benchmarking against the chosen LLM?

Copy link
Contributor

@sdesai345 sdesai345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included feedback

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 7 changed files in this pull request and generated 5 comments.


**Architecture Flow:**

![Document Q&A Benchmark Flow](./document-benchmark-flow.png)
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.

Suggested change
![Document Q&A Benchmark Flow](./document-benchmark-flow.png)
![Flow diagram showing the Document Q&A Benchmark process: documents are indexed, test questions are generated, both RAG and baseline LLM answer the questions, an LLM judge scores the answers, and results are analyzed.](./document-benchmark-flow.png)

Copilot uses AI. Check for mistakes.
| **Overall Average** | 8.15/10 | 4.85/10 | **+68%** |
| **Token Usage** | Variable | Baseline | Context-dependent |

![Performance Comparison: Document Q&A vs Code Modification](./performance-comparison.png)
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.

Suggested change
![Performance Comparison: Document Q&A vs Code Modification](./performance-comparison.png)
![Bar chart comparing RAG and pure LLM performance scores for closed and open document Q&A questions, showing RAG significantly outperforming pure LLM, especially on closed questions.](./performance-comparison.png)

Copilot uses AI. Check for mistakes.

### Architecture Flow

![Code Modification Benchmark Flow](./code-benchmark-flow.png)
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.

Suggested change
![Code Modification Benchmark Flow](./code-benchmark-flow.png)
![Diagram showing the KAITO RAG code modification benchmark flow, including user query, document retrieval, top-4 file selection, LLM response generation, and evaluation steps](./code-benchmark-flow.png)

Copilot uses AI. Check for mistakes.
✓ Tests passed (3/5 issues succeed, 2/5 fail)
```

![TOP-4 Relevance Filtering Process](./top4-filtering-diagram.png)
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.

Suggested change
![TOP-4 Relevance Filtering Process](./top4-filtering-diagram.png)
![Diagram showing how RAG filters retrieved files by relevance score and selects the top 4 files for code modification tasks](./top4-filtering-diagram.png)

Copilot uses AI. Check for mistakes.
--create-namespace
```

1. **Index Your Content**:
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbered list formatting is inconsistent. Item 2 "RAG Engine Deployed" should be numbered as "2." to continue the sequence. However, there's a list item "1. Index Your Content" starting at line 297, which appears to be a third item but is numbered as 1. Please renumber this as "3." to maintain proper list sequence.

Suggested change
1. **Index Your Content**:
3. **Index Your Content**:

Copilot uses AI. Check for mistakes.

### Prerequisites

1. **AKS Cluster with KAITO**: Follow [KAITO installation guide](https://kaito-project.github.io/kaito/docs/installation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this referencing OSS installation instructions and not the KAITO add-on?


---

## Key Takeaways
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the emojis. This screams "AI generated" at me :)

Comment on lines +371 to +377
- **Document Benchmark**: [`rag_benchmark_docs/`](https://github.com/kaito-project/kaito/tree/main/rag_benchmark_docs)
- Quick start: [`RAG_BENCHMARK_DOCS_README.md`](https://github.com/kaito-project/kaito/blob/main/rag_benchmark_docs/RAG_BENCHMARK_DOCS_README.md)
- Complete guide: [`RAG_BENCHMARK_DOCS_GUIDE.md`](https://github.com/kaito-project/kaito/blob/main/rag_benchmark_docs/RAG_BENCHMARK_DOCS_GUIDE.md)

- **Code Benchmark**: [`code_benchmark/`](https://github.com/kaito-project/kaito/pull/1678) (PR pending merge)
- Quick start: [`GETTING_STARTED.md`](https://github.com/kaito-project/kaito/pull/1678/files#diff-d5b183b0a8f37a07a826b64ccfa966be89d3c80c948265bd66be8c53f7dd4f00)
- Complete guide: [`CODE_BENCHMARK_GUIDE.md`](https://github.com/kaito-project/kaito/pull/1678/files#diff-9a5ff0d2cd3c7b140aab1d0c9a6f4bfb0f3c91bf0e55fd31b57669289958056c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not really descriptive link text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants