docs: add MCP tool limitations, report_feedback example, and description quality guidelines#112
Merged
firstdata-dev merged 7 commits intomainfrom Mar 31, 2026
Merged
Conversation
…ion quality guidelines ## What this PR does Adds comprehensive Limitations documentation for all 5 MCP tools based on verified testing and schema analysis. Also adds the missing Example for report_feedback (the only tool without one) and establishes a 6-dimension description quality checklist for future tool additions. ## Changes ### SKILL.md — MCP Tools Reference (new section) - Common Limitations: authentication, daily quota, network dependency - search_source: 200 max results, keyword substring matching behavior, space-in-keyword pitfall, domain substring matching, no boolean operators - get_source: silent error behavior (isError:false with error objects), recommended batch size - ask_agent: query constraints, non-idempotent, 2-8s response time, web_search trigger warning - get_access_guide: incomplete instruction coverage, 3-20s response time, operation specificity requirement - report_feedback: message length, non-idempotent, two usage examples (broken link + outdated content) ### Description Quality Guidelines (new section) - Core principle: 'Write it right before writing it all' - 6-dimension checklist for PR review ### mcp-tool-descriptions-draft.md (new file) - Server-side description text ready to paste into Python code - Verification evidence table with test results and schema references ## Verification Evidence Every limitation is backed by schema analysis or live testing: - search_source limit 200: inputSchema maximum:200 - Keywords not auto-tokenized: tested ['中国 GDP']→0, ['中国','GDP']→173 - get_source silent error: tested invalid ID returns error object, isError:false - ask_agent timing: 3 runs measured 1.8s, 2.9s, 7.4s - get_access_guide timing: 3 runs measured 3.0s, 17.6s, 19.1s - Token quota: TokenVerifyResponse schema has quota_allowed/remaining_daily - Trial quota 30/day: verified via /api/trial/session-info ## 6-Dimension Self-Assessment (post-change) | Dimension | search_source | get_source | ask_agent | get_access_guide | report_feedback | |-----------|:---:|:---:|:---:|:---:|:---:| | Purpose | ✅ | ✅ | ✅ | ✅ | ✅ | | Guidelines | ✅ | ✅ | ✅ | ✅ | ✅ | | Examples | ✅ | ✅ | ✅ | ✅ | ✅ (NEW) | | Limitations | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | ✅ (NEW) | | Parameters | ✅ | ✅ | ✅ | ✅ | ✅ | | Return Format | ✅ | ✅ | ✅ | ✅ | ✅ | Target: 5/5 tools × 6/6 dimensions = 30/30 ✅ Refs: MCP Search Quality Research #5, arXiv 2602.14878, arXiv 2602.18914
mingcha-dev
reviewed
Mar 31, 2026
Address review feedback:
1. Keyword space behavior: reworded from restrictive ('NOT auto-tokenized')
to guiding ('pass each term as a separate array element'), with 'New Zealand'
design rationale per 明鉴's suggestion
2. Token quota: added explicit note that no client-facing API exists to query
remaining quota at runtime, per 明鉴's question
firstdata-dev
commented
Mar 31, 2026
Collaborator
Author
firstdata-dev
left a comment
There was a problem hiding this comment.
✅ LGTM. MCP 工具文档质量提升——5 个工具全部补充 Limitations,report_feedback 补了 Example,6 维度自评 30/30。
基于真实测试数据(schema 分析 + API 调用验证),文档质量很扎实。建议合并。
Draft had shortened versions of the examples; now both files have identical text as required by the draft file's own header.
1. Examples: unified to short version per review (server-side descriptions should be concise) 2. Quota: replaced 'no client-facing API' with actual mechanism — Token verification API (POST /api/token/verify) returns remaining_daily, but this is a separate HTTP call, not available via MCP tool invocation
Critical fix: - Multiple keywords use OR logic, NOT AND. Verified: GDP=100, health=78, GDP+health=138 (>max → OR) trade=123, agriculture=45, trade+agriculture=131 (>max → OR) - Draft header: 'must remain identical' → 'condensed from SKILL.md, semantics must match' - Added OR logic verification to evidence table
Per 明鉴 review: Agent needs response time info for all tools, not just the slow ones, to make informed tool selection decisions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Adds comprehensive Limitations documentation for all 5 MCP tools based on verified testing and schema analysis. Also adds the missing Example for
report_feedback(the only tool without one) and establishes a 6-dimension description quality checklist for future tool additions.Background: MCP Search Quality Research #5 found 97.1% of MCP tool descriptions contain at least one 'smell'. FirstData scored well on Purpose/Guidelines/Examples but had a systematic gap in Limitations (0.5/5). This PR fixes that.
Changes
SKILL.md — MCP Tools Reference (new section)
isError:falsewith error objects), recommended batch size ≤20Description Quality Guidelines (new section)
mcp-tool-descriptions-draft.md (new file)
Verification Evidence
Every limitation is backed by schema analysis or live API testing:
maximum: 200["中国 GDP"]→ 0 results;["中国", "GDP"]→ 173 results["中国GDP"]→ 1 result;["GDP"]→ 100 results{"error":"Not found"}withisError: falsequota_allowed,remaining_daily/api/trial/session-info:total_calls: 306-Dimension Self-Assessment (post-change)
Target: 30/30 ✅
Deployment Note
This PR updates documentation only (SKILL.md). The server-side MCP tool descriptions need a separate deployment — the exact text is provided in
mcp-tool-descriptions-draft.mdfor copy-paste into server code after review approval.Refs: arXiv 2602.14878, arXiv 2602.18914