From 82a0fcf02504da3430ecda78fb1701d84835e4bb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E6=98=93=E5=A4=A9=E8=8E=B2?=
 <91449279+yitianlian@users.noreply.github.com>
Date: Mon, 9 Mar 2026 20:45:26 +0800
Subject: [PATCH] Add codebase and dataset research rules for skills

---
 .agents/skills/deep-research/SKILL.md         |   2 +
 .../codebase-and-data-research-rules.md       | 126 ++++++++++++++++++
 .../references/output-template.md             |  10 ++
 .agents/skills/research-plan/SKILL.md         |   2 +
 .../codebase-and-data-research-rules.md       | 103 ++++++++++++++
 .../references/research-plan-template.md      |   6 +
 6 files changed, 249 insertions(+)
 create mode 100644 .agents/skills/deep-research/references/codebase-and-data-research-rules.md
 create mode 100644 .agents/skills/research-plan/references/codebase-and-data-research-rules.md

diff --git a/.agents/skills/deep-research/SKILL.md b/.agents/skills/deep-research/SKILL.md
index 1da6f64..ebe5e08 100644
--- a/.agents/skills/deep-research/SKILL.md
+++ b/.agents/skills/deep-research/SKILL.md
@@ -46,6 +46,8 @@ Iterate until evidence quality is sufficient:
 8. Run contradiction/counter-evidence checks.
 9. Synthesize and produce final report.
 
+When the topic has implementation, benchmark, reproduction, or planning implications, also apply [references/codebase-and-data-research-rules.md](references/codebase-and-data-research-rules.md).
+
 ## Re-entry Policy (Mid-Run)
 
 When called during an ongoing run (not only at run start):
diff --git a/.agents/skills/deep-research/references/codebase-and-data-research-rules.md b/.agents/skills/deep-research/references/codebase-and-data-research-rules.md
new file mode 100644
index 0000000..9cac32c
--- /dev/null
+++ b/.agents/skills/deep-research/references/codebase-and-data-research-rules.md
@@ -0,0 +1,126 @@
+# Codebase And Data Research Rules
+
+Use this reference when the research topic has an implementation, benchmark, reproduction, or planning component.
+
+## 1. Goal
+
+Convert vague "look at the repo/data" work into a traceable evidence pass:
+
+1. identify the most relevant code bases
+2. identify the most relevant datasets or workloads
+3. determine whether the target plan should align with prior work or intentionally deviate
+4. record evidence and uncertainty explicitly
+
+## 2. Codebase Search Order
+
+Search in this order unless the user gives a stricter constraint:
+
+1. local target repo or currently checked out project
+2. official implementation linked by the paper, project page, or organization
+3. author-maintained GitHub repositories
+4. widely used third-party reproductions only if official code is missing or incomplete
+
+Do not treat random GitHub repos as equivalent evidence to official implementations.
+
+## 3. Codebase Search Rules
+
+For each high-relevance method or paper, check whether an open-source code base exists.
+
+Minimum checks:
+
+1. repository URL
+2. owner type: official org, author, or third party
+3. activity signals: last commit date, issues, stars only as a weak signal
+4. reproducibility signals: environment file, training script, eval script, config examples, checkpoints, README completeness
+5. scope match: full training, inference only, eval only, or partial reproduction
+6. license and usage constraints
+
+If the work is highly relevant and no code is open-sourced, record that absence explicitly instead of silently skipping it.
+
+## 4. What To Extract From A Relevant Repo
+
+When a code base is relevant, extract only the implementation facts needed for reasoning:
+
+1. entry points for training, evaluation, and inference
+2. config system and major knobs
+3. data manifest or preprocessing path
+4. benchmark or metric implementation
+5. dependency and runtime assumptions
+6. checkpoint availability
+7. any mismatch between the paper claim and the released code
+
+Prefer concrete file paths, script names, and config names over generic summaries.
+
+## 5. Dataset And Workload Search Rules
+
+For each candidate dataset, workload, trace, or benchmark, record:
+
+1. what prior work used it
+2. whether it is the main comparison target in the literature
+3. task definition and label space
+4. train/validation/test split policy
+5. scale, domain, and freshness
+6. license, access, and usage restrictions
+7. known contamination, leakage, or annotation-quality concerns
+8. preprocessing conventions used by the most relevant prior work
+
+Do not choose datasets only because they are easy to download.
+
+## 6. Dataset Alignment Decision Rule
+
+Default rule:
+
+1. if the task claims improvement over prior work, align first with the datasets or workloads used by the strongest relevant baselines
+2. if the literature has no stable comparison set, propose a primary benchmark set and explain why
+3. if the target project serves a different domain or product requirement, keep one literature-aligned benchmark and add one target-domain benchmark
+
+You must state which of the following applies:
+
+1. `fully aligned with prior work`
+2. `partially aligned with one added domain-specific benchmark`
+3. `intentionally different from prior work`
+
+If intentionally different, explain what comparability is lost and what practical validity is gained.
+
+## 7. Comparison Set Construction
+
+When planning evaluations, define the comparison set in this order:
+
+1. strongest official or canonical baseline from the literature
+2. strongest practical open-source baseline that can actually be run
+3. target-project incumbent or current production baseline when applicable
+4. ablations of the proposed method
+
+If the canonical baseline cannot be reproduced, say why and choose the closest defensible substitute.
+
+## 8. Evidence Quality Rules
+
+Treat evidence tiers in this order:
+
+1. paper plus official repo plus released configs or checkpoints
+2. paper plus author repo
+3. paper only
+4. third-party reproduction
+5. blog posts, tweets, or issue comments as weak supporting evidence only
+
+Do not present tier 4 or 5 evidence as if it were definitive.
+
+## 9. Required Output When This Reference Applies
+
+Include a compact `Codebase and Data Audit` block in the research output:
+
+1. target repo inspected or not
+2. related official repos found or not
+3. strongest runnable baseline repo
+4. dataset/workload alignment choice
+5. main reproducibility risks
+6. unresolved gaps
+
+## 10. Failure And Gap Handling
+
+If repository or dataset evidence is incomplete:
+
+1. say what was searched
+2. say what was not found
+3. state whether the gap changes the recommendation materially
+4. provide the least risky fallback
diff --git a/.agents/skills/deep-research/references/output-template.md b/.agents/skills/deep-research/references/output-template.md
index ccb8500..acc0cb0 100644
--- a/.agents/skills/deep-research/references/output-template.md
+++ b/.agents/skills/deep-research/references/output-template.md
@@ -46,6 +46,15 @@ Then add short narrative:
 - For `conflict-resolution`: dispute map and source-tier arbitration.
 - For `idea-exploration`: landscape and opportunity boundaries.
 
+## Codebase and Data Audit (When Applicable)
+
+- target repo inspected or not
+- related official repos found or not
+- strongest runnable baseline repo
+- dataset/workload alignment choice
+- main reproducibility risks
+- unresolved gaps
+
 ## Research Trail Summary
 
 - queries_run=
@@ -101,3 +110,4 @@ Rules:
 4. Match output language to user language.
 5. If topic is paper-centric, do not skip `Key Works Deep Dive`.
 6. If degradation is used, include explicit degradation metadata and `Degrade Log`.
+7. If `references/codebase-and-data-research-rules.md` applies, do not skip `Codebase and Data Audit`.
diff --git a/.agents/skills/research-plan/SKILL.md b/.agents/skills/research-plan/SKILL.md
index 6a6df0b..34afa5f 100644
--- a/.agents/skills/research-plan/SKILL.md
+++ b/.agents/skills/research-plan/SKILL.md
@@ -102,6 +102,8 @@ Follow this order unless the user explicitly asks for a lighter output:
 9. Specify the implementation foundation.
 10. Specify expected outcomes and decision rules.
 
+When the plan depends on repository choice, reproducibility, benchmark selection, or dataset alignment, apply [references/codebase-and-data-research-rules.md](references/codebase-and-data-research-rules.md).
+
 ## Experiment Design Rules
 
 Do not produce vague items such as "run some experiments" or "evaluate performance."
diff --git a/.agents/skills/research-plan/references/codebase-and-data-research-rules.md b/.agents/skills/research-plan/references/codebase-and-data-research-rules.md
new file mode 100644
index 0000000..adb7d8d
--- /dev/null
+++ b/.agents/skills/research-plan/references/codebase-and-data-research-rules.md
@@ -0,0 +1,103 @@
+# Codebase And Data Research Rules
+
+Use this reference when the plan depends on an implementation foundation, benchmark choice, or reproduction strategy.
+
+## 1. Goal
+
+Turn "code base and dataset requirements" into explicit planning decisions:
+
+1. which repo to build on
+2. which related repos to compare against
+3. which datasets or workloads to align with
+4. what reproducibility risks are accepted
+
+## 2. Codebase Selection Order
+
+Choose the implementation foundation in this order unless the user overrides it:
+
+1. the current target repo when it already matches the project objective
+2. official code linked by the key paper or project page
+3. author-maintained GitHub repositories
+4. strong third-party reproductions only when official code is missing or unusable
+
+State which repo is the primary foundation and why it wins on relevance, maintainability, and reproducibility.
+
+## 3. Codebase Audit Checklist
+
+For each primary or comparison repo, check:
+
+1. owner type: official org, author, or third party
+2. repository scope: full training, inference only, eval only, or partial
+3. entry points for training, evaluation, and inference
+4. config system and experiment launch pattern
+5. dataset manifests and preprocessing scripts
+6. metric implementation and benchmark coverage
+7. environment reproducibility: lock files, Docker, conda, or install docs
+8. checkpoint availability
+9. license constraints
+10. maintenance signals such as last meaningful commit
+
+Do not name a repo in the plan without explaining whether it is actually runnable for the target objective.
+
+## 4. GitHub Search Rule
+
+For every highly relevant prior work, check whether there is an open-source repository.
+
+Record at least:
+
+1. repo URL
+2. owner
+3. official or unofficial status
+4. what part of the paper it covers
+5. whether it is suitable as a baseline, implementation reference, or both
+
+If no public repo exists for an important work, say so explicitly.
+
+## 5. Dataset And Workload Alignment Rule
+
+Default planning rule:
+
+1. align first with the datasets or workloads used by the strongest relevant prior work
+2. if the target deployment domain differs, keep one literature-aligned benchmark and add one target-domain benchmark
+3. if prior work is fragmented, define a benchmark set that covers the main comparison axis and explain the tradeoff
+
+Do not choose datasets only because they are convenient.
+
+## 6. Dataset Audit Checklist
+
+For each chosen dataset, workload, trace, or benchmark, specify:
+
+1. which related works use it
+2. task definition and label space
+3. split policy
+4. scale and domain coverage
+5. freshness or temporal boundary when relevant
+6. license and access restrictions
+7. preprocessing and normalization conventions
+8. contamination, leakage, or quality risks
+
+## 7. Planning Decision Labels
+
+Every plan must label dataset/workload strategy as one of:
+
+1. `aligned with prior work`
+2. `aligned plus target-domain extension`
+3. `new benchmark strategy`
+
+Every plan must label codebase strategy as one of:
+
+1. `extend current repo`
+2. `reuse official repo`
+3. `port ideas into target repo`
+4. `build minimal new repo`
+
+## 8. Required Plan Content
+
+When this reference applies, the plan must make these items explicit:
+
+1. primary implementation repo
+2. comparison repos considered and why they were rejected or kept
+3. strongest runnable baseline
+4. dataset/workload alignment choice
+5. benchmark comparability risks
+6. reproducibility blockers and fallback path
diff --git a/.agents/skills/research-plan/references/research-plan-template.md b/.agents/skills/research-plan/references/research-plan-template.md
index 4760e4d..aee1ee9 100644
--- a/.agents/skills/research-plan/references/research-plan-template.md
+++ b/.agents/skills/research-plan/references/research-plan-template.md
@@ -44,6 +44,9 @@ For each experiment, fill:
 
 - Language / framework:
 - Existing repo to reuse:
+- Codebase strategy label (`extend current repo` / `reuse official repo` / `port ideas into target repo` / `build minimal new repo`):
+- Comparison repos considered:
+- Strongest runnable baseline repo:
 - Experiment or execution entry points:
 - Evaluation entry points:
 - Config system:
@@ -63,10 +66,13 @@ For each experiment, fill:
 ### Data / Workloads / Inputs
 
 - Primary data source / workload / benchmark:
+- Dataset/workload strategy label (`aligned with prior work` / `aligned plus target-domain extension` / `new benchmark strategy`):
+- Which related works use the chosen data:
 - Validation / comparison scope:
 - Robustness input:
 - License / access:
 - Preprocessing / normalization:
+- Comparability risks:
 
 ### Additional Experiment Variants