From 82a0fcf02504da3430ecda78fb1701d84835e4bb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=98=93=E5=A4=A9=E8=8E=B2?= <91449279+yitianlian@users.noreply.github.com> Date: Mon, 9 Mar 2026 20:45:26 +0800 Subject: [PATCH] Add codebase and dataset research rules for skills --- .agents/skills/deep-research/SKILL.md | 2 + .../codebase-and-data-research-rules.md | 126 ++++++++++++++++++ .../references/output-template.md | 10 ++ .agents/skills/research-plan/SKILL.md | 2 + .../codebase-and-data-research-rules.md | 103 ++++++++++++++ .../references/research-plan-template.md | 6 + 6 files changed, 249 insertions(+) create mode 100644 .agents/skills/deep-research/references/codebase-and-data-research-rules.md create mode 100644 .agents/skills/research-plan/references/codebase-and-data-research-rules.md diff --git a/.agents/skills/deep-research/SKILL.md b/.agents/skills/deep-research/SKILL.md index 1da6f64..ebe5e08 100644 --- a/.agents/skills/deep-research/SKILL.md +++ b/.agents/skills/deep-research/SKILL.md @@ -46,6 +46,8 @@ Iterate until evidence quality is sufficient: 8. Run contradiction/counter-evidence checks. 9. Synthesize and produce final report. +When the topic has implementation, benchmark, reproduction, or planning implications, also apply [references/codebase-and-data-research-rules.md](references/codebase-and-data-research-rules.md). + ## Re-entry Policy (Mid-Run) When called during an ongoing run (not only at run start): diff --git a/.agents/skills/deep-research/references/codebase-and-data-research-rules.md b/.agents/skills/deep-research/references/codebase-and-data-research-rules.md new file mode 100644 index 0000000..9cac32c --- /dev/null +++ b/.agents/skills/deep-research/references/codebase-and-data-research-rules.md @@ -0,0 +1,126 @@ +# Codebase And Data Research Rules + +Use this reference when the research topic has an implementation, benchmark, reproduction, or planning component. + +## 1. Goal + +Convert vague "look at the repo/data" work into a traceable evidence pass: + +1. identify the most relevant code bases +2. identify the most relevant datasets or workloads +3. determine whether the target plan should align with prior work or intentionally deviate +4. record evidence and uncertainty explicitly + +## 2. Codebase Search Order + +Search in this order unless the user gives a stricter constraint: + +1. local target repo or currently checked out project +2. official implementation linked by the paper, project page, or organization +3. author-maintained GitHub repositories +4. widely used third-party reproductions only if official code is missing or incomplete + +Do not treat random GitHub repos as equivalent evidence to official implementations. + +## 3. Codebase Search Rules + +For each high-relevance method or paper, check whether an open-source code base exists. + +Minimum checks: + +1. repository URL +2. owner type: official org, author, or third party +3. activity signals: last commit date, issues, stars only as a weak signal +4. reproducibility signals: environment file, training script, eval script, config examples, checkpoints, README completeness +5. scope match: full training, inference only, eval only, or partial reproduction +6. license and usage constraints + +If the work is highly relevant and no code is open-sourced, record that absence explicitly instead of silently skipping it. + +## 4. What To Extract From A Relevant Repo + +When a code base is relevant, extract only the implementation facts needed for reasoning: + +1. entry points for training, evaluation, and inference +2. config system and major knobs +3. data manifest or preprocessing path +4. benchmark or metric implementation +5. dependency and runtime assumptions +6. checkpoint availability +7. any mismatch between the paper claim and the released code + +Prefer concrete file paths, script names, and config names over generic summaries. + +## 5. Dataset And Workload Search Rules + +For each candidate dataset, workload, trace, or benchmark, record: + +1. what prior work used it +2. whether it is the main comparison target in the literature +3. task definition and label space +4. train/validation/test split policy +5. scale, domain, and freshness +6. license, access, and usage restrictions +7. known contamination, leakage, or annotation-quality concerns +8. preprocessing conventions used by the most relevant prior work + +Do not choose datasets only because they are easy to download. + +## 6. Dataset Alignment Decision Rule + +Default rule: + +1. if the task claims improvement over prior work, align first with the datasets or workloads used by the strongest relevant baselines +2. if the literature has no stable comparison set, propose a primary benchmark set and explain why +3. if the target project serves a different domain or product requirement, keep one literature-aligned benchmark and add one target-domain benchmark + +You must state which of the following applies: + +1. `fully aligned with prior work` +2. `partially aligned with one added domain-specific benchmark` +3. `intentionally different from prior work` + +If intentionally different, explain what comparability is lost and what practical validity is gained. + +## 7. Comparison Set Construction + +When planning evaluations, define the comparison set in this order: + +1. strongest official or canonical baseline from the literature +2. strongest practical open-source baseline that can actually be run +3. target-project incumbent or current production baseline when applicable +4. ablations of the proposed method + +If the canonical baseline cannot be reproduced, say why and choose the closest defensible substitute. + +## 8. Evidence Quality Rules + +Treat evidence tiers in this order: + +1. paper plus official repo plus released configs or checkpoints +2. paper plus author repo +3. paper only +4. third-party reproduction +5. blog posts, tweets, or issue comments as weak supporting evidence only + +Do not present tier 4 or 5 evidence as if it were definitive. + +## 9. Required Output When This Reference Applies + +Include a compact `Codebase and Data Audit` block in the research output: + +1. target repo inspected or not +2. related official repos found or not +3. strongest runnable baseline repo +4. dataset/workload alignment choice +5. main reproducibility risks +6. unresolved gaps + +## 10. Failure And Gap Handling + +If repository or dataset evidence is incomplete: + +1. say what was searched +2. say what was not found +3. state whether the gap changes the recommendation materially +4. provide the least risky fallback diff --git a/.agents/skills/deep-research/references/output-template.md b/.agents/skills/deep-research/references/output-template.md index ccb8500..acc0cb0 100644 --- a/.agents/skills/deep-research/references/output-template.md +++ b/.agents/skills/deep-research/references/output-template.md @@ -46,6 +46,15 @@ Then add short narrative: - For `conflict-resolution`: dispute map and source-tier arbitration. - For `idea-exploration`: landscape and opportunity boundaries. +## Codebase and Data Audit (When Applicable) + +- target repo inspected or not +- related official repos found or not +- strongest runnable baseline repo +- dataset/workload alignment choice +- main reproducibility risks +- unresolved gaps + ## Research Trail Summary - queries_run= @@ -101,3 +110,4 @@ Rules: 4. Match output language to user language. 5. If topic is paper-centric, do not skip `Key Works Deep Dive`. 6. If degradation is used, include explicit degradation metadata and `Degrade Log`. +7. If `references/codebase-and-data-research-rules.md` applies, do not skip `Codebase and Data Audit`. diff --git a/.agents/skills/research-plan/SKILL.md b/.agents/skills/research-plan/SKILL.md index 6a6df0b..34afa5f 100644 --- a/.agents/skills/research-plan/SKILL.md +++ b/.agents/skills/research-plan/SKILL.md @@ -102,6 +102,8 @@ Follow this order unless the user explicitly asks for a lighter output: 9. Specify the implementation foundation. 10. Specify expected outcomes and decision rules. +When the plan depends on repository choice, reproducibility, benchmark selection, or dataset alignment, apply [references/codebase-and-data-research-rules.md](references/codebase-and-data-research-rules.md). + ## Experiment Design Rules Do not produce vague items such as "run some experiments" or "evaluate performance." diff --git a/.agents/skills/research-plan/references/codebase-and-data-research-rules.md b/.agents/skills/research-plan/references/codebase-and-data-research-rules.md new file mode 100644 index 0000000..adb7d8d --- /dev/null +++ b/.agents/skills/research-plan/references/codebase-and-data-research-rules.md @@ -0,0 +1,103 @@ +# Codebase And Data Research Rules + +Use this reference when the plan depends on an implementation foundation, benchmark choice, or reproduction strategy. + +## 1. Goal + +Turn "code base and dataset requirements" into explicit planning decisions: + +1. which repo to build on +2. which related repos to compare against +3. which datasets or workloads to align with +4. what reproducibility risks are accepted + +## 2. Codebase Selection Order + +Choose the implementation foundation in this order unless the user overrides it: + +1. the current target repo when it already matches the project objective +2. official code linked by the key paper or project page +3. author-maintained GitHub repositories +4. strong third-party reproductions only when official code is missing or unusable + +State which repo is the primary foundation and why it wins on relevance, maintainability, and reproducibility. + +## 3. Codebase Audit Checklist + +For each primary or comparison repo, check: + +1. owner type: official org, author, or third party +2. repository scope: full training, inference only, eval only, or partial +3. entry points for training, evaluation, and inference +4. config system and experiment launch pattern +5. dataset manifests and preprocessing scripts +6. metric implementation and benchmark coverage +7. environment reproducibility: lock files, Docker, conda, or install docs +8. checkpoint availability +9. license constraints +10. maintenance signals such as last meaningful commit + +Do not name a repo in the plan without explaining whether it is actually runnable for the target objective. + +## 4. GitHub Search Rule + +For every highly relevant prior work, check whether there is an open-source repository. + +Record at least: + +1. repo URL +2. owner +3. official or unofficial status +4. what part of the paper it covers +5. whether it is suitable as a baseline, implementation reference, or both + +If no public repo exists for an important work, say so explicitly. + +## 5. Dataset And Workload Alignment Rule + +Default planning rule: + +1. align first with the datasets or workloads used by the strongest relevant prior work +2. if the target deployment domain differs, keep one literature-aligned benchmark and add one target-domain benchmark +3. if prior work is fragmented, define a benchmark set that covers the main comparison axis and explain the tradeoff + +Do not choose datasets only because they are convenient. + +## 6. Dataset Audit Checklist + +For each chosen dataset, workload, trace, or benchmark, specify: + +1. which related works use it +2. task definition and label space +3. split policy +4. scale and domain coverage +5. freshness or temporal boundary when relevant +6. license and access restrictions +7. preprocessing and normalization conventions +8. contamination, leakage, or quality risks + +## 7. Planning Decision Labels + +Every plan must label dataset/workload strategy as one of: + +1. `aligned with prior work` +2. `aligned plus target-domain extension` +3. `new benchmark strategy` + +Every plan must label codebase strategy as one of: + +1. `extend current repo` +2. `reuse official repo` +3. `port ideas into target repo` +4. `build minimal new repo` + +## 8. Required Plan Content + +When this reference applies, the plan must make these items explicit: + +1. primary implementation repo +2. comparison repos considered and why they were rejected or kept +3. strongest runnable baseline +4. dataset/workload alignment choice +5. benchmark comparability risks +6. reproducibility blockers and fallback path diff --git a/.agents/skills/research-plan/references/research-plan-template.md b/.agents/skills/research-plan/references/research-plan-template.md index 4760e4d..aee1ee9 100644 --- a/.agents/skills/research-plan/references/research-plan-template.md +++ b/.agents/skills/research-plan/references/research-plan-template.md @@ -44,6 +44,9 @@ For each experiment, fill: - Language / framework: - Existing repo to reuse: +- Codebase strategy label (`extend current repo` / `reuse official repo` / `port ideas into target repo` / `build minimal new repo`): +- Comparison repos considered: +- Strongest runnable baseline repo: - Experiment or execution entry points: - Evaluation entry points: - Config system: @@ -63,10 +66,13 @@ For each experiment, fill: ### Data / Workloads / Inputs - Primary data source / workload / benchmark: +- Dataset/workload strategy label (`aligned with prior work` / `aligned plus target-domain extension` / `new benchmark strategy`): +- Which related works use the chosen data: - Validation / comparison scope: - Robustness input: - License / access: - Preprocessing / normalization: +- Comparability risks: ### Additional Experiment Variants