This document describes how pull requests and reviews are filtered before analysis.
The include-bots input controls whether bot accounts are included in statistics. Bot detection is based on the GitHub user type (Bot) or login suffixes ([bot], -bot).
Two independent filters are applied:
- Author filter - PRs where
authorIsBotistrueare skipped entirely. All reviews on that PR are also excluded, even if the reviewers are human. - Reviewer filter - Individual reviews where
reviewerIsBotistrueare excluded from metrics, even on human-authored PRs.
Both filters must pass for a review to be counted. A human review on a bot-authored PR is not counted.
No additional include-bots filtering is applied in modules that honor the flag. The ai-patterns module keeps its documented split: bot observability metrics ignore include-bots and use the full dataset, with aiCoAuthoredPRs limited to PRs with observable commit metadata; humanReviewBurden still excludes traditional bot-authored PRs and PRs whose AI classification is not observable at the cutoff from the comparison cohort.
| Module | Author filter | Reviewer filter | Notes |
|---|---|---|---|
| per-user-stats | Yes | Yes | Skips entire PR if author is bot; skips individual bot reviews |
| bias-detector | Yes | Yes | Same as per-user-stats |
| merge-correlation | Yes | Yes | Bot reviews excluded from review counts on merged PRs |
| ai-patterns | Mixed | Mixed | Top-level bot observability metrics ignore include-bots and use the full dataset; only aiCoAuthoredPRs is limited to PRs with observable commit metadata. humanReviewBurden always excludes traditional bot-authored PRs, PRs whose AI classification is not observable at the cutoff, and bot reviews from the comparison metrics |
| html-report KPIs: Pull Requests, PR Authors | Yes | N/A | Uses the author-filtered PR list; reviewer identities are not part of these counts |
| html-report KPIs: Unique PR Reviews, Active Reviewers | Yes | Yes | Derived from userStats; when include-bots is false, bot-authored PRs are skipped entirely and bot reviewer reviews are excluded. PENDING reviews are always excluded; self-reviews are excluded only when both identities are known (ghost is exempt). |
| html-report KPI: Avg Reviewers/PR | Mixed | Mixed | Numerator is Unique PR Reviews from userStats; denominator is Pull Requests from the author-filtered PR list |
| html-report KPI: Gini Coefficient | Yes | Yes | Derived from bias-detector |
| html-report KPI: Data Completeness | N/A | N/A | Reports collection completeness, not a post-filtering count |
| time-series | Yes | N/A | Receives the pre-filtered PR list from html-report. When include-bots is false, bot-authored PRs are excluded there; when true, all PRs are included. Bot reviews and self-reviews are not excluded, so the review count reflects all non-PENDING review activity on that input list. |
For ai-patterns, this split is intentional: bot observability (botReviewers, botReviewPercentage, aiCoAuthoredPRs, totalPRs) ignores include-bots and uses the full dataset, while aiCoAuthoredPRs only counts PRs with observable commit metadata. humanReviewBurden uses a comparison cohort that excludes traditional bot-authored PRs regardless of include-bots and excludes PRs whose AI classification is not observable at the cutoff.
Bot-authored PRs (e.g., Dependabot) are excluded entirely because:
- They do not reflect human team review workload.
- Including human reviews on bot PRs would inflate reviewer counts and distort bias detection.
- The
ai-patternsmodule separately tracks bot activity for observability while excluding traditional bot-authored PRs and unobservable AI classifications from the AI-vs-human burden comparison.
Reviews with state === "PENDING" are draft/unsubmitted reviews. They are excluded from the following modules:
- per-user-stats
- bias-detector
- merge-correlation
- time-series
- ai-patterns (human review burden metrics only -
getQualifyingHumanReviewsexcludes PENDING)
Note
The ai-patterns module's top-level metrics (totalReviews, botReviewPercentage) intentionally include PENDING reviews to capture the full scope of bot activity. See statistics.md for details. As a result, botReviewPercentage has a different denominator than metrics in other modules.
Reviews where the reviewer is the PR author are excluded from:
- per-user-stats
- bias-detector
- merge-correlation
- ai-patterns (human review burden metrics)
These do not represent peer review activity. In merge-correlation specifically, self-reviews must not count toward avgReviewsBeforeMerge or affect zeroReviewMerges, as these metrics measure whether a PR received independent peer review before merging.
Exception: ghost placeholder - When GraphQL returns null for a deleted user account, the normalizer substitutes the shared placeholder ghost. The self-review exclusion is skipped when either the reviewer or the author login is ghost, to avoid incorrectly collapsing two unrelated deleted users onto the same identity. This guard applies to all modules listed above.
In bias-detector, this same exception also applies when constructing the Gini matrix domain: the ghost -> ghost diagonal remains an eligible cell and is not subtracted as a structurally impossible self-review pair.