Filtering Specification

This document describes how pull requests and reviews are filtered before analysis.

`include-bots` (default: `false`)

The include-bots input controls whether bot accounts are included in statistics. Bot detection is based on the GitHub user type (Bot) or login suffixes ([bot], -bot).

When `include-bots` is `false`

Two independent filters are applied:

Author filter - PRs where authorIsBot is true are skipped entirely. All reviews on that PR are also excluded, even if the reviewers are human.
Reviewer filter - Individual reviews where reviewerIsBot is true are excluded from metrics, even on human-authored PRs.

Both filters must pass for a review to be counted. A human review on a bot-authored PR is not counted.

When `include-bots` is `true`

No additional include-bots filtering is applied in modules that honor the flag. The ai-patterns module keeps its documented split: bot observability metrics ignore include-bots and use the full dataset, with aiCoAuthoredPRs limited to PRs with observable commit metadata; humanReviewBurden still excludes traditional bot-authored PRs and PRs whose AI classification is not observable at the cutoff from the comparison cohort.

Per-module behavior

Module	Author filter	Reviewer filter	Notes
per-user-stats	Yes	Yes	Skips entire PR if author is bot; skips individual bot reviews
bias-detector	Yes	Yes	Same as per-user-stats
merge-correlation	Yes	Yes	Bot reviews excluded from review counts on merged PRs
ai-patterns	Mixed	Mixed	Top-level bot observability metrics ignore `include-bots` and use the full dataset; only `aiCoAuthoredPRs` is limited to PRs with observable commit metadata. `humanReviewBurden` always excludes traditional bot-authored PRs, PRs whose AI classification is not observable at the cutoff, and bot reviews from the comparison metrics
html-report KPIs: Pull Requests, PR Authors	Yes	N/A	Uses the author-filtered PR list; reviewer identities are not part of these counts
html-report KPIs: Unique PR Reviews, Active Reviewers	Yes	Yes	Derived from `userStats`; when `include-bots` is `false`, bot-authored PRs are skipped entirely and bot reviewer reviews are excluded. PENDING reviews are always excluded; self-reviews are excluded only when both identities are known (`ghost` is exempt).
html-report KPI: Avg Reviewers/PR	Mixed	Mixed	Numerator is Unique PR Reviews from `userStats`; denominator is Pull Requests from the author-filtered PR list
html-report KPI: Gini Coefficient	Yes	Yes	Derived from `bias-detector`
html-report KPI: Data Completeness	N/A	N/A	Reports collection completeness, not a post-filtering count
time-series	Yes	N/A	Receives the pre-filtered PR list from html-report. When `include-bots` is `false`, bot-authored PRs are excluded there; when `true`, all PRs are included. Bot reviews and self-reviews are not excluded, so the review count reflects all non-PENDING review activity on that input list.

For ai-patterns, this split is intentional: bot observability (botReviewers, botReviewPercentage, aiCoAuthoredPRs, totalPRs) ignores include-bots and uses the full dataset, while aiCoAuthoredPRs only counts PRs with observable commit metadata. humanReviewBurden uses a comparison cohort that excludes traditional bot-authored PRs regardless of include-bots and excludes PRs whose AI classification is not observable at the cutoff.

Rationale

Bot-authored PRs (e.g., Dependabot) are excluded entirely because:

They do not reflect human team review workload.
Including human reviews on bot PRs would inflate reviewer counts and distort bias detection.
The ai-patterns module separately tracks bot activity for observability while excluding traditional bot-authored PRs and unobservable AI classifications from the AI-vs-human burden comparison.

Additional filters (always applied)

PENDING reviews

Reviews with state === "PENDING" are draft/unsubmitted reviews. They are excluded from the following modules:

per-user-stats
bias-detector
merge-correlation
time-series
ai-patterns (human review burden metrics only - getQualifyingHumanReviews excludes PENDING)

Note

The ai-patterns module's top-level metrics (totalReviews, botReviewPercentage) intentionally include PENDING reviews to capture the full scope of bot activity. See statistics.md for details. As a result, botReviewPercentage has a different denominator than metrics in other modules.

Self-reviews

Reviews where the reviewer is the PR author are excluded from:

per-user-stats
bias-detector
merge-correlation
ai-patterns (human review burden metrics)

These do not represent peer review activity. In merge-correlation specifically, self-reviews must not count toward avgReviewsBeforeMerge or affect zeroReviewMerges, as these metrics measure whether a PR received independent peer review before merging.

Exception: ghost placeholder - When GraphQL returns null for a deleted user account, the normalizer substitutes the shared placeholder ghost. The self-review exclusion is skipped when either the reviewer or the author login is ghost, to avoid incorrectly collapsing two unrelated deleted users onto the same identity. This guard applies to all modules listed above.

In bias-detector, this same exception also applies when constructing the Gini matrix domain: the ghost -> ghost diagonal remains an eligible cell and is not subtracted as a structurally impossible self-review pair.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering Specification

`include-bots` (default: `false`)

When `include-bots` is `false`

When `include-bots` is `true`

Per-module behavior

Rationale

Additional filters (always applied)

PENDING reviews

Self-reviews

FilesExpand file tree

filtering.md

Latest commit

History

filtering.md

File metadata and controls

Filtering Specification

include-bots (default: false)

When include-bots is false

When include-bots is true

Per-module behavior

Rationale

Additional filters (always applied)

PENDING reviews

Self-reviews

`include-bots` (default: `false`)

When `include-bots` is `false`

When `include-bots` is `true`