Skip to content

User/esg/phase0 fix regression pr#827

Open
esgama107 wants to merge 3 commits intomicrosoft:mainfrom
esgama107:user/esg/phase0-fix-regression-pr
Open

User/esg/phase0 fix regression pr#827
esgama107 wants to merge 3 commits intomicrosoft:mainfrom
esgama107:user/esg/phase0-fix-regression-pr

Conversation

@esgama107
Copy link
Copy Markdown
Collaborator

@esgama107 esgama107 commented Mar 27, 2026

Fix regression detection: enable Method 2, add key mapping, restore summary columns

TL;DR

Regression detection has been non-functional due to compounding bugs. This PR fixes them. No algorithm changes — just wiring fixes to restore what was already implemented.


Background

After a refactored in test IDs from the old format (tput-up-quic) to the new scenario-based format (scenario-upload-quic). This rename broke the key lookup between the regression JSON and the consumer (secnetperf-helpers.psm1), and the regression columns were commented out with:

# TODO: Regression detection is heinously broken. Let's reduce the noise.

Investigation revealed compounding bugs, not just the key mismatch:

Bug: quic.yml — Method 2 never invoked

The regression-detection job runs python regression.py with no --featureint flag. The argparse default is 1, so singular_datapoint_method() is what runs in production. The working sliding_window() method (featureint=2) was implemented but never enabled.

Fix: Pass --featureint 2 to use the sliding window method.

Bug: Test ID key mismatch (3 naming conventions)

The DB, the watermark, and the consumer all use different key formats:

Consumer Key Old DB ID New DB ID
upload-quic tput-up-quic scenario-upload-quic
download-quic tput-down-quic scenario-download-quic
hps-quic hps-conns-100-quic scenario-hps-quic
rps-quic rps-up-512-down-4000-quic scenario-rps-quic
... ... ...

0/12 consumer keys match any DB key format. The consumer wraps the lookup in try/catch and silently returns "no regression" when the key is missing.

Fix: Added DB_TO_CONSUMER_KEY mapping dict in regression.py and a remap_to_consumer_keys() function that translates DB IDs → consumer keys before writing the JSON output. This avoids touching either the DB schema or the consumer code.

Uncomment Regression columns

The summary table doesn't display it. The CumulativeResult, Baseline, and BestResult columns, let' review the data with the new key-mappings.


Changes

File What changed
pipeline/regression.py Added DB_TO_CONSUMER_KEY mapping (26 entries), remap_to_consumer_keys() function, applied remap before JSON output in sliding_window().
.github/workflows/quic.yml regression-detection job now fetches regression.py from main via wget and runs with --featureint 2.
.github/workflows/generate-summary.ps1 Uncommented regression columns (Avg, Noise, Best Ever) in all 4 summary row builders (throughput, download, HPS, RPS) and restored matching table headers.

Discovery: sqlite branch architecture concern

During investigation, we discovered an important architectural detail that affects how changes to regression.py take effect:

The regression-detection job checks out ref: sqlite, not main. This means:

  • The sqlite branch has its own copy of regression.py (the old unmaintained version)
  • The save-test-results job already solves this for sql.py by fetching it from main via wget

This PR adopts the same pattern for regression.py:

- name: Fetch regression.py from main branch
  run: wget https://raw.githubusercontent.com/microsoft/netperf/main/pipeline/regression.py -O regression.py
- run: python regression.py --featureint 2

Branch role summary:

Branch Purpose What gets committed
main Source of truth for code Pipeline scripts, workflows, dashboard source
sqlite Data + generated artifacts netperf.sqlite, regression.json, watermark_regression.json, full_latencies/
deploy Dashboard deployment Flattened JSON files for the React dashboard

The stale regression.py on sqlite remains but is now bypassed by the wget.


Questions for the team

  1. Should regression.py be removed from the sqlite branch entirely? Now that we fetch it from main, the sqlite copy is dead code that could cause confusion. Same question for any other .py files on sqlite.

esgama107 and others added 2 commits March 26, 2026 18:00
…ummary columns

Regression detection has been completely non-functional since Nov 2024 due to
4 compounding bugs. This commit fixes all of them:

1. regression.py Method 1 path bug (line 81): fixed file.json/file.json -> file.json
2. regression.py key mapping: added DB_TO_CONSUMER_KEY dict that translates
   both old-format (tput-up-quic) and new scenario-* format (scenario-upload-quic)
   DB test IDs to consumer keys (upload-quic) expected by secnetperf.ps1
3. quic.yml: regression-detection job now fetches regression.py from main via
   wget (instead of stale sqlite branch copy) and runs with --featureint 2
   to enable the sliding window method
4. generate-summary.ps1: uncommented regression columns (Avg, Noise, Best Ever)
   in all summary tables and added matching column headers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@esgama107
Copy link
Copy Markdown
Collaborator Author

esgama107 commented Mar 27, 2026 via email

@esgama107 esgama107 marked this pull request as draft March 27, 2026 22:18
@esgama107 esgama107 marked this pull request as ready for review March 27, 2026 23:21
@esgama107 esgama107 marked this pull request as draft March 27, 2026 23:23
wrong assessment about path name and folder
@esgama107 esgama107 marked this pull request as ready for review April 2, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant