Conversation
|
@control_center_review_bot please review this pr |
muditbhutani
left a comment
There was a problem hiding this comment.
Merge Confidence: 4.0/5 - Confident
"Looks good - minor concerns only"
Assessment
- ✓ Well-scoped addition of visual regression testing infrastructure with 42 snapshots covering auth, homepage, and operations flows
- ✓ CI optimization with browser caching should reduce test execution time by 2-3 minutes per run
- ✓ Docker-based snapshot generation ensures consistency between local development and CI environment (ubuntu-latest)
- ✓ Test isolation improved with unique email generation and proper error handling for retries
- ⚠ Visual regression tests add maintenance burden—42 snapshots require updates whenever UI styling changes
- ⚠ Error handling uses broad substring matching ('already exists') which could mask unrelated failures
- ⚠ Cache key for Playwright browsers only depends on package-lock.json, not Playwright version—could cause version mismatch
- ⚠ Hardcoded timezone ('Asia/Kolkata') in date assertion may not match backend timezone configuration
Review: Add Playwright visual regression testing with 42 UI snapshots
This PR migrates test infrastructure from Cypress to Playwright and adds comprehensive visual regression testing. The implementation includes 42 snapshots across authentication flows, homepage variations, and operations pages (payments, refunds, payouts, disputes, customers). Docker-based snapshot generation ensures CI consistency. Main risks: snapshot maintenance burden when UI changes, error handling using broad string matching, and CI cache configuration gaps.
| @@ -0,0 +1,174 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Review visual regression testing implementation
Risk: MEDIUM
This PR adds comprehensive visual regression testing using Playwright's screenshot comparison. Three new test suites cover authentication flows, homepage variations, and operations pages (payments, refunds, payouts, disputes, customers).
What changed: New playwright-tests/visual-testing/ directory with 42 PNG snapshots across auth (10), homepage (4), and operations (28) pages. Tests use toHaveScreenshot() with maxDiffPixelRatio: 0.01 for 1% tolerance on dynamic content.
Key implementation details:
- Snapshots generated in Docker container (
update_snapshots.sh) to match CI environment (ubuntu-latest) - Viewport standardized to 1620x1080 (increased from 1440x1005)
- Uses
mockV2MerchantList()to prevent API dependency and ensure consistent state - Tests wait for
networkidle+ 500ms before capturing
Primary risks:
- Snapshot flakiness: Dynamic content (timestamps, merchant IDs, randomly generated data) could cause false positives in CI
- Maintenance burden: 42 snapshots require updates whenever UI styling changes
- Cross-platform consistency: Snapshots generated on macOS locally won't match ubuntu-latest CI without using the Docker script
Verification checklist:
- Confirm all snapshots pass in CI environment (ubuntu-latest)
- Verify
update_snapshots.shscript runs successfully on both x86_64 and arm64 - Check that
mockV2MerchantList()prevents real API calls during visual tests - Validate that tests properly isolate state (unique emails per test run)
Diagram:
sequenceDiagram
participant Dev as Developer
participant Docker as Docker Container
participant PW as Playwright
participant BE as Backend Services
participant FS as Filesystem
Dev->>Docker: ./update_snapshots.sh
Docker->>Docker: Verify backend running
Docker->>FS: Mount project dir
Docker->>BE: Health check
Docker->>PW: npm run build:test
Docker->>PW: Start FE server
PW->>BE: Create test user
PW->>PW: Navigate to pages
PW->>PW: Wait networkidle + 500ms
PW->>FS: Capture screenshots
PW->>FS: Compare with existing
alt Differences found
PW-->>FS: Update snapshots
end
FS-->>Dev: Modified .png files
| @@ -548,12 +548,28 @@ const ssoBaseUrl = process.env.PLAYWRIGHT_SSO_BASE_URL; | |||
| let authId = ""; | |||
There was a problem hiding this comment.
Verify test isolation and retry safety
Risk: MEDIUM
Multiple tests now include error handling for "already exists" conditions to support test retries. The createAuth() and signupUser() functions wrap API calls in try-catch blocks that ignore duplicate resource errors.
What changed:
createAuth()now accepts optionalownerIdandemailDomainparameters for test isolation- SSO and auth tests catch "already exists" errors and continue execution
- Visual testing suite uses unique email generation but shares SSO auth config
Retry safety implementation:
Tests silently ignore "already exists" errors by checking errorMsg.includes("already exists"). This allows Playwright's retry mechanism to work when tests fail for other reasons.
Risks:
- Hidden failures: Real errors containing "already exists" substring get swallowed
- State pollution: Retried tests may encounter stale data from previous attempts
- Inconsistent behavior: Some tests use this pattern, others don't—lacks standardization
Verification:
- Test retry scenario: Force a test to fail after user creation, verify retry succeeds
- Confirm error messages from backend API actually contain "already exists" string
- Check for race conditions when multiple tests create the same SSO config in parallel
Diagram:
stateDiagram-v2
[*] --> CreateUser
CreateUser --> CheckError: API call
CheckError --> UserExists: Error contains "already exists"
CheckError --> RealError: Other error
CheckError --> Success: No error
UserExists --> Continue: Ignore error
RealError --> [*]: Rethrow
Success --> Continue
Continue --> RunTest
RunTest --> [*]
| @@ -16,10 +16,7 @@ jobs: | |||
| with: | |||
There was a problem hiding this comment.
Review CI workflow optimizations
Risk: LOW
The Playwright CI workflow now includes browser caching and consolidated artifact uploads to reduce CI time and storage.
Changes:
- Browser caching: New cache step for
~/.cache/ms-playwrightkeyed bypackage-lock.jsonhash - Artifact consolidation: Merged separate "failures" and "report" uploads into single "playwright-artifacts" upload
- Removed Docker cache pruning: Deleted
docker builder prune -afstep
Expected benefits:
- Cache hit saves ~2-3 minutes installing Playwright browsers on repeat runs
- Single artifact upload simplifies failure investigation
- Removing Docker pruning prevents clearing potentially useful build layers
Risks:
- Cache invalidation: Browser cache persists across Playwright version upgrades if
package-lock.jsondoesn't change - Storage costs: Combined artifacts now include both failures and full HTML report (could be large)
- Docker disk usage: Without pruning, CI runner disk could fill over time
Verification:
- Check CI run times before/after to confirm caching improves performance
- Monitor artifact sizes—combined uploads shouldn't exceed GitHub's limits
- Verify cache invalidates when Playwright version changes in package.json
Diagram:
flowchart TB
Start([CI Trigger]) --> Checkout[Checkout Code]
Checkout --> Cache{Cache Hit?}
Cache -->|Yes| SkipInstall[Skip Browser Install]
Cache -->|No| Install[Install Browsers]
Install --> SaveCache[Save Cache]
SaveCache --> RunTests
SkipInstall --> RunTests[Run Tests]
RunTests --> TestResult{Tests Pass?}
TestResult -->|Pass| Success([✓ Complete])
TestResult -->|Fail| Artifacts[Upload Artifacts]
Artifacts --> Failure([✗ Failed])
subgraph Artifacts Upload
Artifacts --> FailedTests[test-results/**/*-failed-*]
Artifacts --> Retries[test-results/**/*-retry*/]
Artifacts --> Report[playwright-report/]
end
| @@ -65,7 +65,9 @@ test.describe("Volume based routing", () => { | |||
| "Smart routing configuration", | |||
There was a problem hiding this comment.
Review timezone handling in date generation
Risk: LOW
The PaymentRouting.spec.ts test now uses timezone-aware date formatting to prevent CI failures due to server timezone differences.
What changed:
Replaced new Date().toISOString().split("T")[0] with new Date().toLocaleDateString("en-CA", { timeZone: "Asia/Kolkata" }).
Why this matters:
The test verifies that auto-generated routing configuration names match the expected format "Volume Based Routing-YYYY-MM-DD". Without explicit timezone, CI servers in different timezones could generate different dates at midnight boundaries, causing test failures.
Hardcoded timezone risk:
Using "Asia/Kolkata" (UTC+5:30) hardcodes an assumption about where tests run or when data is generated. If the backend generates these names using a different timezone, tests will fail.
Questions to verify:
- Does the backend API generate routing names using "Asia/Kolkata" timezone?
- Are there other date-based assertions in tests that need similar timezone handling?
- Should this use UTC instead of hardcoding a specific region?
Diagram:
sequenceDiagram
participant Test as Playwright Test
participant BE as Backend API
participant Assert as Assertion
Note over Test: Generate expected date
Test->>Test: new Date().toLocaleDateString("en-CA", {timeZone: "Asia/Kolkata"})
Test->>Test: Format: "2024-01-15"
Test->>BE: Create routing config
BE->>BE: Generate name with server timezone
BE-->>Test: "Volume Based Routing-2024-01-15"
Test->>Assert: Compare expected vs actual
alt Timezone mismatch
Assert-->>Test: ❌ Expected "2024-01-15", got "2024-01-14"
else Timezone matches
Assert-->>Test: ✓ Dates match
end
| @@ -294,6 +294,147 @@ export async function createPaymentAPI( | |||
| return await response.json(); | |||
There was a problem hiding this comment.
Review new payout API test helpers
Risk: LOW
Two new test helper functions added to support payout operations testing: createPayoutConnectorAPI() and createPayoutAPI().
Implementation details:
createPayoutConnectorAPI(): Creates Adyen payout connector with hardcoded test credentialscreatePayoutAPI(): Creates EUR 123.45 payout to test customer with card details- Both use dummy data:
api_key: "test_key", card number4111111111111111
What this enables:
Visual regression tests for payout operations pages (empty state and with-payouts state). Previously, only payment operations had test coverage.
Hardcoded data concerns:
- All test connectors use identical credentials (
test_key,test_key,test_key) - Card number 4111111111111111 is Visa's test card but hardcoded in payout data
- Fixed EUR 123.45 amount could mask amount-formatting bugs
Verify:
- Confirm Adyen test mode accepts these dummy credentials without actual validation
- Check if payout creation actually hits external services or is fully mocked
- Validate that test payouts are properly cleaned up (or exist in isolated test merchant accounts)
Diagram:
sequenceDiagram
participant Test as Test Suite
participant Helper as createPayoutAPI
participant API as Hyperswitch API
participant Adyen as Adyen (Test Mode)
Test->>Helper: createPayoutConnectorAPI(merchantId)
Helper->>API: POST /account/{merchantId}/connectors
Note over Helper,API: connector_type: "payout_processor"<br/>connector_name: "adyen"<br/>test_mode: true<br/>credentials: "test_key"
API-->>Helper: Connector created
Test->>Helper: createPayoutAPI(merchantId)
Helper->>API: POST /payouts/create
Note over Helper,API: amount: 12345 (EUR 123.45)<br/>card: 4111111111111111<br/>confirm: true
API->>Adyen: Process payout (test mode)
Adyen-->>API: Mock success response
API-->>Helper: Payout created
Helper-->>Test: Payout data
| }); | ||
|
|
||
| test.describe("Sign in", () => { | ||
| test.describe.serial("Sign in", () => { |
There was a problem hiding this comment.
Feedback: Serial execution unnecessarily increases test time
Severity: 🔵 Nice to Have
Agent Claude:
Problem: Marking the entire "Sign in" test suite as .serial() forces sequential execution of all tests, which increases test runtime unnecessarily.
Risk: If one sign-in test hangs or takes longer than expected, all subsequent tests in the suite are delayed.
Recommendation: Only mark tests as .serial() when they share state or have explicit ordering dependencies. These sign-in tests appear independent (each navigates to a fresh page)—consider removing .serial() to allow parallel execution.
| } catch (error) { | ||
| const errorMsg = error instanceof Error ? error.message : String(error); | ||
| // Ignore "already exists" error on retries, rethrow all others | ||
| if (!errorMsg.includes("already exists")) { |
There was a problem hiding this comment.
Feedback: Substring matching too broad in error handling
Severity: 🔵 Nice to Have
Agent System:
Tests catch errors and check if message includes 'already exists', which could match unrelated error messages
Evidence: errorMsg.includes("already exists") pattern used in auth.spec.ts lines 560, 569
| } catch (error) { | ||
| const errorMsg = error instanceof Error ? error.message : String(error); | ||
| // Ignore "already exists" error on retries, rethrow all others | ||
| if (!errorMsg.includes("already exists")) { |
There was a problem hiding this comment.
Feedback: Error message substring matching too broad
Severity: 🔵 Nice to Have
Agent Claude:
Problem: This error handling pattern is fragile—any error message containing "already exists" as a substring gets silently ignored, including unrelated errors.
Example: If the backend returns "User creation failed: database already exists and is corrupted", this would be swallowed.
Recommendation: Check for exact error codes or more specific messages. If the API returns structured errors, match on error code instead of string matching.
| ); | ||
|
|
||
| const currentDate = new Date().toISOString().split("T")[0]; | ||
| const currentDate = new Date().toLocaleDateString("en-CA", { |
There was a problem hiding this comment.
Feedback: Hardcoded timezone may not match backend
Severity: 🔵 Nice to Have
Agent Claude:
Problem: Hardcoding "Asia/Kolkata" timezone assumes the backend generates routing configuration names using IST. If backend uses UTC or server local time, this assertion will fail.
Why this matters: Tests become environment-dependent and may pass locally but fail in CI, or vice versa.
Recommendation: Either:
- Use UTC (
timeZone: "UTC") if backend uses UTC for name generation, or - Make timezone configurable via environment variable to match backend configuration
| await signinPage.passwordInput.fill(password); | ||
| await signinPage.signinButton.click(); | ||
| await signinPage.skip2FAButton.click(); | ||
| await expect( |
There was a problem hiding this comment.
Feedback: Brittle assertion on welcome message text
Severity: 🔵 Nice to Have
Agent Claude:
Problem: Adding a hardcoded string assertion in the login helper makes every test that uses loginUI() dependent on this exact welcome message text.
Risk: If the welcome message text changes (e.g., localization, A/B testing, copy updates), 15+ tests will fail even though login actually succeeded.
Recommendation: Either:
- Remove this assertion and let calling tests verify post-login state as needed, or
- Check for URL change (
expect(page).toHaveURL(/.*dashboard\/home/)) which is more stable
|
@hs-pm-bot what do you think |
AnalysisThis PR introduces Playwright visual regression testing to the Control Center, alongside infrastructure improvements to the existing E2E setup. It's a meaningful addition to the testing story — visual regressions on auth, homepage, and operations flows are exactly the kind of thing unit/E2E tests miss. What's actually in the diffVisual test coverage (new):
Infrastructure:
Concerns worth flagging1. Viewport change has blast radius. Bumping to 2. Snapshot flakiness risk. Visual tests on dynamic dashboards (operations, payments) are notoriously flaky — timestamps, pagination, random IDs, chart animations, font rendering differences between local and CI Linux runners. I don't see masking/ 3. Timezone pin is a workaround, not a fix. 4. Rename is load-bearing. 5. PR hygiene. Description is empty (no Motivation, no "How did you test it", no checkboxes). For a 1149-line PR touching CI and adding 35 snapshot binaries, that's a gap — especially for open-source contributors reviewing later. Please fill in:
6. Snapshot maintenance cost. 35 PNGs × future UI changes = every frontend PR will need RecommendationDirection is right — visual tests are a real gap. Before merging:
Tag @prajwalnl for description fill-in; suggest a second pair of eyes from someone who's run these snapshots locally on a non-IST machine to sanity-check cross-platform stability. |
Type of Change
Description
Add comprehensive visual regression testing using Playwright screenshot comparisons for key UI flows in the Hyperswitch Control Center. This PR introduces 42 baseline snapshots covering authentication, homepage, and operations modules to detect unintended UI changes.
Changes:
Motivation and Context
Visual regression testing is essential for catching unintended UI changes before they reach production. As the control center grows in complexity, manual visual inspection becomes unreliable and time-consuming. This PR establishes a baseline of screenshot tests that will automatically flag any pixel-level differences in critical user flows, improving confidence in deployments and reducing manual QA overhead.
How did you test it?
Tests were validated locally using npx playwright test. Screenshots were generated on palaywright docker image with simialr env as CI with Chromium. CI workflow tested via GitHub Actions.
Where to test it?
Checklist
npm run re:build