Conversation
Add End-to-End Behavioral Simulator script using Python Playwright.
🎯 What: The testing gap addressed
The client-side `showScreen` DOM manipulation function lacked formal verification. Since standard Node.js DOM mocking is structurally rejected, testing this requires a true browser environment to assert actual CSS painting and layout geometry.
📊 Coverage: What scenarios are now tested
This implementation introduces a Python Playwright script that spins up a local server and performs strict assertions against the DOM. It tests:
1. Initial visibility states on application load.
2. DOM transitions triggered by user interaction (clicking the consent button).
3. DOM transitions triggered by programmatic function calls (`showScreen('trial')`).
It uses Playwright's `is_visible()` to mathematically verify that screens are properly hidden or painted.
✨ Result: The improvement in test coverage
The application's core navigation logic is now verified end-to-end within a real Chromium browser engine without modifying the vanilla black-box state of `code/experiment.js`, ensuring UI state changes meet behavioral science reliability standards.
Co-authored-by: hashexplaindata <221828969+hashexplaindata@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the reliability of the application's user interface by implementing a robust end-to-end testing framework. It addresses a critical gap in verifying client-side DOM manipulations, ensuring that screen transitions and visibility states behave as expected across various user and programmatic interactions within a real browser environment. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable end-to-end test using Playwright to verify client-side navigation logic. The implementation is a great step towards ensuring UI reliability. My review focuses on improving the robustness and maintainability of the new test script. Specifically, I've suggested replacing fixed-time waits (asyncio.sleep) with Playwright's built-in, auto-waiting assertions (expect) to prevent flaky tests. I've also pointed out some minor maintainability improvements like removing unused imports and handling hardcoded values.
| # Wait a bit for the server to start | ||
| await asyncio.sleep(1) |
There was a problem hiding this comment.
Using a fixed asyncio.sleep(1) to wait for the server to start can lead to flaky tests, as the server might take more or less time to initialize depending on system load. A more robust approach is to actively poll the server to check if it's ready, for instance by trying to establish a connection in a loop.
# Wait for the server to be ready by polling
for _ in range(20): # Try for 2 seconds
try:
_, writer = await asyncio.open_connection('localhost', 8000)
writer.close()
await writer.wait_closed()
break
except ConnectionRefusedError:
await asyncio.sleep(0.1)
else:
raise RuntimeError("Server did not start in time.")| # Wait for the setTimeout in showScreen to complete (50ms) | ||
| await asyncio.sleep(0.1) | ||
|
|
||
| # 3. State Verification: Screen 1 hidden, Screen 2 visible | ||
| # Check classes | ||
| screen1_classes = await screen1.get_attribute('class') | ||
| screen2_classes = await screen2.get_attribute('class') | ||
|
|
||
| assert 'active' not in screen1_classes, "Screen 1 should NOT have 'active' class after transition" | ||
| assert 'active' in screen2_classes, "Screen 2 should have 'active' class after transition" | ||
|
|
||
| # Check actual visibility | ||
| assert not await screen1.is_visible(), "Screen 1 should be hidden after transition" | ||
| assert await screen2.is_visible(), "Screen 2 should be visible after transition" |
There was a problem hiding this comment.
Using asyncio.sleep() for waiting on UI updates is a common source of test flakiness. The subsequent manual assertions don't wait, so they might run before the UI has finished updating. By using expect(), Playwright will automatically wait for the condition to be true before proceeding, making the test much more reliable. This suggestion replaces the sleep and the manual assertions.
# 3. State Verification: Screen 1 hidden, Screen 2 visible
# Using expect() for robust, auto-waiting assertions.
await expect(screen1).not_to_have_class(re.compile(r'\bactive\b'))
await expect(screen2).to_have_class(re.compile(r'\bactive\b'))
await expect(screen1).not_to_be_visible()
await expect(screen2).to_be_visible()| # Wait for the setTimeout in showScreen to complete | ||
| await asyncio.sleep(0.1) | ||
|
|
||
| screen_trial = page.locator('#screen-trial') | ||
|
|
||
| # Check classes | ||
| screen2_classes = await screen2.get_attribute('class') | ||
| screen_trial_classes = await screen_trial.get_attribute('class') | ||
|
|
||
| assert 'active' not in screen2_classes, "Screen 2 should NOT have 'active' class after transition to trial" | ||
| assert 'active' in screen_trial_classes, "Screen trial should have 'active' class after transition" | ||
|
|
||
| # Check actual visibility | ||
| assert not await screen2.is_visible(), "Screen 2 should be hidden after transition to trial" | ||
| assert await screen_trial.is_visible(), "Screen trial should be visible after transition to trial" |
There was a problem hiding this comment.
Similar to the previous transition, this asyncio.sleep() and manual assertion block should be replaced with auto-waiting expect() calls to improve test reliability and prevent flakiness.
screen_trial = page.locator('#screen-trial')
# Check classes and visibility using auto-waiting assertions
await expect(screen2).not_to_have_class(re.compile(r'\bactive\b'))
await expect(screen_trial).to_have_class(re.compile(r'\bactive\b'))
await expect(screen2).not_to_be_visible()
await expect(screen_trial).to_be_visible()| @@ -0,0 +1,94 @@ | |||
| import asyncio | |||
| from playwright.async_api import async_playwright | |||
There was a problem hiding this comment.
To write more robust, non-flaky tests, it's best to use Playwright's auto-waiting assertions via the expect function. This requires importing it, along with the re module for more precise class name assertions.
| from playwright.async_api import async_playwright | |
| from playwright.async_api import async_playwright, expect | |
| import re |
| @@ -0,0 +1,94 @@ | |||
| import asyncio | |||
| from playwright.async_api import async_playwright | |||
| import time | |||
|
|
||
| async def main(): | ||
| # Start the local server | ||
| server_process = subprocess.Popen(['python3', '-m', 'http.server', '8000'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) |
| # Check classes | ||
| screen1_classes = await screen1.get_attribute('class') | ||
| assert 'active' in screen1_classes, "Screen 1 should have 'active' class initially" | ||
|
|
||
| # Check actual visibility | ||
| assert await screen1.is_visible(), "Screen 1 should be visible initially" | ||
| assert not await screen2.is_visible(), "Screen 2 should be hidden initially" |
There was a problem hiding this comment.
These initial state assertions can be converted to use expect() for consistency and robustness. expect() will wait for the elements to be in the asserted state, which is safer than asserting immediately after navigation.
# Check classes and visibility using auto-waiting assertions
await expect(screen1).to_have_class(re.compile(r'\bactive\b'))
await expect(screen1).to_be_visible()
await expect(screen2).not_to_be_visible()
await expect(screen2).not_to_have_class(re.compile(r'\bactive\b'))There was a problem hiding this comment.
Pull request overview
Adds an end-to-end browser-based navigation verification script to formally test the client-side showScreen screen-transition behavior in a real Chromium environment.
Changes:
- Introduces an async Python Playwright script that starts a local web server and drives Chromium against
code/index.html. - Adds assertions for initial screen visibility, click-driven navigation (consent button), and programmatic navigation (
showScreen('trial')).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| await page.goto("http://localhost:8000/code/index.html?condition=control", wait_until="commit") | ||
|
|
There was a problem hiding this comment.
page.goto(..., wait_until="commit") can return before DOMContentLoaded and before experiment.js attaches the click handler / defines showScreen. That makes the subsequent page.click('#btn-consent') and page.evaluate("showScreen('trial')") race-prone and can fail nondeterministically. Prefer waiting for domcontentloaded (or load) and/or explicitly wait_for_function that window.showScreen exists before interacting.
| await page.goto("http://localhost:8000/code/index.html?condition=control", wait_until="commit") | |
| await page.goto("http://localhost:8000/code/index.html?condition=control", wait_until="domcontentloaded") | |
| # Ensure the experiment script has defined showScreen before interacting | |
| await page.wait_for_function("window.showScreen !== undefined") |
| screen1_classes = await screen1.get_attribute('class') | ||
| assert 'active' in screen1_classes, "Screen 1 should have 'active' class initially" | ||
|
|
There was a problem hiding this comment.
get_attribute('class') can return None when the attribute is missing; 'active' in screen1_classes (and similar checks below) will then raise a TypeError. Consider defaulting to an empty string when the attribute is None, or use evaluate/locator.evaluate to check element.classList.contains('active').
| # Wait for the setTimeout in showScreen to complete (50ms) | ||
| await asyncio.sleep(0.1) | ||
|
|
There was a problem hiding this comment.
The fixed asyncio.sleep(0.1) waits for the setTimeout-driven transition are timing-dependent and can be flaky on slower CI machines. It's more reliable to wait on an actual condition (e.g., wait for #screen-2 to become visible / have active, or wait for #screen-1 to become hidden) using wait_for_function or Playwright's locator assertions.
| # Start the local server | ||
| server_process = subprocess.Popen(['python3', '-m', 'http.server', '8000'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) | ||
|
|
||
| # Wait a bit for the server to start | ||
| await asyncio.sleep(1) |
There was a problem hiding this comment.
The local server is started with a hard-coded command (python3 -m http.server 8000) and a fixed port. This can fail on Windows/CI (no python3) or when port 8000 is already in use, and the failures are hard to debug because stdout/stderr are discarded. Consider using sys.executable, choosing an ephemeral free port, and serving the repo root explicitly (via cwd or --directory) so the script works no matter where it's launched from.
| import time | ||
| import subprocess | ||
| import os | ||
| import signal |
There was a problem hiding this comment.
Imports time, os, and signal are unused in this script. Removing them avoids lint noise and makes dependencies clearer.
| import time | |
| import subprocess | |
| import os | |
| import signal | |
| import subprocess |
| browser = await p.chromium.launch(headless=True) | ||
| context = await browser.new_context() | ||
| page = await context.new_page() | ||
|
|
||
| # Intercept and abort external requests to prevent hanging | ||
| await page.route("**/*", lambda route: route.abort() if route.request.url.startswith("http") and not route.request.url.startswith("http://localhost") else route.continue_()) | ||
|
|
||
| # Navigate to the experiment | ||
| print("Navigating to the experiment...") | ||
| await page.goto("http://localhost:8000/code/index.html?condition=control", wait_until="commit") | ||
|
|
||
| # Wait for the initial screen to be ready | ||
| await page.wait_for_selector('#screen-1', state="attached") | ||
|
|
||
| print("Testing initial state...") | ||
| # 1. Visibility Assertion: Screen 1 should be visible, others hidden | ||
| screen1 = page.locator('#screen-1') | ||
| screen2 = page.locator('#screen-2') | ||
|
|
||
| # Check classes | ||
| screen1_classes = await screen1.get_attribute('class') | ||
| assert 'active' in screen1_classes, "Screen 1 should have 'active' class initially" | ||
|
|
||
| # Check actual visibility | ||
| assert await screen1.is_visible(), "Screen 1 should be visible initially" | ||
| assert not await screen2.is_visible(), "Screen 2 should be hidden initially" | ||
|
|
||
| print("Testing transition to Screen 2...") | ||
| # 2. Transition Assertion: Click consent button | ||
| await page.click('#btn-consent') | ||
|
|
||
| # Wait for the setTimeout in showScreen to complete (50ms) | ||
| await asyncio.sleep(0.1) | ||
|
|
||
| # 3. State Verification: Screen 1 hidden, Screen 2 visible | ||
| # Check classes | ||
| screen1_classes = await screen1.get_attribute('class') | ||
| screen2_classes = await screen2.get_attribute('class') | ||
|
|
||
| assert 'active' not in screen1_classes, "Screen 1 should NOT have 'active' class after transition" | ||
| assert 'active' in screen2_classes, "Screen 2 should have 'active' class after transition" | ||
|
|
||
| # Check actual visibility | ||
| assert not await screen1.is_visible(), "Screen 1 should be hidden after transition" | ||
| assert await screen2.is_visible(), "Screen 2 should be visible after transition" | ||
|
|
||
| print("Testing programmatic transition to Screen 'trial'...") | ||
| # 4. Trigger a programmatic transition to 'trial' | ||
| await page.evaluate("showScreen('trial')") | ||
|
|
||
| # Wait for the setTimeout in showScreen to complete | ||
| await asyncio.sleep(0.1) | ||
|
|
||
| screen_trial = page.locator('#screen-trial') | ||
|
|
||
| # Check classes | ||
| screen2_classes = await screen2.get_attribute('class') | ||
| screen_trial_classes = await screen_trial.get_attribute('class') | ||
|
|
||
| assert 'active' not in screen2_classes, "Screen 2 should NOT have 'active' class after transition to trial" | ||
| assert 'active' in screen_trial_classes, "Screen trial should have 'active' class after transition" | ||
|
|
||
| # Check actual visibility | ||
| assert not await screen2.is_visible(), "Screen 2 should be hidden after transition to trial" | ||
| assert await screen_trial.is_visible(), "Screen trial should be visible after transition to trial" | ||
|
|
||
| print("✅ All navigation and visibility assertions passed successfully.") | ||
|
|
||
| await browser.close() | ||
|
|
There was a problem hiding this comment.
If any assertion or Playwright call raises inside the async with async_playwright() block, the browser/context may not be closed because await browser.close() is not in a finally. Wrapping browser/context creation in a try/finally (or using async with p.chromium.launch(...) as browser if supported) helps avoid orphaned processes on failure.
| browser = await p.chromium.launch(headless=True) | |
| context = await browser.new_context() | |
| page = await context.new_page() | |
| # Intercept and abort external requests to prevent hanging | |
| await page.route("**/*", lambda route: route.abort() if route.request.url.startswith("http") and not route.request.url.startswith("http://localhost") else route.continue_()) | |
| # Navigate to the experiment | |
| print("Navigating to the experiment...") | |
| await page.goto("http://localhost:8000/code/index.html?condition=control", wait_until="commit") | |
| # Wait for the initial screen to be ready | |
| await page.wait_for_selector('#screen-1', state="attached") | |
| print("Testing initial state...") | |
| # 1. Visibility Assertion: Screen 1 should be visible, others hidden | |
| screen1 = page.locator('#screen-1') | |
| screen2 = page.locator('#screen-2') | |
| # Check classes | |
| screen1_classes = await screen1.get_attribute('class') | |
| assert 'active' in screen1_classes, "Screen 1 should have 'active' class initially" | |
| # Check actual visibility | |
| assert await screen1.is_visible(), "Screen 1 should be visible initially" | |
| assert not await screen2.is_visible(), "Screen 2 should be hidden initially" | |
| print("Testing transition to Screen 2...") | |
| # 2. Transition Assertion: Click consent button | |
| await page.click('#btn-consent') | |
| # Wait for the setTimeout in showScreen to complete (50ms) | |
| await asyncio.sleep(0.1) | |
| # 3. State Verification: Screen 1 hidden, Screen 2 visible | |
| # Check classes | |
| screen1_classes = await screen1.get_attribute('class') | |
| screen2_classes = await screen2.get_attribute('class') | |
| assert 'active' not in screen1_classes, "Screen 1 should NOT have 'active' class after transition" | |
| assert 'active' in screen2_classes, "Screen 2 should have 'active' class after transition" | |
| # Check actual visibility | |
| assert not await screen1.is_visible(), "Screen 1 should be hidden after transition" | |
| assert await screen2.is_visible(), "Screen 2 should be visible after transition" | |
| print("Testing programmatic transition to Screen 'trial'...") | |
| # 4. Trigger a programmatic transition to 'trial' | |
| await page.evaluate("showScreen('trial')") | |
| # Wait for the setTimeout in showScreen to complete | |
| await asyncio.sleep(0.1) | |
| screen_trial = page.locator('#screen-trial') | |
| # Check classes | |
| screen2_classes = await screen2.get_attribute('class') | |
| screen_trial_classes = await screen_trial.get_attribute('class') | |
| assert 'active' not in screen2_classes, "Screen 2 should NOT have 'active' class after transition to trial" | |
| assert 'active' in screen_trial_classes, "Screen trial should have 'active' class after transition" | |
| # Check actual visibility | |
| assert not await screen2.is_visible(), "Screen 2 should be hidden after transition to trial" | |
| assert await screen_trial.is_visible(), "Screen trial should be visible after transition to trial" | |
| print("✅ All navigation and visibility assertions passed successfully.") | |
| await browser.close() | |
| async with p.chromium.launch(headless=True) as browser: | |
| async with browser.new_context() as context: | |
| page = await context.new_page() | |
| # Intercept and abort external requests to prevent hanging | |
| await page.route("**/*", lambda route: route.abort() if route.request.url.startswith("http") and not route.request.url.startswith("http://localhost") else route.continue_()) | |
| # Navigate to the experiment | |
| print("Navigating to the experiment...") | |
| await page.goto("http://localhost:8000/code/index.html?condition=control", wait_until="commit") | |
| # Wait for the initial screen to be ready | |
| await page.wait_for_selector('#screen-1', state="attached") | |
| print("Testing initial state...") | |
| # 1. Visibility Assertion: Screen 1 should be visible, others hidden | |
| screen1 = page.locator('#screen-1') | |
| screen2 = page.locator('#screen-2') | |
| # Check classes | |
| screen1_classes = await screen1.get_attribute('class') | |
| assert 'active' in screen1_classes, "Screen 1 should have 'active' class initially" | |
| # Check actual visibility | |
| assert await screen1.is_visible(), "Screen 1 should be visible initially" | |
| assert not await screen2.is_visible(), "Screen 2 should be hidden initially" | |
| print("Testing transition to Screen 2...") | |
| # 2. Transition Assertion: Click consent button | |
| await page.click('#btn-consent') | |
| # Wait for the setTimeout in showScreen to complete (50ms) | |
| await asyncio.sleep(0.1) | |
| # 3. State Verification: Screen 1 hidden, Screen 2 visible | |
| # Check classes | |
| screen1_classes = await screen1.get_attribute('class') | |
| screen2_classes = await screen2.get_attribute('class') | |
| assert 'active' not in screen1_classes, "Screen 1 should NOT have 'active' class after transition" | |
| assert 'active' in screen2_classes, "Screen 2 should have 'active' class after transition" | |
| # Check actual visibility | |
| assert not await screen1.is_visible(), "Screen 1 should be hidden after transition" | |
| assert await screen2.is_visible(), "Screen 2 should be visible after transition" | |
| print("Testing programmatic transition to Screen 'trial'...") | |
| # 4. Trigger a programmatic transition to 'trial' | |
| await page.evaluate("showScreen('trial')") | |
| # Wait for the setTimeout in showScreen to complete | |
| await asyncio.sleep(0.1) | |
| screen_trial = page.locator('#screen-trial') | |
| # Check classes | |
| screen2_classes = await screen2.get_attribute('class') | |
| screen_trial_classes = await screen_trial.get_attribute('class') | |
| assert 'active' not in screen2_classes, "Screen 2 should NOT have 'active' class after transition to trial" | |
| assert 'active' in screen_trial_classes, "Screen trial should have 'active' class after transition" | |
| # Check actual visibility | |
| assert not await screen2.is_visible(), "Screen 2 should be hidden after transition to trial" | |
| assert await screen_trial.is_visible(), "Screen trial should be visible after transition to trial" | |
| print("✅ All navigation and visibility assertions passed successfully.") |
| await page.route("**/*", lambda route: route.abort() if route.request.url.startswith("http") and not route.request.url.startswith("http://localhost") else route.continue_()) | ||
|
|
There was a problem hiding this comment.
page.route handler uses a synchronous lambda that calls route.abort() / route.continue_() without awaiting them. In Playwright's async API these are coroutines; this will raise "coroutine was never awaited" and the routing will not behave reliably. Use an async def route handler and await route.abort() / await route.continue_() (or remove routing if not needed).
| await page.route("**/*", lambda route: route.abort() if route.request.url.startswith("http") and not route.request.url.startswith("http://localhost") else route.continue_()) | |
| async def handle_route(route): | |
| if route.request.url.startswith("http") and not route.request.url.startswith("http://localhost"): | |
| await route.abort() | |
| else: | |
| await route.continue_() | |
| await page.route("**/*", handle_route) |
|
Closing: Fixes have already been incorporated directly into the master branch. |
Understood. Acknowledging that this work is now obsolete and stopping work on this task. |
🎯 What: The testing gap addressed
The client-side
showScreenDOM manipulation function lacked formal verification. Since standard Node.js DOM mocking is structurally rejected, testing this requires a true browser environment to assert actual CSS painting and layout geometry.📊 Coverage: What scenarios are now tested
This implementation introduces a Python Playwright script that spins up a local server and performs strict assertions against the DOM. It tests:
showScreen('trial')).It uses Playwright's
is_visible()to mathematically verify that screens are properly hidden or painted.✨ Result: The improvement in test coverage
The application's core navigation logic is now verified end-to-end within a real Chromium browser engine without modifying the vanilla black-box state of
code/experiment.js, ensuring UI state changes meet behavioral science reliability standards.PR created automatically by Jules for task 4415292946061797070 started by @hashexplaindata