[FEAT] Generic Testing Harness + "Take All" tests #44

Avni2000 · 2026-02-06T02:46:42Z

Added "Take All" tests which run in 3 different git repos + vscode instances
Introduced integrationUtils.ts for shared utility functions related to conflict resolution and UI interactions.
Improved testHelpers.ts to accommodate new parameters for test configurations.
Updated .gitignore to exclude output.txt.
Updated the conflict resolution logic in resolver.ts to prioritize uniform "take all" actions based on the preferred side (base, current, or incoming).

Summary by CodeRabbit

New Features
- Added "Accept All" bulk-resolve actions with three header buttons (All Base, All Current, All Incoming).
Refactor
- Removed the "Use Both" option and consolidated resolution choices to a shared type.
- Added uniform-take-all resolution behavior and made session startup non-blocking.
Tests
- Reworked integration runner for sequential multi-case runs; added a test harness, reusable test utilities, and end-to-end bulk-resolve tests.
Chores
- Added output.txt to .gitignore.

coderabbitai · 2026-02-06T02:47:04Z

📝 Walkthrough

Walkthrough

Replaces single-run integration flow with a TEST_CASES-driven sequential runner, adds shared Playwright/VSCode test harness and helpers, introduces a "Take All" E2E test and bulk-accept UI buttons, removes the 'both' resolution option, and adds uniform "take all" resolver behavior.

Changes

Cohort / File(s)	Summary
Integration runner & tests `src/tests/runIntegrationTest.ts`, `src/tests/takeAllButtons.test.ts`	Adds a sequential TEST_CASES runner with per-case configs and workspace lifecycle; adds the Playwright E2E `takeAllButtons` test driven by temp config JSON.
Test harness & helpers `src/tests/testHelpers.ts`, `src/tests/testHarness.ts`	New centralized test utilities: types (TestCaseDefinition, TestConfig, ExpectedCell), health/session polling, server/session orchestration, file-write waiting, setupConflictResolver, applyResolutionAndReadNotebook, and notebook assertion helpers.
Playwright UI helpers `src/tests/integrationUtils.ts`	New Playwright-focused utilities and types (MergeSide, ConflictChoice, ConflictChoiceResolver) plus functions for extracting/verifying UI cells, resolved-count polling, waiting for all resolved, and collecting ExpectedCell[] from the UI.
VS Code integration test refactor `src/tests/vscodeIntegration.test.ts`	Refactored to use shared testHelpers/integrationUtils/testHarness; removed inline health/session/polling and ad-hoc parsing logic in favor of centralized utilities.
Client UI: bulk actions & row changes `src/web/client/ConflictResolver.tsx`, `src/web/client/MergeRow.tsx`	Adds `handleAcceptAll(choice)` and three header buttons (All Base/All Current/All Incoming); removes "Use Both" UI and local 'both' handling; relies on shared ResolutionChoice type.
Types centralization `src/types.ts`, `src/web/client/types.ts`, `src/web/webTypes.ts`	Removes `'both'` from `ResolutionChoice` and updates client/web types to import the shared `ResolutionChoice`.
Resolver logic `src/resolver.ts`	Implements uniform "take all" handling by deriving a preferred non-delete side, reordering rows, and using preferred-side cells across rows; removes handling for the `'both'` case.
WebSocket change `src/web/WebConflictPanel.ts`	Switches from awaiting openSession to non-blocking call with then/catch handlers for session initialization and asynchronous send/log flow.
Misc `.gitignore`	Adds `output.txt` to `.gitignore`.

Sequence Diagrams

sequenceDiagram
    actor User
    participant UI as ConflictResolver UI
    participant State as Component State
    participant FS as Notebook File System

    User->>UI: Click "All Base"/"All Current"/"All Incoming"
    UI->>State: handleAcceptAll(choice)
    loop each conflict row
        State->>State: set resolvedContent & choice (or delete)
    end
    UI->>User: render updated resolved rows
    User->>UI: Click "Apply Resolution"
    UI->>FS: write resolved notebook
    FS-->>User: notebook updated

sequenceDiagram
    participant Runner as Integration Runner
    participant FS as File System
    participant Server as Merge-NB Server
    participant VS as VS Code Test Runner
    participant Browser as Playwright Browser

    Runner->>Runner: iterate TEST_CASES
    loop per test case
        Runner->>FS: createMergeConflictRepo(base,current,incoming)
        Runner->>Server: waitForServer(portFile)
        Server-->>Runner: port available
        Runner->>VS: launch test module with per-case config
        VS->>Browser: run Playwright test (open workspace, UI interactions)
        Browser->>FS: read/write notebook files
        VS-->>Runner: report pass/fail
        Runner->>FS: cleanup workspace & config
    end
    Runner->>Runner: exit with aggregated result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[FEAT] Create a Testing Harness #34: Overlaps test infrastructure and Playwright tests (runIntegrationTest, testHelpers, testHarness, takeAllButtons).
Optimize WebConflictPanel rendering for large notebooks via virtualization and lazy MathJax #23: Touches the same UI components and ResolutionChoice/type changes (ConflictResolver, MergeRow).
[FIX] Rebuild notebook from unified rows + resolved text areas #30: Modifies resolver/resolution plumbing that intersects the new uniform "take all" logic in src/resolver.ts.

Poem

🐰 I hopped through branches, cells in a row,

Clicked "All Base", "All Current", "All Incoming" — watch them go.
Playwright and servers whisper tests through the night,
Rows resolved, files written — the merge looks right.
A cheerful rabbit twitches its nose at merged-code delight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[FEAT] Generic Testing Harness + "Take All" tests' accurately describes the main changes: introduction of a generic testing harness and bulk-resolve "take all" functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch generic-tester

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/tests/takeAllButtons.test.ts`:
- Around line 270-312: The capture logic assumes "current" was chosen when
building expectedCells for conflict rows; instead inspect the resolved-cell DOM
to derive which side was applied (e.g., check for an active/selected class or
which resolved-choice button is toggled on the resolvedCell element) and set a
variable like resolvedSide = 'current' | 'incoming' (default to 'current' or
'code' if not found); then replace direct calls to getColumnCellType(row,
'current') and getColumnCell(row, 'current', i) with getColumnCellType(row,
resolvedSide) and getColumnCell(row, resolvedSide, i) so metadata and cellType
come from the actual chosen side before pushing expectedCells. Ensure you still
treat deleted/textarea branches the same and handle absence of a referenceCell
safely.

In `@src/tests/testHelpers.ts`:
- Around line 51-59: The checkHealth function currently reads only
res.statusCode and never consumes the HTTP response body, which can keep the
socket open; modify checkHealth (the Promise created in checkHealth) so that in
the http.get callback you consume or drain the response stream (e.g., call
res.resume() or read data and listen for 'end') before resolving based on
res.statusCode, and ensure existing error and timeout handlers still resolve
false and cleanup the request; this guarantees the response body is drained and
the socket is freed.
- Around line 161-177: The mtime check in waitForFileWrite is hardcoded to 10
seconds which can miss a file written before polling begins; update
waitForFileWrite to record an anchor timestamp at function start (e.g., const
start = Date.now()) and replace the fixed 10000 mtime window with a comparison
using the provided timeoutMs (for example treat the file as recent if
stat.mtimeMs >= start - 0 or Date.now() - stat.mtimeMs <= timeoutMs), so the
mtime window is relative to when waitForFileWrite was called and honors the
timeoutMs parameter; adjust logic in waitForFileWrite (and keep maxAttempts
behavior) and ensure you reference the start timestamp and timeoutMs when
checking stat.mtimeMs.

🧹 Nitpick comments (8)

src/web/client/ConflictResolver.tsx (2)

272-298: handleAcceptAll silently overwrites user-edited resolutions.

The MergeRow component (see MergeRow.tsx lines 106–116) shows a warning when a user has modified content and tries to switch branches. handleAcceptAll bypasses that guard by directly setting choices state, discarding any per-row edits without confirmation. Consider showing a confirmation dialog when any row has been manually edited before applying the bulk action.

461-504: Consider extracting inline button styles to CSS classes.

All three "Accept All" buttons share the same structure with only background and border varying by side. Extracting to CSS classes (e.g., .btn-accept-all-base, .btn-accept-all-current, .btn-accept-all-incoming) would reduce JSX verbosity and keep styling consistent with the rest of the component.

src/tests/testHelpers.ts (1)

84-88: getCellSource duplicates normalizeCellSource from src/notebookUtils.ts.

normalizeCellSource (in src/notebookUtils.ts lines 14–19) already handles string | string[] normalization. getCellSource adds a null-cell guard on top. Consider reusing normalizeCellSource here to avoid divergence (e.g., if join separator ever changes).

src/tests/runIntegrationTest.ts (1)

85-153: Shared config file path could collide if multiple runners execute in parallel.

configPath at Line 88 uses a fixed well-known path in os.tmpdir(). If two runner processes ever execute simultaneously (e.g., parallel CI jobs on the same machine), they would overwrite each other's config. Consider including a unique identifier (e.g., PID or random suffix) in the config filename.

This is fine for the current sequential design but worth noting for future-proofing.

src/tests/vscodeIntegration.test.ts (2)

237-242: getColumnCell is duplicated between this file and takeAllButtons.test.ts.

This local function (lines 237–242) has an identical implementation extracted as a module-level function in takeAllButtons.test.ts (lines 30–35). Consider moving it into testHelpers.ts alongside the other shared utilities to keep the DRY principle.

49-49: executeCommand is fire-and-forget — command failure won't surface clearly.

vscode.commands.executeCommand('merge-nb.findConflicts') is not awaited. If the command fails (e.g., extension not activated), the only symptom is a waitForServer timeout with a generic message. Consider awaiting the promise and wrapping in a try/catch for a clearer failure diagnostic.

src/tests/takeAllButtons.test.ts (2)

432-471: Cross-check against 04_current.ipynb — hardcoded path depends on test runner convention.

The path ../../test (Line 435) and filename 04_current.ipynb (Line 436) are derived from __dirname, which assumes the compiled output structure matches the source layout. This is fragile if the build output directory changes. Consider reading the notebook path from the test config or the TestCaseDefinition to keep it in sync with the runner.

Also, the cross-check treats mismatches as warnings (Line 464–468), which is appropriate given cell-matching may rearrange sources.

106-162: Setup boilerplate is nearly identical to vscodeIntegration.test.ts.

Lines 112–162 (config read, port file cleanup, command execution, server/session wait, browser launch, header check) are duplicated almost verbatim from vscodeIntegration.test.ts lines 31–72. Consider extracting a shared setupTestBrowser(configPath) helper into testHelpers.ts that returns { page, browser, serverPort, conflictFile }.

src/tests/takeAllButtons.test.ts

src/tests/testHelpers.ts

coderabbitai · 2026-02-06T02:51:19Z

src/tests/testHelpers.ts

+/**
+ * Wait for the conflict file to be written (mtime within last 10 seconds).
+ * Returns true if confirmed, false otherwise.
+ */
+export async function waitForFileWrite(filePath: string, fs: typeof import('fs'), timeoutMs = 10000): Promise<boolean> {
+    const maxAttempts = Math.ceil(timeoutMs / 500);
+    for (let attempt = 0; attempt < maxAttempts; attempt++) {
+        await new Promise(r => setTimeout(r, 500));
+        try {
+            const stat = fs.statSync(filePath);
+            if (Date.now() - stat.mtimeMs < 10000) {
+                return true;
+            }
+        } catch { /* continue */ }
+    }
+    return false;
+}


⚠️ Potential issue | 🟡 Minor

waitForFileWrite mtime window is hardcoded to 10 seconds regardless of timeoutMs.

The 10-second mtime check on Line 171 means if the file was written more than 10 seconds before the first poll, it will never be detected even though the overall timeout hasn't expired. Consider making the mtime window relative to when waitForFileWrite was first called, or document this constraint.

🛡️ Proposed fix — anchor to call time

export async function waitForFileWrite(filePath: string, fs: typeof import('fs'), timeoutMs = 10000): Promise<boolean> { + const callTime = Date.now(); const maxAttempts = Math.ceil(timeoutMs / 500); for (let attempt = 0; attempt < maxAttempts; attempt++) { await new Promise(r => setTimeout(r, 500)); try { const stat = fs.statSync(filePath); - if (Date.now() - stat.mtimeMs < 10000) { + if (stat.mtimeMs >= callTime) { return true; } } catch { /* continue */ } } return false; }

🤖 Prompt for AI Agents

In `@src/tests/testHelpers.ts` around lines 161 - 177, The mtime check in waitForFileWrite is hardcoded to 10 seconds which can miss a file written before polling begins; update waitForFileWrite to record an anchor timestamp at function start (e.g., const start = Date.now()) and replace the fixed 10000 mtime window with a comparison using the provided timeoutMs (for example treat the file as recent if stat.mtimeMs >= start - 0 or Date.now() - stat.mtimeMs <= timeoutMs), so the mtime window is relative to when waitForFileWrite was called and honors the timeoutMs parameter; adjust logic in waitForFileWrite (and keep maxAttempts behavior) and ensure you reference the start timestamp and timeoutMs when checking stat.mtimeMs.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/tests/testHelpers.ts`:
- Around line 62-83: In getHealthInfo, when the response status is not 200 the
response body is not consumed which leaks the socket; update the non-200 branch
inside the http.get response handler (the callback in getHealthInfo) to drain
the response by calling res.resume() (or otherwise consume the stream) before
resolving null, mirroring the fix used in checkHealth, so the socket is
released.

coderabbitai · 2026-02-06T04:29:11Z

src/tests/testHelpers.ts

+/** Get detailed health info from the web server */
+export function getHealthInfo(port: number): Promise<HealthResponse | null> {
+    return new Promise((resolve) => {
+        const req = http.get(`http://127.0.0.1:${port}/health`, { timeout: 1000 }, (res) => {
+            if (res.statusCode !== 200) {
+                resolve(null);
+                return;
+            }
+            let data = '';
+            res.on('data', (chunk: string) => data += chunk);
+            res.on('end', () => {
+                try {
+                    resolve(JSON.parse(data));
+                } catch {
+                    resolve(null);
+                }
+            });
+        });
+        req.on('error', () => resolve(null));
+        req.on('timeout', () => { req.destroy(); resolve(null); });
+    });
+}


⚠️ Potential issue | 🟡 Minor

Missing res.resume() on the non-200 path leaks the socket.

The same drain fix applied to checkHealth (Line 54) is missing here. When statusCode !== 200, the response body is never consumed, keeping the underlying socket allocated.

🐛 Proposed fix — drain on non-200

const req = http.get(`http://127.0.0.1:${port}/health`, { timeout: 1000 }, (res) => { if (res.statusCode !== 200) { + res.resume(); resolve(null); return; }

🤖 Prompt for AI Agents

In `@src/tests/testHelpers.ts` around lines 62 - 83, In getHealthInfo, when the response status is not 200 the response body is not consumed which leaks the socket; update the non-200 branch inside the http.get response handler (the callback in getHealthInfo) to drain the response by calling res.resume() (or otherwise consume the stream) before resolving null, mirroring the fix used in checkHealth, so the socket is released.

- Updated the conflict resolution logic in resolver.ts to prioritize uniform "take all" actions based on the preferred side (base, current, or incoming). - Added new test cases in runIntegrationTest.ts to cover scenarios for each preferred side action. - Refactored takeAllButtons.test.ts to streamline the testing process and ensure proper verification of resolved cells. - Introduced integrationUtils.ts for shared utility functions related to conflict resolution and UI interactions. - Improved testHelpers.ts to accommodate new parameters for test configurations. - Updated .gitignore to exclude output.txt.

Avni2000 · 2026-02-06T05:52:18Z

@coderabbitai a lot of things have changed since the last time you reviewed the code. Could you review it again?

Also, please ensure the resolver.ts change is working as expected or if that needs to be documented better or what

coderabbitai · 2026-02-06T05:54:13Z

🧩 Analysis chain

@Avni2000, I'll review the changes again and specifically examine the resolver.ts logic. Let me first read the resolver.ts file to understand the implementation details.

🏁 Script executed:

#!/bin/bash
# Read the resolver.ts file to examine the uniform-resolution heuristic changes
cat src/resolver.ts

Length of output: 17076

Let me also check the ConflictResolver.tsx to understand how the UI sends resolutions:

🏁 Script executed:

#!/bin/bash
# Read the ConflictResolver.tsx to understand how handleAcceptAll works
cat src/web/client/ConflictResolver.tsx | head -n 250

Length of output: 9941

🏁 Script executed:

#!/bin/bash
# Get the handleAcceptAll function implementation
cat src/web/client/ConflictResolver.tsx | grep -A 50 "handleAcceptAll"

Length of output: 6056

@Avni2000, I've thoroughly analyzed the resolver.ts changes and their interaction with the UI. I've identified several areas where the logic could be clearer and better documented. Here's my detailed analysis:

Issues Found in resolver.ts

1. Type Inconsistency (Line 274)

const preferredSide = (uniqueChoices.size === 1
    ? [...uniqueChoices][0]
    : undefined) as ('base' | 'current' | 'incoming' | 'both' | undefined);

The type assertion includes 'both', but the reordering logic (lines 277-287) only handles 'base' | 'current' | 'incoming'. This is inconsistent. Either:

Remove 'both' from the type assertion, OR
Add handling for 'both' in the reordering logic

Recommendation: Remove 'both' since it doesn't make sense for uniform "take all" scenarios.

2. Unclear Fallback Behavior (Lines 323-327)

} else if (preferredSide === 'base' || preferredSide === 'current' || preferredSide === 'incoming') {
    // For uniform "take all", only include cells that exist on the preferred side.
    if (preferredSide === 'base') cellToUse = baseCell;
    else if (preferredSide === 'current') cellToUse = currentCell;
    else if (preferredSide === 'incoming') cellToUse = incomingCell;
}

This fallback triggers for rows without explicit resolutions when a preferredSide exists. However:

The UI's handleAcceptAll sets resolutions for all conflict rows
This means the fallback only applies to non-conflict rows (identical cells)
But for identical rows, the existing behavior (lines 329-330) already works

Questions:

Is this fallback actually needed?
When would a preferredSide exist but some rows lack resolutions?
Should this filter out cells that don't exist on the preferred side for identical rows too?

3. Missing Documentation

The code lacks explanation for:

Why reordering happens: The comment says "prefer that side for ordering" but doesn't explain the purpose. Is it to:
- Preserve the chosen side's cell order in the final notebook?
- Ensure proper indexing?
- Something else?

The uniform resolution concept: What does "uniform take all" mean? Add a comment like:

// Detect uniform "take all" action (e.g., when user clicks "All Base", "All Current", or "All Incoming"
// in the UI). If all non-delete resolutions choose the same side, we consider it a uniform resolution
// and preserve that side's cell ordering in the final notebook.

Fallback trigger conditions: When does the fallback apply vs. the default path?
Relationship to UI: No mention that this logic handles the "All Base/Current/Incoming" buttons from the UI.

4. Suggested Documentation Improvements

Here's how I'd document this section:

// Detect uniform "take all" action (e.g., when user clicks "All Base", "All Current", or "All Incoming").
// If all non-delete resolutions choose the same side, we:
// 1. Reorder rows to preserve that side's original cell ordering
// 2. For rows without explicit resolutions, only include cells that exist on that side
const nonDeleteChoices = resolvedRows
    .map(r => r.resolution?.choice)
    .filter((c): c is 'base' | 'current' | 'incoming' => !!c && c !== 'delete');

const uniqueChoices = new Set(nonDeleteChoices);
const preferredSide = (uniqueChoices.size === 1
    ? [...uniqueChoices][0]
    : undefined) as ('base' | 'current' | 'incoming' | undefined);

let rowsForResolution = resolvedRows;
if (preferredSide === 'base' || preferredSide === 'current' || preferredSide === 'incoming') {
    // Reorder rows to match the preferred side's original cell order.
    // This ensures the final notebook preserves the structure of the chosen side.
    const indexKey = preferredSide === 'base'
        ? 'baseCellIndex'
        : preferredSide === 'current'
            ? 'currentCellIndex'
            : 'incomingCellIndex';

    const withIndex = resolvedRows
        .filter(r => (r as any)[indexKey] !== undefined)
        .sort((a, b) => (a as any)[indexKey] - (b as any)[indexKey]);
    const withoutIndex = resolvedRows.filter(r => (r as any)[indexKey] === undefined);
    rowsForResolution = [...withIndex, ...withoutIndex];
}

Recommendations

Add comprehensive comments explaining the uniform resolution logic
Clarify or remove the fallback behavior (lines 323-327) if it's not needed
Fix the type inconsistency by removing 'both' from line 274
Add a function header comment explaining that this function handles both individual resolutions and bulk "take all" operations
Consider extracting the uniform resolution detection into a separate function with a descriptive name like detectUniformResolution()

I'll now trigger a full review of the PR as requested.

🧠 Learnings used

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:45:32.680Z
Learning: Applies to src/**/resolver.ts : Use the `merge-nb.findConflicts` command as the single unified entry point to find notebooks with merge conflicts (both textual and semantic) and bring up the conflict resolution panel

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:39:45.034Z
Learning: Applies to src/test/**/*.ipynb : Include three-way semantic conflict test cases with base, current, and incoming notebook versions for testing the merge algorithm

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:39:45.034Z
Learning: Applies to src/conflictDetector.ts : Implement `analyzeNotebookConflicts` function for detection and `resolveAllConflicts` function for resolution

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/resolver.ts (1)

347-349: ⚠️ Potential issue | 🟡 Minor

Empty resolvedContent silently drops the cell — is this intentional?

If a user edits a resolved cell's content to be empty (but did not choose "delete"), this continue drops the cell from the output. This could surprise users who intended to keep an empty code cell. The explicit delete choice already handles deletion at Line 344. Consider limiting this guard to only the delete case, or documenting that empty content is treated as deletion.

🤖 Fix all issues with AI agents

In `@src/resolver.ts`:
- Around line 293-303: preferredSide can become 'both' because nonDeleteChoices
includes 'both', but downstream code only handles 'base'|'current'|'incoming';
update the detection to exclude 'both' or explicitly handle it. Fix by changing
the filter that builds nonDeleteChoices to remove 'both' (so nonDeleteChoices
only contains 'base'|'current'|'incoming') or, if you prefer to support 'both',
add handling for preferredSide === 'both' in the reordering and fallback logic
used with resolvedRows (the same branches that currently check preferredSide for
'base'|'current'|'incoming'); refer to nonDeleteChoices, preferredSide, and the
downstream reorder/fallback code that consumes preferredSide and ensure
consistency with handleAcceptAll which only sends base|current|incoming.

In `@src/tests/takeAllButtons.test.ts`:
- Around line 200-204: The log replacement is using the regex /\\n/g which
matches a literal backslash+n instead of real newline characters; update the two
replace calls where expected.source and actual.source are logged (in the failing
branch that increments sourceMismatches) to use /\n/g so actual newline
characters are escaped in the output (keep the replacement string as '\\n' so
the log shows the two-character sequence). This change should be made in the
test code in takeAllButtons.test.ts wherever expected.source.replace(/\\n/g,
'\\\\n') and actual.source.replace(/\\n/g, '\\\\n') are used.

🧹 Nitpick comments (7)

.gitignore (1)

6-6: Consider a more specific pattern if output.txt is generated in limited locations.

The pattern output.txt will ignore any file with that name anywhere in the repository tree. If the test output files are only generated at the root level, consider using /output.txt (matches root only), or if they're only in test directories, use a pattern like **/test/**/output.txt or integration/**/output.txt. This reduces the risk of accidentally ignoring important files with the same name elsewhere.

That said, the current pattern is acceptable if output.txt may be generated in multiple locations throughout the repository.

src/web/client/ConflictResolver.tsx (1)

461-504: Consider extracting repeated inline button styles into a shared object or CSS class.

All three buttons share the same fontSize, padding, color, and differ only in background/border. This is a minor readability nit — the inline approach is consistent with the rest of the component.
src/resolver.ts (1)
313-316: as any bypasses type safety on ResolvedRow — use a typed accessor instead.

The dynamic property access via (r as any)[indexKey] is fragile. If ResolvedRow field names change, the compiler won't catch this. A small helper or inline accessor eliminates the cast.
♻️ Proposed refactor — typed index accessor
+        const getCellIndex = (r: import('./web/webTypes').ResolvedRow): number | undefined => {
+            if (preferredSide === 'base') return r.baseCellIndex;
+            if (preferredSide === 'current') return r.currentCellIndex;
+            return r.incomingCellIndex;
+        };
+
-        const withIndex = resolvedRows
-            .filter(r => (r as any)[indexKey] !== undefined)
-            .sort((a, b) => (a as any)[indexKey] - (b as any)[indexKey]);
-        const withoutIndex = resolvedRows.filter(r => (r as any)[indexKey] === undefined);
+        const withIndex = resolvedRows
+            .filter(r => getCellIndex(r) !== undefined)
+            .sort((a, b) => getCellIndex(a)! - getCellIndex(b)!);
+        const withoutIndex = resolvedRows.filter(r => getCellIndex(r) === undefined);
         rowsForResolution = [...withIndex, ...withoutIndex];
src/tests/integrationUtils.ts (2)
15-21: getColumnCellType defaults to 'code' for absent cells and doesn't account for 'raw' cells.

If a raw cell type is present in the notebook, the classList.contains('code-cell') check would return false, and this function would classify it as 'markdown'. This may or may not matter depending on whether raw cells appear in practice — just something to be aware of.

104-120: Redundant initial poll before the loop body.

Line 111 reads the count, then Line 113 immediately reads it again at the start of the first iteration. The initial read is never checked against the exit condition. Consider either checking the initial read or removing it.
♻️ Minor simplification
 export async function waitForAllConflictsResolved(
     page: Page,
     timeoutMs = 5000,
     pollMs = 200,
 ): Promise<{ resolved: number; total: number }> {
     const start = Date.now();
-    let last = await getResolvedCount(page);
+    let last = { resolved: 0, total: 0 };
     while (Date.now() - start < timeoutMs) {
         last = await getResolvedCount(page);
         if (last.total > 0 && last.resolved === last.total) {
             return last;
         }
         await new Promise(r => setTimeout(r, pollMs));
     }
     return last;
 }
src/tests/takeAllButtons.test.ts (1)

93-93: executeCommand result is not awaited — errors from the command are silently lost.

If merge-nb.findConflicts throws, the error won't propagate. The test will wait for the server timeout and fail with a less informative error. This matches the existing pattern in vscodeIntegration.test.ts, so it's consistent, but worth noting.
src/tests/runIntegrationTest.ts (1)
40-42: Git config values have extraneous double-quote wrapping.

'"test@mergenb.test"' passes the literal string "test@mergenb.test" (with double-quote characters) to execSync's shell. This stores the email/name with embedded quotes in git config. Benign for tests but unexpected.
♻️ Proposed fix — remove inner quotes
-    git(tmpDir, 'config', 'user.email', '"test@mergenb.test"');
-    git(tmpDir, 'config', 'user.name', '"MergeNB Test"');
+    git(tmpDir, 'config', 'user.email', 'test@mergenb.test');
+    git(tmpDir, 'config', 'user.name', 'MergeNB Test');
Same applies to commit messages at Lines 47, 55, 62:
-    git(tmpDir, 'commit', '-m', '"base"');
+    git(tmpDir, 'commit', '-m', 'base');

src/resolver.ts

coderabbitai · 2026-02-06T05:59:23Z

src/tests/takeAllButtons.test.ts

+            if (expected.source !== actual.source) {
+                sourceMismatches++;
+                console.log(`Source mismatch at cell ${i}:`);
+                console.log(`  Expected: "${expected.source.substring(0, 80).replace(/\\n/g, '\\\\n')}..."`);
+                console.log(`  Actual:   "${actual.source.substring(0, 80).replace(/\\n/g, '\\\\n')}..."`);


⚠️ Potential issue | 🟡 Minor

Regex escapes literal \n two-char sequences instead of actual newline characters.

/\\n/g matches a literal backslash followed by n in the string, not newline characters (\n). The logged mismatch output will still contain raw newlines making it hard to read. Compare with vscodeIntegration.test.ts Line 390 which correctly uses /\n/g.

🐛 Proposed fix

- console.log(` Expected: "${expected.source.substring(0, 80).replace(/\\n/g, '\\\\n')}..."`); - console.log(` Actual: "${actual.source.substring(0, 80).replace(/\\n/g, '\\\\n')}..."`); + console.log(` Expected: "${expected.source.substring(0, 80).replace(/\n/g, '\\n')}..."`); + console.log(` Actual: "${actual.source.substring(0, 80).replace(/\n/g, '\\n')}..."`);

🤖 Prompt for AI Agents

In `@src/tests/takeAllButtons.test.ts` around lines 200 - 204, The log replacement is using the regex /\\n/g which matches a literal backslash+n instead of real newline characters; update the two replace calls where expected.source and actual.source are logged (in the failing branch that increments sourceMismatches) to use /\n/g so actual newline characters are escaped in the output (keep the replacement string as '\\n' so the log shows the two-character sequence). This change should be made in the test code in takeAllButtons.test.ts wherever expected.source.replace(/\\n/g, '\\\\n') and actual.source.replace(/\\n/g, '\\\\n') are used.

- Removed 'both' from possible cell resolutions

- Tests prior resolutions hold

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/types.ts (1)
103-111: ⚠️ Potential issue | 🟡 Minor

Stale JSDoc: both is still documented but no longer in the union.

Line 108 describes both: Include both current and incoming content, but 'both' was removed from ResolutionChoice. Update the doc block to match the current union.
Proposed fix
 /**
  * Resolution choices for conflict resolution.
  * - base: Use the base version (pre-merge common ancestor)
  * - current: Use the current branch version
  * - incoming: Use the incoming branch version  
- * - both: Include both current and incoming content
  * - delete: Remove the cell entirely
  */
 export type ResolutionChoice = 'base' | 'current' | 'incoming' | 'delete';

🤖 Fix all issues with AI agents

In `@src/tests/testHarness.ts`:
- Around line 69-70: The temp-directory used to build portFilePath is
inconsistent with readTestConfig: replace the manual TMPDIR/TMP fallback used
when computing tmpDir/portFilePath with os.tmpdir() so both use the same
platform API; update the code that sets tmpDir (and thus portFilePath) to call
os.tmpdir() directly (keeping options.portFileName or default
'mergenb-server-port') so portFilePath and readTestConfig reference the same
directory.

In `@src/tests/vscodeIntegration.test.ts`:
- Around line 95-97: The parseInt result for testId can be NaN if data-testid is
missing/malformed; update the logic around row.getAttribute('data-testid'),
testId and rowIndex so that after computing rowIndex = parseInt(..., 10) you
check Number.isNaN(rowIndex) and if so set rowIndex = conflictIdx (or another
safe fallback) before using modulo checks (% 7, % 2); ensure the guard handles
empty strings and non-numeric suffixes so downstream branch logic (odd: prefer
current) does not silently get selected for every row.

🧹 Nitpick comments (11)

src/types.ts (1)
179-182: SemanticConflictResolution.choice duplicates the ResolutionChoice union inline.

This should use the ResolutionChoice type alias for consistency, especially since the rest of this PR centralizes that type.
Proposed fix
 export interface SemanticConflictResolution {
     conflict: SemanticConflict;
-    choice: 'base' | 'current' | 'incoming' | 'delete';
+    choice: ResolutionChoice;
     /** The resolved content from the editable text area (source of truth) */
     resolvedContent?: string;
 }
src/web/webTypes.ts (2)
184-188: ResolutionChoice | 'base' is redundant — 'base' is already in ResolutionChoice.

ResolutionChoice is 'base' | 'current' | 'incoming' | 'delete', so ResolutionChoice | 'base' simplifies to just ResolutionChoice. This looks like a leftover from before the type centralization.
Proposed fix
 export interface WebResolutionData {
     type: 'semantic';
     resolutions: Array<{
         index: number;
-        choice: ResolutionChoice | 'base';
+        choice: ResolutionChoice;
         customContent?: string;
     }>;
162-175: BrowserToExtensionMessage uses choice: string instead of ResolutionChoice.

Line 168 types the resolution choice as a raw string, bypassing the type safety you've just established everywhere else. Consider narrowing this to ResolutionChoice for consistency and compile-time safety.
Proposed fix
 export type BrowserToExtensionMessage =
     | { 
         command: 'resolve'; 
         type: 'semantic';
         resolutions: Array<{
             index: number;
-            choice: string;
+            choice: ResolutionChoice;
             customContent?: string;
         }>;
src/web/client/types.ts (1)
65-71: Verbose import('../../types').ResolutionChoice when the type is already re-exported in this file.

ResolutionChoice is re-exported on line 19, so you can reference it directly in these interfaces instead of using the inline import(...) syntax. Same applies to NotebookCell usages on lines 29-31, 79-83.
Example for ConflictChoice and ResolvedRow
 export interface ConflictChoice {
     index: number;
-    choice: import('../../types').ResolutionChoice;
+    choice: ResolutionChoice;
     resolvedContent: string;
 }
 
 export interface ResolvedRow {
-    baseCell?: import('../../types').NotebookCell;
-    currentCell?: import('../../types').NotebookCell;
-    incomingCell?: import('../../types').NotebookCell;
+    baseCell?: NotebookCell;
+    currentCell?: NotebookCell;
+    incomingCell?: NotebookCell;
     baseCellIndex?: number;
     currentCellIndex?: number;
     incomingCellIndex?: number;
     resolution?: {
-        choice: import('../../types').ResolutionChoice;
+        choice: ResolutionChoice;
         resolvedContent: string;
     };
 }
Also applies to: 77-95
src/web/client/ConflictResolver.tsx (2)

271-301: handleAcceptAll skips already-resolved conflicts — consider a UX hint or override option.

The logic is correct and the "respect prior resolutions" behavior matches the commit intent. However, if a user manually resolves a few conflicts, then clicks "All Incoming", the already-resolved ones silently remain unchanged. This could be confusing since the button label suggests all conflicts are affected.

Consider one of:

A visual indicator (e.g., toast/badge) showing how many conflicts were skipped.

A modifier behavior (e.g., holding Shift to override all, or a separate "Reset & Accept All" option).

This is not blocking — just a UX polish consideration.

464-507: Bulk-accept buttons look good; minor accessibility note.

The buttons properly use CSS variables for theming and include title attributes. For improved accessibility, consider adding aria-label attributes in addition to title, since screen readers may not always announce title content.
src/tests/testHarness.ts (2)
49-56: readTestConfig lacks error context on failure.

If the config file is missing or contains invalid JSON, the raw fs.readFileSync / JSON.parse errors won't indicate what the test harness was trying to do. A small wrapper message would save debugging time.
💡 Suggested improvement
 export function readTestConfig(): TestConfig {
     const configPath = path.join(os.tmpdir(), 'mergenb-test-config.json');
-    return JSON.parse(fs.readFileSync(configPath, 'utf8'));
+    try {
+        return JSON.parse(fs.readFileSync(configPath, 'utf8'));
+    } catch (err) {
+        throw new Error(`Failed to read test config from ${configPath}: ${err}`);
+    }
 }
211-218: JSON.stringify metadata comparison is key-order sensitive.

If the resolver emits metadata with keys in a different insertion order than the reference cell, JSON.stringify will produce different strings even when the contents are semantically equal. This is unlikely to bite in practice (both sides typically come from the same JSON parse), but worth noting for future-proofing.
src/tests/integrationUtils.ts (3)
5-6: ConflictChoice name shadows the interface in src/web/client/types.ts.

src/web/client/types.ts exports a ConflictChoice interface (with index, choice, resolvedContent fields), while this file exports a simple union type MergeSide | 'delete' under the same name. They serve different purposes and live in different modules, but the naming overlap could confuse developers navigating the codebase. Consider a more specific name like ResolutionAction if this causes confusion down the line.

117-132: Redundant initial getResolvedCount call on Line 123.

The last variable is initialized on Line 123 but immediately overwritten on Line 125 in the first loop iteration. The initial call can be removed.
💡 Minor cleanup
     const start = Date.now();
-    let last = await getResolvedCount(page);
+    let last = { resolved: 0, total: 0 };
     while (Date.now() - start < timeoutMs) {
         last = await getResolvedCount(page);
134-238: collectExpectedCellsFromUI is comprehensive but lengthy.

The function correctly handles identical rows, resolved conflicts, deleted conflicts, and unresolved (skipped) rows. The conflictIdx tracking is properly decoupled from the overall row index.

One thing to keep in mind: Line 229 hardcodes hasOutputs: false for all conflict cells, meaning output content is never verified for resolved conflicts. This aligns with a known gap noted in prior learnings.

Consider extracting the identical-row block (Lines 155–174) and the conflict-row block (Lines 176–234) into named helpers to reduce cognitive load, but this is optional.

src/tests/testHarness.ts

coderabbitai · 2026-02-06T18:22:57Z

src/tests/vscodeIntegration.test.ts

            const testId = await row.getAttribute('data-testid') || '';
            const rowIndex = parseInt(testId.replace('conflict-row-', '').replace('row-', ''), 10);
-            
+


⚠️ Potential issue | 🟡 Minor

parseInt on a missing/malformed data-testid silently produces NaN.

If the data-testid attribute is absent or doesn't contain a numeric suffix, parseInt returns NaN. The downstream modulo checks (% 7, % 2) all evaluate to false/NaN, silently funneling every such row into the "odd: prefer current" branch. Consider adding a guard or falling back to conflictIdx when parsing fails.

🛡️ Suggested guard

const testId = await row.getAttribute('data-testid') || ''; - const rowIndex = parseInt(testId.replace('conflict-row-', '').replace('row-', ''), 10); + const parsed = parseInt(testId.replace('conflict-row-', '').replace('row-', ''), 10); + const rowIndex = Number.isNaN(parsed) ? conflictIdx : parsed;

🤖 Prompt for AI Agents

In `@src/tests/vscodeIntegration.test.ts` around lines 95 - 97, The parseInt result for testId can be NaN if data-testid is missing/malformed; update the logic around row.getAttribute('data-testid'), testId and rowIndex so that after computing rowIndex = parseInt(..., 10) you check Number.isNaN(rowIndex) and if so set rowIndex = conflictIdx (or another safe fallback) before using modulo checks (% 7, % 2); ensure the guard handles empty strings and non-numeric suffixes so downstream branch logic (odd: prefer current) does not silently get selected for every row.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/tests/testHarness.ts`:
- Around line 128-134: The current code reads and JSON-parses the notebook even
when waitForFileWrite(conflictFile, fs, options.writeTimeoutMs) returns false,
risking stale data; update the function (the block handling fileWritten in
testHarness.ts) to not proceed on false — either throw a descriptive Error
(e.g., "Timed out waiting for file write for {conflictFile}") or return an
explicit failure value (e.g., null/undefined or a result object with
success:false) so callers can detect and handle the timeout instead of parsing
stale notebookContent; ensure references to conflictFile, fs,
options.writeTimeoutMs and waitForFileWrite are used so the change is applied in
the correct location.
- Around line 82-106: After launching Chromium with chromium.launch (creating
the browser variable), wrap all subsequent post-launch steps (page = await
browser.newPage(); page.goto(sessionUrl); page.waitForSelector('.header-title');
the header check, sleeps, etc.) in a try/catch so that any thrown error will
first await browser.close() before rethrowing; specifically, keep the
successful-return path returning the browser and page as-is, but on any failure
in the sequence (page.goto, page.waitForSelector, title check, etc.) call await
browser.close() in the catch and then throw the original error so the browser
process is not leaked.

🧹 Nitpick comments (4)

src/tests/testHarness.ts (4)
137-154: Outputs content is not captured for comparison.

buildExpectedCellsFromNotebook records hasOutputs (boolean) but does not capture the actual outputs content. This means assertNotebookMatches can only verify whether outputs exist, not whether they are correct. This aligns with a known verification gap in the test suite. Consider capturing and comparing output content if output correctness matters for "Take All" scenarios.

Based on learnings: "The integration test has verification gaps: it never directly checks the outputs array content (only execution_count), uses getCellSource() normalization which masks source format inconsistencies (string vs array)."

211-218: JSON.stringify metadata comparison is property-order–dependent.

If the expected and actual metadata objects have the same keys/values but in different property order (common after serialization round-trips), this comparison produces a false mismatch. Use a deep-equal utility or sort keys before comparing.
💡 Quick fix — sort keys for stable comparison
         if (options.compareMetadata) {
             const expectedMetadata = expected.metadata || {};
             const actualMetadata = actual.metadata || {};
-            if (JSON.stringify(expectedMetadata) !== JSON.stringify(actualMetadata)) {
+            const stableStringify = (obj: Record<string, unknown>) =>
+                JSON.stringify(obj, Object.keys(obj).sort());
+            if (stableStringify(expectedMetadata) !== stableStringify(actualMetadata)) {
                 metadataMismatches++;
                 console.log(`Metadata mismatch at cell ${i}`);
             }
         }
Note: This only sorts top-level keys. For deeply nested metadata, consider a recursive key-sort or a library like deep-equal.
16-24: Consider adding a teardown / cleanup helper for ConflictSession.

Callers must remember to close the browser (and potentially the page) in a finally block. A small closeSession(session: ConflictSession) helper would centralize cleanup and prevent browser process leaks across multiple test files.
💡 Example helper
export async function closeSession(session: ConflictSession): Promise<void> {
    try {
        await session.page.close();
    } catch { /* ignore */ }
    try {
        await session.browser.close();
    } catch { /* ignore */ }
}
58-67: Hardcoded conflict.ipynb filename limits flexibility.

Line 63 assumes every test workspace has a file named conflict.ipynb. If future test cases use different filenames, this will need changing. Consider accepting the filename via SetupOptions or TestConfig.params.

src/tests/testHarness.ts

coderabbitai · 2026-02-06T18:36:27Z

src/tests/testHarness.ts

+    const fileWritten = await waitForFileWrite(conflictFile, fs, options.writeTimeoutMs);
+    if (!fileWritten) {
+        console.log('Warning: Could not confirm file write, proceeding anyway');
+    }
+
+    const notebookContent = fs.readFileSync(conflictFile, 'utf8');
+    return JSON.parse(notebookContent);


⚠️ Potential issue | 🟡 Minor

Proceeding after unconfirmed file write can silently produce wrong test results.

When waitForFileWrite returns false (Line 129), the function logs a warning but still reads the file. This reads stale (pre-resolution) notebook content, causing downstream assertions to pass or fail for the wrong reasons. Consider throwing instead, or at minimum returning a flag so callers can decide.

💡 Suggested alternative — fail explicitly

const fileWritten = await waitForFileWrite(conflictFile, fs, options.writeTimeoutMs); if (!fileWritten) { - console.log('Warning: Could not confirm file write, proceeding anyway'); + throw new Error(`File write not confirmed for ${conflictFile} within timeout`); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const fileWritten = await waitForFileWrite(conflictFile, fs, options.writeTimeoutMs);

if (!fileWritten) {

console.log('Warning: Could not confirm file write, proceeding anyway');

}

const notebookContent = fs.readFileSync(conflictFile, 'utf8');

return JSON.parse(notebookContent);

const fileWritten = await waitForFileWrite(conflictFile, fs, options.writeTimeoutMs);

if (!fileWritten) {

throw new Error(`File write not confirmed for ${conflictFile} within timeout`);

}

const notebookContent = fs.readFileSync(conflictFile, 'utf8');

return JSON.parse(notebookContent);

🤖 Prompt for AI Agents

In `@src/tests/testHarness.ts` around lines 128 - 134, The current code reads and JSON-parses the notebook even when waitForFileWrite(conflictFile, fs, options.writeTimeoutMs) returns false, risking stale data; update the function (the block handling fileWritten in testHarness.ts) to not proceed on false — either throw a descriptive Error (e.g., "Timed out waiting for file write for {conflictFile}") or return an explicit failure value (e.g., null/undefined or a result object with success:false) so callers can detect and handle the timeout instead of parsing stale notebookContent; ensure references to conflictFile, fs, options.writeTimeoutMs and waitForFileWrite are used so the change is applied in the correct location.

Avni2000 · 2026-02-06T18:44:21Z

@coderabbitai the CI keeps failing whilst all my tests work locally. Could you check that out for yourself?

coderabbitai · 2026-02-06T18:49:15Z

🧩 Analysis chain

@Avni2000, I'll investigate the CI failures for you. Let me check the GitHub Actions runs and logs for this PR.

🏁 Script executed:

#!/bin/bash

# Check the status of CI runs for this PR
echo "=== CI Run Status for PR `#44` ==="
gh pr checks 44 --repo Avni2000/MergeNB

echo -e "\n=== Recent workflow runs ==="
gh run list --repo Avni2000/MergeNB --branch generic-tester --limit 5

echo -e "\n=== Latest run details ==="
LATEST_RUN=$(gh run list --repo Avni2000/MergeNB --branch generic-tester --limit 1 --json databaseId --jq '.[0].databaseId')
if [ -n "$LATEST_RUN" ]; then
    echo "Latest run ID: $LATEST_RUN"
    gh run view $LATEST_RUN --repo Avni2000/MergeNB --log-failed
fi

Length of output: 1121

🏁 Script executed:

#!/bin/bash

# Check workflow configuration files
echo "=== GitHub Actions Workflow Files ==="
fd -e yml -e yaml . .github/workflows --exec cat {}

echo -e "\n=== Package.json scripts ==="
cat package.json | jq '.scripts'

Length of output: 2216

Let me check the logs from the failed CI runs to identify the specific error:

🏁 Script executed:

#!/bin/bash

echo "=== Checking failed run 21761566674 ==="
gh run view 21761566674 --repo Avni2000/MergeNB --log-failed

echo -e "\n\n=== Checking failed run 21761092673 ==="
gh run view 21761092673 --repo Avni2000/MergeNB --log-failed

Length of output: 178946

🏁 Script executed:

#!/bin/bash

# Let's also check the successful run to compare
echo "=== Checking successful run 21760729202 ==="
gh run view 21760729202 --repo Avni2000/MergeNB --log

Length of output: 267607

@Avni2000 Thanks for the details — I pulled the CI logs for PR #44. The failures are all the same symptom:

Web UI server starts, prints “Opening browser to: http://localhost:/?session=…”
Then: Browser connection timeout — no WebSocket connection within 30 seconds
Then the harness errors: No session was created within timeout

In the last green run (today ~18:05 UTC), the sequence included “Session created …” and “WebSocket connected …” quickly, and 02_perCellResolution + All Base + All Current passed. So the code path works; the flake is the Playwright-to-server handshake racing against a 30s window on CI.

Concrete stabilization plan:

Make the WS handshake timeout configurable and longer in CI:
- In the extension (WebConflictPanel/server), read MERGENB_WS_TIMEOUT_MS (default 30000). On timeout, log a warning but keep listening rather than failing.
Increase the test harness waitForSession window:
- Read MERGENB_SESSION_WAIT_MS (default 60000) instead of 15000.
Ensure Playwright launches in a CI-friendly way:
- Force headless Chromium and add args: --no-sandbox, --disable-dev-shm-usage, --disable-gpu.

If you’re good with that, I’ll prep a tiny patch touching just the timeouts and Playwright launch, plus set the env in the workflow.

Extension: make browser WS-connect timeout configurable and non-fatal
Tests: bump waitForSession default and read from env; force Playwright headless with CI-safe flags
Workflow: set env overrides for CI

Would you like me to proceed with this patch now? I can also split it: just the workflow/env + test harness change first (no behavior change in extension), if you prefer.

If you’d rather not change behavior, an alternative is simply increasing the harness wait and setting the CI env vars — that should already reduce flakes.

🧠 Learnings used

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:45:32.680Z
Learning: Applies to src/**/resolver.ts : Use the `merge-nb.findConflicts` command as the single unified entry point to find notebooks with merge conflicts (both textual and semantic) and bring up the conflict resolution panel

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:39:45.034Z
Learning: Applies to src/conflictDetector.ts : Implement `analyzeNotebookConflicts` function for detection and `resolveAllConflicts` function for resolution

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:39:45.034Z
Learning: Applies to src/test/**/*.ipynb : Include three-way semantic conflict test cases with base, current, and incoming notebook versions for testing the merge algorithm

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:45:32.680Z
Learning: Applies to src/webview/**/*.{ts,tsx} : Provide a unified conflict resolution UI in `ConflictResolverPanel` webview supporting both textual and semantic conflict resolution with 3-way diff view

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:39:45.034Z
Learning: Applies to src/conflictDetector.ts : Implement cell-level diffing—show side-by-side or inline diffs for conflicting cells, not raw JSON lines

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:39:45.034Z
Learning: Use VSCode Extension API for implementing the notebook merge conflict resolver extension

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:39:45.034Z
Learning: Applies to src/web/**/*.ts : Implement web-based conflict resolution UI using React and communicate via HTTP/WebSocket server

💡 Use @coderabbitai plan to create a plan for code changes

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Avni2000 · 2026-02-06T18:51:25Z

@coderabbitai branch off this pr and make a PR to main (if possible) for that task, I will happily review it, I didn't know those were options. The current implementation and last fix did seem to solve it though.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/web/WebConflictPanel.ts`:
- Around line 85-98: The .then() callback after server.openSession can run after
the panel has been disposed, causing _sendConflictData() to send to a closed
session; add an _isDisposed boolean on WebConflictPanel, set it true in
dispose(), and at the top of the .then() callback check if (this._isDisposed)
return; before calling this._sendConflictData() and logging; ensure dispose()
clears any session state as currently done for _sessionId/_conflict to fully
guard against post-dispose execution.

🧹 Nitpick comments (3)

src/tests/testHarness.ts (3)
204-209: Trailing ... is always appended, even for short sources.

substring(0, 80) + "..." prints "hello..." for a 5-char source. This is a cosmetic nit but can be confusing when debugging short cells.
Optional fix
-            console.log(`  Expected: "${expected.source.substring(0, 80).replace(/\n/g, '\\n')}..."`);
-            console.log(`  Actual:   "${actualSource.substring(0, 80).replace(/\n/g, '\\n')}..."`);
+            const trunc = (s: string) => s.length > 80 ? s.substring(0, 80).replace(/\n/g, '\\n') + '...' : s.replace(/\n/g, '\\n');
+            console.log(`  Expected: "${trunc(expected.source)}"`);
+            console.log(`  Actual:   "${trunc(actualSource)}"`);
216-223: JSON.stringify metadata comparison is key-order dependent.

If the expected and actual metadata objects have identical keys/values but in a different insertion order, this comparison produces a false mismatch. This is unlikely with notebook files parsed from JSON (V8 preserves insertion order), but a deep-equal utility would be more robust.

58-111: Consider providing a teardown / cleanup helper alongside setupConflictResolver.

The harness creates a Playwright browser in setupConflictResolver but doesn't export a corresponding cleanup function. Every caller must remember to call browser.close() (and potentially clean up the VS Code documents). A thin helper would centralize teardown and prevent resource leaks in test code:
export async function teardownConflictSession(session: ConflictSession): Promise<void> {
    await session.browser.close();
}

coderabbitai · 2026-02-06T18:52:17Z

src/web/WebConflictPanel.ts

+        // Do not await the WebSocket connection here to avoid deadlocking tests
+        // that need to open the session after the command returns.
+        void server.openSession(
+            this._sessionId,
+            '', // No HTML content needed - server generates shell
+            (message: unknown) => this._handleMessage(message)
+        ).then(() => {
            // Send conflict data to browser once connected
            this._sendConflictData();
-
            logger.info(`[WebConflictPanel] Opened conflict resolver in browser, session: ${this._sessionId}`);
-        } catch (error) {
+        }).catch((error) => {
            logger.error('[WebConflictPanel] Failed to open browser session:', error);
            vscode.window.showErrorMessage(`Failed to open conflict resolver in browser: ${error}`);
-        }
+        });


⚠️ Potential issue | 🟡 Minor

Guard against post-dispose execution in the .then() callback.

Since openSession is now fire-and-forget, there's a window where dispose() can run before .then() fires (e.g., rapid cancel or re-creation via createOrShow). When that happens, _sendConflictData() still executes because it only checks _sessionId and _conflict—neither of which is cleared in dispose(). This sends data to an already-closed session.

Add an _isDisposed guard at the top of the .then() callback:

Proposed fix

void server.openSession( this._sessionId, '', // No HTML content needed - server generates shell (message: unknown) => this._handleMessage(message) ).then(() => { + if (this._isDisposed) return; // Send conflict data to browser once connected this._sendConflictData(); logger.info(`[WebConflictPanel] Opened conflict resolver in browser, session: ${this._sessionId}`); }).catch((error) => { + if (this._isDisposed) return; logger.error('[WebConflictPanel] Failed to open browser session:', error); vscode.window.showErrorMessage(`Failed to open conflict resolver in browser: ${error}`); });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Do not await the WebSocket connection here to avoid deadlocking tests

// that need to open the session after the command returns.

void server.openSession(

this._sessionId,

'', // No HTML content needed - server generates shell

(message: unknown) => this._handleMessage(message)

).then(() => {

// Send conflict data to browser once connected

this._sendConflictData();

logger.info(`[WebConflictPanel] Opened conflict resolver in browser, session: ${this._sessionId}`);

} catch (error) {

}).catch((error) => {

logger.error('[WebConflictPanel] Failed to open browser session:', error);

vscode.window.showErrorMessage(`Failed to open conflict resolver in browser: ${error}`);

}

});

// Do not await the WebSocket connection here to avoid deadlocking tests

// that need to open the session after the command returns.

void server.openSession(

this._sessionId,

'', // No HTML content needed - server generates shell

(message: unknown) => this._handleMessage(message)

).then(() => {

if (this._isDisposed) return;

// Send conflict data to browser once connected

this._sendConflictData();

logger.info(`[WebConflictPanel] Opened conflict resolver in browser, session: ${this._sessionId}`);

}).catch((error) => {

if (this._isDisposed) return;

logger.error('[WebConflictPanel] Failed to open browser session:', error);

vscode.window.showErrorMessage(`Failed to open conflict resolver in browser: ${error}`);

});

🤖 Prompt for AI Agents

In `@src/web/WebConflictPanel.ts` around lines 85 - 98, The .then() callback after server.openSession can run after the panel has been disposed, causing _sendConflictData() to send to a closed session; add an _isDisposed boolean on WebConflictPanel, set it true in dispose(), and at the top of the .then() callback check if (this._isDisposed) return; before calling this._sendConflictData() and logging; ensure dispose() clears any session state as currently done for _sessionId/_conflict to fully guard against post-dispose execution.

coderabbitai · 2026-02-06T18:53:09Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@Avni2000, I appreciate your willingness to review! I cannot directly create branches or PRs, but I can plan the code changes using our planning system, which will allow another agent to execute the changes. Let me plan all the CI stability improvements right now.

CI Stability Improvements Plan

This plan addresses the WebSocket connection timeout flakes in CI by:

Making WS handshake timeout configurable in the extension
Increasing test harness wait times and adding CI-friendly Playwright flags
Setting appropriate environment variables in the CI workflow

This plan will create all the necessary changes to stabilize the CI tests. Once executed, it should eliminate the WebSocket connection timeout flakes you're seeing.

🏁 Script executed:

#!/bin/bash
# Find files related to WebSocket connection and timeout handling
echo "=== WebSocket and server-related files ==="
fd -e ts -e js . src | xargs rg -l "WebSocket|websocket|ws\." | head -20

echo -e "\n=== Files with timeout handling ==="
rg -l "timeout.*30000|30000.*timeout|setTimeout.*30.*1000" src --type ts

echo -e "\n=== Server/panel initialization files ==="
fd -e ts "panel|server" src

Length of output: 494

🏁 Script executed:

#!/bin/bash
# Find where Playwright browser is launched
echo "=== Playwright launch configuration ==="
rg -A 10 "chromium.launch|firefox.launch|webkit.launch" src/tests --type ts

Length of output: 823

🏁 Script executed:

#!/bin/bash
# Read the current workflow file
cat .github/workflows/vscode-integration-tests.yml

Length of output: 78

🏁 Script executed:

#!/bin/bash
# Find the actual workflow file
echo "=== GitHub Actions workflow files ==="
fd -e yml -e yaml . .github/workflows

echo -e "\n=== Check webServer.ts for timeout implementation ==="
cat src/web/webServer.ts | head -100

echo -e "\n=== Check for session/WebSocket timeout in webServer.ts ==="
rg -A 5 -B 5 "30000|timeout" src/web/webServer.ts

Length of output: 5578

🏁 Script executed:

#!/bin/bash
# Read the actual workflow file
echo "=== Current workflow configuration ==="
cat .github/workflows/vscode-integration-test.yml

echo -e "\n=== Full context around the timeout in webServer.ts ==="
rg -A 15 -B 5 "Create a pending connection promise" src/web/webServer.ts

Length of output: 2279

Perfect! I now have all the information needed. Here's the complete plan with specific code changes:

CI Stability Improvements Plan

This plan addresses the WebSocket connection timeout flakes in CI by making timeouts configurable and adding CI-friendly Playwright settings.

This plan should resolve the CI flakiness by:

Doubling timeout windows (30s → 60s for WS, 15s → 60s for session wait)
Adding CI-friendly browser flags (no sandbox, no GPU, reduced shared memory)
Making timeouts configurable so they can be adjusted without code changes

The changes are minimal, focused, and backwards-compatible (defaults work locally, extended timeouts only activate in CI).

🧠 Learnings used

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:45:32.680Z
Learning: Applies to src/webview/**/*.{ts,tsx} : Provide a unified conflict resolution UI in `ConflictResolverPanel` webview supporting both textual and semantic conflict resolution with 3-way diff view

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:39:45.034Z
Learning: Applies to src/conflictDetector.ts : Implement `analyzeNotebookConflicts` function for detection and `resolveAllConflicts` function for resolution

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:45:32.680Z
Learning: Applies to src/**/resolver.ts : Use the `merge-nb.findConflicts` command as the single unified entry point to find notebooks with merge conflicts (both textual and semantic) and bring up the conflict resolution panel

💡 Use @coderabbitai plan to create a plan for code changes

coderabbitai · 2026-02-06T18:53:34Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Check the current CI status for PR `#44`
echo "=== Current CI Status for PR `#44` ==="
gh pr checks 44 --repo Avni2000/MergeNB

echo -e "\n=== Latest CI runs on generic-tester branch ==="
gh run list --repo Avni2000/MergeNB --branch generic-tester --limit 3 --json databaseId,status,conclusion,createdAt,displayTitle

echo -e "\n=== Most recent run details ==="
LATEST_RUN=$(gh run list --repo Avni2000/MergeNB --branch generic-tester --limit 1 --json databaseId --jq '.[0].databaseId')
if [ -n "$LATEST_RUN" ]; then
    gh run view $LATEST_RUN --repo Avni2000/MergeNB
fi

echo -e "\n=== Recent commits on generic-tester ==="
gh api /repos/Avni2000/MergeNB/commits?sha=generic-tester --jq '.[:3] | .[] | "\(.commit.author.date) - \(.commit.message | split("\n")[0]) by \(.commit.author.name)"'

Length of output: 1326

@Avni2000, Great news! 🎉 The CI is now passing — your recent fixes worked:

✓ Latest run: 21761915415 succeeded (completed ~8 minutes ago)

Your fixes:

[FIX] Test CI
[FIX] closes browser on failure
[FIX] Always use os.tmpdir()

clearly resolved the WebSocket handshake timeout issues!

Regarding creating a separate PR: I can't directly commit code or create branches (per my constraints), but I can help in two ways:

Option 1: Plan optional robustness improvements
Even though CI is passing, the timeout/Playwright hardening I mentioned could still add defense-in-depth for future CI flakiness. If you'd like, I can use `` to prepare:

Configurable WS timeout (env-driven)
Longer test harness waits
CI-optimized Playwright launch args

You'd then review the plan and another agent would implement it.

Option 2: Move forward as-is
Since your fixes work and CI is green, you could merge this PR and only revisit those improvements if flakiness returns.

Which would you prefer? If you want Option 1, just say "plan the improvements" and I'll prepare the requirements. Otherwise, we're good to go! 🚀

🐰✨
(Celebrating your successful fix!)

🧠 Learnings used

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:45:32.680Z
Learning: Applies to src/webview/**/*.{ts,tsx} : Provide a unified conflict resolution UI in `ConflictResolverPanel` webview supporting both textual and semantic conflict resolution with 3-way diff view

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:45:32.680Z
Learning: Applies to src/**/resolver.ts : Use the `merge-nb.findConflicts` command as the single unified entry point to find notebooks with merge conflicts (both textual and semantic) and bring up the conflict resolution panel

Learnt from: CR
Repo: Avni2000/MergeNB PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-03T02:39:45.034Z
Learning: Applies to src/conflictDetector.ts : Implement `analyzeNotebookConflicts` function for detection and `resolveAllConflicts` function for resolution

💡 Use @coderabbitai plan to create a plan for code changes

Avni2000 added 2 commits February 5, 2026 20:06

[ENHANCEMENT] Adds take all current/incoming/base

bf2edcd

[ENHANCEMENT] More generic testing harness

2e9c509

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

[FIX] Free socket

04554ec

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

Avni2000 changed the title ~~[ENHANCEMENT] More generic testing harness~~ [FEAT] Generic Testing Harness + "Take All" tests Feb 6, 2026

Avni2000 added 3 commits February 6, 2026 12:01

[Refactor] types

8aca4b4

- Removed 'both' from possible cell resolutions

[FIX] Respect prior resolutions

ae39870

- Tests prior resolutions hold

[REFACTOR] Duplicated code

52b3a2d

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

[FIX] Always use os.tmpdir()

da90c2d

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

[FIX] closes browser on failure

5f3de7a

[FIX] Test CI

890dbe8

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

Avni2000 merged commit 6061e75 into main Feb 6, 2026
2 checks passed

		const testId = await row.getAttribute('data-testid') \|\| '';
		const rowIndex = parseInt(testId.replace('conflict-row-', '').replace('row-', ''), 10);

[FEAT] Generic Testing Harness + "Take All" tests #44

[FEAT] Generic Testing Harness + "Take All" tests #44

Conversation

Avni2000 commented Feb 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Avni2000 commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026

Issues Found in resolver.ts

1. Type Inconsistency (Line 274)

2. Unclear Fallback Behavior (Lines 323-327)

3. Missing Documentation

4. Suggested Documentation Improvements

Recommendations

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Avni2000 commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026

Uh oh!

Avni2000 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Feb 6, 2026

CI Stability Improvements Plan

CI Stability Improvements Plan

Uh oh!

coderabbitai bot commented Feb 6, 2026

Uh oh!

Avni2000 commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

Avni2000 commented Feb 6, 2026 •

edited

Loading