Skip to content

Conversation

@luccas-harbour
Copy link
Contributor

@luccas-harbour luccas-harbour commented Dec 30, 2025

What’s in this PR

  • Adds a diffing extension and a computeDiff entry point that returns document-level diffs plus comment diffs.
  • Implements a generic diff pipeline based on Myers diff with hooks to map operations into “added / deleted / modified” payloads.
  • Adds paragraph-specific diffing that flattens inline content, detects attribute changes, and emits inline text/node diffs.
  • Adds comment diffing that normalizes comment identity, parses comment text, and diffs comment bodies and metadata.
  • Provides utilities for attribute/mark diffs, similarity scoring, and insertion-position calculations.
  • Includes unit tests covering sequence diffing, paragraph/inline behavior, comment diffing, and computeDiff integration.

How the diffing computation works

  • The diff pipeline first normalizes the old and new ProseMirror documents into ordered node lists.
  • It runs Myers diff on those sequences to identify equal, insert, and delete operations.
  • A reordering step pairs delete/insert operations so they can be treated as modifications where appropriate (e.g., similar paragraphs).
  • For paragraph nodes, the module tokenizes inline content into characters and inline nodes, diffs those tokens, and groups contiguous changes into readable inline diffs (including attributes and marks).
  • For non‑paragraph nodes, it compares attributes to detect modifications and emits additions/deletions with JSON payloads.
  • Comment diffing follows the same sequence-based approach, matching by stable ids and diffing both metadata and comment body content.

Notes / rationale

  • Diffs are designed to be applied in reverse order using the pos anchor to reconstruct the updated document.
  • Paragraph matching uses paraId when available, and a similarity heuristic as a fallback to reduce noisy add/delete pairs.
  • Attribute and mark diffs are normalized to structured “added/deleted/modified” payloads for easy consumption.

Testing

  • Adds focused unit tests for sequence diffing, attribute/mark changes, inline grouping, paragraph behaviors, and comment diffs.
  • Includes integration tests that diff real document fixtures and verify expected operations.

Luccas Correa added 30 commits December 30, 2025 15:17
This function can then be reused when diffing paragraphs and runs. It
helps identifying modifications instead of delete/insert pairs
Always maps starting/ending positions to the old document instead of the
new one.
@linear
Copy link

linear bot commented Dec 30, 2025

@luccas-harbour luccas-harbour marked this pull request as ready for review December 30, 2025 18:43
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +136 to +140
if (isParagraphNodeInfo(oldNodeInfo) && isParagraphNodeInfo(newNodeInfo)) {
return shouldProcessEqualParagraphsAsModification(oldNodeInfo, newNodeInfo);
}
return JSON.stringify(oldNodeInfo.node.attrs) !== JSON.stringify(newNodeInfo.node.attrs);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Non-paragraph content changes never emitted

For non-paragraph nodes the comparator falls through to true whenever the type matches, and this block only flags an "equal" pair as modified when their attrs JSON differs. Any content edits inside a same-typed node that keeps its attributes (e.g., changing the text of a heading) will therefore be treated as unchanged and no diff is emitted, so callers will miss real document changes. Consider comparing child content (or the node’s JSON) instead of only attrs for non-paragraph nodes.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not the case because all of the descendants are iterated, not only container nodes. If a child of a node was modified, the diffing algorithm will pick it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants