Skip to content

Conversation

@sid597
Copy link
Collaborator

@sid597 sid597 commented Nov 24, 2025

duplicate node alert on page

add to node create dialog, only show small list, fix lint errors, fix bug to show in all open pages

Summary by CodeRabbit

  • New Features

    • Added vector similarity matching to identify and surface similar pages based on content
    • Integrated collapsible panel displaying matched pages with similarity scores
    • Users can navigate directly to matched pages from results
    • Feature respects Suggestive Mode setting
  • Refactor

    • Consolidated reference handling and observer logic for improved code organization

✏️ Tip: You can customize this high-level summary in your review settings.

https://www.loom.com/share/cb366c09942b4030a9933118ad281bc5

@linear
Copy link

linear bot commented Nov 24, 2025

@supabase
Copy link

supabase bot commented Nov 24, 2025

This pull request has been ignored for the connected project zytfjzqyijgagqxrzbmz because there are no changes detected in packages/database/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

Copy link
Collaborator Author

sid597 commented Nov 24, 2025

@sid597 sid597 changed the title duplicate node alert on page ENG-1074 Prod duplicate node alert on page Nov 24, 2025
@sid597 sid597 marked this pull request as ready for review November 24, 2025 10:18
@sid597
Copy link
Collaborator Author

sid597 commented Nov 24, 2025

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

📝 Walkthrough

Walkthrough

Adds vector similarity matching for Roam pages through a new VectorDuplicateMatches component that fetches and displays semantically similar nodes. Introduces vector search utility findSimilarNodesVectorOnly, integrates it into page title observers, and refactors rendering functions to consolidate Discourse-related UI logic with explicit uid parameters.

Changes

Cohort / File(s) Summary
Vector-based duplicate matching component
apps/roam/src/components/VectorDuplicateMatches.tsx
New React component that displays collapsible panel of similar pages using vector embeddings; fetches matches via findSimilarNodesVectorOnly with debouncing, manages loading/results state, and renders clickable match links with similarity scores.
Vector search utility
apps/roam/src/utils/hyde.ts
Added findSimilarNodesVectorOnly function that creates embeddings and calls Supabase RPC to find vector-similar nodes; updated SearchFunc type parameter from CandidateNodeWithEmbedding[] to Result[]; added VectorMatch type export.
Page title observer integration
apps/roam/src/utils/initializeObserversAndListeners.ts
Modified pageTitleObserver to detect Discourse nodes and invoke renderPossibleDuplicates when Suggestive Mode is enabled; replaced linkedReferencesObserver with consolidated renderDiscourseContextAndCanvasReferences call.
Content extraction utility
apps/roam/src/utils/extractContentFromTitle.ts
New module exporting extractContentFromTitle function that parses page titles using node format patterns and returns extracted content or original title if no match.
Discourse context rendering refactor
apps/roam/src/utils/renderLinkedReferenceAdditions.ts
Renamed renderLinkedReferenceAdditions to renderDiscourseContextAndCanvasReferences; changed signature to accept explicit uid parameter and return void (removed async); removed conditional gating logic, now renders components unconditionally.

Sequence Diagram

sequenceDiagram
    participant User
    participant VectorDuplicateMatches as VectorDuplicateMatches<br/>Component
    participant findSimilarNodesVectorOnly as findSimilarNodesVectorOnly<br/>Function
    participant Embedding as Embedding<br/>Service
    participant Supabase as Supabase RPC
    participant UI as Roam UI

    User->>UI: Open/Navigate to Page
    UI->>VectorDuplicateMatches: Mount Component
    VectorDuplicateMatches->>VectorDuplicateMatches: Debounce (500ms)
    VectorDuplicateMatches->>findSimilarNodesVectorOnly: Call with text/context
    findSimilarNodesVectorOnly->>Embedding: Create embedding from text
    Embedding-->>findSimilarNodesVectorOnly: Return vector
    findSimilarNodesVectorOnly->>Supabase: RPC call for similar nodes
    Supabase-->>findSimilarNodesVectorOnly: Return VectorMatch[] results
    findSimilarNodesVectorOnly-->>VectorDuplicateMatches: Results with scores
    VectorDuplicateMatches->>VectorDuplicateMatches: Update state & render matches
    VectorDuplicateMatches-->>UI: Display matches panel
    User->>UI: Click match result
    UI->>User: Open node in sidebar
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • VectorDuplicateMatches component: React state management, debouncing logic, async data fetching, and DOM injection patterns require careful review
  • findSimilarNodesVectorOnly: Embedding creation, Supabase RPC integration, and error handling with toast notifications need verification
  • Integration in initializeObserversAndListeners: New callback logic branches and interaction with renderPossibleDuplicates timing
  • renderLinkedReferenceAdditions refactoring: Signature change from async to sync and removal of internal uid resolution; verify all call sites are updated correctly
  • extractContentFromTitle: Placeholder parsing and regex matching edge cases

Possibly related PRs

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'ENG-1074 Prod duplicate node alert on page' directly aligns with the main objective of adding a duplicate node alert functionality on pages, as described in the PR summary and implemented across multiple files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
apps/roam/src/utils/initializeObserversAndListeners.ts (1)

55-57: Discourse-node duplicate rendering integration looks correct; consider guarding against empty UIDs.

The flow of resolving the page UID via getPageUidByPageTitle and gating renderPossibleDuplicates behind isDiscourseNode(uid) is sound and matches how discourse-node UIDs are handled elsewhere. As a small polish, you could avoid calling isDiscourseNode when Roam returns an empty UID:

-      const uid = getPageUidByPageTitle(title);
-      if (isDiscourseNode(uid)) {
+      const uid = getPageUidByPageTitle(title);
+      if (uid && isDiscourseNode(uid)) {
         renderPossibleDuplicates(h1, title);
       }

Based on learnings, using getPageUidByPageTitle synchronously here is correct.

Also applies to: 89-92

apps/roam/src/utils/useNodeContext.ts (2)

13-36: Content extraction from formatted titles is well-aligned with placeholder semantics.

The logic of deriving placeholders from node.format, using getDiscourseNodeFormatExpression, and preferring a {content} placeholder (falling back to the first capture) is a good match for how format expressions are built. If you find yourself needing the same behavior outside Roam (e.g., the Obsidian util), consider lifting this into a shared extractContentFromTitle helper to avoid divergence between apps.


38-69: useNodeContext matching strategy looks solid; UID-first then title-based fallback.

Resolving the discourse node by pageUid via findDiscourseNode and then falling back to matchDiscourseNode({ ...node, title: pageTitle }) gives a sensible priority order, and returning null when nothing matches lets callers cleanly short‑circuit (as VectorDuplicateMatches does). If you expect empty or transient titles, you could optionally short‑circuit early in the effect when !pageTitle.trim() to skip getDiscourseNodes() work, but the current implementation is correct as is.

Based on learnings, using the Roam page UID with findDiscourseNode (where DiscourseNode.type is the UID field) is consistent with existing conventions.

apps/roam/src/components/VectorDuplicateMatches.tsx (2)

8-21: Avoid duplicating the vector match shape; import the shared type from hyde.ts.

VectorMatchItem mirrors the VectorMatch type exposed from hyde.ts, and the cast const vectorSearch = findSimilarNodesVectorOnly as (...) => Promise<VectorMatchItem[]>; will silently drift if the underlying type ever changes. To keep things aligned, consider reusing the exported type instead of re-declaring and casting:

-import type { Result } from "~/utils/types";
-import { findSimilarNodesVectorOnly } from "../utils/hyde";
+import type { Result } from "~/utils/types";
+import { findSimilarNodesVectorOnly, type VectorMatch } from "../utils/hyde";
@@
-type VectorMatchItem = {
-  node: Result;
-  score: number;
-};
+type VectorMatchItem = VectorMatch & { node: Result };
@@
-const vectorSearch = findSimilarNodesVectorOnly as (
-  params: VectorSearchParams,
-) => Promise<VectorMatchItem[]>;
+const vectorSearch = (params: VectorSearchParams) =>
+  findSimilarNodesVectorOnly(params);

95-157: UI rendering and DOM mounting for duplicate matches are consistent with the Roam environment.

The conditional rendering (no panel when activeContext is null, spinner vs. "No matches found" vs. list) reads cleanly, and the list items correctly open the matched node in the right sidebar. The renderPossibleDuplicates helper’s use of a dedicated container class plus data-page-title to reuse or recreate the mount point, and repositioning relative to the H1’s container, should keep the injected UI stable across title changes and re-renders in Roam’s header.

Also applies to: 160-199

apps/roam/src/utils/hyde.ts (1)

59-63: RPC signatures verified; defensive UID filtering remains a good practice.

The parameter names and JSON serialization approach are correct—p_query_embedding (string-serialized vector) and p_subset_roam_uids (text array) match the SQL function definition, and JSON.stringify() is the established pattern across the codebase for vector RPC calls. The findSimilarNodesVectorOnly function uses the same serialization strategy with match_content_embeddings.

The defensive filter suggestion still holds as a safeguard against unexpected falsy values:

-  const subsetRoamUids = indexData.map((node) => node.uid);
+  const subsetRoamUids = indexData
+    .map((node) => node.uid)
+    .filter((uid): uid is string => !!uid);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 434df94 and d574e8d.

📒 Files selected for processing (5)
  • apps/roam/src/components/CreateNodeDialog.tsx (3 hunks)
  • apps/roam/src/components/VectorDuplicateMatches.tsx (1 hunks)
  • apps/roam/src/utils/hyde.ts (3 hunks)
  • apps/roam/src/utils/initializeObserversAndListeners.ts (2 hunks)
  • apps/roam/src/utils/useNodeContext.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-06-22T10:40:52.752Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Applied to files:

  • apps/roam/src/components/CreateNodeDialog.tsx
  • apps/roam/src/utils/useNodeContext.ts
  • apps/roam/src/utils/hyde.ts
  • apps/roam/src/utils/initializeObserversAndListeners.ts
📚 Learning: 2025-08-25T15:53:21.799Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.

Applied to files:

  • apps/roam/src/components/CreateNodeDialog.tsx
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Applied to files:

  • apps/roam/src/utils/useNodeContext.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.

Applied to files:

  • apps/roam/src/utils/useNodeContext.ts
📚 Learning: 2025-11-05T21:57:14.909Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.

Applied to files:

  • apps/roam/src/utils/useNodeContext.ts
  • apps/roam/src/utils/initializeObserversAndListeners.ts
📚 Learning: 2025-05-30T14:49:24.016Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 182
File: apps/website/app/utils/supabase/dbUtils.ts:22-28
Timestamp: 2025-05-30T14:49:24.016Z
Learning: In apps/website/app/utils/supabase/dbUtils.ts, expanding the KNOWN_EMBEDDINGS and DEFAULT_DIMENSIONS mappings to support additional embedding models requires corresponding database model changes (creating new embedding tables), which should be scoped as separate work from API route implementations.

Applied to files:

  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-05-20T14:04:19.632Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/schemas/embedding.sql:66-95
Timestamp: 2025-05-20T14:04:19.632Z
Learning: In the `match_embeddings_for_subset_nodes` SQL function in packages/database/supabase/schemas/embedding.sql, the number of results is implicitly limited by the length of the input array parameter `p_subset_roam_uids` since the function filters content using `WHERE c.source_local_id = ANY(p_subset_roam_uids)`.

Applied to files:

  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-06-25T22:56:17.522Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-06-25T22:56:17.522Z
Learning: In the Roam discourse-graph system, the existence of the configuration page (identified by DISCOURSE_CONFIG_PAGE_TITLE) and its corresponding UID is a system invariant. The code can safely assume this page will always exist, so defensive null checks are not needed when using `getPageUidByPageTitle(DISCOURSE_CONFIG_PAGE_TITLE)`.

Applied to files:

  • apps/roam/src/utils/initializeObserversAndListeners.ts
🧬 Code graph analysis (5)
apps/roam/src/components/VectorDuplicateMatches.tsx (3)
apps/roam/src/utils/types.ts (1)
  • Result (42-46)
apps/roam/src/utils/hyde.ts (1)
  • findSimilarNodesVectorOnly (539-590)
apps/roam/src/utils/useNodeContext.ts (2)
  • NodeContext (8-11)
  • useNodeContext (38-69)
apps/roam/src/components/CreateNodeDialog.tsx (1)
apps/roam/src/components/VectorDuplicateMatches.tsx (1)
  • VectorDuplicateMatches (23-158)
apps/roam/src/utils/useNodeContext.ts (2)
apps/obsidian/src/utils/extractContentFromTitle.ts (1)
  • extractContentFromTitle (3-10)
apps/obsidian/src/utils/getDiscourseNodeFormatExpression.ts (1)
  • getDiscourseNodeFormatExpression (1-9)
apps/roam/src/utils/hyde.ts (2)
apps/roam/src/utils/types.ts (1)
  • Result (42-46)
apps/roam/src/utils/supabaseContext.ts (1)
  • getLoggedInClient (95-112)
apps/roam/src/utils/initializeObserversAndListeners.ts (1)
apps/roam/src/components/VectorDuplicateMatches.tsx (1)
  • renderPossibleDuplicates (160-199)
🔇 Additional comments (3)
apps/roam/src/components/CreateNodeDialog.tsx (1)

15-15: Debounced duplicate search integration into the create dialog looks good.

The 500 ms debounced debouncedTitle and its use as both pageTitle and explicit text for VectorDuplicateMatches is a sensible way to surface duplicate suggestions without firing vector searches on every keystroke, and using limit={5} keeps the UI compact in the dialog context.

Also applies to: 37-52, 150-154

apps/roam/src/components/VectorDuplicateMatches.tsx (1)

37-82: Search triggering and cancellation logic is well-structured.

Using useNodeContext(pageTitle) combined with an override when text is provided, resetting hasSearched on activeContext.searchText changes, and guarding the fetch on isOpen && !hasSearched provides a clear, debounced search lifecycle. The isCancelled flag in the effect is a good safeguard against setting state after unmount or context changes; no changes needed here.

apps/roam/src/utils/hyde.ts (1)

534-590: Now I need to check the actual SQL definition to confirm the issue:

Based on my verification against Supabase documentation and pgvector standards, the embedding should be passed directly to the rpc() method, not stringified. Supabase documentation examples show passing query_embedding: embedding directly, and the Supabase client automatically serializes parameters as JSON. The JSON.stringify() call converts the embedding array to a string, which will cause a type mismatch with the SQL function's vector type parameter.

Remove JSON.stringify() from the embedding parameter in the RPC call

Change line 558:

const { data, error } = await supabase.rpc("match_content_embeddings", {
  query_embedding: queryEmbedding,
  match_threshold: threshold,
  match_count: limit,
});
⛔ Skipped due to learnings
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 503
File: apps/roam/src/utils/syncDgNodesToSupabase.ts:304-360
Timestamp: 2025-10-18T12:05:23.834Z
Learning: When inserting pgvector embeddings via Supabase (PostgREST) in TypeScript, use JSON.stringify(vector) to serialize the number[] array to a string, as PostgREST does not natively support the pgvector type. This is the official workaround for working with vector columns through the REST API.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 182
File: apps/website/app/utils/supabase/dbUtils.ts:22-28
Timestamp: 2025-05-30T14:49:24.016Z
Learning: In apps/website/app/utils/supabase/dbUtils.ts, expanding the KNOWN_EMBEDDINGS and DEFAULT_DIMENSIONS mappings to support additional embedding models requires corresponding database model changes (creating new embedding tables), which should be scoped as separate work from API route implementations.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/schemas/embedding.sql:66-95
Timestamp: 2025-05-20T14:04:19.632Z
Learning: In the `match_embeddings_for_subset_nodes` SQL function in packages/database/supabase/schemas/embedding.sql, the number of results is implicitly limited by the length of the input array parameter `p_subset_roam_uids` since the function filters content using `WHERE c.source_local_id = ANY(p_subset_roam_uids)`.

@sid597 sid597 requested a review from mdroidian November 24, 2025 10:40
Copy link
Contributor

@mdroidian mdroidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start! But I think this would be worth re-looking at useNodeContext because there is a lot going on there that seems unnecessary, which will result in longer page load times.

@sid597 sid597 force-pushed the eng-1074-prod-duplicate-node-alert-using-vector-search branch 2 times, most recently from 49170f6 to 9c559c3 Compare December 7, 2025 07:35
@sid597 sid597 force-pushed the eng-1074-prod-duplicate-node-alert-using-vector-search branch from 9c559c3 to 74f63a3 Compare December 18, 2025 14:04
@sid597
Copy link
Collaborator Author

sid597 commented Dec 18, 2025

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 18, 2025

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
apps/roam/src/utils/initializeObserversAndListeners.ts (1)

91-116: Performance concern: heavy queries on every page.

This code runs getDiscourseNodes() and findDiscourseNode() on every page title observer callback, which is potentially expensive. This was flagged in a previous review with ticket ENG-1095 created to optimize before this PR merges.

Additionally, getPageUidByPageTitle is called twice - once for the config page (line 93) and once for the current page (line 98). The second call result could potentially be reused if findDiscourseNode were refactored to accept an optional pre-fetched uid.

apps/roam/src/utils/hyde.ts (1)

554-555: Silent failure when Supabase is not configured.

When getLoggedInClient() returns null (user hasn't set up Supabase/embeddings), the function silently returns an empty array. This was discussed in a previous review - users may think no duplicates exist when actually the search wasn't performed.

Consider distinguishing between "no matches found" and "search not available" states, or at minimum logging this condition for debugging.

🧹 Nitpick comments (3)
apps/roam/src/utils/initializeObserversAndListeners.ts (1)

106-115: Null safety: querySelector may return null.

The querySelector returns HTMLDivElement | null, but if .rm-reference-main doesn't exist, linkedReferencesDiv will be null and the conditional correctly guards against it. However, the type assertion as HTMLDivElement on line 108 is only safe because of the truthy check on line 109.

Consider using a type guard pattern for clarity:

-        const linkedReferencesDiv = document.querySelector(
-          ".rm-reference-main",
-        ) as HTMLDivElement;
-        if (linkedReferencesDiv) {
+        const linkedReferencesDiv = document.querySelector<HTMLDivElement>(
+          ".rm-reference-main",
+        );
+        if (linkedReferencesDiv) {
apps/roam/src/components/VectorDuplicateMatches.tsx (2)

44-45: Incorrect parameter order in extractContentFromTitle call.

Line 44 passes (pageTitle || "", node) but extractContentFromTitle expects (title: string, node: { format: string }). The first parameter should be the title to extract content from, and node.format is used internally.

However, looking at the function signature, it seems correct - pageTitle is the title and node provides the format. But passing an empty string when pageTitle is undefined will cause the function to return the empty string (since no format match will occur). Consider handling the undefined case explicitly.

-  const searchText = extractContentFromTitle(pageTitle || "", node);
+  const searchText = pageTitle ? extractContentFromTitle(pageTitle, node) : "";

18-28: Add explicit return types per coding guidelines.

Per the coding guidelines for TypeScript files, functions should have explicit return types.

🔎 Suggested changes:
 export const VectorDuplicateMatches = ({
   pageTitle,
   text,
   limit = 15,
   node,
 }: {
   pageTitle?: string;
   text?: string;
   limit?: number;
   node: DiscourseNode;
-}) => {
+}): JSX.Element | null => {
-  const handleSuggestionClick = async (node: VectorMatch["node"]) => {
+  const handleSuggestionClick = async (node: VectorMatch["node"]): Promise<void> => {

Also applies to: 95-104

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b4cb58 and 612a9fa.

📒 Files selected for processing (5)
  • apps/roam/src/components/VectorDuplicateMatches.tsx (1 hunks)
  • apps/roam/src/utils/extractContentFromTitle.ts (1 hunks)
  • apps/roam/src/utils/hyde.ts (3 hunks)
  • apps/roam/src/utils/initializeObserversAndListeners.ts (3 hunks)
  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/main.mdc)

**/*.{ts,tsx}: Use Tailwind CSS for styling where possible
When refactoring inline styles, use tailwind classes
Prefer type over interface in TypeScript
Use explicit return types for functions
Avoid any types when possible
Prefer arrow functions over regular function declarations
Use named parameters (object destructuring) when a function has more than 2 parameters
Use PascalCase for components and types
Use camelCase for variables and functions
Use UPPERCASE for constants
Function names should describe their purpose clearly
Prefer early returns over nested conditionals for better readability

Files:

  • apps/roam/src/utils/initializeObserversAndListeners.ts
  • apps/roam/src/components/VectorDuplicateMatches.tsx
  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts
  • apps/roam/src/utils/extractContentFromTitle.ts
  • apps/roam/src/utils/hyde.ts
apps/roam/**/*.{js,ts,tsx,jsx,json}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Prefer existing dependencies from package.json when working on the Roam Research extension

Files:

  • apps/roam/src/utils/initializeObserversAndListeners.ts
  • apps/roam/src/components/VectorDuplicateMatches.tsx
  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts
  • apps/roam/src/utils/extractContentFromTitle.ts
  • apps/roam/src/utils/hyde.ts
apps/roam/**/*.{ts,tsx,jsx,js,css,scss}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension

Files:

  • apps/roam/src/utils/initializeObserversAndListeners.ts
  • apps/roam/src/components/VectorDuplicateMatches.tsx
  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts
  • apps/roam/src/utils/extractContentFromTitle.ts
  • apps/roam/src/utils/hyde.ts
apps/roam/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

apps/roam/**/*.{ts,tsx,js,jsx}: Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Files:

  • apps/roam/src/utils/initializeObserversAndListeners.ts
  • apps/roam/src/components/VectorDuplicateMatches.tsx
  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts
  • apps/roam/src/utils/extractContentFromTitle.ts
  • apps/roam/src/utils/hyde.ts
apps/roam/**

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Implement the Discourse Graph protocol in the Roam Research extension

Files:

  • apps/roam/src/utils/initializeObserversAndListeners.ts
  • apps/roam/src/components/VectorDuplicateMatches.tsx
  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts
  • apps/roam/src/utils/extractContentFromTitle.ts
  • apps/roam/src/utils/hyde.ts
🧠 Learnings (15)
📚 Learning: 2025-06-22T10:40:52.752Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Applied to files:

  • apps/roam/src/utils/initializeObserversAndListeners.ts
  • apps/roam/src/components/VectorDuplicateMatches.tsx
  • apps/roam/src/utils/extractContentFromTitle.ts
  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-12-07T20:54:20.007Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Applied to files:

  • apps/roam/src/utils/initializeObserversAndListeners.ts
  • apps/roam/src/components/VectorDuplicateMatches.tsx
  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts
  • apps/roam/src/utils/extractContentFromTitle.ts
📚 Learning: 2025-11-05T21:57:14.909Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.

Applied to files:

  • apps/roam/src/utils/initializeObserversAndListeners.ts
  • apps/roam/src/components/VectorDuplicateMatches.tsx
  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

Applied to files:

  • apps/roam/src/components/VectorDuplicateMatches.tsx
  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-08-25T15:53:21.799Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.

Applied to files:

  • apps/roam/src/components/VectorDuplicateMatches.tsx
📚 Learning: 2025-06-23T11:49:45.457Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:11-40
Timestamp: 2025-06-23T11:49:45.457Z
Learning: In the DiscourseGraphs/discourse-graph codebase, direct access to `window.roamAlphaAPI` is the established pattern throughout the codebase. The team prefers to maintain this pattern consistently rather than making piecemeal changes, and plans to address dependency injection as a global refactor when scheduled.

Applied to files:

  • apps/roam/src/components/VectorDuplicateMatches.tsx
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Applied to files:

  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.

Applied to files:

  • apps/roam/src/utils/renderLinkedReferenceAdditions.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Applied to files:

  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{js,ts,tsx,jsx,json} : Prefer existing dependencies from package.json when working on the Roam Research extension

Applied to files:

  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-10-24T11:44:27.528Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 514
File: apps/roam/src/components/settings/SuggestiveModeSettings.tsx:105-123
Timestamp: 2025-10-24T11:44:27.528Z
Learning: In the discourse-graph codebase (especially in Roam app components using roamjs-components Toast), toast intents are passed as string literals (e.g., "primary", "danger", "success") rather than using the Blueprint Intent enum, as this is the established pattern used in >95% of the codebase.

Applied to files:

  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-05-30T14:49:24.016Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 182
File: apps/website/app/utils/supabase/dbUtils.ts:22-28
Timestamp: 2025-05-30T14:49:24.016Z
Learning: In apps/website/app/utils/supabase/dbUtils.ts, expanding the KNOWN_EMBEDDINGS and DEFAULT_DIMENSIONS mappings to support additional embedding models requires corresponding database model changes (creating new embedding tables), which should be scoped as separate work from API route implementations.

Applied to files:

  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-10-18T18:58:16.100Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 504
File: apps/roam/src/utils/syncDgNodesToSupabase.ts:523-531
Timestamp: 2025-10-18T18:58:16.100Z
Learning: In `apps/roam/src/utils/syncDgNodesToSupabase.ts`, partial successes from `upsertNodesToSupabaseAsContent` and `addMissingEmbeddings` (indicated by numeric return values showing the count of successful operations) should NOT trigger backoff. Only complete failures (false) should trigger the exponential backoff mechanism. This design allows the sync process to continue making progress even when some items fail.

Applied to files:

  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-06-19T19:43:43.380Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 226
File: apps/roam/src/components/settings/HomePersonalSettings.tsx:123-149
Timestamp: 2025-06-19T19:43:43.380Z
Learning: The "Fetch Embeddings for nodes" button in HomePersonalSettings.tsx is for testing purposes only, so it doesn't require production-level error handling or user feedback improvements.

Applied to files:

  • apps/roam/src/utils/hyde.ts
📚 Learning: 2025-05-20T14:04:19.632Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/schemas/embedding.sql:66-95
Timestamp: 2025-05-20T14:04:19.632Z
Learning: In the `match_embeddings_for_subset_nodes` SQL function in packages/database/supabase/schemas/embedding.sql, the number of results is implicitly limited by the length of the input array parameter `p_subset_roam_uids` since the function filters content using `WHERE c.source_local_id = ANY(p_subset_roam_uids)`.

Applied to files:

  • apps/roam/src/utils/hyde.ts
🧬 Code graph analysis (4)
apps/roam/src/utils/initializeObserversAndListeners.ts (4)
apps/roam/src/utils/getExportSettings.ts (1)
  • getUidAndBooleanSetting (56-62)
apps/roam/src/utils/renderNodeConfigPage.ts (1)
  • DISCOURSE_CONFIG_PAGE_TITLE (29-29)
apps/roam/src/components/VectorDuplicateMatches.tsx (1)
  • renderPossibleDuplicates (168-177)
apps/roam/src/utils/renderLinkedReferenceAdditions.ts (1)
  • renderDiscourseContextAndCanvasReferences (7-40)
apps/roam/src/components/VectorDuplicateMatches.tsx (2)
apps/roam/src/utils/hyde.ts (2)
  • findSimilarNodesVectorOnly (540-592)
  • VectorMatch (535-538)
apps/roam/src/utils/handleTitleAdditions.ts (1)
  • handleTitleAdditions (7-58)
apps/roam/src/utils/extractContentFromTitle.ts (2)
apps/obsidian/src/utils/extractContentFromTitle.ts (1)
  • extractContentFromTitle (3-10)
apps/obsidian/src/utils/getDiscourseNodeFormatExpression.ts (1)
  • getDiscourseNodeFormatExpression (1-9)
apps/roam/src/utils/hyde.ts (2)
apps/roam/src/utils/types.ts (1)
  • Result (42-46)
apps/roam/src/utils/supabaseContext.ts (1)
  • getLoggedInClient (97-114)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (6)
apps/roam/src/utils/extractContentFromTitle.ts (1)

3-26: LGTM! Well-structured content extraction utility.

The function correctly handles the {content} placeholder lookup and falls back appropriately. The implementation is more robust than the Obsidian counterpart (shown in relevant snippets) by supporting named placeholders beyond just the first capture group.

One minor note: the parameter order (title, node) differs from the Obsidian version (format, title). This is fine since they're separate implementations, but worth noting if cross-platform consistency becomes a goal.

apps/roam/src/utils/renderLinkedReferenceAdditions.ts (1)

7-40: Clean refactor with explicit uid parameter.

The refactor properly separates concerns - the caller now determines if this is a discourse node and provides the uid explicitly. The idempotency check via data-roamjs-discourse-context attribute prevents duplicate rendering.

One question: Line 25 passes results: [] to DiscourseContext. Is this intentional? If DiscourseContext fetches its own results internally, this is fine. If not, this may result in an empty context display.

apps/roam/src/utils/hyde.ts (2)

572-578: Result mapping simplified as requested.

The result mapping now follows the cleaner pattern suggested in the past review.


581-590: Toast notification on error addresses UX feedback.

The toast notification provides user-visible feedback on search failures, addressing the previous review concern about silent errors.

apps/roam/src/components/VectorDuplicateMatches.tsx (2)

168-177: Integration with handleTitleAdditions aligns with codebase direction.

As noted in a previous review comment, this will be combined with other title additions. The current implementation correctly uses handleTitleAdditions utility which provides the proper DOM injection pattern shown in the relevant code snippets.


112-164: Well-structured UI with proper loading states and Tailwind styling.

The component properly handles:

  • Loading state with spinner
  • Empty state messaging
  • Result list with clickable items
  • Collapsible section with count badge

Uses BlueprintJS components (Collapse, Spinner, Icon) and Tailwind CSS as per coding guidelines for the Roam extension.

@sid597
Copy link
Collaborator Author

sid597 commented Dec 18, 2025

@mdroidian I accidentally merged some changes intended for this branch to the observer pr eng-1095 (#591), then to fix that I did some graphite stuff and it ruined #591 so I closed it, should i create a seperate pr for the observer or this one is fine?

https://www.loom.com/share/f57d858ec5eb4663bbe0ad2286f9bd6f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants