Skip to content

⚡ Bolt: Optimized document metadata fetching#84

Open
nkissick-del wants to merge 1 commit intomainfrom
bolt/optimize-doc-metadata-fetching-2784656229378864432
Open

⚡ Bolt: Optimized document metadata fetching#84
nkissick-del wants to merge 1 commit intomainfrom
bolt/optimize-doc-metadata-fetching-2784656229378864432

Conversation

@nkissick-del
Copy link
Copy Markdown
Owner

  • 💡 What: Optimized DocMetadataService to filter by document IDs at the database query level instead of fetching all metadata for a Knowledge Base and filtering in memory.
  • 🎯 Why: Fetching metadata for thousands of documents when only a few (e.g., 10) are needed for pagination causes significant performance overhead and memory usage.
  • 📊 Impact: Reduces the number of documents fetched from Elasticsearch/Infinity from O(N) (total docs in KB) to O(k) (page size), where k << N.
  • 🔬 Measurement: Verified that ESConnection.search and OSConnection.search receive the ids filter and DocumentService passes the IDs correctly.

PR created automatically by Jules for task 2784656229378864432 started by @nkissick-del

- Updated `DocumentService.get_list` to pass `doc_ids` to `DocMetadataService.get_metadata_for_documents`.
- Updated `DocMetadataService` methods to accept `doc_ids` and pass them to `_search_metadata` via `condition`.
- Updated `ESConnection.search` and `OSConnection.search` to handle `id` in `condition` using `ids` query.
- Prevents fetching all document metadata when only a page is needed.

Co-authored-by: nkissick-del <236767245+nkissick-del@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 4 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="rag/utils/opensearch_conn.py">

<violation number="1" location="rag/utils/opensearch_conn.py:157">
P2: Missing falsy value check for 'id' condition. If `v` is an empty list or empty string, this will add an ids filter with empty/invalid values, potentially causing unexpected query results (returning zero documents). Add a guard `if not v: continue` to match the existing pattern used for other condition keys.</violation>
</file>

<file name="api/db/services/document_service.py">

<violation number="1" location="api/db/services/document_service.py:104">
P2: Guard the metadata lookup when there are no page results; an empty doc_ids list triggers a full KB metadata fetch in DocMetadataService and negates the performance optimization on empty pages.</violation>
</file>

<file name="rag/utils/es_conn.py">

<violation number="1" location="rag/utils/es_conn.py:129">
P2: Missing empty list check for `id` filter. When `v` is an empty list, `Q("ids", values=[])` will match no documents, which may cause unexpected zero results. The `delete` method in this file explicitly handles this case by skipping the filter when the list is empty. Consider adding a similar check here for consistency.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines +157 to +162
if k == "id":
if isinstance(v, list):
bqry.filter.append(Q("ids", values=v))
elif isinstance(v, str):
bqry.filter.append(Q("ids", values=[v]))
continue
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing falsy value check for 'id' condition. If v is an empty list or empty string, this will add an ids filter with empty/invalid values, potentially causing unexpected query results (returning zero documents). Add a guard if not v: continue to match the existing pattern used for other condition keys.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At rag/utils/opensearch_conn.py, line 157:

<comment>Missing falsy value check for 'id' condition. If `v` is an empty list or empty string, this will add an ids filter with empty/invalid values, potentially causing unexpected query results (returning zero documents). Add a guard `if not v: continue` to match the existing pattern used for other condition keys.</comment>

<file context>
@@ -154,6 +154,12 @@ def search(
         bqry = Q("bool", must=[])
         condition["kb_id"] = knowledgebaseIds
         for k, v in condition.items():
+            if k == "id":
+                if isinstance(v, list):
+                    bqry.filter.append(Q("ids", values=v))
</file context>
Suggested change
if k == "id":
if isinstance(v, list):
bqry.filter.append(Q("ids", values=v))
elif isinstance(v, str):
bqry.filter.append(Q("ids", values=[v]))
continue
if k == "id":
if not v:
continue
if isinstance(v, list):
bqry.filter.append(Q("ids", values=v))
elif isinstance(v, str):
bqry.filter.append(Q("ids", values=[v]))
continue
Fix with Cubic

docs_list = list(docs.dicts())
metadata_map = DocMetadataService.get_metadata_for_documents(None, kb_id)
doc_ids = [d["id"] for d in docs_list]
metadata_map = DocMetadataService.get_metadata_for_documents(doc_ids, kb_id)
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Guard the metadata lookup when there are no page results; an empty doc_ids list triggers a full KB metadata fetch in DocMetadataService and negates the performance optimization on empty pages.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At api/db/services/document_service.py, line 104:

<comment>Guard the metadata lookup when there are no page results; an empty doc_ids list triggers a full KB metadata fetch in DocMetadataService and negates the performance optimization on empty pages.</comment>

<file context>
@@ -100,7 +100,8 @@ def get_list(cls, kb_id, page_number, items_per_page, orderby, desc, keywords, i
         docs_list = list(docs.dicts())
-        metadata_map = DocMetadataService.get_metadata_for_documents(None, kb_id)
+        doc_ids = [d["id"] for d in docs_list]
+        metadata_map = DocMetadataService.get_metadata_for_documents(doc_ids, kb_id)
         for doc in docs_list:
             doc["meta_fields"] = metadata_map.get(doc["id"], {})
</file context>
Suggested change
metadata_map = DocMetadataService.get_metadata_for_documents(doc_ids, kb_id)
metadata_map = DocMetadataService.get_metadata_for_documents(doc_ids, kb_id) if doc_ids else {}
Fix with Cubic

Comment on lines +129 to +133
if isinstance(v, list):
bool_query.filter.append(Q("ids", values=v))
elif isinstance(v, str):
bool_query.filter.append(Q("ids", values=[v]))
continue
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing empty list check for id filter. When v is an empty list, Q("ids", values=[]) will match no documents, which may cause unexpected zero results. The delete method in this file explicitly handles this case by skipping the filter when the list is empty. Consider adding a similar check here for consistency.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At rag/utils/es_conn.py, line 129:

<comment>Missing empty list check for `id` filter. When `v` is an empty list, `Q("ids", values=[])` will match no documents, which may cause unexpected zero results. The `delete` method in this file explicitly handles this case by skipping the filter when the list is empty. Consider adding a similar check here for consistency.</comment>

<file context>
@@ -125,6 +125,12 @@ def search(
         condition["kb_id"] = knowledgebase_ids
         for k, v in condition.items():
+            if k == "id":
+                if isinstance(v, list):
+                    bool_query.filter.append(Q("ids", values=v))
+                elif isinstance(v, str):
</file context>
Suggested change
if isinstance(v, list):
bool_query.filter.append(Q("ids", values=v))
elif isinstance(v, str):
bool_query.filter.append(Q("ids", values=[v]))
continue
if isinstance(v, list) and v:
bool_query.filter.append(Q("ids", values=v))
elif isinstance(v, str) and v:
bool_query.filter.append(Q("ids", values=[v]))
continue
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant