⚡ Bolt: Optimized document metadata fetching#84
Conversation
- Updated `DocumentService.get_list` to pass `doc_ids` to `DocMetadataService.get_metadata_for_documents`. - Updated `DocMetadataService` methods to accept `doc_ids` and pass them to `_search_metadata` via `condition`. - Updated `ESConnection.search` and `OSConnection.search` to handle `id` in `condition` using `ids` query. - Prevents fetching all document metadata when only a page is needed. Co-authored-by: nkissick-del <236767245+nkissick-del@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
3 issues found across 4 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="rag/utils/opensearch_conn.py">
<violation number="1" location="rag/utils/opensearch_conn.py:157">
P2: Missing falsy value check for 'id' condition. If `v` is an empty list or empty string, this will add an ids filter with empty/invalid values, potentially causing unexpected query results (returning zero documents). Add a guard `if not v: continue` to match the existing pattern used for other condition keys.</violation>
</file>
<file name="api/db/services/document_service.py">
<violation number="1" location="api/db/services/document_service.py:104">
P2: Guard the metadata lookup when there are no page results; an empty doc_ids list triggers a full KB metadata fetch in DocMetadataService and negates the performance optimization on empty pages.</violation>
</file>
<file name="rag/utils/es_conn.py">
<violation number="1" location="rag/utils/es_conn.py:129">
P2: Missing empty list check for `id` filter. When `v` is an empty list, `Q("ids", values=[])` will match no documents, which may cause unexpected zero results. The `delete` method in this file explicitly handles this case by skipping the filter when the list is empty. Consider adding a similar check here for consistency.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| if k == "id": | ||
| if isinstance(v, list): | ||
| bqry.filter.append(Q("ids", values=v)) | ||
| elif isinstance(v, str): | ||
| bqry.filter.append(Q("ids", values=[v])) | ||
| continue |
There was a problem hiding this comment.
P2: Missing falsy value check for 'id' condition. If v is an empty list or empty string, this will add an ids filter with empty/invalid values, potentially causing unexpected query results (returning zero documents). Add a guard if not v: continue to match the existing pattern used for other condition keys.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At rag/utils/opensearch_conn.py, line 157:
<comment>Missing falsy value check for 'id' condition. If `v` is an empty list or empty string, this will add an ids filter with empty/invalid values, potentially causing unexpected query results (returning zero documents). Add a guard `if not v: continue` to match the existing pattern used for other condition keys.</comment>
<file context>
@@ -154,6 +154,12 @@ def search(
bqry = Q("bool", must=[])
condition["kb_id"] = knowledgebaseIds
for k, v in condition.items():
+ if k == "id":
+ if isinstance(v, list):
+ bqry.filter.append(Q("ids", values=v))
</file context>
| if k == "id": | |
| if isinstance(v, list): | |
| bqry.filter.append(Q("ids", values=v)) | |
| elif isinstance(v, str): | |
| bqry.filter.append(Q("ids", values=[v])) | |
| continue | |
| if k == "id": | |
| if not v: | |
| continue | |
| if isinstance(v, list): | |
| bqry.filter.append(Q("ids", values=v)) | |
| elif isinstance(v, str): | |
| bqry.filter.append(Q("ids", values=[v])) | |
| continue |
| docs_list = list(docs.dicts()) | ||
| metadata_map = DocMetadataService.get_metadata_for_documents(None, kb_id) | ||
| doc_ids = [d["id"] for d in docs_list] | ||
| metadata_map = DocMetadataService.get_metadata_for_documents(doc_ids, kb_id) |
There was a problem hiding this comment.
P2: Guard the metadata lookup when there are no page results; an empty doc_ids list triggers a full KB metadata fetch in DocMetadataService and negates the performance optimization on empty pages.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At api/db/services/document_service.py, line 104:
<comment>Guard the metadata lookup when there are no page results; an empty doc_ids list triggers a full KB metadata fetch in DocMetadataService and negates the performance optimization on empty pages.</comment>
<file context>
@@ -100,7 +100,8 @@ def get_list(cls, kb_id, page_number, items_per_page, orderby, desc, keywords, i
docs_list = list(docs.dicts())
- metadata_map = DocMetadataService.get_metadata_for_documents(None, kb_id)
+ doc_ids = [d["id"] for d in docs_list]
+ metadata_map = DocMetadataService.get_metadata_for_documents(doc_ids, kb_id)
for doc in docs_list:
doc["meta_fields"] = metadata_map.get(doc["id"], {})
</file context>
| metadata_map = DocMetadataService.get_metadata_for_documents(doc_ids, kb_id) | |
| metadata_map = DocMetadataService.get_metadata_for_documents(doc_ids, kb_id) if doc_ids else {} |
| if isinstance(v, list): | ||
| bool_query.filter.append(Q("ids", values=v)) | ||
| elif isinstance(v, str): | ||
| bool_query.filter.append(Q("ids", values=[v])) | ||
| continue |
There was a problem hiding this comment.
P2: Missing empty list check for id filter. When v is an empty list, Q("ids", values=[]) will match no documents, which may cause unexpected zero results. The delete method in this file explicitly handles this case by skipping the filter when the list is empty. Consider adding a similar check here for consistency.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At rag/utils/es_conn.py, line 129:
<comment>Missing empty list check for `id` filter. When `v` is an empty list, `Q("ids", values=[])` will match no documents, which may cause unexpected zero results. The `delete` method in this file explicitly handles this case by skipping the filter when the list is empty. Consider adding a similar check here for consistency.</comment>
<file context>
@@ -125,6 +125,12 @@ def search(
condition["kb_id"] = knowledgebase_ids
for k, v in condition.items():
+ if k == "id":
+ if isinstance(v, list):
+ bool_query.filter.append(Q("ids", values=v))
+ elif isinstance(v, str):
</file context>
| if isinstance(v, list): | |
| bool_query.filter.append(Q("ids", values=v)) | |
| elif isinstance(v, str): | |
| bool_query.filter.append(Q("ids", values=[v])) | |
| continue | |
| if isinstance(v, list) and v: | |
| bool_query.filter.append(Q("ids", values=v)) | |
| elif isinstance(v, str) and v: | |
| bool_query.filter.append(Q("ids", values=[v])) | |
| continue |
DocMetadataServiceto filter by document IDs at the database query level instead of fetching all metadata for a Knowledge Base and filtering in memory.ESConnection.searchandOSConnection.searchreceive theidsfilter andDocumentServicepasses the IDs correctly.PR created automatically by Jules for task 2784656229378864432 started by @nkissick-del