Skip to content

Update embedding model defaults for Gemini API compatibility#46

Open
Asesino-K wants to merge 1 commit intodbpedia:mainfrom
Asesino-K:update-gemini-embedding-defaults
Open

Update embedding model defaults for Gemini API compatibility#46
Asesino-K wants to merge 1 commit intodbpedia:mainfrom
Asesino-K:update-gemini-embedding-defaults

Conversation

@Asesino-K
Copy link

@Asesino-K Asesino-K commented Mar 16, 2026

This PR updates the default embedding model names used in the current GSoC25 NEF workflow to match the currently supported Gemini embedding API.

While testing the predicate artifact generation and retrieval workflow locally, I encountered model resolution errors with the older embedding model defaults. This patch updates the endpoint and default model settings to the currently working Gemini embedding model so that the local setup path is consistent with the active API behavior.

This is intended as a small compatibility/setup fix for the current pipeline.

Summary by CodeRabbit

Release Notes

  • Updates
    • Updated the embedding model to the latest version for improved consistency and compatibility.
    • Enhanced command-line interface support for selecting different embedding models during pipeline configuration.

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

Updates default embedding model identifiers from "embedding-001" to "gemini-embedding-001" across embedding and retrieval modules, including REST batch endpoints, request payloads, and PredicateEmbeddingRetriever initialization signature, with explicit CLI assignment of the model parameter.

Changes

Cohort / File(s) Summary
Embedding model identifier updates
GSoC25/NEF/Emeddings.py, GSoC25/NEF/NEF.py
Updated default embedding model from "embedding-001" to "gemini-embedding-001" in Gemini REST batch endpoint, per-request model field, and PredicateEmbeddingRetriever's public initialization signature; CLI now explicitly assigns embed_model to the retriever after pipeline construction.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: updating embedding model defaults to use Gemini-compatible model names (embedding-001 → gemini-embedding-001) across two files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
GSoC25/NEF/NEF.py (1)

449-450: Prefer constructor wiring over post-init mutation for embed_model.

Right now the retriever is initialized with a hardcoded model and then overridden later. Passing embed_model through EnhancedNEFPipeline.__init__ avoids split configuration and future ordering bugs.

♻️ Proposed refactor
 class EnhancedNEFPipeline:
     def __init__(
         self,
         client: "genai.Client",
         embeddings_path: Optional[str] = None,
         predicates_path: Optional[str] = None,
+        embed_model: str = "gemini-embedding-001",
         llm_model: str = "gemini-2.5-flash",
         predicate_threshold: float = 0.5,
         new_predicate_namespace: str = "http://nef.local/rel/",
         redis_host: Optional[str] = None,
         redis_port: Optional[int] = None,
@@
         self.pred = PredicateEmbeddingRetriever(
             client=self.client,
             embeddings_path=embeddings_path,
             predicates_path=predicates_path,
-            embed_model="gemini-embedding-001",
+            embed_model=embed_model,
             verbose=verbose,
         )
@@
         pipe = EnhancedNEFPipeline(
             client=client,
             embeddings_path=args.embeddings,
             predicates_path=args.predicates,
+            embed_model=args.embed_model,
             llm_model=args.llm_model,
             predicate_threshold=args.predicate_threshold,
             new_predicate_namespace=args.new_predicate_namespace,
             redis_host=args.redis_host,
             redis_port=args.redis_port,
             redis_password=args.redis_password,
             verbose=args.verbose,
         )
-        # ensure retriever uses requested embed model
-        pipe.pred.embed_model = args.embed_model

Also applies to: 777-777

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@GSoC25/NEF/NEF.py` around lines 449 - 450, The code currently constructs the
retriever with a hardcoded embed_model and mutates it afterward; update
EnhancedNEFPipeline to accept embed_model as a constructor parameter and wire it
into retriever creation (instead of setting embed_model post-init).
Specifically, add an embed_model parameter to EnhancedNEFPipeline.__init__,
propagate that value into the retriever initialization (the place where
embed_model="gemini-embedding-001" is passed), and remove any subsequent
post-construction assignment to retriever.embed_model to ensure configuration is
done only in the constructor.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@GSoC25/NEF/NEF.py`:
- Around line 449-450: The code currently constructs the retriever with a
hardcoded embed_model and mutates it afterward; update EnhancedNEFPipeline to
accept embed_model as a constructor parameter and wire it into retriever
creation (instead of setting embed_model post-init). Specifically, add an
embed_model parameter to EnhancedNEFPipeline.__init__, propagate that value into
the retriever initialization (the place where embed_model="gemini-embedding-001"
is passed), and remove any subsequent post-construction assignment to
retriever.embed_model to ensure configuration is done only in the constructor.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95fbd87b-18c3-414c-859e-a4a79437f696

📥 Commits

Reviewing files that changed from the base of the PR and between 3d07047 and 641f25e.

📒 Files selected for processing (2)
  • GSoC25/NEF/Emeddings.py
  • GSoC25/NEF/NEF.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant