Update embedding model defaults for Gemini API compatibility by Asesino-K · Pull Request #46 · dbpedia/neural-extraction-framework

Asesino-K · 2026-03-16T16:50:58Z

This PR updates the default embedding model names used in the current GSoC25 NEF workflow to match the currently supported Gemini embedding API.

While testing the predicate artifact generation and retrieval workflow locally, I encountered model resolution errors with the older embedding model defaults. This patch updates the endpoint and default model settings to the currently working Gemini embedding model so that the local setup path is consistent with the active API behavior.

This is intended as a small compatibility/setup fix for the current pipeline.

Summary by CodeRabbit

Release Notes

Updates
- Updated the embedding model to the latest version for improved consistency and compatibility.
- Enhanced command-line interface support for selecting different embedding models during pipeline configuration.

coderabbitai · 2026-03-16T16:51:17Z

📝 Walkthrough

Walkthrough

Updates default embedding model identifiers from "embedding-001" to "gemini-embedding-001" across embedding and retrieval modules, including REST batch endpoints, request payloads, and PredicateEmbeddingRetriever initialization signature, with explicit CLI assignment of the model parameter.

Changes

Cohort / File(s)	Summary
Embedding model identifier updates `GSoC25/NEF/Emeddings.py`, `GSoC25/NEF/NEF.py`	Updated default embedding model from "embedding-001" to "gemini-embedding-001" in Gemini REST batch endpoint, per-request model field, and PredicateEmbeddingRetriever's public initialization signature; CLI now explicitly assigns embed_model to the retriever after pipeline construction.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: updating embedding model defaults to use Gemini-compatible model names (embedding-001 → gemini-embedding-001) across two files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

GSoC25/NEF/NEF.py (1)

449-450: Prefer constructor wiring over post-init mutation for embed_model.

Right now the retriever is initialized with a hardcoded model and then overridden later. Passing embed_model through EnhancedNEFPipeline.__init__ avoids split configuration and future ordering bugs.

♻️ Proposed refactor

 class EnhancedNEFPipeline:
     def __init__(
         self,
         client: "genai.Client",
         embeddings_path: Optional[str] = None,
         predicates_path: Optional[str] = None,
+        embed_model: str = "gemini-embedding-001",
         llm_model: str = "gemini-2.5-flash",
         predicate_threshold: float = 0.5,
         new_predicate_namespace: str = "http://nef.local/rel/",
         redis_host: Optional[str] = None,
         redis_port: Optional[int] = None,
@@
         self.pred = PredicateEmbeddingRetriever(
             client=self.client,
             embeddings_path=embeddings_path,
             predicates_path=predicates_path,
-            embed_model="gemini-embedding-001",
+            embed_model=embed_model,
             verbose=verbose,
         )
@@
         pipe = EnhancedNEFPipeline(
             client=client,
             embeddings_path=args.embeddings,
             predicates_path=args.predicates,
+            embed_model=args.embed_model,
             llm_model=args.llm_model,
             predicate_threshold=args.predicate_threshold,
             new_predicate_namespace=args.new_predicate_namespace,
             redis_host=args.redis_host,
             redis_port=args.redis_port,
             redis_password=args.redis_password,
             verbose=args.verbose,
         )
-        # ensure retriever uses requested embed model
-        pipe.pred.embed_model = args.embed_model

Also applies to: 777-777

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@GSoC25/NEF/NEF.py` around lines 449 - 450, The code currently constructs the
retriever with a hardcoded embed_model and mutates it afterward; update
EnhancedNEFPipeline to accept embed_model as a constructor parameter and wire it
into retriever creation (instead of setting embed_model post-init).
Specifically, add an embed_model parameter to EnhancedNEFPipeline.__init__,
propagate that value into the retriever initialization (the place where
embed_model="gemini-embedding-001" is passed), and remove any subsequent
post-construction assignment to retriever.embed_model to ensure configuration is
done only in the constructor.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@GSoC25/NEF/NEF.py`:
- Around line 449-450: The code currently constructs the retriever with a
hardcoded embed_model and mutates it afterward; update EnhancedNEFPipeline to
accept embed_model as a constructor parameter and wire it into retriever
creation (instead of setting embed_model post-init). Specifically, add an
embed_model parameter to EnhancedNEFPipeline.__init__, propagate that value into
the retriever initialization (the place where embed_model="gemini-embedding-001"
is passed), and remove any subsequent post-construction assignment to
retriever.embed_model to ensure configuration is done only in the constructor.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95fbd87b-18c3-414c-859e-a4a79437f696

📥 Commits

Reviewing files that changed from the base of the PR and between 3d07047 and 641f25e.

📒 Files selected for processing (2)

GSoC25/NEF/Emeddings.py
GSoC25/NEF/NEF.py

Update embedding model defaults for Gemini API compatibility

641f25e

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update embedding model defaults for Gemini API compatibility#46

Update embedding model defaults for Gemini API compatibility#46
Asesino-K wants to merge 1 commit intodbpedia:mainfrom
Asesino-K:update-gemini-embedding-defaults

Asesino-K commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Asesino-K commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Asesino-K commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading