Replaced brittle prompt with pydantic schema validation#30
Replaced brittle prompt with pydantic schema validation#30Nossks wants to merge 1 commit intodbpedia:mainfrom
Conversation
📝 WalkthroughWalkthroughReplaces ad-hoc JSON parsing with a Pydantic Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Disambiguator as LLMDisambiguator
participant LLM as LanguageModel
participant Validator as DisambiguationResult
participant Repo as CandidatePools
User->>Disambiguator: request disambiguation (prompt + candidate pools)
Disambiguator->>LLM: generate response (with generation_config & schema)
LLM-->>Disambiguator: JSON response
Disambiguator->>Validator: DisambiguationResult.model_validate_json(response)
alt validation succeeds
Validator-->>Disambiguator: typed fields (subject_index, predicate_uri, object_index)
Disambiguator->>Repo: map indices → URIs (clamp & collision checks)
Disambiguator->>Disambiguator: detect role-swaps/duplicates, adjust
Disambiguator-->>User: result (meta: "candidate")
else validation fails
Validator-->>Disambiguator: validation error
Disambiguator->>Disambiguator: set predicate=None, subject_index=0, object_index=0
Disambiguator->>Repo: fallback mapping (safe defaults)
Disambiguator-->>User: result (meta: "hallucination_rejected")
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@GSoC25/NEF/NEF.py`:
- Around line 387-390: The meta dictionary sets label incorrectly as
"hallucinaiton_rejected"; update the string to the correct spelling
"hallucination_rejected" where meta is created (look for the meta = {...}
assignment that uses chosen_sim, sim_map, pred_uri and sets "label") so
downstream filters get the correct label.
- Around line 357-360: Wrap the call to
DisambiguationResult.model_validate_json(resp.text) and the subsequent accesses
to data.predicate_uri / data.subject_index / data.object_index in a try/except
that catches ValidationError and ValueError; on exception log the error (using
the existing logger in this module or self.logger), and return or yield a safe
fallback (e.g., None or a sentinel) so the pipeline can continue instead of
crashing. Ensure the try block contains the lines that assign to data, pred_uri,
s_idx, o_idx, and the except block sets pred_uri, s_idx, o_idx to safe defaults
and emits a clear log message including resp.text or the exception for
debugging.
🧹 Nitpick comments (1)
GSoC25/NEF/NEF.py (1)
12-12: If GSoC25/NEF becomes a package, the import will need to be relative.The absolute import
from output_schema import DisambiguationResultcurrently works becauseoutput_schema.pyis in the same directory. However, if GSoC25/NEF is converted to a package (by adding__init__.pyto match the pattern of other GSoC25 submodules), this import will fail when the module is imported asGSoC25.NEF.NEF. Convert to a relative import and add__init__.py:Suggested change (if package usage is intended)
-from output_schema import DisambiguationResult +from .output_schema import DisambiguationResultAlso add an empty
GSoC25/NEF/__init__.pyfile.
9765fdf to
28ff871
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@GSoC25/NEF/NEF.py`:
- Line 12: NEF.py currently uses a relative import "from .output_schema import
DisambiguationResult" which fails when run as a script; change it to a safe
import that works in both package and script contexts by replacing the relative
import with a fallback/absolute form—for example try importing
DisambiguationResult from output_schema first (or use an absolute import path)
and fall back to the package-style import in an except ImportError block; update
the import that references DisambiguationResult/output_schema accordingly so CLI
execution (python NEF/NEF.py ...) no longer raises "attempted relative import
with no known parent package".
28ff871 to
c9c5be9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@GSoC25/NEF/NEF.py`:
- Around line 348-357: The generation_config is currently a plain dict but must
be an instance of types.GenerateContentConfig; replace the dict assigned to
generation_config with
types.GenerateContentConfig(response_mime_type="application/json",
response_schema=DisambiguationResult) and pass that object into the call to
self.client.models.generate_content (the model=self.model_name, contents=prompt
call remains the same).
c9c5be9 to
f1908da
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@GSoC25/NEF/NEF.py`:
- Around line 348-351: Replace the incorrect parameter name response_scheme with
response_schema when constructing the types.GenerateContentConfig so the
structured Pydantic output (DisambiguationResult) is honored; update the
generation_config assignment in NEF.py (the types.GenerateContentConfig call
that sets response_mime_type and the Pydantic class) to use
response_schema=DisambiguationResult.
f1908da to
606d7e9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@GSoC25/NEF/NEF.py`:
- Around line 371-375: Replace the broad except block that catches Exception
around the validation logic with a specific except for ValidationError and
ensure you use the existing _safe_print/verbose pattern for logging instead of
print; for example, in the try/except that currently sets pred_uri=None,
s_idx=0, o_idx=0, change the handler to except ValidationError as e and call
_safe_print(f"validation error: {e}", verbose) (or the module's standard logging
helper), then preserve setting pred_uri, s_idx, o_idx as before. Also add an
import for ValidationError alongside DisambiguationResult in the module imports
so the exception type is available.
606d7e9 to
ea527ae
Compare
|
Hi there, just a quick update: All CI/CD checks are now passing (I resolved the import issues for both CLI and Package modes). The new DisambiguationResult schema is successfully catching and rejecting hallucinations locally. I know this touches the core generation logic, so I am happy to make adjustments if you prefer a different fallback strategy. Ready for review whenever you have time. |
This PR replaces the legacy index-based prompt selection logic in disambiguate_triple with Gemini's native Structured Output (response_schema). It also introduces a dedicated output_schema.py for type definitions and fixes a critical data integrity issue where invalid predicates defaulted to allowed[0].
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.