Refactor: Centralize LLM connection logic and improve error handling by mallasiddharthreddy · Pull Request #25 · dbpedia/neural-extraction-framework

mallasiddharthreddy · 2026-01-26T12:06:14Z

Summary
Addresses the code duplication issue between IndIE and llm_IE modules by implementing a unified architecture for LLM interactions.

Key Changes

New Core Service (src/llm_core.py): Created a centralized LLMService that handles Ollama connections.
- Implemented robust availability checks that auto-pull models from the registry if missing.
- Added defensive parsing to handle variable response formats from the Ollama API (list vs dict).
- Integrated standard retry logic with exponential backoff.
Refactor (IndIE/llm_extractor.py): Removed duplicate connection code; migrated to use LLMService.
Adapter Implementation (llm_IE/llm_interface.py): Rewrote the interface to act as an adapter, routing legacy requests through the new core service while maintaining API compatibility.

Testing

Validated that model availability checks correctly identify and pull missing models.
Verified imports and initialization across both modules.

Summary by CodeRabbit

Refactor
- LLM handling consolidated into a shared core service; extraction API behavior remains compatible.
New Features
- Shared service adds unified model availability checks, fallback verification, and retry/backoff for more reliable responses.
Configuration
- Default generation settings adjusted (top_p=0.8, num_predict=1500) to improve output quality and length.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-26T12:06:34Z

📝 Walkthrough

Walkthrough

Adds a shared LLM core (src/llm_core.py) exposing ModelConfig and LLMService, and refactors local modules to use this service for model availability and generation instead of bespoke Ollama-specific logic.

Changes

Cohort / File(s)	Summary
Shared LLM core `GSoC25_H/src/llm_core.py`	New module introducing `ModelConfig` dataclass and `LLMService`. Handles model listing/pulling/verification and `generate_response` with retry/backoff and logging.
LLM extractor `GSoC25_H/IndIE/llm_extractor.py`	Removed local `ModelConfig` and `LLMInterface`. Adds parent-path adjustment and imports shared `ModelConfig`/`LLMService`; constructs shared config and delegates generation to `LLMService` (defaults updated: `top_p=0.8`, `num_predict=1500`).
LLM interface adapter `GSoC25_H/llm_IE/llm_interface.py`	Replaced Ollama-specific HTTP methods with calls to `LLMService.generate_response`. Builds standard message payloads, extracts `response["message"]["content"]`, preserves parsing; removed `_is_available`, `_pull_model`, and `_generate_text`.

Sequence Diagram(s)

sequenceDiagram
    participant Extractor as Extractor / Adapter
    participant Service as LLMService
    participant Ollama as Ollama API

    rect rgba(100,149,237,0.5)
    Extractor->>Service: instantiate(ModelConfig) / call generate_response(messages)
    end

    rect rgba(60,179,113,0.5)
    Service->>Ollama: list models
    Ollama-->>Service: models list
    alt model missing
        Service->>Ollama: pull model
        Ollama-->>Service: pull result
    end
    Service->>Ollama: chat(messages) with retry/backoff
    Ollama-->>Service: response {message: {content: ...}}
    end

    Service-->>Extractor: standardized response
    Extractor->>Extractor: parse raw_output

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: centralizing LLM connection logic into a shared LLMService and improving error handling with retry logic and better model availability checks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@GSoC25_H/llm_IE/llm_interface.py`:
- Around line 29-44: The base_url passed into OllamaInterface.__init__ is
ignored; add a host (or base_url) field to SharedConfig in
GSoC25_H/src/llm_core.py, populate that field from the base_url parameter in
OllamaInterface.__init__ (i.e., thread base_url → SharedConfig.host), and update
LLMService.__init__ to use that field when constructing the Ollama client (call
ollama.Client(host=self.config.host, timeout=...) instead of relying on the
default). Ensure SharedConfig and LLMService reference the same host attribute
name.

In `@GSoC25_H/src/llm_core.py`:
- Around line 120-140: The retry loop currently prints to stdout and always
sleeps after catching an exception even when no retries remain; update the
except block to log via self.logger (e.g., self.logger.warning or
self.logger.error) instead of print, and only call time.sleep(wait_time) when
retries < self.config.max_retries (i.e., when another retry will occur). Also
replace the final print at loop exit with self.logger.error to record the
permanent failure; keep using self.client.chat and self.config fields as in the
current code.

🧹 Nitpick comments (2)

GSoC25_H/src/llm_core.py (1)

31-92: Consolidate model list parsing to avoid drift.

The list normalization and name extraction are duplicated before/after pull; consider a small helper (e.g., _normalize_models_list() + _extract_model_names()) to keep response-shape handling in one place.

GSoC25_H/IndIE/llm_extractor.py (1)

4-11: Consider avoiding sys.path mutation at import time.

This pattern can create import-order side effects; prefer a proper package layout or configure the module path at the entrypoint (e.g., editable install / PYTHONPATH).

GSoC25_H/llm_IE/llm_interface.py

GSoC25_H/src/llm_core.py

…, logging)

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@GSoC25_H/llm_IE/llm_interface.py`:
- Around line 47-53: The parameter sentence in extract_relations is currently
unused; either incorporate it into the request messages or mark it intentionally
unused to silence lint warnings. If sentence should influence the LLM call,
update messages (in extract_relations) to include it (e.g., combine sentence
with prompt or add a separate user/system message containing sentence) so the
shared service receives the text; otherwise rename the parameter to _sentence or
add a clear in-code marker/comment (e.g., "# unused" or a lint directive) to
indicate it's intentionally unused. Ensure changes are made inside the
extract_relations function to reference the symbol extract_relations and
messages.

🧹 Nitpick comments (2)

GSoC25_H/src/llm_core.py (1)
43-43: Remove the unused f-string prefix (Line 43).
Minor style nit; no placeholders here.
♻️ Suggested change
-                self.logger.warning(f"Could not parse model list response. Attempting direct model check...")
+                self.logger.warning("Could not parse model list response. Attempting direct model check...")
GSoC25_H/llm_IE/llm_interface.py (1)

7-10: Prefer package imports over mutating sys.path.
This can hide import issues and behave differently in tooling. Consider a proper package layout or entry-point PYTHONPATH instead.

GSoC25_H/llm_IE/llm_interface.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@GSoC25_H/llm_IE/llm_interface.py`:
- Around line 34-42: The SharedConfig instantiation is using the wrong attribute
name and a hardcoded retry value; change the num_predict lookup to use
model_config.num_predict (e.g., getattr(model_config, 'num_predict', 2000))
instead of 'max_tokens', and read max_retries from the model_config (e.g.,
getattr(model_config, 'max_retries', 3)) instead of hardcoding 3; update the
SharedConfig call in the llm_interface.py block where SharedConfig(...) is
created to use these corrected attribute lookups.

🧹 Nitpick comments (2)

GSoC25_H/llm_IE/llm_interface.py (2)
7-12: Path manipulation works but is fragile.

The sys.path.append approach is common but can cause issues with relative imports in different execution contexts. Consider using a proper package structure with __init__.py files or a pyproject.toml to define the project layout.

69-70: Response structure access assumes dict format.

The Ollama Python client may return a response object with attribute access (e.g., response.message.content) rather than dict access. Consider adding defensive handling similar to _ensure_model_available in llm_core.py.
♻️ Proposed fix for defensive access
         # Extract text content from the Ollama response dictionary
-        raw_output = response.get("message", {}).get("content", "").strip()
+        # Handle both dict and object response formats
+        if hasattr(response, 'message'):
+            message = response.message
+            raw_output = (message.content if hasattr(message, 'content') else message.get("content", "")).strip()
+        elif isinstance(response, dict):
+            raw_output = response.get("message", {}).get("content", "").strip()
+        else:
+            raw_output = ""

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@GSoC25_H/llm_IE/llm_interface.py`:
- Around line 34-42: The SharedConfig construction in llm_interface.py is using
getattr(model_config, 'num_predict', 2000) which ignores the
ModelConfig.max_tokens field; change the mapping so SharedConfig.num_predict is
populated from model_config.max_tokens (e.g., use getattr(model_config,
'max_tokens', 1024)) so the local ModelConfig value drives num_predict; update
the SharedConfig call site (symbol: SharedConfig) to pass
num_predict=getattr(model_config, 'max_tokens', 1024) and remove reliance on
'num_predict' from model_config.

🧹 Nitpick comments (1)

GSoC25_H/llm_IE/llm_interface.py (1)

1-12: Path manipulation is fragile but acceptable for this refactor.

The sys.path.append approach on lines 7-8 works but is fragile in larger projects. Consider using proper package structure with __init__.py files and relative imports, or installing the package in editable mode (pip install -e .) for a more robust solution.

GSoC25_H/llm_IE/llm_interface.py

mallasiddharthreddy · 2026-01-26T15:07:40Z

Hi DBpedia Community,

I have addressed all feedback from the automated review to ensure the refactor is robust (handling latest tags, legacy config compatibility, and connection logic).

A quick introduction: I am a final-year BE student specializing in AIML and a Global Finalist in the AWS AI Agent Hackathon. I am very interested in the "Stabilizing Hindi DBpedia Pipeline" project idea for GSoC 2026.

I saw the proposal mentioning the need to "reduce code duplication across IndIE and llm_IE," so I decided to tackle this as a warm-up task to familiarize myself with the architecture.

(I tried replying on the Forum, but my new account is temporarily on hold, so I am reaching out here directly!)

The code is now stable and lint-free. I am ready for the next "Predicate Linking" task while this is under review!

Best, Siddharth

Refactor: Implement robust model availability checks and logging

28f94c1

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

GSoC25_H/llm_IE/llm_interface.py Show resolved Hide resolved

GSoC25_H/src/llm_core.py Outdated Show resolved Hide resolved

fix: address CodeRabbit feedback (pass host to client, smart matching…

df24907

…, logging)

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

GSoC25_H/llm_IE/llm_interface.py Show resolved Hide resolved

style: fix linter warnings (remove unused f-string, mark unused arg)

98fe9c4

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

fix: robust config lookup (handle num_predict/max_tokens and retries)

328ce63

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

GSoC25_H/llm_IE/llm_interface.py Show resolved Hide resolved

fix: map legacy max_tokens to num_predict correctly

4362309

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Centralize LLM connection logic and improve error handling#25

Refactor: Centralize LLM connection logic and improve error handling#25
mallasiddharthreddy wants to merge 5 commits intodbpedia:mainfrom
mallasiddharthreddy:refactor/unify-extraction-logic

mallasiddharthreddy commented Jan 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 26, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

mallasiddharthreddy commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mallasiddharthreddy commented Jan 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mallasiddharthreddy commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mallasiddharthreddy commented Jan 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 26, 2026 •

edited

Loading