Refactor: Centralize LLM connection logic and improve error handling#25
Refactor: Centralize LLM connection logic and improve error handling#25mallasiddharthreddy wants to merge 5 commits intodbpedia:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds a shared LLM core ( Changes
Sequence Diagram(s)sequenceDiagram
participant Extractor as Extractor / Adapter
participant Service as LLMService
participant Ollama as Ollama API
rect rgba(100,149,237,0.5)
Extractor->>Service: instantiate(ModelConfig) / call generate_response(messages)
end
rect rgba(60,179,113,0.5)
Service->>Ollama: list models
Ollama-->>Service: models list
alt model missing
Service->>Ollama: pull model
Ollama-->>Service: pull result
end
Service->>Ollama: chat(messages) with retry/backoff
Ollama-->>Service: response {message: {content: ...}}
end
Service-->>Extractor: standardized response
Extractor->>Extractor: parse raw_output
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@GSoC25_H/llm_IE/llm_interface.py`:
- Around line 29-44: The base_url passed into OllamaInterface.__init__ is
ignored; add a host (or base_url) field to SharedConfig in
GSoC25_H/src/llm_core.py, populate that field from the base_url parameter in
OllamaInterface.__init__ (i.e., thread base_url → SharedConfig.host), and update
LLMService.__init__ to use that field when constructing the Ollama client (call
ollama.Client(host=self.config.host, timeout=...) instead of relying on the
default). Ensure SharedConfig and LLMService reference the same host attribute
name.
In `@GSoC25_H/src/llm_core.py`:
- Around line 120-140: The retry loop currently prints to stdout and always
sleeps after catching an exception even when no retries remain; update the
except block to log via self.logger (e.g., self.logger.warning or
self.logger.error) instead of print, and only call time.sleep(wait_time) when
retries < self.config.max_retries (i.e., when another retry will occur). Also
replace the final print at loop exit with self.logger.error to record the
permanent failure; keep using self.client.chat and self.config fields as in the
current code.
🧹 Nitpick comments (2)
GSoC25_H/src/llm_core.py (1)
31-92: Consolidate model list parsing to avoid drift.The list normalization and name extraction are duplicated before/after pull; consider a small helper (e.g.,
_normalize_models_list()+_extract_model_names()) to keep response-shape handling in one place.GSoC25_H/IndIE/llm_extractor.py (1)
4-11: Consider avoidingsys.pathmutation at import time.This pattern can create import-order side effects; prefer a proper package layout or configure the module path at the entrypoint (e.g., editable install /
PYTHONPATH).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@GSoC25_H/llm_IE/llm_interface.py`:
- Around line 47-53: The parameter sentence in extract_relations is currently
unused; either incorporate it into the request messages or mark it intentionally
unused to silence lint warnings. If sentence should influence the LLM call,
update messages (in extract_relations) to include it (e.g., combine sentence
with prompt or add a separate user/system message containing sentence) so the
shared service receives the text; otherwise rename the parameter to _sentence or
add a clear in-code marker/comment (e.g., "# unused" or a lint directive) to
indicate it's intentionally unused. Ensure changes are made inside the
extract_relations function to reference the symbol extract_relations and
messages.
🧹 Nitpick comments (2)
GSoC25_H/src/llm_core.py (1)
43-43: Remove the unused f-string prefix (Line 43).
Minor style nit; no placeholders here.♻️ Suggested change
- self.logger.warning(f"Could not parse model list response. Attempting direct model check...") + self.logger.warning("Could not parse model list response. Attempting direct model check...")GSoC25_H/llm_IE/llm_interface.py (1)
7-10: Prefer package imports over mutatingsys.path.
This can hide import issues and behave differently in tooling. Consider a proper package layout or entry-pointPYTHONPATHinstead.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@GSoC25_H/llm_IE/llm_interface.py`:
- Around line 34-42: The SharedConfig instantiation is using the wrong attribute
name and a hardcoded retry value; change the num_predict lookup to use
model_config.num_predict (e.g., getattr(model_config, 'num_predict', 2000))
instead of 'max_tokens', and read max_retries from the model_config (e.g.,
getattr(model_config, 'max_retries', 3)) instead of hardcoding 3; update the
SharedConfig call in the llm_interface.py block where SharedConfig(...) is
created to use these corrected attribute lookups.
🧹 Nitpick comments (2)
GSoC25_H/llm_IE/llm_interface.py (2)
7-12: Path manipulation works but is fragile.The
sys.path.appendapproach is common but can cause issues with relative imports in different execution contexts. Consider using a proper package structure with__init__.pyfiles or apyproject.tomlto define the project layout.
69-70: Response structure access assumes dict format.The Ollama Python client may return a response object with attribute access (e.g.,
response.message.content) rather than dict access. Consider adding defensive handling similar to_ensure_model_availableinllm_core.py.♻️ Proposed fix for defensive access
# Extract text content from the Ollama response dictionary - raw_output = response.get("message", {}).get("content", "").strip() + # Handle both dict and object response formats + if hasattr(response, 'message'): + message = response.message + raw_output = (message.content if hasattr(message, 'content') else message.get("content", "")).strip() + elif isinstance(response, dict): + raw_output = response.get("message", {}).get("content", "").strip() + else: + raw_output = ""
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@GSoC25_H/llm_IE/llm_interface.py`:
- Around line 34-42: The SharedConfig construction in llm_interface.py is using
getattr(model_config, 'num_predict', 2000) which ignores the
ModelConfig.max_tokens field; change the mapping so SharedConfig.num_predict is
populated from model_config.max_tokens (e.g., use getattr(model_config,
'max_tokens', 1024)) so the local ModelConfig value drives num_predict; update
the SharedConfig call site (symbol: SharedConfig) to pass
num_predict=getattr(model_config, 'max_tokens', 1024) and remove reliance on
'num_predict' from model_config.
🧹 Nitpick comments (1)
GSoC25_H/llm_IE/llm_interface.py (1)
1-12: Path manipulation is fragile but acceptable for this refactor.The
sys.path.appendapproach on lines 7-8 works but is fragile in larger projects. Consider using proper package structure with__init__.pyfiles and relative imports, or installing the package in editable mode (pip install -e .) for a more robust solution.
|
Hi DBpedia Community, I have addressed all feedback from the automated review to ensure the refactor is robust (handling latest tags, legacy config compatibility, and connection logic). A quick introduction: I am a final-year BE student specializing in AIML and a Global Finalist in the AWS AI Agent Hackathon. I am very interested in the "Stabilizing Hindi DBpedia Pipeline" project idea for GSoC 2026. I saw the proposal mentioning the need to "reduce code duplication across IndIE and llm_IE," so I decided to tackle this as a warm-up task to familiarize myself with the architecture. (I tried replying on the Forum, but my new account is temporarily on hold, so I am reaching out here directly!) The code is now stable and lint-free. I am ready for the next "Predicate Linking" task while this is under review! Best, Siddharth |
Summary
Addresses the code duplication issue between
IndIEandllm_IEmodules by implementing a unified architecture for LLM interactions.Key Changes
src/llm_core.py): Created a centralizedLLMServicethat handles Ollama connections.IndIE/llm_extractor.py): Removed duplicate connection code; migrated to useLLMService.llm_IE/llm_interface.py): Rewrote the interface to act as an adapter, routing legacy requests through the new core service while maintaining API compatibility.Testing
Summary by CodeRabbit
Refactor
New Features
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.