feat(text_tasks): add external LM model discovery cache helpers#883
feat(text_tasks): add external LM model discovery cache helpers#8831larity wants to merge 3 commits intoace-step:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds provider-aware external language-model discovery and a disk-backed cache for discovered model identifiers, including TTL enforcement, robust JSON handling, targeted invalidation, provider-specific request headers, and unit tests exercising persistence, TTL, discovery parsing, and error paths. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Cache as Cache Module
participant Disk as Disk Storage
participant Discovery as Discovery Module
participant API as External API
App->>Cache: load_cached_external_models(provider, protocol, base_url)
alt Entry exists & not stale
Cache-->>App: models (cached)
else Cache miss or stale
App->>Discovery: discover_external_models(provider, protocol, base_url, api_key, timeout)
Discovery->>Discovery: validate URL scheme
Discovery->>Discovery: build candidate URLs & auth headers
loop Try each candidate URL
Discovery->>API: HTTP GET (with headers)
API-->>Discovery: JSON response
Discovery->>Discovery: extract & normalize model IDs
end
Discovery-->>App: models (discovered) or ExternalModelDiscoveryError
alt models found
App->>Cache: save_cached_external_models(..., models)
Cache->>Disk: write JSON with updated_at
Disk-->>Cache: Path
Cache-->>App: persisted
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
acestep/text_tasks/external_lm_model_cache.py (1)
89-93: Avoid fully silent best-effort failures during invalidation.Returning on
OSErrorwith no log makes permission and filesystem issues invisible in production.
A debug/error log here keeps behavior best-effort while preserving observability.Proposed refactor
import json import os import time from pathlib import Path +from loguru import logger @@ - except OSError: + except OSError as exc: + logger.debug(f"External model cache unlink failed: {exc}") return @@ - except OSError: + except OSError as exc: + logger.debug(f"External model cache write failed: {exc}") returnAs per coding guidelines: "Error handling: Log errors with
loguru.logger(notprint())."Also applies to: 99-100
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/text_tasks/external_lm_model_cache.py` around lines 89 - 93, The current best-effort invalidation silently returns on OSError (around the path.exists()/path.unlink() block), hiding permission/filesystem problems; change the except OSError to capture the exception as e and log it with loguru.logger (e.g., logger.error or logger.exception) including a descriptive message and the exception, then return to preserve best-effort behavior; apply the same change to the other identical block at lines ~99-100 so both path.unlink() failure cases are logged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/core/generation/handler/init_service_test.py`:
- Line 969: Remove the unresolved merge marker and duplicate test definitions:
delete the literal merge artifact lines (e.g., "<<<<<<< HEAD", "=======",
">>>>>>> ...") and keep only a single definition of
test_load_text_encoder_uses_cpu_safe_dtype_when_offloaded; also resolve the
other conflict fragment that created a duplicated block (the second copy of the
same test) so the module contains one valid test function and no merge markers,
ensuring the file parses as valid Python.
In `@acestep/text_tasks/external_lm_model_cache_test.py`:
- Around line 139-158: The teardown in _PatchedEnv.__exit__ currently always
pops ACESTEP_EXTERNAL_MODEL_CACHE_TTL_SEC which can remove a pre-existing value;
change __enter__ to save the original TTL value (e.g., self.original_ttl =
os.environ.get("ACESTEP_EXTERNAL_MODEL_CACHE_TTL_SEC")) and modify __exit__ to
restore that saved value: if self.original_ttl is None then pop the env var,
else set ACESTEP_EXTERNAL_MODEL_CACHE_TTL_SEC back to self.original_ttl; keep
restoring XDG_DATA_HOME as implemented.
In `@acestep/text_tasks/external_lm_model_cache.py`:
- Around line 35-37: The TTL environment parsing is brittle:
int(os.getenv("ACESTEP_EXTERNAL_MODEL_CACHE_TTL_SEC", "43200")) can raise
ValueError and crash load_cached_external_models; change ttl_sec parsing in
load_cached_external_models (where ttl_sec is set) to safely parse the env var
(catch ValueError/TypeError), falling back to the default 43200 (or a sentinel
like -1) when malformed, then keep the existing expiry check using updated_at
unchanged; alternatively extract safe parsing into a small helper (e.g.,
safe_int_env) and use that to set ttl_sec.
In `@acestep/text_tasks/external_lm_model_discovery.py`:
- Around line 39-41: Before calling request.urlopen(req, ...) validate the URL
scheme of the 'url' variable (the discovery base_url) using
urllib.parse.urlparse and allow only 'http' or 'https'; if the parsed scheme is
not 'http' or 'https' reject the input (raise/return/log an error) and do not
call request.urlopen. Update the code around the request.Request(...) /
request.urlopen(...) block in external_lm_model_discovery.py to perform this
scheme check (referencing the 'url' variable and the request.urlopen call) and
handle invalid schemes safely.
---
Nitpick comments:
In `@acestep/text_tasks/external_lm_model_cache.py`:
- Around line 89-93: The current best-effort invalidation silently returns on
OSError (around the path.exists()/path.unlink() block), hiding
permission/filesystem problems; change the except OSError to capture the
exception as e and log it with loguru.logger (e.g., logger.error or
logger.exception) including a descriptive message and the exception, then return
to preserve best-effort behavior; apply the same change to the other identical
block at lines ~99-100 so both path.unlink() failure cases are logged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 28e16bda-832c-4427-8fa4-05c1e0b396d8
📒 Files selected for processing (5)
acestep/core/generation/handler/init_service_test.pyacestep/text_tasks/external_lm_model_cache.pyacestep/text_tasks/external_lm_model_cache_test.pyacestep/text_tasks/external_lm_model_discovery.pyacestep/text_tasks/external_lm_model_discovery_test.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
acestep/text_tasks/external_lm_model_discovery.py (1)
15-23: Expand the public function docstring to include Args/Returns/Raises.
discover_external_modelsis public and raises a domain exception, so the docstring should document inputs, output, and raised errors in the required format.As per coding guidelines, "Docstrings: Mandatory for all modules, classes, and public functions. Use concise format with Args, Returns, and exception documentation."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/text_tasks/external_lm_model_discovery.py` around lines 15 - 23, Update the docstring for the public function discover_external_models to follow the project's docstring standard: add an Args section describing provider (str), protocol (str), base_url (str), api_key (str), and timeout_sec (int) with defaults, add a Returns section describing the returned list[str] of model identifiers, and add a Raises section that documents the specific domain exception class actually raised in this function (replace "DOMAIN_EXCEPTION" with the real exception class name used in the code) and the conditions under which it is raised; keep the description concise and use the existing one-line summary as the first line.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/text_tasks/external_lm_model_discovery.py`:
- Around line 72-87: The discovery logic in external_lm_model_discovery.py
misses cases where base_url already points to a model-list endpoint (e.g.,
"/v1/models" or "/api/tags"), so modify the candidate generation in the
function(s) that build candidates (the block using variables root and candidates
and the similar block around lines 103-104) to treat existing model-list
endpoints as valid: if root.endswith("/models") or root.endswith("/api/tags")
then candidates.append(root) (ensure you normalize/trail-strip before adding and
avoid duplicates). Do this in both candidate-generation locations so the
function will try the base URL itself when it's already the correct model-list
endpoint.
---
Nitpick comments:
In `@acestep/text_tasks/external_lm_model_discovery.py`:
- Around line 15-23: Update the docstring for the public function
discover_external_models to follow the project's docstring standard: add an Args
section describing provider (str), protocol (str), base_url (str), api_key
(str), and timeout_sec (int) with defaults, add a Returns section describing the
returned list[str] of model identifiers, and add a Raises section that documents
the specific domain exception class actually raised in this function (replace
"DOMAIN_EXCEPTION" with the real exception class name used in the code) and the
conditions under which it is raised; keep the description concise and use the
existing one-line summary as the first line.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d40438cd-4ce4-4d55-b141-e5fb671bfa4b
📒 Files selected for processing (4)
acestep/text_tasks/external_lm_model_cache.pyacestep/text_tasks/external_lm_model_cache_test.pyacestep/text_tasks/external_lm_model_discovery.pyacestep/text_tasks/external_lm_model_discovery_test.py
✅ Files skipped from review due to trivial changes (1)
- acestep/text_tasks/external_lm_model_cache.py
🚧 Files skipped from review as they are similar to previous changes (2)
- acestep/text_tasks/external_lm_model_discovery_test.py
- acestep/text_tasks/external_lm_model_cache_test.py
|
Addressed the real |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
acestep/text_tasks/external_lm_model_discovery.py (1)
23-23: Expand docstrings with Args/Returns/Raises for guideline compliance.Line 23 and helper docstrings on Lines 70, 110, 125, and 148 are currently summary-only. The repo guideline asks for key inputs/outputs and raised exceptions where relevant.
Proposed docstring pattern (apply similarly to helpers)
def discover_external_models( @@ ) -> list[str]: - """Discover model identifiers from a provider's model-list endpoint.""" + """Discover model identifiers from provider model-list endpoints. + + Args: + provider: Provider name used for endpoint heuristics. + protocol: Protocol token used for endpoint/header heuristics. + base_url: User-configured provider base URL or endpoint URL. + api_key: API key used for authenticated discovery calls. + timeout_sec: Request timeout in seconds. + + Returns: + Ordered, de-duplicated model identifiers. + + Raises: + ExternalModelDiscoveryError: If no valid endpoint yields model IDs. + """As per coding guidelines, "Docstrings are mandatory for all new/modified Python modules, classes, and functions. Docstrings must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant."
Also applies to: 70-70, 110-110, 125-125, 148-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/text_tasks/external_lm_model_discovery.py` at line 23, Update the module-level docstring and the helper function docstrings in acestep/text_tasks/external_lm_model_discovery.py to follow the project's docstring pattern: for each docstring (module-level and the helpers located near lines 70, 110, 125, 148) add brief "Args:" describing parameters, "Returns:" describing return values, and "Raises:" for exceptions that can be raised; ensure each docstring still starts with the one-line summary and then lists the key inputs/outputs and any exceptions thrown (use the exact function/method names in each helper when editing those docstrings).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@acestep/text_tasks/external_lm_model_discovery.py`:
- Line 23: Update the module-level docstring and the helper function docstrings
in acestep/text_tasks/external_lm_model_discovery.py to follow the project's
docstring pattern: for each docstring (module-level and the helpers located near
lines 70, 110, 125, 148) add brief "Args:" describing parameters, "Returns:"
describing return values, and "Raises:" for exceptions that can be raised;
ensure each docstring still starts with the one-line summary and then lists the
key inputs/outputs and any exceptions thrown (use the exact function/method
names in each helper when editing those docstrings).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 47724750-ce87-40e2-8c15-2374dbbd7050
📒 Files selected for processing (2)
acestep/text_tasks/external_lm_model_discovery.pyacestep/text_tasks/external_lm_model_discovery_test.py
✅ Files skipped from review due to trivial changes (1)
- acestep/text_tasks/external_lm_model_discovery_test.py
86221ea to
fabd6c0
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
acestep/text_tasks/external_lm_model_discovery.py (1)
11-23: Expand docstrings to include Args/Returns/Raises sections for readiness compliance.Docstrings exist, but current one-liners don’t yet include key inputs/outputs and raised exceptions where relevant.
As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions. Docstrings must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant."
Also applies to: 69-70, 109-110, 124-125, 147-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/text_tasks/external_lm_model_discovery.py` around lines 11 - 23, Update the docstrings for ExternalModelDiscoveryError and discover_external_models to include concise Args, Returns, and Raises sections: for discover_external_models document provider, protocol, base_url, api_key, timeout_sec parameter types and meanings, return type (list[str]) and what the list contains, and raise ExternalModelDiscoveryError (or other exceptions) and under what conditions; for ExternalModelDiscoveryError add a short description and typical cause in the Raises section of callers if relevant. Apply the same docstring style to the other functions/classes in this module (the additional functions modified in this PR) so each has purpose plus key inputs/outputs and raised exceptions where relevant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@acestep/text_tasks/external_lm_model_discovery.py`:
- Around line 11-23: Update the docstrings for ExternalModelDiscoveryError and
discover_external_models to include concise Args, Returns, and Raises sections:
for discover_external_models document provider, protocol, base_url, api_key,
timeout_sec parameter types and meanings, return type (list[str]) and what the
list contains, and raise ExternalModelDiscoveryError (or other exceptions) and
under what conditions; for ExternalModelDiscoveryError add a short description
and typical cause in the Raises section of callers if relevant. Apply the same
docstring style to the other functions/classes in this module (the additional
functions modified in this PR) so each has purpose plus key inputs/outputs and
raised exceptions where relevant.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2612803a-ef24-457c-84eb-e8d59d09cfd2
📒 Files selected for processing (4)
acestep/text_tasks/external_lm_model_cache.pyacestep/text_tasks/external_lm_model_cache_test.pyacestep/text_tasks/external_lm_model_discovery.pyacestep/text_tasks/external_lm_model_discovery_test.py
✅ Files skipped from review due to trivial changes (1)
- acestep/text_tasks/external_lm_model_cache.py
🚧 Files skipped from review as they are similar to previous changes (2)
- acestep/text_tasks/external_lm_model_cache_test.py
- acestep/text_tasks/external_lm_model_discovery_test.py
fabd6c0 to
8742892
Compare
8742892 to
577bfbd
Compare
Summary
This PR adds the self-contained helpers used to discover external provider model IDs and cache the results locally.
Scope
In scope:
Out of scope:
What Changed
external_lm_model_cache.pyfor local model-list caching and invalidationexternal_lm_model_discovery.pyfor provider-aware model-list endpoint discovery and parsingTesting
Ran 7 tests, all passing.
Risk / Compatibility
This PR is based directly on
main. It stays inside isolated cache/discovery helpers underacestep/text_tasksand does not introduce runtime secret handling or live request logging.Related Upstream Context
This PR is one small slice of the broader external-LM work that was originally bundled into #808.
Related upstream references:
This slice is limited to model discovery and cache helpers for external providers. It does not include prompt construction, response parsing, caption enhancement behavior, lyric generation behavior, or UI wiring.
Summary by CodeRabbit
New Features
Tests