feat(text_tasks): add external LM captioning helpers#882
feat(text_tasks): add external LM captioning helpers#8821larity wants to merge 4 commits intoace-step:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new helper module for external LM captioning that provides caption normalization, retry detection, user metadata override/coercion and aliasing, fallback caption construction, and formatted request-intent assembly; includes unit tests exercising these helpers. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can approve the review once all CodeRabbit's comments are resolved.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
acestep/core/generation/handler/init_service_test.py (1)
969-1071:⚠️ Potential issue | 🔴 CriticalUnresolved merge conflict and duplicate test must be removed.
The file contains an unresolved merge conflict marker at line 969 (
<<<<<<< HEAD), which is invalid Python syntax and will cause the test suite to fail immediately. Additionally,test_load_text_encoder_uses_cpu_safe_dtype_when_offloadedis defined twice (lines 989–1029 and lines 1031–1071); in Python the second definition shadows the first, so only one copy would actually run.Please resolve the merge conflict by:
- Removing the
<<<<<<< HEADmarker at line 969.- Keeping only one copy of the test method (the original at lines 989–1029 appears correct).
- Removing the duplicate block at lines 1031–1071.
Proposed fix
-<<<<<<< HEAD def test_get_vae_dtype_returns_float16_on_pre_ampere_cuda(self): ... - - def test_load_text_encoder_uses_cpu_safe_dtype_when_offloaded(self): - """It casts the text encoder to the CPU-safe dtype during CPU offload.""" - host = _Host(project_root="K:/fake_root", device="cuda") - host.offload_to_cpu = True - host.dtype = torch.bfloat16 - - class _FakeEncoder: - def __init__(self): - self.to_calls = [] - self.eval_called = False - - def to(self, value): - self.to_calls.append(value) - return self - - def eval(self): - self.eval_called = True - return self - - fake_encoder = _FakeEncoder() - fake_tokenizer = object() - fake_transformers = types.SimpleNamespace( - AutoModel=types.SimpleNamespace(from_pretrained=Mock(return_value=fake_encoder)), - AutoTokenizer=types.SimpleNamespace(from_pretrained=Mock(return_value=fake_tokenizer)), - ) - - with patch("os.path.exists", return_value=True): - with patch.dict("sys.modules", {"transformers": fake_transformers}): - result = host._load_text_encoder_and_tokenizer( - checkpoint_dir="K:/fake_root/checkpoints/acestep-v15-turbo", - device="cuda", - ) - - self.assertEqual( - result, - os.path.join("K:/fake_root/checkpoints/acestep-v15-turbo", "Qwen3-Embedding-0.6B"), - ) - self.assertIs(host.text_encoder, fake_encoder) - self.assertIs(host.text_tokenizer, fake_tokenizer) - self.assertEqual(fake_encoder.to_calls, ["cpu", torch.float32]) - self.assertTrue(fake_encoder.eval_called) - if __name__ == "__main__": unittest.main()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/core/generation/handler/init_service_test.py` around lines 969 - 1071, Remove the leftover merge marker and duplicate test: delete the unresolved marker "<<<<<<< HEAD" and any conflict noise, keep a single definition of test_load_text_encoder_uses_cpu_safe_dtype_when_offloaded (the existing correct version that calls _Host._load_text_encoder_and_tokenizer and asserts fake_encoder.to_calls and eval_called), and remove the second, duplicated test block so there are no duplicate test definitions or merge markers left in the file.
🧹 Nitpick comments (1)
acestep/text_tasks/external_lm_captioning.py (1)
32-40: Avoid silent metadata parse failures; log invalid values for traceability.Lines 34-35 and 39-40 swallow invalid inputs without observability, which makes bad user metadata hard to diagnose.
Suggested low-noise logging for conversion failures
from __future__ import annotations from typing import Any +from loguru import logger @@ if user_metadata.get("bpm") not in (None, ""): try: plan.bpm = int(user_metadata["bpm"]) except (TypeError, ValueError): - pass + logger.debug("Ignoring invalid bpm override: {}", user_metadata.get("bpm")) if user_metadata.get("duration") not in (None, ""): try: plan.duration = float(user_metadata["duration"]) except (TypeError, ValueError): - pass + logger.debug("Ignoring invalid duration override: {}", user_metadata.get("duration"))As per coding guidelines: "Error handling: Log errors with
loguru.logger(notprint())."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/text_tasks/external_lm_captioning.py` around lines 32 - 40, The try/except blocks around assigning plan.bpm and plan.duration silently swallow invalid user_metadata values; update the exception handlers in the code that sets plan.bpm and plan.duration to log conversion failures using loguru.logger (not print), including the offending value (user_metadata["bpm"] or user_metadata["duration"]) and the caught exception, so invalid inputs are traceable while still falling back to the existing pass/ignore behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/text_tasks/external_lm_captioning_test.py`:
- Around line 48-54: The test fixture constructs a SimpleNamespace named plan
with non-canonical field names (key_scale, time_signature) that differ from the
runtime metadata keys (keyscale, timesignature), masking real override bugs;
update the SimpleNamespace instantiations in this file (the plan declarations
around the shown diff and the second occurrence at lines ~69-71) to use the
canonical field names keyscale and timesignature so the test exercises the real
runtime keys used by the code.
In `@acestep/text_tasks/external_lm_captioning.py`:
- Around line 41-46: The metadata override is writing attributes with snake_case
variants (plan.key_scale, plan.time_signature) that many plan classes expect as
keyscale/timesignature, and may miss language on plan types that use a different
field; update the override logic to set both common variants for each field:
when user_metadata contains "keyscale" set both plan.keyscale and plan.key_scale
to the stripped string; when it contains "timesignature" set both
plan.timesignature and plan.time_signature; when it contains "language" set both
plan.vocal_language and plan.language (or any other language field used by the
plan type) so user constraints are applied regardless of the plan object's
attribute naming.
---
Outside diff comments:
In `@acestep/core/generation/handler/init_service_test.py`:
- Around line 969-1071: Remove the leftover merge marker and duplicate test:
delete the unresolved marker "<<<<<<< HEAD" and any conflict noise, keep a
single definition of test_load_text_encoder_uses_cpu_safe_dtype_when_offloaded
(the existing correct version that calls _Host._load_text_encoder_and_tokenizer
and asserts fake_encoder.to_calls and eval_called), and remove the second,
duplicated test block so there are no duplicate test definitions or merge
markers left in the file.
---
Nitpick comments:
In `@acestep/text_tasks/external_lm_captioning.py`:
- Around line 32-40: The try/except blocks around assigning plan.bpm and
plan.duration silently swallow invalid user_metadata values; update the
exception handlers in the code that sets plan.bpm and plan.duration to log
conversion failures using loguru.logger (not print), including the offending
value (user_metadata["bpm"] or user_metadata["duration"]) and the caught
exception, so invalid inputs are traceable while still falling back to the
existing pass/ignore behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0a8b1c74-b5eb-40ea-a2f8-f22500189c25
📒 Files selected for processing (3)
acestep/core/generation/handler/init_service_test.pyacestep/text_tasks/external_lm_captioning.pyacestep/text_tasks/external_lm_captioning_test.py
|
Reviewed the @CodeRabbit findings against the current branch. Applied in
Verified as not applicable to the current code:
Validation rerun: /media/stellaris/Linux AI/Acestep/acestep_git/ACE-Step-1.5/ACE-Step-1.5/.venv/bin/python -m unittest \
acestep.text_tasks.external_lm_captioning_testRan 4 tests, all passing. |
|
Tip For best results, initiate chat on the files or code changes.
All 4 tests passing ✅. ✏️ Learnings added
🧠 Learnings usedIf you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
e0297dc to
b11fd1a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/text_tasks/external_lm_captioning.py`:
- Line 65: The code sets source = (caption or "music piece").strip().rstrip(".")
which can become empty when caption is whitespace; after computing source, add a
check like "if not source: source = 'music piece'" to ensure whitespace-only
captions fallback to "music piece" (refer to the variable source and the input
caption in external_lm_captioning.py).
- Around line 102-105: The loop that appends metadata to intent_parts (iterating
keys and using user_metadata.get(key)) doesn't normalize string values before
checking for "unknown", so variants like "Unknown" or " unknown " slip through;
update the logic in the loop that handles value (the value variable) to, when
it's a string, call strip() and lower() for normalization and then check against
(None, "", "unknown") before appending to intent_parts so non-string values are
preserved and only normalized empty/unknown strings are filtered out.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4d264909-1347-4785-a2ed-2e14d11f41ac
📒 Files selected for processing (2)
acestep/text_tasks/external_lm_captioning.pyacestep/text_tasks/external_lm_captioning_test.py
✅ Files skipped from review due to trivial changes (1)
- acestep/text_tasks/external_lm_captioning_test.py
|
Addressed the two new Included updates:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
acestep/text_tasks/external_lm_captioning.py (1)
77-84: Consider filtering "unknown" metadata values for consistency withbuild_format_request_intent.Lines 104-113 filter out
"unknown"(case-insensitive) before appending metadata to the intent string. If the same filtering were applied here, it would prevent awkward output like "The groove stays anchored around unknown BPM." in edge cases whereuser_metadata["bpm"]is literally"unknown".💡 Optional: add "unknown" filtering
- if bpm not in (None, ""): + if bpm not in (None, "") and str(bpm).strip().lower() != "unknown": parts.append(f"The groove stays anchored around {bpm} BPM.") - if timesignature: + if timesignature and str(timesignature).strip().lower() != "unknown": parts.append(f"The arrangement holds a {timesignature} pulse throughout.") - if keyscale: + if keyscale and str(keyscale).strip().lower() != "unknown": parts.append(f"The harmony centers on {keyscale}.") - if duration not in (None, ""): + if duration not in (None, "") and str(duration).strip().lower() != "unknown": parts.append(f"The structure is paced for roughly {duration} seconds.")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/text_tasks/external_lm_captioning.py` around lines 77 - 84, The metadata append block in external_lm_captioning.py can emit sentences for literal "unknown" values (e.g., bpm, timesignature, keyscale, duration); mirror the filtering used by build_format_request_intent by treating any value equal to "unknown" (case-insensitive) as absent before appending to parts. Update the conditional checks around bpm, timesignature, keyscale, and duration to skip when the value is None, empty string, or when str(value).strip().lower() == "unknown", so the parts list only receives meaningful metadata.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@acestep/text_tasks/external_lm_captioning.py`:
- Around line 77-84: The metadata append block in external_lm_captioning.py can
emit sentences for literal "unknown" values (e.g., bpm, timesignature, keyscale,
duration); mirror the filtering used by build_format_request_intent by treating
any value equal to "unknown" (case-insensitive) as absent before appending to
parts. Update the conditional checks around bpm, timesignature, keyscale, and
duration to skip when the value is None, empty string, or when
str(value).strip().lower() == "unknown", so the parts list only receives
meaningful metadata.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a0e77d79-f9f9-45de-8fde-f8669adec090
📒 Files selected for processing (2)
acestep/text_tasks/external_lm_captioning.pyacestep/text_tasks/external_lm_captioning_test.py
✅ Files skipped from review due to trivial changes (1)
- acestep/text_tasks/external_lm_captioning_test.py
77d7b34 to
8b7f62b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
acestep/text_tasks/external_lm_captioning_test.py (1)
45-75: Add one negative-path test for invalid numeric overrides.Consider adding a case where
bpm/durationare invalid strings (e.g.,"abc"), asserting no exception and unchanged numeric fields. That would lock in the parse-failure branch behavior.Based on learnings: "PR Readiness: New/updated tests cover changed behavior and edge cases. No unrelated refactor/formatting churn."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/text_tasks/external_lm_captioning_test.py` around lines 45 - 75, Add a negative-path test in acestep/text_tasks/external_lm_captioning_test.py that exercises apply_user_metadata_overrides with invalid numeric overrides (e.g., user_metadata contains "bpm": "abc" and "duration": "xyz"), asserting that no exception is raised and that numeric fields on the returned plan (bpm and duration) remain unchanged from the original plan; reference the existing test_apply_user_metadata_overrides_preserves_constrained_values to mirror structure and assert other fields (keyscale/key_scale, timesignature/time_signature, language/vocal_language) still apply when valid.acestep/text_tasks/external_lm_captioning.py (1)
68-84: Normalize fallback metadata values before rendering clauses.
build_fallback_caption()currently inserts raw metadata forbpm/duration/keyscale/timesignature. Whitespace-padded or sentinel values (e.g." unknown "," ") can leak into user-facing text. Consider normalizing string inputs here the same way intent-building already does.💡 Proposed refactor
- bpm = user_metadata.get("bpm") - duration = user_metadata.get("duration") - keyscale = user_metadata.get("keyscale") - timesignature = user_metadata.get("timesignature") + bpm = user_metadata.get("bpm") + duration = user_metadata.get("duration") + keyscale = user_metadata.get("keyscale") + timesignature = user_metadata.get("timesignature") + + def _normalize(value: Any) -> Any: + if isinstance(value, str): + normalized = value.strip() + if normalized.lower() in {"", "unknown"}: + return None + return normalized + return value + + bpm = _normalize(bpm) + duration = _normalize(duration) + keyscale = _normalize(keyscale) + timesignature = _normalize(timesignature)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/text_tasks/external_lm_captioning.py` around lines 68 - 84, The build_fallback_caption() block currently uses raw user_metadata values (bpm, duration, keyscale, timesignature) and may render whitespace-only or sentinel strings; normalize and validate each value before appending clauses: trim strings and treat empty/whitespace or known sentinels (e.g. "unknown", "") as missing, and only append parts when the cleaned value is non-empty; update the variables bpm, duration, keyscale, timesignature at the top of the function (or introduce cleaned_bpm/cleaned_duration/etc.) and use those cleaned identifiers when building the parts list and formatting the clause strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@acestep/text_tasks/external_lm_captioning_test.py`:
- Around line 45-75: Add a negative-path test in
acestep/text_tasks/external_lm_captioning_test.py that exercises
apply_user_metadata_overrides with invalid numeric overrides (e.g.,
user_metadata contains "bpm": "abc" and "duration": "xyz"), asserting that no
exception is raised and that numeric fields on the returned plan (bpm and
duration) remain unchanged from the original plan; reference the existing
test_apply_user_metadata_overrides_preserves_constrained_values to mirror
structure and assert other fields (keyscale/key_scale,
timesignature/time_signature, language/vocal_language) still apply when valid.
In `@acestep/text_tasks/external_lm_captioning.py`:
- Around line 68-84: The build_fallback_caption() block currently uses raw
user_metadata values (bpm, duration, keyscale, timesignature) and may render
whitespace-only or sentinel strings; normalize and validate each value before
appending clauses: trim strings and treat empty/whitespace or known sentinels
(e.g. "unknown", "") as missing, and only append parts when the cleaned value is
non-empty; update the variables bpm, duration, keyscale, timesignature at the
top of the function (or introduce cleaned_bpm/cleaned_duration/etc.) and use
those cleaned identifiers when building the parts list and formatting the clause
strings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 86d4e79e-bf51-4bad-b535-a3391fa03afc
📒 Files selected for processing (2)
acestep/text_tasks/external_lm_captioning.pyacestep/text_tasks/external_lm_captioning_test.py
Summary
This PR adds the small helper module used to build external-LM formatting intents, detect low-value echoed captions, and apply local fallback metadata shaping.
Scope
In scope:
Out of scope:
What Changed
external_lm_captioning.pyfor formatting-intent construction and local caption fallback helpersexternal_lm_captioning_test.pyto cover retry heuristics, metadata overrides, fallback captions, and format-intent assemblyTesting
Ran 4 tests, all passing.
Risk / Compatibility
This PR is based directly on
main. It stays inside a single helper module underacestep/text_tasksand does not introduce live provider requests or secret handling.Related Upstream Context
This PR is one small slice of the broader external-LM work that was originally bundled into #808.
Related upstream references:
This slice focuses only on captioning helper behavior and metadata handling. It does not add external provider transport, request execution, or UI wiring.
Summary by CodeRabbit