feat(text_tasks): add external LM captioning helpers by 1larity · Pull Request #882 · ace-step/ACE-Step-1.5

1larity · 2026-03-20T00:52:03Z

Summary

This PR adds the small helper module used to build external-LM formatting intents, detect low-value echoed captions, and apply local fallback metadata shaping.

Scope

In scope:

caption retry heuristics
local metadata override helpers
fallback caption generation
format-intent string construction
focused unit tests for the helper module

Out of scope:

runtime credential access
HTTP request execution
task orchestration
UI wiring
docs

What Changed

add external_lm_captioning.py for formatting-intent construction and local caption fallback helpers
add external_lm_captioning_test.py to cover retry heuristics, metadata overrides, fallback captions, and format-intent assembly

Testing

/media/stellaris/Linux AI/Acestep/acestep_git/ACE-Step-1.5/ACE-Step-1.5/.venv/bin/python -m unittest \
  acestep.text_tasks.external_lm_captioning_test

Ran 4 tests, all passing.

Risk / Compatibility

This PR is based directly on main. It stays inside a single helper module under acestep/text_tasks and does not introduce live provider requests or secret handling.

Related Upstream Context

This PR is one small slice of the broader external-LM work that was originally bundled into #808.

Related upstream references:

#717 Running LM models via ollama
#808 Legacy GPU stability, external AI generation, and better progress UX
#677 "Enhance Caption" is worse than useless!
#681 "AI Auto labelling" hallucinates lyrics
#472 lora auto-label metadata is horrible

This slice focuses only on captioning helper behavior and metadata handling. It does not add external provider transport, request execution, or UI wiring.

Summary by CodeRabbit

New Features
- Caption-processing utilities: caption normalization, retry heuristics, user-metadata overrides (with value coercion and alias synchronization), fallback caption generation, and assembling formatted requests that include caption, lyrics, and relevant metadata.
Tests
- Added tests validating retry behavior, metadata override/alias propagation, fallback caption content, and formatted-request output (including omission of empty/unknown metadata).

coderabbitai · 2026-03-20T00:52:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new helper module for external LM captioning that provides caption normalization, retry detection, user metadata override/coercion and aliasing, fallback caption construction, and formatted request-intent assembly; includes unit tests exercising these helpers.

Changes

Cohort / File(s)	Summary
External LM Captioning Module `acestep/text_tasks/external_lm_captioning.py`	New module adding: `normalized_caption(text)`, `caption_needs_retry(original_caption, generated_caption)`, `apply_user_metadata_overrides(plan, user_metadata)` (with type coercion and alias propagation), `build_fallback_caption(caption, user_metadata)`, and `build_format_request_intent(caption, lyrics, user_metadata)`; implements caption canonicalization, retry heuristics, metadata sanitization, and request-building utilities.
Unit Tests `acestep/text_tasks/external_lm_captioning_test.py`	New unittest suite validating retry logic, metadata override and alias propagation, fallback caption composition for empty vs. populated inputs, and formatted request intent structure including inclusion/omission rules for metadata values.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested reviewers

ChuxiJ

Poem

🐰 A rabbit taps keys, neat and quick,

trimming captions with a soft, calm flick.
Metadata corners gently aligned,
fallback lines stitched, all neatly defined.
Hooray — tidy captions hop off the stick!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(text_tasks): add external LM captioning helpers' directly and clearly describes the main change: adding new helper functions for external language-model captioning workflows to the text_tasks module.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can approve the review once all CodeRabbit's comments are resolved.

Enable the reviews.request_changes_workflow setting to automatically approve the review once all CodeRabbit's comments are resolved.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

acestep/core/generation/handler/init_service_test.py (1)

969-1071: ⚠️ Potential issue | 🔴 Critical

Unresolved merge conflict and duplicate test must be removed.

The file contains an unresolved merge conflict marker at line 969 (<<<<<<< HEAD), which is invalid Python syntax and will cause the test suite to fail immediately. Additionally, test_load_text_encoder_uses_cpu_safe_dtype_when_offloaded is defined twice (lines 989–1029 and lines 1031–1071); in Python the second definition shadows the first, so only one copy would actually run.

Please resolve the merge conflict by:

Removing the <<<<<<< HEAD marker at line 969.
Keeping only one copy of the test method (the original at lines 989–1029 appears correct).
Removing the duplicate block at lines 1031–1071.

Proposed fix

-<<<<<<< HEAD
 def test_get_vae_dtype_returns_float16_on_pre_ampere_cuda(self):
     ...
-
-    def test_load_text_encoder_uses_cpu_safe_dtype_when_offloaded(self):
-        """It casts the text encoder to the CPU-safe dtype during CPU offload."""
-        host = _Host(project_root="K:/fake_root", device="cuda")
-        host.offload_to_cpu = True
-        host.dtype = torch.bfloat16
-
-        class _FakeEncoder:
-            def __init__(self):
-                self.to_calls = []
-                self.eval_called = False
-
-            def to(self, value):
-                self.to_calls.append(value)
-                return self
-
-            def eval(self):
-                self.eval_called = True
-                return self
-
-        fake_encoder = _FakeEncoder()
-        fake_tokenizer = object()
-        fake_transformers = types.SimpleNamespace(
-            AutoModel=types.SimpleNamespace(from_pretrained=Mock(return_value=fake_encoder)),
-            AutoTokenizer=types.SimpleNamespace(from_pretrained=Mock(return_value=fake_tokenizer)),
-        )
-
-        with patch("os.path.exists", return_value=True):
-            with patch.dict("sys.modules", {"transformers": fake_transformers}):
-                result = host._load_text_encoder_and_tokenizer(
-                    checkpoint_dir="K:/fake_root/checkpoints/acestep-v15-turbo",
-                    device="cuda",
-                )
-
-        self.assertEqual(
-            result,
-            os.path.join("K:/fake_root/checkpoints/acestep-v15-turbo", "Qwen3-Embedding-0.6B"),
-        )
-        self.assertIs(host.text_encoder, fake_encoder)
-        self.assertIs(host.text_tokenizer, fake_tokenizer)
-        self.assertEqual(fake_encoder.to_calls, ["cpu", torch.float32])
-        self.assertTrue(fake_encoder.eval_called)
-
 if __name__ == "__main__":
     unittest.main()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/handler/init_service_test.py` around lines 969 -
1071, Remove the leftover merge marker and duplicate test: delete the unresolved
marker "<<<<<<< HEAD" and any conflict noise, keep a single definition of
test_load_text_encoder_uses_cpu_safe_dtype_when_offloaded (the existing correct
version that calls _Host._load_text_encoder_and_tokenizer and asserts
fake_encoder.to_calls and eval_called), and remove the second, duplicated test
block so there are no duplicate test definitions or merge markers left in the
file.

🧹 Nitpick comments (1)

acestep/text_tasks/external_lm_captioning.py (1)

32-40: Avoid silent metadata parse failures; log invalid values for traceability.

Lines 34-35 and 39-40 swallow invalid inputs without observability, which makes bad user metadata hard to diagnose.

Suggested low-noise logging for conversion failures

 from __future__ import annotations
 
 from typing import Any
+from loguru import logger
@@
     if user_metadata.get("bpm") not in (None, ""):
         try:
             plan.bpm = int(user_metadata["bpm"])
         except (TypeError, ValueError):
-            pass
+            logger.debug("Ignoring invalid bpm override: {}", user_metadata.get("bpm"))
     if user_metadata.get("duration") not in (None, ""):
         try:
             plan.duration = float(user_metadata["duration"])
         except (TypeError, ValueError):
-            pass
+            logger.debug("Ignoring invalid duration override: {}", user_metadata.get("duration"))

As per coding guidelines: "Error handling: Log errors with loguru.logger (not print())."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/text_tasks/external_lm_captioning.py` around lines 32 - 40, The
try/except blocks around assigning plan.bpm and plan.duration silently swallow
invalid user_metadata values; update the exception handlers in the code that
sets plan.bpm and plan.duration to log conversion failures using loguru.logger
(not print), including the offending value (user_metadata["bpm"] or
user_metadata["duration"]) and the caught exception, so invalid inputs are
traceable while still falling back to the existing pass/ignore behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/text_tasks/external_lm_captioning_test.py`:
- Around line 48-54: The test fixture constructs a SimpleNamespace named plan
with non-canonical field names (key_scale, time_signature) that differ from the
runtime metadata keys (keyscale, timesignature), masking real override bugs;
update the SimpleNamespace instantiations in this file (the plan declarations
around the shown diff and the second occurrence at lines ~69-71) to use the
canonical field names keyscale and timesignature so the test exercises the real
runtime keys used by the code.

In `@acestep/text_tasks/external_lm_captioning.py`:
- Around line 41-46: The metadata override is writing attributes with snake_case
variants (plan.key_scale, plan.time_signature) that many plan classes expect as
keyscale/timesignature, and may miss language on plan types that use a different
field; update the override logic to set both common variants for each field:
when user_metadata contains "keyscale" set both plan.keyscale and plan.key_scale
to the stripped string; when it contains "timesignature" set both
plan.timesignature and plan.time_signature; when it contains "language" set both
plan.vocal_language and plan.language (or any other language field used by the
plan type) so user constraints are applied regardless of the plan object's
attribute naming.

---

Outside diff comments:
In `@acestep/core/generation/handler/init_service_test.py`:
- Around line 969-1071: Remove the leftover merge marker and duplicate test:
delete the unresolved marker "<<<<<<< HEAD" and any conflict noise, keep a
single definition of test_load_text_encoder_uses_cpu_safe_dtype_when_offloaded
(the existing correct version that calls _Host._load_text_encoder_and_tokenizer
and asserts fake_encoder.to_calls and eval_called), and remove the second,
duplicated test block so there are no duplicate test definitions or merge
markers left in the file.

---

Nitpick comments:
In `@acestep/text_tasks/external_lm_captioning.py`:
- Around line 32-40: The try/except blocks around assigning plan.bpm and
plan.duration silently swallow invalid user_metadata values; update the
exception handlers in the code that sets plan.bpm and plan.duration to log
conversion failures using loguru.logger (not print), including the offending
value (user_metadata["bpm"] or user_metadata["duration"]) and the caught
exception, so invalid inputs are traceable while still falling back to the
existing pass/ignore behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a8b1c74-b5eb-40ea-a2f8-f22500189c25

📥 Commits

Reviewing files that changed from the base of the PR and between 89d5379 and 48a3a0c.

📒 Files selected for processing (3)

acestep/core/generation/handler/init_service_test.py
acestep/text_tasks/external_lm_captioning.py
acestep/text_tasks/external_lm_captioning_test.py

acestep/text_tasks/external_lm_captioning_test.py

acestep/text_tasks/external_lm_captioning.py

1larity · 2026-03-20T01:00:47Z

Reviewed the @CodeRabbit findings against the current branch.

Applied in e0297dc:

fixed apply_user_metadata_overrides() so it writes both field-name variants for key scale, time signature, and language
updated the captioning test fixture to use the canonical runtime keys instead of the alias-only names
added loguru.logger.debug() logging for ignored invalid bpm / duration metadata conversions

Verified as not applicable to the current code:

the init_service_test.py merge-marker report does not match the current tree; the file is 758 lines long, contains no <<<<<<< markers, and has a single test_load_text_encoder_uses_cpu_safe_dtype_when_offloaded definition at line 715

Validation rerun:

/media/stellaris/Linux AI/Acestep/acestep_git/ACE-Step-1.5/ACE-Step-1.5/.venv/bin/python -m unittest \
  acestep.text_tasks.external_lm_captioning_test

Ran 4 tests, all passing.

coderabbitai · 2026-03-20T01:04:24Z

Tip

For best results, initiate chat on the files or code changes.

@1larity: Thank you for the thorough follow-up. Everything looks good:

apply_user_metadata_overrides() dual-field writes — confirmed addressed; both canonical and alias field names (e.g., keyscale/key_scale, timesignature/time_signature, language/vocal_language) are now written together.
Test fixture canonical keys — confirmed updated to match the runtime plan schema.
loguru.logger.debug() on conversion errors — confirmed added for the bpm/duration parse failures.
init_service_test.py merge-marker report — acknowledged as a diff-artifact false positive; the working tree is clean (758 lines, no conflict markers, single definition at line 715).

All 4 tests passing ✅.

✏️ Learnings added

Learnt from: 1larity
URL: https://github.com/ace-step/ACE-Step-1.5/pull/882

Timestamp: 2026-03-20T01:04:19.198Z
Learning: In `acestep/text_tasks/external_lm_captioning.py`, `apply_user_metadata_overrides()` intentionally writes both the canonical field name and its runtime alias in a single pass (e.g., `plan.keyscale` + `plan.key_scale`, `plan.timesignature` + `plan.time_signature`, `plan.language` + `plan.vocal_language`). Invalid `bpm`/`duration` conversion errors are swallowed and logged via `loguru.logger.debug()`. Do not flag either of these patterns as bugs in future reviews.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 808
File: acestep/inference.py:25-26
Timestamp: 2026-03-11T10:56:30.827Z
Learning: In ace-step/ACE-Step-1.5, acestep/inference.py’s lyric-density guard now uses a Unicode-aware tokenizer (_extract_lyric_word_tokens with str.isalnum()) and the counting helpers (_count_lyric_words/_count_lyric_syllables) ignore tag lines. A non‑Latin regression test exists, so do not re‑flag ASCII-only tokenization for this path in future reviews.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 774
File: acestep/ui/gradio/interfaces/training_dataset_tab_label_preview.py:70-95
Timestamp: 2026-03-05T23:12:28.095Z
Learning: In ace-step/ACE-Step-1.5, the hardcoded English placeholder strings in `acestep/ui/gradio/interfaces/training_dataset_tab_label_preview.py` (e.g., `"Music description..."` for `edit_caption`, `"pop, electronic, dance..."` for `edit_genre`, `Dropdown` choices `["Use Global Ratio", "Caption", "Genre"]` and its default value, and the lyrics placeholder `"[Verse 1]\nLyrics here...\n\n[Chorus]\n..."`) are pre-existing localization gaps carried forward from before the decomposition refactor. They are deferred to a follow-up PR and should not be flagged as new issues introduced by decomposition PRs.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 774
File: acestep/ui/gradio/interfaces/training_dataset_tab_scan_settings.py:59-95
Timestamp: 2026-03-05T23:12:37.350Z
Learning: In `acestep/ui/gradio/interfaces/training_dataset_tab_scan_settings.py`, the `gr.Dataframe` headers (`["#", "Filename", "Duration", "Lyrics", "Labeled", "BPM", "Key", "Caption"]`) and the `format_lyrics` / `transcribe_lyrics` checkbox label/info strings are intentionally hardcoded in English as a pre-existing issue, not introduced by this PR. Do not flag these as new i18n violations during code review.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 837
File: acestep/core/generation/handler/init_service_loader_components.py:44-47
Timestamp: 2026-03-14T10:35:13.950Z
Learning: In `acestep/core/generation/handler/init_service_loader.py` (origin/main, pre-extraction), `_load_text_encoder_and_tokenizer()` already used `self.text_encoder.to("cpu").to(self.dtype)` in the CPU-offload branch — using `self.dtype` (which can be bfloat16) instead of `torch.float32`. This is a pre-existing behavior, not introduced by PR `#837`. PR `#837` moves it unchanged into `init_service_loader_components.py` and intends to fix it there. Do not flag the original `init_service_loader.py` pattern as a new regression in future reviews.

Learnt from: CR
Repo: ace-step/ACE-Step-1.5 PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-28T14:14:55.765Z
Learning: AI-Agent Workflow: Add/update focused tests. Self-review only changed hunks for regressions and scope creep. Summarize risk, validation, and non-target impact in PR notes.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 811
File: docs/en/GRADIO_GUIDE.md:0-0
Timestamp: 2026-03-11T12:03:00.308Z
Learning: In ace-step/ACE-Step-1.5 GRADIO_GUIDE.md, the "How It Integrates With 5Hz LM" section explicitly notes that when external mode is active, some LM-assisted workflows (notably Create Sample) still check `llm_handler.llm_initialized` in both the Gradio handler path and `create_sample()`, and therefore may still require local 5Hz LM initialization until the remaining routing is completed. The doc describes the external-mode paths as "intended behavior" rather than currently fully functional. Do not flag this remaining initialization requirement as a new bug in documentation reviews.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 774
File: acestep/ui/gradio/interfaces/training.py:40-47
Timestamp: 2026-03-05T23:13:27.540Z
Learning: In `acestep/ui/gradio/interfaces/training.py`, the hardcoded English strings in the top-level `gr.HTML(...)` header block (i.e., `<h2>🎵 LoRA Training for ACE-Step</h2>` and its `<p>` subtitle) are a pre-existing localization gap carried forward unchanged from before the decomposition refactor. They are out of scope for decomposition PRs and should not be flagged as new i18n violations during code review.

Learnt from: 1larity
Repo: ace-step/ACE-Step-1.5 PR: 774
File: acestep/ui/gradio/interfaces/training_lora_tab_run_export.py:91-95
Timestamp: 2026-03-05T23:13:00.508Z
Learning: In `acestep/ui/gradio/interfaces/training_lora_tab_run_export.py`, the hardcoded English strings for `resume_checkpoint_dir` (`label="Resume Checkpoint"` and `info="Directory of a saved LoRA checkpoint to resume from"`) are a pre-existing localization gap carried forward and are out of scope for decomposition PRs. Do not flag these as new i18n violations during code review.

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/text_tasks/external_lm_captioning.py`:
- Line 65: The code sets source = (caption or "music piece").strip().rstrip(".")
which can become empty when caption is whitespace; after computing source, add a
check like "if not source: source = 'music piece'" to ensure whitespace-only
captions fallback to "music piece" (refer to the variable source and the input
caption in external_lm_captioning.py).
- Around line 102-105: The loop that appends metadata to intent_parts (iterating
keys and using user_metadata.get(key)) doesn't normalize string values before
checking for "unknown", so variants like "Unknown" or " unknown " slip through;
update the logic in the loop that handles value (the value variable) to, when
it's a string, call strip() and lower() for normalization and then check against
(None, "", "unknown") before appending to intent_parts so non-string values are
preserved and only normalized empty/unknown strings are filtered out.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d264909-1347-4785-a2ed-2e14d11f41ac

📥 Commits

Reviewing files that changed from the base of the PR and between e0297dc and b11fd1a.

📒 Files selected for processing (2)

acestep/text_tasks/external_lm_captioning.py
acestep/text_tasks/external_lm_captioning_test.py

✅ Files skipped from review due to trivial changes (1)

acestep/text_tasks/external_lm_captioning_test.py

acestep/text_tasks/external_lm_captioning.py

1larity · 2026-03-20T11:49:49Z

Addressed the two new #882 review findings in 0e00321.

Included updates:

whitespace-only captions now fall back to "music piece" in build_fallback_caption()
string metadata values are normalized before the unknown filter in build_format_request_intent()
added regression coverage for both behaviors in external_lm_captioning_test.py
Validation:
python -m unittest acestep.text_tasks.external_lm_captioning_test
6 tests passed

coderabbitai

🧹 Nitpick comments (1)

acestep/text_tasks/external_lm_captioning.py (1)

77-84: Consider filtering "unknown" metadata values for consistency with build_format_request_intent.

Lines 104-113 filter out "unknown" (case-insensitive) before appending metadata to the intent string. If the same filtering were applied here, it would prevent awkward output like "The groove stays anchored around unknown BPM." in edge cases where user_metadata["bpm"] is literally "unknown".

💡 Optional: add "unknown" filtering

-    if bpm not in (None, ""):
+    if bpm not in (None, "") and str(bpm).strip().lower() != "unknown":
         parts.append(f"The groove stays anchored around {bpm} BPM.")
-    if timesignature:
+    if timesignature and str(timesignature).strip().lower() != "unknown":
         parts.append(f"The arrangement holds a {timesignature} pulse throughout.")
-    if keyscale:
+    if keyscale and str(keyscale).strip().lower() != "unknown":
         parts.append(f"The harmony centers on {keyscale}.")
-    if duration not in (None, ""):
+    if duration not in (None, "") and str(duration).strip().lower() != "unknown":
         parts.append(f"The structure is paced for roughly {duration} seconds.")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/text_tasks/external_lm_captioning.py` around lines 77 - 84, The
metadata append block in external_lm_captioning.py can emit sentences for
literal "unknown" values (e.g., bpm, timesignature, keyscale, duration); mirror
the filtering used by build_format_request_intent by treating any value equal to
"unknown" (case-insensitive) as absent before appending to parts. Update the
conditional checks around bpm, timesignature, keyscale, and duration to skip
when the value is None, empty string, or when str(value).strip().lower() ==
"unknown", so the parts list only receives meaningful metadata.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@acestep/text_tasks/external_lm_captioning.py`:
- Around line 77-84: The metadata append block in external_lm_captioning.py can
emit sentences for literal "unknown" values (e.g., bpm, timesignature, keyscale,
duration); mirror the filtering used by build_format_request_intent by treating
any value equal to "unknown" (case-insensitive) as absent before appending to
parts. Update the conditional checks around bpm, timesignature, keyscale, and
duration to skip when the value is None, empty string, or when
str(value).strip().lower() == "unknown", so the parts list only receives
meaningful metadata.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a0e77d79-f9f9-45de-8fde-f8669adec090

📥 Commits

Reviewing files that changed from the base of the PR and between b11fd1a and 0e00321.

📒 Files selected for processing (2)

acestep/text_tasks/external_lm_captioning.py
acestep/text_tasks/external_lm_captioning_test.py

✅ Files skipped from review due to trivial changes (1)

acestep/text_tasks/external_lm_captioning_test.py

coderabbitai

🧹 Nitpick comments (2)

acestep/text_tasks/external_lm_captioning_test.py (1)

45-75: Add one negative-path test for invalid numeric overrides.

Consider adding a case where bpm/duration are invalid strings (e.g., "abc"), asserting no exception and unchanged numeric fields. That would lock in the parse-failure branch behavior.

Based on learnings: "PR Readiness: New/updated tests cover changed behavior and edge cases. No unrelated refactor/formatting churn."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/text_tasks/external_lm_captioning_test.py` around lines 45 - 75, Add
a negative-path test in acestep/text_tasks/external_lm_captioning_test.py that
exercises apply_user_metadata_overrides with invalid numeric overrides (e.g.,
user_metadata contains "bpm": "abc" and "duration": "xyz"), asserting that no
exception is raised and that numeric fields on the returned plan (bpm and
duration) remain unchanged from the original plan; reference the existing
test_apply_user_metadata_overrides_preserves_constrained_values to mirror
structure and assert other fields (keyscale/key_scale,
timesignature/time_signature, language/vocal_language) still apply when valid.

acestep/text_tasks/external_lm_captioning.py (1)

68-84: Normalize fallback metadata values before rendering clauses.

build_fallback_caption() currently inserts raw metadata for bpm/duration/keyscale/timesignature. Whitespace-padded or sentinel values (e.g. " unknown ", " ") can leak into user-facing text. Consider normalizing string inputs here the same way intent-building already does.

💡 Proposed refactor

-    bpm = user_metadata.get("bpm")
-    duration = user_metadata.get("duration")
-    keyscale = user_metadata.get("keyscale")
-    timesignature = user_metadata.get("timesignature")
+    bpm = user_metadata.get("bpm")
+    duration = user_metadata.get("duration")
+    keyscale = user_metadata.get("keyscale")
+    timesignature = user_metadata.get("timesignature")
+
+    def _normalize(value: Any) -> Any:
+        if isinstance(value, str):
+            normalized = value.strip()
+            if normalized.lower() in {"", "unknown"}:
+                return None
+            return normalized
+        return value
+
+    bpm = _normalize(bpm)
+    duration = _normalize(duration)
+    keyscale = _normalize(keyscale)
+    timesignature = _normalize(timesignature)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@acestep/text_tasks/external_lm_captioning.py` around lines 68 - 84, The
build_fallback_caption() block currently uses raw user_metadata values (bpm,
duration, keyscale, timesignature) and may render whitespace-only or sentinel
strings; normalize and validate each value before appending clauses: trim
strings and treat empty/whitespace or known sentinels (e.g. "unknown", "") as
missing, and only append parts when the cleaned value is non-empty; update the
variables bpm, duration, keyscale, timesignature at the top of the function (or
introduce cleaned_bpm/cleaned_duration/etc.) and use those cleaned identifiers
when building the parts list and formatting the clause strings.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@acestep/text_tasks/external_lm_captioning_test.py`:
- Around line 45-75: Add a negative-path test in
acestep/text_tasks/external_lm_captioning_test.py that exercises
apply_user_metadata_overrides with invalid numeric overrides (e.g.,
user_metadata contains "bpm": "abc" and "duration": "xyz"), asserting that no
exception is raised and that numeric fields on the returned plan (bpm and
duration) remain unchanged from the original plan; reference the existing
test_apply_user_metadata_overrides_preserves_constrained_values to mirror
structure and assert other fields (keyscale/key_scale,
timesignature/time_signature, language/vocal_language) still apply when valid.

In `@acestep/text_tasks/external_lm_captioning.py`:
- Around line 68-84: The build_fallback_caption() block currently uses raw
user_metadata values (bpm, duration, keyscale, timesignature) and may render
whitespace-only or sentinel strings; normalize and validate each value before
appending clauses: trim strings and treat empty/whitespace or known sentinels
(e.g. "unknown", "") as missing, and only append parts when the cleaned value is
non-empty; update the variables bpm, duration, keyscale, timesignature at the
top of the function (or introduce cleaned_bpm/cleaned_duration/etc.) and use
those cleaned identifiers when building the parts list and formatting the clause
strings.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 86d4e79e-bf51-4bad-b535-a3391fa03afc

📥 Commits

Reviewing files that changed from the base of the PR and between 0e00321 and 8b7f62b.

📒 Files selected for processing (2)

acestep/text_tasks/external_lm_captioning.py
acestep/text_tasks/external_lm_captioning_test.py

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

acestep/text_tasks/external_lm_captioning_test.py Show resolved Hide resolved

acestep/text_tasks/external_lm_captioning.py Outdated Show resolved Hide resolved

ChuxiJ mentioned this pull request Mar 20, 2026

feat(text_tasks): add external AI helper modules #884

Open

1larity force-pushed the split/external-lm-captioning-main branch from e0297dc to b11fd1a Compare March 20, 2026 11:42

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

acestep/text_tasks/external_lm_captioning.py Show resolved Hide resolved

acestep/text_tasks/external_lm_captioning.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

1larity mentioned this pull request Mar 20, 2026

Feature Spotlight: External AI Integration, Legacy GPU Support & Better Generation UX #914

Open

1larity added 3 commits March 20, 2026 19:08

feat(text_tasks): add external LM captioning helpers

9abeb0f

fix(text_tasks): harden external LM captioning overrides

d30560e

fix(text_tasks): harden external LM caption fallback

8b7f62b

1larity force-pushed the split/external-lm-captioning-main branch from 77d7b34 to 8b7f62b Compare March 20, 2026 19:09

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

Merge branch 'main' into split/external-lm-captioning-main

14baf63

Conversation

1larity commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

What Changed

Testing

Risk / Compatibility

Related Upstream Context

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

1larity commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

1larity commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1larity commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

1larity commented Mar 20, 2026 •

edited

Loading

1larity commented Mar 20, 2026 •

edited

Loading