Conversation
Replace the implicit torchaudio/ffmpeg MP3 export path with an explicit ffmpeg/libmp3lame encode step that forces 320 kbps CBR at 44.1 kHz. Why: - The previous MP3 export path did not set an explicit bitrate, so the final output depended on backend defaults and could end up around 128 kbps. - 128 kbps MP3 is a major audible bottleneck for music generation quality. - 44.1 kHz is the native and more appropriate distribution target for music. It fully covers the audible band and avoids spending bitrate budget on ultrasonic bandwidth that is not perceptually useful in MP3 delivery. - 48 kHz is mainly beneficial for video, broadcast, and production pipelines, but at a fixed lossy bitrate it does not provide a meaningful quality advantage for music distribution and can be slightly less efficient than 44.1 kHz. - For MP3 specifically, raising bitrate from 128 kbps to 320 kbps yields a much larger real-world quality improvement than keeping a 48 kHz sample rate. What changed: - Resample MP3 output to 44.1 kHz before encoding. - Export MP3 through an explicit ffmpeg subprocess using libmp3lame. - Force 320 kbps CBR instead of relying on backend defaults. - Keep the existing save path for other formats unchanged.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces a minimal audio utility with a full AudioSaver implementation (multi-format saving including MP3 with ffmpeg, WAV32 fallback), adds helpers (apply_fade, normalize_audio, hashing/UUID), batch/convert utilities, MP3-focused tests, and forwards mp3 export options through generation flow. Changes
Sequence Diagram(s)sequenceDiagram
participant Orchestrator as "generate_music (inference)"
participant DiT as "dit_handler.generate_music"
participant AudioSaver as "AudioSaver.save_audio"
participant Backend as "torchaudio / soundfile"
participant FFmpeg as "ffmpeg subprocess"
participant FS as "FileSystem"
Orchestrator->>Orchestrator: assemble generation params (incl. mp3_bitrate/mp3_sample_rate)
Orchestrator->>DiT: call generate_music(filtered_kwargs)
DiT-->>Orchestrator: returns audio tensor
Orchestrator->>AudioSaver: save_audio(audio_tensor, format, mp3_* settings)
AudioSaver->>Backend: prepare/write intermediate WAV or direct write
alt format == "mp3"
AudioSaver->>FFmpeg: run ffmpeg (libmp3lame, bitrate, sample_rate)
FFmpeg->>FS: write .mp3
else
Backend->>FS: write target audio file
end
FS-->>AudioSaver: saved path
AudioSaver-->>Orchestrator: saved path / status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can suggest fixes for GitHub Check annotations.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
acestep/audio_utils.py (1)
189-201: Make the MP3 override visible in the public API docs.
save_audiostill reads as if the caller-controlledsample_rateis always honored, but the MP3 branch now always emits 44.1 kHz/320 kbps and can raiseRuntimeErrorwhenffmpegis missing or the encode fails. Please capture that in the method and wrapper docstrings so callers do not have to learn it from implementation details.As per coding guidelines, "Docstrings are mandatory for all new/modified modules, classes, and functions. Docstrings must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant."
Also applies to: 555-567
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/audio_utils.py` around lines 189 - 201, Update the save_audio docstring (and the corresponding public wrapper docstrings noted around the later section) to clearly state that when format == 'mp3' the function overrides the requested sample_rate and forces encoding at 44.1 kHz and 320 kbps, and that this branch may raise RuntimeError if ffmpeg is not available or the MP3 encode fails; mention the inputs/outputs as already present and add the raised exceptions line for clarity. Reference the save_audio function name in the docs so callers see the MP3 behavior and error conditions in the public API documentation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/audio_utils.py`:
- Around line 1-571: This file is too large and mixes DSP helpers, I/O,
batch/export, and hashing/UUID utilities; split it into focused modules (e.g.,
audio_dsp.py for apply_fade/normalize_audio, audio_io.py for
AudioSaver/_save_mp3/save_audio/save_batch/convert_audio, and audio_hash.py for
get_lora_weights_hash/get_audio_file_hash/generate_uuid_from_params/generate_uuid_from_audio_data)
and keep the original top-level module as a thin facade that re-exports the key
symbols (AudioSaver, save_audio, apply_fade, normalize_audio,
get_lora_weights_hash, generate_uuid_from_params, generate_uuid_from_audio_data)
so existing imports remain stable; update any internal imports to reference the
new modules and ensure the global _default_saver instance is preserved in the
facade and that AudioSaver._save_mp3 logic is moved alongside AudioSaver in
audio_io.py.
- Around line 245-313: The outer except currently catches errors from
_save_mp3() and other ffmpeg-backed saves and then attempts a soundfile
fallback, but ends up masking the original ffmpeg RuntimeError; change the logic
so that if the failing format is an ffmpeg-backed format (e.g., "mp3", "opus",
"aac" or when _save_mp3() is used) you immediately re-raise the original
exception instead of trying the soundfile fallback, whereas for PCM containers
("wav", "flac", "wav32") keep the current soundfile fallback path; locate
references to _save_mp3, the format variable checks, and the outer except block
to implement conditional re-raise for ffmpeg formats and only perform the
fallback for PCM formats.
---
Nitpick comments:
In `@acestep/audio_utils.py`:
- Around line 189-201: Update the save_audio docstring (and the corresponding
public wrapper docstrings noted around the later section) to clearly state that
when format == 'mp3' the function overrides the requested sample_rate and forces
encoding at 44.1 kHz and 320 kbps, and that this branch may raise RuntimeError
if ffmpeg is not available or the MP3 encode fails; mention the inputs/outputs
as already present and add the raised exceptions line for clarity. Reference the
save_audio function name in the docs so callers see the MP3 behavior and error
conditions in the public API documentation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6016d9b1-7de7-4874-b18c-58401b2921db
📒 Files selected for processing (2)
acestep/audio_utils.pyacestep/audio_utils_test.py
…ormats This follow-up fix corrects the MP3 export error handling introduced in the previous patch. Changes: - preserve the original ffmpeg error for MP3 export failures - prevent the generic `soundfile` fallback from overriding explicit MP3 failures - keep fallback behavior only for PCM/container formats where it actually makes sense - add/update tests to verify that MP3 export errors are re-raised correctly Why: The previous MP3 patch correctly forced `44.1 kHz / 320 kbps CBR`, but its outer exception handling could still catch the explicit MP3 failure and retry through a generic fallback path. That behavior risked hiding the real ffmpeg/libmp3lame error and made debugging harder. This follow-up ensures compressed ffmpeg-backed formats fail loudly and correctly.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
acestep/audio_utils.py (1)
1-577: 🛠️ Refactor suggestion | 🟠 MajorThis module is still far above the repository’s Python LOC cap.
This remains a large multi-responsibility module; please split by responsibility and keep
acestep.audio_utilsas a stable facade re-export.As per coding guidelines, "Prefer single-responsibility modules with clear boundaries. Target module size: optimal <= 150 LOC, hard cap 200 LOC." and "Preserve stable facade imports when splitting large files so external callers are not broken." Based on learnings, module-size concerns should be raised when Python files exceed 200 LOC.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/audio_utils.py` around lines 1 - 577, This file is too large and must be split by responsibility while keeping acestep.audio_utils as a stable facade re-exporting the originals; create smaller modules (e.g., acestep.audio_processing with apply_fade and normalize_audio, acestep.io with AudioSaver, save_audio, save_batch, convert_audio, and acestep.hashing with get_audio_file_hash, generate_uuid_from_params, generate_uuid_from_audio_data, get_lora_weights_hash) and move related helper functions/constants (MP3_SAMPLE_RATE, MP3_BITRATE, _save_mp3) accordingly; then update the original module to import and re-export the same symbols (AudioSaver, save_audio, apply_fade, normalize_audio, get_lora_weights_hash, get_audio_file_hash, generate_uuid_from_params, generate_uuid_from_audio_data, save_batch, convert_audio) so external callers remain unchanged. Ensure tests/imports still reference the same names and that any internal relative imports (torchaudio, soundfile usage) are preserved in the new modules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/audio_utils_test.py`:
- Around line 382-395: The test test_save_audio_mp3_preserves_ffmpeg_failure
references missing fixtures (self.temp_dir and self.sample_audio) defined in a
different test setup; add local setup for this test (or ensure the test class
setUp defines them) so the RuntimeError assertion runs instead of raising
AttributeError: initialize self.temp_dir (e.g., create a temp dir via
tempfile.mkdtemp()/TemporaryDirectory) and initialize self.sample_audio to the
same kind of audio tensor/array used elsewhere in tests (matching
self.sample_rate and AudioSaver expectations) before calling saver.save_audio.
In `@acestep/audio_utils.py`:
- Around line 224-233: The current heuristic that transposes when
audio_tensor.dim() == 2 and audio_tensor.shape[0] > audio_tensor.shape[1] can
mis-detect short clips; remove this shape-based transpose and instead trust the
channels_first flag: when channels_first is True treat incoming numpy as
[channels, samples] and convert directly, and when channels_first is False treat
incoming numpy as [samples, channels] and do not auto-transpose; update the
block that creates audio_tensor (references: channels_first, audio_data,
audio_tensor) to eliminate the heuristic transpose so orientation is
deterministic.
- Around line 359-361: Guard against deleting the converted output by checking
that input_path and output_path are not the same file before unlinking: inside
the block that handles remove_input (the branch referencing remove_input,
input_path, output_path and logger.debug("[AudioSaver]...")), compare the
resolved paths (or use input_path.samefile/output_path.exist checks) and only
call input_path.unlink() when they differ and the file exists; if they are the
same, skip deletion and optionally log that deletion was skipped to avoid
removing the freshly converted output.
- Around line 167-174: The ffmpeg subprocess call in the ffmpeg MP3 export block
currently calls subprocess.run(...) without a timeout; modify the call to
include timeout=120 and add an except subprocess.TimeoutExpired handler
(alongside the existing FileNotFoundError and subprocess.CalledProcessError
handlers) that raises a RuntimeError indicating the ffmpeg command timed out and
include any available output (stdout/stderr) from the exception. Update the
subprocess.run invocation and add the TimeoutExpired except branch in the same
try/except that wraps the ffmpeg invocation to ensure hung ffmpeg processes
don't block save requests.
---
Duplicate comments:
In `@acestep/audio_utils.py`:
- Around line 1-577: This file is too large and must be split by responsibility
while keeping acestep.audio_utils as a stable facade re-exporting the originals;
create smaller modules (e.g., acestep.audio_processing with apply_fade and
normalize_audio, acestep.io with AudioSaver, save_audio, save_batch,
convert_audio, and acestep.hashing with get_audio_file_hash,
generate_uuid_from_params, generate_uuid_from_audio_data, get_lora_weights_hash)
and move related helper functions/constants (MP3_SAMPLE_RATE, MP3_BITRATE,
_save_mp3) accordingly; then update the original module to import and re-export
the same symbols (AudioSaver, save_audio, apply_fade, normalize_audio,
get_lora_weights_hash, get_audio_file_hash, generate_uuid_from_params,
generate_uuid_from_audio_data, save_batch, convert_audio) so external callers
remain unchanged. Ensure tests/imports still reference the same names and that
any internal relative imports (torchaudio, soundfile usage) are preserved in the
new modules.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c81479c7-f945-4d5c-935f-a2d187bc02b8
📒 Files selected for processing (2)
acestep/audio_utils.pyacestep/audio_utils_test.py
- add ffmpeg timeout handling - remove fragile channel transpose heuristic - prevent deleting output when input == output - fix broken MP3 failure test fixtures - add regression coverage for the above cases
ChuxiJ
left a comment
There was a problem hiding this comment.
There are still some other issues with 44.1kHz.
We still recommend 48kHz, and suggest making it user-selectable here instead of hardcoding it.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
acestep/audio_utils.py (1)
1-590:⚠️ Potential issue | 🟠 MajorModule is still far above the repo hard cap (already reported).
This file remains a multi-responsibility module at ~590 LOC. Please split by responsibility (or provide a concrete deferred split plan in PR notes) before merge.
As per coding guidelines, "Prefer single-responsibility modules with clear boundaries. Target module size: optimal <= 150 LOC, hard cap 200 LOC."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/audio_utils.py` around lines 1 - 590, The file is too large and mixes responsibilities; split it into smaller modules: move audio processing functions (apply_fade, normalize_audio, generate_uuid_from_audio_data) into a new audio_processing module, move I/O and saving logic (AudioSaver class, _save_mp3, save_audio, save_batch, convert_audio and the global _default_saver) into an audio_io module, and move hashing/utility functions (get_lora_weights_hash, get_audio_file_hash, generate_uuid_from_params) into a utils/hashing module; update imports/exports so callers use the new modules (or re-export them from a lightweight aceitstep.audio_utils shim) and include a short deferred split plan in PR notes if you cannot split in this PR. Ensure each new module has tests and preserved behavior and adjust any relative imports that referenced AudioSaver or the helper functions.
🧹 Nitpick comments (1)
acestep/audio_utils_test.py (1)
234-260: Strengthen the success-path MP3 test by assertingsubprocess.runkwargs.This test validates command arguments, but it does not pin
timeout/check/capture_output. A regression there could slip through.💡 Suggested assertion additions
mock_run.assert_called_once() cmd = mock_run.call_args[0][0] + run_kwargs = mock_run.call_args[1] self.assertIn('ffmpeg', cmd) self.assertIn('libmp3lame', cmd) self.assertIn('44100', cmd) self.assertIn('320k', cmd) + self.assertEqual(run_kwargs.get("timeout"), 120) + self.assertTrue(run_kwargs.get("check")) + self.assertTrue(run_kwargs.get("capture_output")) self.assertTrue(result.endswith('.mp3'))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/audio_utils_test.py` around lines 234 - 260, The MP3 export test should also assert the subprocess.run keyword arguments to prevent regressions; in test_save_audio_mp3_uses_explicit_ffmpeg_export_settings, after obtaining mock_run.call_args inspect call_args[1] and add assertions that 'check' is True, 'capture_output' is True, and that 'timeout' exists and is a positive integer (e.g. assert mock_run.call_args[1].get('timeout') is not None and > 0) so the test pins these important kwargs for the AudioSaver.save_audio ffmpeg invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/audio_utils.py`:
- Around line 429-450: The docstring incorrectly states MD5 while the
implementation uses SHA-256; update the function docstring to say SHA-256 (or
"SHA-256 hash") and adjust any descriptive text to match (e.g., "Compute a
SHA-256 hash identifying the currently loaded LoRA adapter weights" and "returns
hex digest string"). Locate the docstring above the function that uses
dit_handler, lora_service and hash_obj = hashlib.sha256() and change the
algorithm name and any related wording to accurately reflect SHA-256.
---
Duplicate comments:
In `@acestep/audio_utils.py`:
- Around line 1-590: The file is too large and mixes responsibilities; split it
into smaller modules: move audio processing functions (apply_fade,
normalize_audio, generate_uuid_from_audio_data) into a new audio_processing
module, move I/O and saving logic (AudioSaver class, _save_mp3, save_audio,
save_batch, convert_audio and the global _default_saver) into an audio_io
module, and move hashing/utility functions (get_lora_weights_hash,
get_audio_file_hash, generate_uuid_from_params) into a utils/hashing module;
update imports/exports so callers use the new modules (or re-export them from a
lightweight aceitstep.audio_utils shim) and include a short deferred split plan
in PR notes if you cannot split in this PR. Ensure each new module has tests and
preserved behavior and adjust any relative imports that referenced AudioSaver or
the helper functions.
---
Nitpick comments:
In `@acestep/audio_utils_test.py`:
- Around line 234-260: The MP3 export test should also assert the subprocess.run
keyword arguments to prevent regressions; in
test_save_audio_mp3_uses_explicit_ffmpeg_export_settings, after obtaining
mock_run.call_args inspect call_args[1] and add assertions that 'check' is True,
'capture_output' is True, and that 'timeout' exists and is a positive integer
(e.g. assert mock_run.call_args[1].get('timeout') is not None and > 0) so the
test pins these important kwargs for the AudioSaver.save_audio ffmpeg
invocation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9ebc6175-4669-4548-8e84-96544a98c3e4
📒 Files selected for processing (2)
acestep/audio_utils.pyacestep/audio_utils_test.py
| """Compute an MD5 hash identifying the currently loaded LoRA adapter weights. | ||
|
|
||
| Iterates over the handler's LoRA service registry to find adapter weight | ||
| file paths, then hashes each file to produce a combined fingerprint. | ||
|
|
||
| Args: | ||
| dit_handler: DiT handler instance with LoRA state attributes. | ||
|
|
||
| Returns: | ||
| Hex digest string uniquely identifying the loaded LoRA weights, | ||
| or empty string if no LoRA is active. | ||
| """ | ||
| if not getattr(dit_handler, "lora_loaded", False): | ||
| return "" | ||
| if not getattr(dit_handler, "use_lora", False): | ||
| return "" | ||
|
|
||
| lora_service = getattr(dit_handler, "_lora_service", None) | ||
| if lora_service is None or not lora_service.registry: | ||
| return "" | ||
|
|
||
| hash_obj = hashlib.sha256() |
There was a problem hiding this comment.
Hash algorithm docstring is inaccurate.
The docstring states MD5, but the implementation uses SHA-256 (hashlib.sha256). Please update the docstring to avoid API confusion.
💡 Proposed docstring fix
-def get_lora_weights_hash(dit_handler) -> str:
- """Compute an MD5 hash identifying the currently loaded LoRA adapter weights.
+def get_lora_weights_hash(dit_handler) -> str:
+ """Compute a SHA-256 hash identifying the currently loaded LoRA adapter weights.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@acestep/audio_utils.py` around lines 429 - 450, The docstring incorrectly
states MD5 while the implementation uses SHA-256; update the function docstring
to say SHA-256 (or "SHA-256 hash") and adjust any descriptive text to match
(e.g., "Compute a SHA-256 hash identifying the currently loaded LoRA adapter
weights" and "returns hex digest string"). Locate the docstring above the
function that uses dit_handler, lora_service and hash_obj = hashlib.sha256() and
change the algorithm name and any related wording to accurately reflect SHA-256.
That makes sense. |
Keep explicit MP3 export parameters provided by the UI all the way to final export. If the UI sends mp3_bitrate and mp3_sample_rate, those values are now preserved and used for MP3 output. If they are not provided, export behavior remains unchanged and the existing default MP3 settings are used.
|
@ChuxiJ the export path now honors explicit MP3 bitrate and sample-rate values when they are provided by the UI.
|
There was a problem hiding this comment.
♻️ Duplicate comments (3)
acestep/audio_utils.py (3)
435-436:⚠️ Potential issue | 🟡 MinorHash algorithm docstring is incorrect.
The docstring says MD5, but implementation uses
hashlib.sha256()(Line 456). Please align the docstring to SHA-256.💡 Proposed fix
-def get_lora_weights_hash(dit_handler) -> str: - """Compute an MD5 hash identifying the currently loaded LoRA adapter weights. +def get_lora_weights_hash(dit_handler) -> str: + """Compute a SHA-256 hash identifying the currently loaded LoRA adapter weights.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/audio_utils.py` around lines 435 - 436, The docstring for the function that "Compute an MD5 hash identifying the currently loaded LoRA adapter weights" is incorrect: the implementation uses hashlib.sha256() (see the function that computes the adapter weights hash), so update the docstring to state that the function computes a SHA-256 hash (not MD5) and, if present, adjust any descriptive text to reference SHA-256 and its properties consistently with the use of hashlib.sha256().
377-380:⚠️ Potential issue | 🟠 MajorGuard against deleting the converted output when paths match.
remove_input=Truecurrently unlinks the input unconditionally; if input and output resolve to the same file, this deletes the final output.💡 Proposed fix
- output_path = self.save_audio( + output_path = Path(self.save_audio( audio_tensor, output_path, sample_rate=sample_rate, format=output_format, channels_first=True - ) + )) @@ - if remove_input: - input_path.unlink() - logger.debug(f"[AudioSaver] Removed input file: {input_path}") + if remove_input and input_path.exists(): + if input_path.resolve() != output_path.resolve(): + input_path.unlink() + logger.debug(f"[AudioSaver] Removed input file: {input_path}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/audio_utils.py` around lines 377 - 380, Guard against deleting the converted output by checking paths before unlinking: in the routine that handles saving/converting audio (refer to variables input_path, output_path and the remove_input logic in AudioSaver or the function that contains this unlink block), only call input_path.unlink() when remove_input is True AND input_path.resolve() != output_path.resolve(); if they are equal, skip deletion and log a debug/warning that removal was skipped to avoid deleting the output.
238-255:⚠️ Potential issue | 🟡 MinorRemove shape-based transpose heuristic for
channels_first=False.The
shape[0] > shape[1]heuristic can still mis-orient valid short clips. Make orientation deterministic fromchannels_firstonly.💡 Proposed fix
if isinstance(audio_data, np.ndarray): - if channels_first: - # numpy already [channels, samples] - audio_tensor = torch.from_numpy(audio_data).float() - else: - # numpy [samples, channels] -> tensor [samples, channels] -> [channels, samples] (if transposed) - audio_tensor = torch.from_numpy(audio_data).float() - if audio_tensor.dim() == 2 and audio_tensor.shape[0] > audio_tensor.shape[1]: - # Assume [samples, channels] if dim0 > dim1 (heuristic) - audio_tensor = audio_tensor.T + audio_tensor = torch.from_numpy(audio_data).float() + if not channels_first and audio_tensor.dim() == 2: + audio_tensor = audio_tensor.transpose(0, 1) else: # torch tensor audio_tensor = audio_data.cpu().float() if not channels_first and audio_tensor.dim() == 2: - # [samples, channels] -> [channels, samples] - if audio_tensor.shape[0] > audio_tensor.shape[1]: - audio_tensor = audio_tensor.T + audio_tensor = audio_tensor.transpose(0, 1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/audio_utils.py` around lines 238 - 255, Remove the shape-based transpose heuristic and make orientation deterministic from channels_first only: stop using audio_tensor.shape[0] > audio_tensor.shape[1] to decide transposition and instead transpose solely based on the channels_first flag (for 2-D inputs). Concretely, in the numpy branch (handling audio_data as np.ndarray) and the torch branch (audio_data as tensor), if channels_first is True and audio_tensor.dim() == 2 then transpose audio_tensor; otherwise do not transpose—delete the conditional checks that compare shape[0] and shape[1]. Ensure you update both places referencing audio_data/audio_tensor so orientation is driven exclusively by channels_first.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@acestep/audio_utils.py`:
- Around line 435-436: The docstring for the function that "Compute an MD5 hash
identifying the currently loaded LoRA adapter weights" is incorrect: the
implementation uses hashlib.sha256() (see the function that computes the adapter
weights hash), so update the docstring to state that the function computes a
SHA-256 hash (not MD5) and, if present, adjust any descriptive text to reference
SHA-256 and its properties consistently with the use of hashlib.sha256().
- Around line 377-380: Guard against deleting the converted output by checking
paths before unlinking: in the routine that handles saving/converting audio
(refer to variables input_path, output_path and the remove_input logic in
AudioSaver or the function that contains this unlink block), only call
input_path.unlink() when remove_input is True AND input_path.resolve() !=
output_path.resolve(); if they are equal, skip deletion and log a debug/warning
that removal was skipped to avoid deleting the output.
- Around line 238-255: Remove the shape-based transpose heuristic and make
orientation deterministic from channels_first only: stop using
audio_tensor.shape[0] > audio_tensor.shape[1] to decide transposition and
instead transpose solely based on the channels_first flag (for 2-D inputs).
Concretely, in the numpy branch (handling audio_data as np.ndarray) and the
torch branch (audio_data as tensor), if channels_first is True and
audio_tensor.dim() == 2 then transpose audio_tensor; otherwise do not
transpose—delete the conditional checks that compare shape[0] and shape[1].
Ensure you update both places referencing audio_data/audio_tensor so orientation
is driven exclusively by channels_first.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 48f64c76-1b68-4fd7-a254-93aeab3551e4
📒 Files selected for processing (3)
acestep/audio_utils.pyacestep/audio_utils_test.pyacestep/inference.py
ChuxiJ
left a comment
There was a problem hiding this comment.
Code Review
Verdict: Needs Changes ❌
The MP3 export feature itself is well-implemented (_save_mp3() with proper temp file cleanup, input validation, timeout), but the inference.py refactoring introduces a functional regression.
Critical: Dropped global_caption parameter
The refactoring converts the dit_handler.generate_music() call from explicit keywords to a dict + inspect.signature() filter. However, the new dict omits global_caption=params.global_caption which is present in the current main branch and accepted by the handler. The user's global_caption value will be silently ignored — this is a functional regression.
chunk_mask_mode is also missing from the dict, though its default "auto" matches the previous getattr fallback, making it benign.
Medium: inspect.signature() pattern is fragile
This pattern silently drops parameters without error, making bugs like the above easy to introduce. A misspelled key would be silently filtered out instead of raising TypeError. The logger.debug for dropped keys is easy to miss. The previous explicit-keyword approach was safer.
Required changes
- Add
global_captionandchunk_mask_modeback todit_generate_kwargs - Consider raising a warning (not just debug) when parameters are dropped by the signature filter
Other suggestions (non-blocking)
- Forward
mp3_bitrate/mp3_sample_rateinconvert_audio(),save_batch(), and the module-levelsave_audio()for consistency - Change
logger.infoat the export log line tologger.debugto reduce noise in batch generation - Minimize the
audio_utils.pydiff to only actual semantic changes (current diff is a full-file replacement)
This fix restores the missing global_caption forwarding in acestep/inference.py, which was accidentally dropped during the dict + inspect.signature() refactor and caused a functional regression. It also restores chunk_mask_mode in the forwarded kwargs and changes the log for filtered-out parameters from debug to warning, so unexpected drops are more visible during development and review. Scope intentionally kept minimal: only the file already involved in the PR was updated.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
acestep/inference.py (1)
200-201: Consider documenting the new fields in the class docstring.The new
mp3_bitrateandmp3_sample_ratefields are functional with sensible defaults aligned with PR objectives (48kHz as the default). However, theGenerationConfigdocstring (lines 179-192) doesn't document these new attributes.📝 Suggested docstring addition
Add to the
Attributessection around line 191:audio_format: Output audio format, one of "mp3", "wav", "flac", "wav32", "opus", "aac". Default: "flac" + mp3_bitrate: MP3 encoding bitrate (e.g., "128k", "192k", "320k"). Only used when audio_format="mp3". + mp3_sample_rate: MP3 output sample rate in Hz (e.g., 44100, 48000). Only used when audio_format="mp3". """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/inference.py` around lines 200 - 201, The GenerationConfig class docstring is missing descriptions for the newly added attributes mp3_bitrate and mp3_sample_rate; update the GenerationConfig docstring (Attributes section) to document mp3_bitrate: str (e.g., "MP3 bitrate for output, default '128k'") and mp3_sample_rate: int (sample rate in Hz, default 48000), briefly stating units/format and default values so readers and auto-doc tools see the new fields; locate GenerationConfig and add these two lines to its Attributes block near the other parameter docs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@acestep/inference.py`:
- Around line 200-201: The GenerationConfig class docstring is missing
descriptions for the newly added attributes mp3_bitrate and mp3_sample_rate;
update the GenerationConfig docstring (Attributes section) to document
mp3_bitrate: str (e.g., "MP3 bitrate for output, default '128k'") and
mp3_sample_rate: int (sample rate in Hz, default 48000), briefly stating
units/format and default values so readers and auto-doc tools see the new
fields; locate GenerationConfig and add these two lines to its Attributes block
near the other parameter docs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 00fd3d5b-f249-496a-a0cb-4728ba1ff1aa
📒 Files selected for processing (1)
acestep/inference.py
ChuxiJ
left a comment
There was a problem hiding this comment.
Code Review
Overall: Well-implemented MP3 export with proper error handling, input validation, and test coverage.
Key Items to Address
-
Medium - API gap for
convert_audioandsave_batch: These methods (and the module-levelsave_audioconvenience function) do not accept the newmp3_bitrate/mp3_sample_rateparameters. Users calling these functions cannot control MP3 export settings. At minimum, thread the parameters throughconvert_audioandsave_batch, or document this limitation. -
Medium - Whole-file rewrite inflates diff: The diff shows -516/+596 lines for
audio_utils.py, but the actual changes are ~60 lines. This was likely caused by re-formatting. A cleaner diff would make review easier. -
Low -
logger.infoin hot path: The newlogger.infoat line 743 ininference.pylogs on every audio export. For batch generation this is noisy. Consider usinglogger.debugto match the existing logging style insave_audio. -
Low - Test formatting: Multi-
patchdecorators on single lines are hard to read. Consider stacking or usingwithblocks.
Good Practices
subprocess.runwith list args prevents shell injection- Input validation on bitrate and sample rate with sensible defaults
timeout=120on subprocess prevents runaway ffmpeg processes- Temp file cleanup in
finallyblock inspect.signaturefiltering ininference.pyis a clever forward-compatibility mechanism
No blocking issues.
Thread mp3_bitrate and mp3_sample_rate through save_audio, convert_audio, and save_batch so callers can control MP3 export settings consistently. Also reduce noisy export logging in inference.py from info to debug, improve readability of multi-patch test formatting, and add coverage to verify MP3 parameter forwarding.
ChuxiJ
left a comment
There was a problem hiding this comment.
Code Review
Thanks for the PR! The feature concept (selectable MP3 export format) is useful. However:
Concern: Scope
The diff rewrites the entire audio_utils.py file (516 lines deleted, 618 added). For adding MP3 format selection, the changes should be much more targeted — primarily adding parameters to the UI and passing them through to torchaudio.save().
Could you scope this down to only the changes needed for MP3 format selection? A full rewrite of audio_utils.py makes it very hard to review and introduces regression risk.
Specific feedback:
- The batch processing and audio effects features (fade, normalization) mentioned in the release notes appear to already exist in the current codebase. Please ensure you're not duplicating existing functionality.
- CodeRabbit has paused review due to rapid commits — please stabilize the branch before requesting final review.
Suggestion:
- Rebase on latest
main - Keep only the MP3 bitrate/sample rate selection changes
- Add a test verifying the new MP3 export options work correctly
Code reviewThe core Found 2 issues:
Minor notes:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
ChuxiJ
left a comment
There was a problem hiding this comment.
Code Review: Selectable MP3 export format
Thanks for the contribution! The MP3 export configurability is useful.
Concerns
-
Full file rewrite of
audio_utils.py: This PR replaces the entire 516-line file with a 618-line rewrite (+777/-559). A full rewrite makes it very difficult to review what actually changed vs what was just reformatted. Could you restructure this as incremental changes on top of the existing file? This would make the diff much clearer and reduce merge conflict risk. -
Missing changes in Gradio UI: The PR description says "If the UI does not support these settings, generation will default to the previous settings" — but without UI controls, users can't actually select MP3 bitrate/sample rate. Is there a follow-up PR planned for the Gradio UI integration?
-
inference.pychanges: The +58/-43 diff ininference.pyalso appears to be partially a reformat. Please separate formatting changes from functional changes.
Suggestions
- Break this into two PRs: (1) the MP3 export feature with minimal changes, (2) any refactoring/cleanup separately
- Add MP3 bitrate/sample rate options to the Gradio UI and API request schema
- Add a test that verifies MP3 output with non-default bitrate
The feature itself is welcome — just needs a cleaner diff. Happy to re-review once restructured.
- pass repaint params through dit_generate_kwargs - set MP3 defaults to 320k and 44100 - force true CBR export with ffmpeg -abr 0 - update tests for new MP3 defaults - remove redundant fallback access and unused imports
Add a minimal backend-only implementation for configurable MP3 export. This change keeps the diff intentionally small and focused on the core feature: - add optional MP3 bitrate and sample rate settings in the backend - pass those settings through the generation pipeline - add targeted tests for default and custom MP3 export behavior The Gradio UI was not modified on purpose, to leave lead developers full freedom to integrate these settings where and how they see fit.

Replaces the implicit MP3 export path in torchaudio/ffmpeg by adding the option to select the MP3 format.
If the UI does not support these settings, generation will default to the previous settings.
Summary by CodeRabbit
Release Notes