Utility ComfyUI nodes for ACE-Step (1.0 and 1.5) music models.
- Some of this stuff is pretty niche/experimental, especially the LM sampling bits, and may be changed/break workflows.
- You may need a fairly recent Python version (3.12+ should be fine).
- This repo has no affiliation with the official ACE-Step project.
All nodes have the ACETricks prefix for easy searching.
See the node descriptions and input/output tooltips for more information.
AudioAsLatent- RearrangesAUDIOto look like aLATENT. Can be useful for using stuff like latent-only blend nodes. For conversion in the other direction, seeLatentAsAudio.AudioBlend- Applies a blend function toAUDIO.AudioFromBatch- Extracts items from batches ofAUDIO.AudioLevels- Can be used to normalize the values inAUDIO.LatentAsAudio- SeeAudioAsLatent.MonoToStereo- Converts monoAUDIOto stereo. Safe to use if it's already.SetAudioDtype- Sets the dtype forAUDIOtensors.WaveForm- Generates a waveform image fromAUDIO.
CondJoinLyrics- Joins split lyrics/conditioning intoCONDITIONING.CondSplitOutLyrics- Can split out lyrics (and some other metadata like audio codes) fromCONDITIONING.
SilentLatent- Can generate latents full of silence for ACE-Step 1.0 and 1.5 (if given a 1.5 latent as reference). Can be interesting for initial generations if you set denoise to something lower than 1.0 (or multiply theSIGMAS). You can also use ComfyUI's built-inLatentMultiplynode to multiply by -1.0 and make stuff louder!VisualizeLatent- Can output anIMAGErepresentation of ACE-Step 1.0 and 1.5 latents. Extended version of the previewing inComfyUI-blehmentioned in the Integrations section below.
EncodeLyrics- ACE 1.0-specific node for encoding conditioning with extended features.Mask- Can be used to generate masks for 1.0 latents. Does not currently support 1.5.
Ace15CompressDuplicateAudioCodes- Can collapse sequences of duplicate audio codes inCONDITIONING.Ace15LLMInference- (experimental) Advanced node for doing LLM inference (primarily) with ACE-Step 1.5's LLMs - used for generating audio codes. It is configured with YAML, see the default YAML text for descriptions of parameters.EmptyAce15LatentFromConditioning- Creates an emptyLATENTto match the time covered by the audio codes inCONDITIONING.ModelPatchAce15Use4dLatent- (experimental) Patches an ACE-Step 1.5 model with a wrapper to handle 4D latent inputs. This is a horrible hack and may not be compatible with everything, I use it personally for all my generations and it works pretty well currently. See the description forSqueezeUnsqueezeLatentDimensionnode which you will need to use as well.RawTextEncodeAce15- (experimental) Advanced node for encoding ACE-Step 1.5CONDITIONINGwhich allows you complete control of the exact text used for the following items: DiT prompt (actual sampling), lyrics, LLM positive, LLM negative and audio codes. It is configured with YAML, see the default YAML text for descriptions of parameters.SqueezeUnsqueezeLatentDimension- Mostly useful with ACE-Step 1.5 since it uses 3D latents while a lot of nodes expects 4+. Unsqueezing adds an empty dimension, squeezing removes an empty dimension. For ACE 1.5 - leave dimension on the default of2. Unsqueeze to add the dimension, toggle unsqueeze mode off to remove it. Normal usage would look something like: Empty latent node → unsqueeze → sampler → squeeze. Important: You will also need to patch the model to deal with 4D latents. See theModelPatchAce15Use4dLatentnode.TextEncodeAce15- Simpler conditioning node for ACE-Step 1.5. Has some extended features but is mostly superceded by theRawTextEncodeAce15andAce15LLMInferencenodes.
TimeOffset- Can be used to calculate offsets into the latent time dimension for 1.0 and 1.5 latents.
These can integrate with some of my other projects:
- ComfyUI-sonar - custom noise types.
- ComfyUI-bleh - Many additional blend functions. Not precisely an integration but the
ComfyUI-blehnode pack can also show you graphical previews while sampling.
Think these are all 1.5-specific. In no specific order:
- Being able to handle the DiT prompt and codes (or another way to think of it is the LM prompts) seperately is very powerful. It's possible to adjust lyrics (nothing too extreme) if you're having trouble with conformance, while leaving audio codes the same. You can get interesting effects by using one signature/time signature/BPM for audio codes and a different one when you sample.
- The
Ace15LLMInferencenode integrates with exotic noise types fromComfyUI-sonarso if you want to do something like temperature sampling with noise from the Cauchy distribution, you can! It can also use blend modes fromComfyUI-bleh, and fun fact: CFG is just a blend mode. - Use the
SqueezeUnsqueezeLatentDimensionandModelPatchAce15Use4dLatentnodes to make ACE-Step 1.5 work with nodes that don't support 3D latents (I.E.ComfyUI-sonar).