A GitHub Action that generates music from text prompts using Ace-Step 1.5 via the native acestep.cpp engine.
Text + optional lyrics in, stereo 48 kHz WAV out.
No Python. No PyTorch. No waiting.
The pre-built Docker image ships with compiled ace-qwen3/dit-vae binaries and all ~7.7 GB of pre-quantized GGUF models baked in — action execution starts immediately.
- 🎵 Generate high-quality music from a text caption
- 🖊️ Optional lyrics — or let the LLM write them for you
- 🔍 Analyze existing audio with
ace-understand— extract caption, lyrics, BPM, key, duration, and language - ⚡ Native C++17 / GGML engine — lightweight, no GPU required
- 🐳 Pre-built Docker image with models included — zero download wait
- 🎲 Reproducible generation with optional seed
- 🔧 Easy integration with GitHub Actions workflows
name: Generate Audio
on: [push]
jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Generate audio
id: audio
uses: audiohacking/acestep-action@main
with:
caption: 'upbeat electronic chiptune music'
- name: Use generated audio
run: echo "Audio saved to ${{ steps.audio.outputs.audio_file }}"- name: Generate audio
id: audio
uses: audiohacking/acestep-action@main
with:
caption: 'calm ambient piano melody, lo-fi, warm'
lyrics: |
[Verse]
Floating on a cloud of sound
Melodies that go around
duration: '30'
seed: '42'
output_path: 'generated_music.wav'- name: Upload audio
uses: actions/upload-artifact@v4
with:
name: generated-audio
path: ${{ steps.audio.outputs.audio_file }}Supply a local file path or a URL (http/https) to an MP3 or WAV file via the understand input.
The action uses the file directly if a path is given, or downloads it first if a URL is provided.
It then runs ace-understand and returns the analysis as JSON in the understand_result output.
Audio generation is skipped when understand is set.
# From a URL
- name: Analyze audio (URL)
id: analyze
uses: audiohacking/acestep-action@main
with:
understand: 'https://example.com/song.mp3'
# From a local path (e.g. a file already in the workspace)
- name: Analyze audio (local file)
id: analyze
uses: audiohacking/acestep-action@main
with:
understand: '/github/workspace/output.wav'
- name: Show analysis
run: echo '${{ steps.analyze.outputs.understand_result }}'| Input | Description | Required | Default |
|---|---|---|---|
caption |
Text description for music generation | No | chiptune |
lyrics |
Lyrics (empty = LLM auto-generates) | No | (empty) |
duration |
Duration in seconds | No | 20 |
seed |
Random seed for reproducible generation | No | (random) |
inference_steps |
Number of DiT inference steps | No | 8 |
shift |
Flow-matching shift parameter | No | 3 |
vocal_language |
Vocal language code (en, fr, …) |
No | en |
output_path |
Output path for the generated WAV file | No | output.wav |
understand |
Local file path or URL (http/https) to an MP3 or WAV file to analyze (activates understand mode — skips generation) | No | (empty) |
| Output | Description |
|---|---|
audio_file |
Path to the generated WAV audio file |
generation_time |
Time taken to generate the audio in seconds |
understand_result |
JSON from ace-understand: caption, lyrics, BPM, key, duration, language |
The action runs as a pre-built Docker container published to GitHub Container Registry. The image is built once (by build-docker.yml) and contains everything needed:
| What | Where in image |
|---|---|
ace-qwen3 binary (Qwen3 causal LM) |
/action/bin/ace-qwen3 |
dit-vae binary (DiT + Oobleck VAE) |
/action/bin/dit-vae |
ace-understand binary (reverse pipeline) |
/action/bin/ace-understand |
Qwen3-Embedding-0.6B-Q8_0.gguf |
/action/models/ |
acestep-5Hz-lm-4B-Q8_0.gguf |
/action/models/ |
acestep-v15-turbo-Q8_0.gguf |
/action/models/ |
vae-BF16.gguf |
/action/models/ |
At runtime the entrypoint (src/entrypoint.sh):
Generation mode (default — when understand is not set):
- Builds a request JSON from inputs
- Runs
ace-qwen3(LLM stage: caption → enriched JSON with lyrics + audio codes) - Runs
dit-vae(DiT + VAE stage: JSON → stereo 48 kHz WAV) - Moves the output WAV to the requested path in
$GITHUB_WORKSPACE
Understand mode (when understand is provided):
- If a URL (http/https/ftp/file) is given, downloads the audio file; if a local path is given, uses it directly
- Runs
ace-understand(VAE encode → FSQ tokenize → LM understand → JSON) - Emits the resulting JSON as the
understand_resultaction output
Image location: ghcr.io/audiohacking/acestep-action:latest
acestep-action/
├── action.yml # Docker action definition
├── Dockerfile # Image: build binaries + download models
├── src/
│ └── entrypoint.sh # Generation shell script (Docker entrypoint)
└── .github/
└── workflows/
├── build-docker.yml # Build and publish image to GHCR
└── test.yml # CI test workflow
To test locally with the Dockerfile instead of the GHCR image, change action.yml:
runs:
using: 'docker'
image: 'Dockerfile' # instead of 'docker://ghcr.io/...'Then build and run the container manually:
docker build -t acestep-action .
docker run --rm \
-e INPUT_CAPTION="upbeat chiptune" \
-e INPUT_DURATION="10" \
-e GITHUB_WORKSPACE=/out \
-e GITHUB_OUTPUT=/dev/stdout \
-v /tmp/out:/out \
acestep-actionContributions are welcome! Please feel free to submit a Pull Request.
See LICENSE file for details.