Ace-Step Audio Generation Action

A GitHub Action that generates music from text prompts using Ace-Step 1.5 via the native acestep.cpp engine.
Text + optional lyrics in, stereo 48 kHz WAV out.

No Python. No PyTorch. No waiting.
The pre-built Docker image ships with compiled ace-qwen3/dit-vae binaries and all ~7.7 GB of pre-quantized GGUF models baked in — action execution starts immediately.

Features

🎵 Generate high-quality music from a text caption
🖊️ Optional lyrics — or let the LLM write them for you
🔍 Analyze existing audio with ace-understand — extract caption, lyrics, BPM, key, duration, and language
⚡ Native C++17 / GGML engine — lightweight, no GPU required
🐳 Pre-built Docker image with models included — zero download wait
🎲 Reproducible generation with optional seed
🔧 Easy integration with GitHub Actions workflows

Usage

Basic Example

name: Generate Audio
on: [push]

jobs:
  generate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Generate audio
        id: audio
        uses: audiohacking/acestep-action@main
        with:
          caption: 'upbeat electronic chiptune music'

      - name: Use generated audio
        run: echo "Audio saved to ${{ steps.audio.outputs.audio_file }}"

With lyrics and seed

- name: Generate audio
  id: audio
  uses: audiohacking/acestep-action@main
  with:
    caption: 'calm ambient piano melody, lo-fi, warm'
    lyrics: |
      [Verse]
      Floating on a cloud of sound
      Melodies that go around
    duration: '30'
    seed: '42'
    output_path: 'generated_music.wav'

Upload the result as an artifact

- name: Upload audio
  uses: actions/upload-artifact@v4
  with:
    name: generated-audio
    path: ${{ steps.audio.outputs.audio_file }}

Analyze an existing audio file

Supply a local file path or a URL (http/https) to an MP3 or WAV file via the understand input.
The action uses the file directly if a path is given, or downloads it first if a URL is provided.
It then runs ace-understand and returns the analysis as JSON in the understand_result output.
Audio generation is skipped when understand is set.

# From a URL
- name: Analyze audio (URL)
  id: analyze
  uses: audiohacking/acestep-action@main
  with:
    understand: 'https://example.com/song.mp3'

# From a local path (e.g. a file already in the workspace)
- name: Analyze audio (local file)
  id: analyze
  uses: audiohacking/acestep-action@main
  with:
    understand: '/github/workspace/output.wav'

- name: Show analysis
  run: echo '${{ steps.analyze.outputs.understand_result }}'

Inputs

Input	Description	Required	Default
`caption`	Text description for music generation	No	`chiptune`
`lyrics`	Lyrics (empty = LLM auto-generates)	No	(empty)
`duration`	Duration in seconds	No	`20`
`seed`	Random seed for reproducible generation	No	(random)
`inference_steps`	Number of DiT inference steps	No	`8`
`shift`	Flow-matching shift parameter	No	`3`
`vocal_language`	Vocal language code (`en`, `fr`, …)	No	`en`
`output_path`	Output path for the generated WAV file	No	`output.wav`
`understand`	Local file path or URL (http/https) to an MP3 or WAV file to analyze (activates understand mode — skips generation)	No	(empty)

Outputs

Output	Description
`audio_file`	Path to the generated WAV audio file
`generation_time`	Time taken to generate the audio in seconds
`understand_result`	JSON from `ace-understand`: caption, lyrics, BPM, key, duration, language

How it works

The action runs as a pre-built Docker container published to GitHub Container Registry. The image is built once (by build-docker.yml) and contains everything needed:

What	Where in image
`ace-qwen3` binary (Qwen3 causal LM)	`/action/bin/ace-qwen3`
`dit-vae` binary (DiT + Oobleck VAE)	`/action/bin/dit-vae`
`ace-understand` binary (reverse pipeline)	`/action/bin/ace-understand`
`Qwen3-Embedding-0.6B-Q8_0.gguf`	`/action/models/`
`acestep-5Hz-lm-4B-Q8_0.gguf`	`/action/models/`
`acestep-v15-turbo-Q8_0.gguf`	`/action/models/`
`vae-BF16.gguf`	`/action/models/`

At runtime the entrypoint (src/entrypoint.sh):

Generation mode (default — when understand is not set):

Builds a request JSON from inputs
Runs ace-qwen3 (LLM stage: caption → enriched JSON with lyrics + audio codes)
Runs dit-vae (DiT + VAE stage: JSON → stereo 48 kHz WAV)
Moves the output WAV to the requested path in $GITHUB_WORKSPACE

Understand mode (when understand is provided):

If a URL (http/https/ftp/file) is given, downloads the audio file; if a local path is given, uses it directly
Runs ace-understand (VAE encode → FSQ tokenize → LM understand → JSON)
Emits the resulting JSON as the understand_result action output

Image location: ghcr.io/audiohacking/acestep-action:latest

Project structure

acestep-action/
├── action.yml                      # Docker action definition
├── Dockerfile                      # Image: build binaries + download models
├── src/
│   └── entrypoint.sh              # Generation shell script (Docker entrypoint)
└── .github/
    └── workflows/
        ├── build-docker.yml       # Build and publish image to GHCR
        └── test.yml               # CI test workflow

Local development

To test locally with the Dockerfile instead of the GHCR image, change action.yml:

runs:
  using: 'docker'
  image: 'Dockerfile'   # instead of 'docker://ghcr.io/...'

Then build and run the container manually:

docker build -t acestep-action .

docker run --rm \
  -e INPUT_CAPTION="upbeat chiptune" \
  -e INPUT_DURATION="10" \
  -e GITHUB_WORKSPACE=/out \
  -e GITHUB_OUTPUT=/dev/stdout \
  -v /tmp/out:/out \
  acestep-action

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ace-Step Audio Generation Action

Features

Usage

Basic Example

With lyrics and seed

Upload the result as an artifact

Analyze an existing audio file

Inputs

Outputs

How it works

Project structure

Local development

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ace-Step Audio Generation Action

Features

Usage

Basic Example

With lyrics and seed

Upload the result as an artifact

Analyze an existing audio file

Inputs

Outputs

How it works

Project structure

Local development

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages