Skip to content

audiohacking/acestep-action

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ace-Step Audio Generation Action

A GitHub Action that generates music from text prompts using Ace-Step 1.5 via the native acestep.cpp engine.
Text + optional lyrics in, stereo 48 kHz WAV out.

No Python. No PyTorch. No waiting.
The pre-built Docker image ships with compiled ace-qwen3/dit-vae binaries and all ~7.7 GB of pre-quantized GGUF models baked in — action execution starts immediately.

Features

  • 🎵 Generate high-quality music from a text caption
  • 🖊️ Optional lyrics — or let the LLM write them for you
  • 🔍 Analyze existing audio with ace-understand — extract caption, lyrics, BPM, key, duration, and language
  • ⚡ Native C++17 / GGML engine — lightweight, no GPU required
  • 🐳 Pre-built Docker image with models included — zero download wait
  • 🎲 Reproducible generation with optional seed
  • 🔧 Easy integration with GitHub Actions workflows

Usage

Basic Example

name: Generate Audio
on: [push]

jobs:
  generate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Generate audio
        id: audio
        uses: audiohacking/acestep-action@main
        with:
          caption: 'upbeat electronic chiptune music'

      - name: Use generated audio
        run: echo "Audio saved to ${{ steps.audio.outputs.audio_file }}"

With lyrics and seed

- name: Generate audio
  id: audio
  uses: audiohacking/acestep-action@main
  with:
    caption: 'calm ambient piano melody, lo-fi, warm'
    lyrics: |
      [Verse]
      Floating on a cloud of sound
      Melodies that go around
    duration: '30'
    seed: '42'
    output_path: 'generated_music.wav'

Upload the result as an artifact

- name: Upload audio
  uses: actions/upload-artifact@v4
  with:
    name: generated-audio
    path: ${{ steps.audio.outputs.audio_file }}

Analyze an existing audio file

Supply a local file path or a URL (http/https) to an MP3 or WAV file via the understand input.
The action uses the file directly if a path is given, or downloads it first if a URL is provided.
It then runs ace-understand and returns the analysis as JSON in the understand_result output.
Audio generation is skipped when understand is set.

# From a URL
- name: Analyze audio (URL)
  id: analyze
  uses: audiohacking/acestep-action@main
  with:
    understand: 'https://example.com/song.mp3'

# From a local path (e.g. a file already in the workspace)
- name: Analyze audio (local file)
  id: analyze
  uses: audiohacking/acestep-action@main
  with:
    understand: '/github/workspace/output.wav'

- name: Show analysis
  run: echo '${{ steps.analyze.outputs.understand_result }}'

Inputs

Input Description Required Default
caption Text description for music generation No chiptune
lyrics Lyrics (empty = LLM auto-generates) No (empty)
duration Duration in seconds No 20
seed Random seed for reproducible generation No (random)
inference_steps Number of DiT inference steps No 8
shift Flow-matching shift parameter No 3
vocal_language Vocal language code (en, fr, …) No en
output_path Output path for the generated WAV file No output.wav
understand Local file path or URL (http/https) to an MP3 or WAV file to analyze (activates understand mode — skips generation) No (empty)

Outputs

Output Description
audio_file Path to the generated WAV audio file
generation_time Time taken to generate the audio in seconds
understand_result JSON from ace-understand: caption, lyrics, BPM, key, duration, language

How it works

The action runs as a pre-built Docker container published to GitHub Container Registry. The image is built once (by build-docker.yml) and contains everything needed:

What Where in image
ace-qwen3 binary (Qwen3 causal LM) /action/bin/ace-qwen3
dit-vae binary (DiT + Oobleck VAE) /action/bin/dit-vae
ace-understand binary (reverse pipeline) /action/bin/ace-understand
Qwen3-Embedding-0.6B-Q8_0.gguf /action/models/
acestep-5Hz-lm-4B-Q8_0.gguf /action/models/
acestep-v15-turbo-Q8_0.gguf /action/models/
vae-BF16.gguf /action/models/

At runtime the entrypoint (src/entrypoint.sh):

Generation mode (default — when understand is not set):

  1. Builds a request JSON from inputs
  2. Runs ace-qwen3 (LLM stage: caption → enriched JSON with lyrics + audio codes)
  3. Runs dit-vae (DiT + VAE stage: JSON → stereo 48 kHz WAV)
  4. Moves the output WAV to the requested path in $GITHUB_WORKSPACE

Understand mode (when understand is provided):

  1. If a URL (http/https/ftp/file) is given, downloads the audio file; if a local path is given, uses it directly
  2. Runs ace-understand (VAE encode → FSQ tokenize → LM understand → JSON)
  3. Emits the resulting JSON as the understand_result action output

Image location: ghcr.io/audiohacking/acestep-action:latest

Project structure

acestep-action/
├── action.yml                      # Docker action definition
├── Dockerfile                      # Image: build binaries + download models
├── src/
│   └── entrypoint.sh              # Generation shell script (Docker entrypoint)
└── .github/
    └── workflows/
        ├── build-docker.yml       # Build and publish image to GHCR
        └── test.yml               # CI test workflow

Local development

To test locally with the Dockerfile instead of the GHCR image, change action.yml:

runs:
  using: 'docker'
  image: 'Dockerfile'   # instead of 'docker://ghcr.io/...'

Then build and run the container manually:

docker build -t acestep-action .

docker run --rm \
  -e INPUT_CAPTION="upbeat chiptune" \
  -e INPUT_DURATION="10" \
  -e GITHUB_WORKSPACE=/out \
  -e GITHUB_OUTPUT=/dev/stdout \
  -v /tmp/out:/out \
  acestep-action

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

See LICENSE file for details.