Skip to content

AudenAI/Auden

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Auden: Audio & Multimodal Understanding Research Toolbox

A comprehensive toolbox for audio & multimodal understanding tasks including ASR, CLAP, audio captioning, speaker identification, speech-llm and more.

📖 Read the Tutorial | 💡 Examples | 🤗 Models

🔥 What's New

  • 2026-01: Released pretrained ASR models AudenAI/auden-asr-zh-stream and AudenAI/auden-asr-zh-en
  • 2026-01: Added WenetTransformer & WhisperEncoder
  • 2025-12: Released AzeroS - Speech-LLM understanding both semantics and paralinguistics (gender, age, emotion)
  • 2025-12: Released TagSpeech - Unified E2E multi-speaker ASR and diarization with timestamps and speaker attribution
  • 2025-10: Released TTA - Multilingual transcribe, translate, and align model with cross-lingual speech retrieval
  • 2025-10: Released Voice - General-purpose voice encoder for speaker verification, emotion, gender, and age classification
  • 2025-09 Initial release with foundation models including ASR, CLAP, audio captioning, audio tagging and LALM etc.

Quick Start

Before installing Auden:

git clone https://github.com/AudenAI/Auden.git
cd Auden
pip install -e .

Some examples/ may have extra installation requirements. Please refer to the examples/ READMEs for details.

Features

  • 🎯 Multiple foundation audio tasks: ASR, captioning, contrastive learning, audio classification, speaker identification etc.
  • 🤖 Multimodal LLM support (e.g. speech-LLM, asr-llm)
  • 🚀 Pre-trained models&encoders with huggingface support and easy fine-tuning
  • 🔧 Modular design for custom workflows
  • 📊 Comprehensive evaluation metrics

📖 Tutorial

New to Auden? Check out our comprehensive tutorial to understand the design philosophy and learn how to build your own projects:

👉 Auden Design Philosophy and Usage Guide

The tutorial covers:

  • Design principles and core architecture
  • How to use Auto* APIs for model/config loading
  • Step-by-step guide to building custom projects
  • Best practices for training, data processing, and deployment
  • Common FAQs and troubleshooting

Usage

Available Built-in Model Types from AutoModel

from auden.auto import list_available_models
print(list_available_models())

Custom Model Registration

Important: If you want to load custom models with AutoModel, you must register them first:

from auden.auto import register_model, register_config

# Register your custom model and config
register_model("my-model", "examples.my_model.model", "MyModel")
register_config("my-model", "examples.my_model.config", "MyConfig")

# Now you can use it with AutoModel
from auden.auto import AutoModel
model = AutoModel.from_pretrained("path/to/my-model")

Note: If you don't want to use AutoModel, you can always skip this step and use your own way to load.

Loading Models

Auden provides a HuggingFace-like interface for loading models:

from auden.auto import AutoModel

# Load from HuggingFace Hub
model = AutoModel.from_pretrained("your-org/your-model")

# Load from local checkpoint
model = AutoModel.from_pretrained("path/to/model")

# Load from configuration (creates an EMPTY model)
from auden.auto import AutoConfig
config = AutoConfig.from_pretrained("path/to/config_or_model_dir")

# Important: from_config(...) constructs an EMPTY model (random init) from the config only
# It does NOT load weights. To load weights, use from_pretrained(...)
model = AutoModel.from_config(config)

Loading Configurations

from auden.auto import AutoConfig

# Load from various sources
config = AutoConfig.from_pretrained("your-org/model")
config = AutoConfig.from_pretrained("path/to/config.json")

# Create config for specific model type
config = AutoConfig.for_model("zipformer", hidden_size=512)

Examples

Check examples/ for task-specific tutorials.

License

The code and weights in this repository are released under the LICENSE file. This repository also includes a NOTICE file with third-party attributions (e.g., Transformers, Icefall).

Acknowledgements

  • This project draws inspiration from the design and user experience of Hugging Face Transformers, especially around Auto* APIs and configuration patterns.
  • Many ASR components and utilities (e.g., Zipformer encoder variants, WER tooling) are adapted from or inspired by k2-fsa/icefall. We thank the Icefall authors and contributors for their excellent work and open-source spirit.
  • Parts of the data pipeline and dataset handling build upon Lhotse. We thank the Lhotse authors and contributors for their work and open-source efforts.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages