A modern, modular real-time Automatic Speech Recognition (ASR) interface with support for various ASR backends, built with FastAPI and WebRTC.
- 🎤 Real-time ASR: Live transcription via WebRTC audio streaming
- 📁 File Upload: Upload and transcribe audio files
- 🔄 Modular Architecture: Easy integration of custom ASR backends
- 🌐 Web Interface: Modern, responsive web UI for configuration and monitoring
- ⚡ High Performance: Optimized for low-latency real-time processing by relying on WebRTC implemented via HuggingFace FastRTC library.
- MLX Whisper (Apple Silicon optimized, word-level timestamps)
- Whisper (OpenAI official) (segment-level timestamps, permissive license)
To use the standard OpenAI Whisper backend, set backend: "whisper" in your configuration (see below).
- Python 3.10 or higher
- uv
# Clone the repository
git clone https://github.com/Diabolocom-Research/HISI-interface.git
cd HISI-interface
# Install uv if you haven't already
pip install uv
# Create virtual environment and install dependencies
uv sync
# Start the virtual env
source .venv/bin/activate
# Install the package in development mode
uv pip install -e .# Using the CLI
hisi-interface serve
# Or with uvicorn
uvicorn asr_interface.web.server:create_app --reload- Open http://localhost:8000 in your browser
- Configure your ASR model settings
- Start recording or upload an audio file
- View real-time transcriptions and segments
The HISI Interface follows a clean, modular architecture based on the legacy whisper-streaming system:
asr_interface/
├── core/ # Core protocols and configuration
├── backends/ # ASR backends and model loaders
├── handlers/ # Real-time audio processing
├── web/ # FastAPI web server
├── utils/ # Audio processing utilities
└── cli/ # Command-line interface
ASRBase: Abstract base class for ASR backends (Whisper, MLX Whisper, etc.)OnlineASRProcessor: Main processor that manages audio buffering and hypothesis stabilizationModelLoader: Protocol for loading ASR models and creating processorsRealTimeASRHandler: WebRTC audio stream handlerASRComponentsStore: Thread-safe state management
- Model Loading:
ModelLoadercreates anASRBasebackend and wraps it withOnlineASRProcessor - Audio Processing:
OnlineASRProcessormanages audio buffering, calls the ASR backend, and stabilizes transcripts - Real-time Streaming:
RealTimeASRHandlerreceives WebRTC audio and feeds it to the processor - Output: Stabilized transcripts are returned to the client
POST /load_model- Load ASR model with configurationPOST /upload_and_transcribe- Upload and transcribe audio filePOST /evaluate_model- Evaluate model performanceGET /- Web interfaceGET /transcript- Get transcript for WebRTC session
# Start web server
hisi-interface serve [--host HOST] [--port PORT] [--reload]
# Show project info
hisi-interface infoThe modular architecture makes it easy to add custom ASR backends. There are two integration paths:
For most use cases, you can provide your own ASR backend and reuse the existing OnlineASRProcessor:
from asr_interface.core.protocols import ModelLoader, ASRProcessor
from asr_interface.core.config import ASRConfig
from asr_interface.backends import ASRBase, OnlineASRProcessor
class MyCustomASR(ASRBase):
"""Your custom ASR backend implementing ASRBase."""
def load_model(self, modelsize=None, cache_dir=None, model_dir=None):
# Load your model
pass
def transcribe(self, audio, init_prompt=""):
# Transcribe audio and return result with segments
pass
def ts_words(self, result):
# Extract word-level timestamps
pass
def segments_end_ts(self, result):
# Extract segment end timestamps
pass
def use_vad(self):
# Enable VAD if supported
pass
class MyCustomLoader(ModelLoader):
def load(self, config: ASRConfig) -> tuple[ASRProcessor, dict]:
# Create your ASR backend
asr_backend = MyCustomASR(
lan=config.lan,
modelsize=config.model,
cache_dir=config.model_cache_dir,
model_dir=config.model_dir
)
# Wrap with OnlineASRProcessor
processor = OnlineASRProcessor(
asr=asr_backend,
buffer_trimming=(config.buffer_trimming, int(config.buffer_trimming_sec)),
min_chunk_sec=config.min_chunk_size
)
metadata = {"separator": asr_backend.sep, "model_type": "my_custom"}
return processor, metadata
# Register your loader
from asr_interface.backends.registry import register_loader
register_loader("my_backend", MyCustomLoader())For complete control over the real-time processing pipeline:
from asr_interface.core.protocols import ASRProcessor
class MyCustomRealTimeEngine(ASRProcessor):
def init(self, offset: float = 0.0):
# Reset state for new stream
pass
def insert_audio_chunk(self, audio: np.ndarray):
# Add audio to buffer
pass
def process_iter(self) -> Optional[Tuple[float, float, str]]:
# Process buffer and return results
pass
def finish(self) -> Optional[Tuple[float, float, str]]:
# Process remaining audio
passFor detailed integration guides, see docs/INTEGRATION_GUIDE.md.
In your configuration (e.g., ASRConfig):
backend = "whisper" # For standard OpenAI Whisper
# or
backend = "mlx_whisper" # For MLX Whisper
# or
backend = "whisper_timestamped" # For Whisper Timestamped# Install development dependencies
uv sync --group dev
# Run tests
pytest
# Format code
black asr_interface tests
# Lint code
ruff check asr_interface tests
# Type checking
mypy asr_interfaceHISI-interface/
├── asr_interface/ # Main package
│ ├── core/ # Core protocols and configuration
│ ├── backends/ # ASR model loaders
│ ├── handlers/ # Real-time processing handlers
│ ├── web/ # Web server and API
│ ├── utils/ # Utility functions
│ └── cli/ # Command-line interface
├── tests/ # Test suite
├── docs/ # Documentation
├── pyproject.toml # Project configuration
└── README.md # This file
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with FastAPI for the web framework
- Uses WebRTC for real-time audio streaming
- Supports Whisper and other ASR models
- Audio processing powered by librosa
- Real-time streaming implementation inspired by whisper_streaming
- Icons provided by Icons8
- WebRTC implementation powered by FastRTC
Make sure you have Python 3.10 or higher:
python --versionIf you encounter permission errors or import issues:
# Deactivate any existing environment
deactivate
# Remove and recreate virtual environment
rm -rf venv
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activateIf uv sync fails:
# Clear uv cache
uv cache clean
# Try again
uv syncIf you encounter dependency conflicts:
# With uv
uv sync --reinstall
# With pip
pip install --upgrade pip
pip install -r requirements.txt --force-reinstall