HISI Interface

A modern, modular real-time Automatic Speech Recognition (ASR) interface with support for various ASR backends, built with FastAPI and WebRTC.

Features

🎤 Real-time ASR: Live transcription via WebRTC audio streaming
📁 File Upload: Upload and transcribe audio files
🔄 Modular Architecture: Easy integration of custom ASR backends
🌐 Web Interface: Modern, responsive web UI for configuration and monitoring
⚡ High Performance: Optimized for low-latency real-time processing by relying on WebRTC implemented via HuggingFace FastRTC library.

Supported ASR Backends

MLX Whisper (Apple Silicon optimized, word-level timestamps)
Whisper (OpenAI official) (segment-level timestamps, permissive license)

To use the standard OpenAI Whisper backend, set backend: "whisper" in your configuration (see below).

Quick Start

Installation

Prerequisites

Python 3.10 or higher
uv

Using uv (Recommended)

# Clone the repository
git clone https://github.com/Diabolocom-Research/HISI-interface.git
cd HISI-interface

# Install uv if you haven't already
pip install uv

# Create virtual environment and install dependencies
uv sync

# Start the virtual env
source .venv/bin/activate

# Install the package in development mode
uv pip install -e .

Start the Server

# Using the CLI
hisi-interface serve

# Or with uvicorn
uvicorn asr_interface.web.server:create_app --reload

Use the Web Interface

Open http://localhost:8000 in your browser
Configure your ASR model settings
Start recording or upload an audio file
View real-time transcriptions and segments

Architecture

The HISI Interface follows a clean, modular architecture based on the legacy whisper-streaming system:

asr_interface/
├── core/           # Core protocols and configuration
├── backends/       # ASR backends and model loaders
├── handlers/       # Real-time audio processing
├── web/           # FastAPI web server
├── utils/         # Audio processing utilities
└── cli/           # Command-line interface

Key Components

ASRBase: Abstract base class for ASR backends (Whisper, MLX Whisper, etc.)
OnlineASRProcessor: Main processor that manages audio buffering and hypothesis stabilization
ModelLoader: Protocol for loading ASR models and creating processors
RealTimeASRHandler: WebRTC audio stream handler
ASRComponentsStore: Thread-safe state management

Architecture Flow

Model Loading: ModelLoader creates an ASRBase backend and wraps it with OnlineASRProcessor
Audio Processing: OnlineASRProcessor manages audio buffering, calls the ASR backend, and stabilizes transcripts
Real-time Streaming: RealTimeASRHandler receives WebRTC audio and feeds it to the processor
Output: Stabilized transcripts are returned to the client

API Reference

Web API

POST /load_model - Load ASR model with configuration
POST /upload_and_transcribe - Upload and transcribe audio file
POST /evaluate_model - Evaluate model performance
GET / - Web interface
GET /transcript - Get transcript for WebRTC session

CLI Commands

# Start web server
hisi-interface serve [--host HOST] [--port PORT] [--reload]


# Show project info
hisi-interface info

Adding Custom ASR Backends

The modular architecture makes it easy to add custom ASR backends. There are two integration paths:

Path 1: Using Existing Real-Time Engine (Recommended)

For most use cases, you can provide your own ASR backend and reuse the existing OnlineASRProcessor:

from asr_interface.core.protocols import ModelLoader, ASRProcessor
from asr_interface.core.config import ASRConfig
from asr_interface.backends import ASRBase, OnlineASRProcessor

class MyCustomASR(ASRBase):
    """Your custom ASR backend implementing ASRBase."""

    def load_model(self, modelsize=None, cache_dir=None, model_dir=None):
        # Load your model
        pass

    def transcribe(self, audio, init_prompt=""):
        # Transcribe audio and return result with segments
        pass

    def ts_words(self, result):
        # Extract word-level timestamps
        pass

    def segments_end_ts(self, result):
        # Extract segment end timestamps
        pass

    def use_vad(self):
        # Enable VAD if supported
        pass

class MyCustomLoader(ModelLoader):
    def load(self, config: ASRConfig) -> tuple[ASRProcessor, dict]:
        # Create your ASR backend
        asr_backend = MyCustomASR(
            lan=config.lan,
            modelsize=config.model,
            cache_dir=config.model_cache_dir,
            model_dir=config.model_dir
        )

        # Wrap with OnlineASRProcessor
        processor = OnlineASRProcessor(
            asr=asr_backend,
            buffer_trimming=(config.buffer_trimming, int(config.buffer_trimming_sec)),
            min_chunk_sec=config.min_chunk_size
        )

        metadata = {"separator": asr_backend.sep, "model_type": "my_custom"}
        return processor, metadata

# Register your loader
from asr_interface.backends.registry import register_loader
register_loader("my_backend", MyCustomLoader())

Path 2: Complete Custom Real-Time Engine (Advanced)

For complete control over the real-time processing pipeline:

from asr_interface.core.protocols import ASRProcessor

class MyCustomRealTimeEngine(ASRProcessor):
    def init(self, offset: float = 0.0):
        # Reset state for new stream
        pass

    def insert_audio_chunk(self, audio: np.ndarray):
        # Add audio to buffer
        pass

    def process_iter(self) -> Optional[Tuple[float, float, str]]:
        # Process buffer and return results
        pass

    def finish(self) -> Optional[Tuple[float, float, str]]:
        # Process remaining audio
        pass

For detailed integration guides, see docs/INTEGRATION_GUIDE.md.

Example: Selecting a Backend

In your configuration (e.g., ASRConfig):

backend = "whisper"  # For standard OpenAI Whisper
# or
backend = "mlx_whisper"  # For MLX Whisper
# or
backend = "whisper_timestamped"  # For Whisper Timestamped

Development

Setup Development Environment

# Install development dependencies
uv sync --group dev

# Run tests
pytest

# Format code
black asr_interface tests

# Lint code
ruff check asr_interface tests

# Type checking
mypy asr_interface

Project Structure

HISI-interface/
├── asr_interface/          # Main package
│   ├── core/              # Core protocols and configuration
│   ├── backends/          # ASR model loaders
│   ├── handlers/          # Real-time processing handlers
│   ├── web/               # Web server and API
│   ├── utils/             # Utility functions
│   └── cli/               # Command-line interface
├── tests/                 # Test suite
├── docs/                  # Documentation
├── pyproject.toml         # Project configuration
└── README.md             # This file

Interface

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with FastAPI for the web framework
Uses WebRTC for real-time audio streaming
Supports Whisper and other ASR models
Audio processing powered by librosa
Real-time streaming implementation inspired by whisper_streaming
Icons provided by Icons8
WebRTC implementation powered by FastRTC

Troubleshooting

Common Issues

Python Version

Make sure you have Python 3.10 or higher:

python --version

Virtual Environment Issues

If you encounter permission errors or import issues:

# Deactivate any existing environment
deactivate

# Remove and recreate virtual environment
rm -rf venv
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

uv Issues

If uv sync fails:

# Clear uv cache
uv cache clean

# Try again
uv sync

Dependencies Issues

If you encounter dependency conflicts:

# With uv
uv sync --reinstall

# With pip
pip install --upgrade pip
pip install -r requirements.txt --force-reinstall

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
asr_interface		asr_interface
assets		assets
docs		docs
legacy		legacy
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
Screenshot 2025-07-21 at 18.09.15.png		Screenshot 2025-07-21 at 18.09.15.png
index.html		index.html
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

Diabolocom-Research/HISI-interface

Folders and files

Latest commit

History

Repository files navigation

HISI Interface

Features

Supported ASR Backends

Quick Start

Installation

Prerequisites

Using uv (Recommended)

Start the Server

Use the Web Interface

Architecture

Key Components

Architecture Flow

API Reference

Web API

CLI Commands

Adding Custom ASR Backends

Path 1: Using Existing Real-Time Engine (Recommended)

Path 2: Complete Custom Real-Time Engine (Advanced)

Example: Selecting a Backend

Development

Setup Development Environment

Project Structure

Interface

Contributing

License

Acknowledgments

Troubleshooting

Common Issues

Python Version

Virtual Environment Issues

uv Issues

Dependencies Issues

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 3

Uh oh!

Languages