Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github/workflow		.github/workflow
audio		audio
backend		backend
src		src
ui		ui
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
requirement.txt		requirement.txt
setup.py		setup.py
temp.py		temp.py

Repository files navigation

🎙️✨ OpenVoice-Audio Summarizer

🎯 Overview

OpenVoice Audio Summarizer is an open-source platform designed to transform meeting recordings and audio files into concise summaries and actionable insights. The system helps improve meeting productivity by organizing key discussion points and reducing the time required to review recordings. It primarily operates using open-source end-to-end models, eliminating the need for external API services. As a result, the entire process runs locally on the user’s machine, ensuring zero API cost and better data privacy.

The OpenVoice Audio Summarizer is built using pre-trained models such as Falcon3 with 1 billion parameters and the Whisper Medium speech recognition model. Whisper is used to convert audio into text, while the Falcon model generates summaries and insights from the transcribed content.

To improve computational efficiency, the system runs on an NVIDIA GeForce RTX 2070 GPU. Additionally, 4-bit quantization is applied to the Falcon model to optimize memory usage and performance while maintaining model accuracy.

✨ Core Features

Audio File Processing
Audio File Preprocssing
Automatic Transcription
Automatic Speech Recognition (ASR)
Intent and Topic Detection
Context-Aware Summarization
Real Time Internal Logs
Local Model Execution

📸 OpenVoice Audio Summarizer Output (UI Preview)

OpenVoice Log Output

🏗️ Architecture

The system is follows a layered, interface-driven architecture, designed to ensure modularity, maintainability, and extensibility. Fundamental design principles include dependency injection, interface-based abstraction, separation of concerns, and a plugin-oriented tool architecture.

Key Architectural Highlights:

Dependencies are injected via a centralized container, promoting loose coupling and easy testing.
Core components define interfaces to enforcing contracts and enabling polymorphic behavior.
Singleton patterns are used for global utilities like logging.
The architecture defines clear layers: UI/API, Components (business logic), Infrastructure (integrations), Interfaces (contracts), and Container (dependency management).

┌─────────────────────────────────────┐
│       UI Layer (Streamlit)          │
│       Backend API (FastAPI)         │
├─────────────────────────────────────┤
│    Components (Business Logic)      │
│  ├─ Chat Generation Service         │
│  ├─ Chat Connection Service         │
│  ├─ Transcript Generation Service   │
│  └── Voice Connection Service       │
├─────────────────────────────────────┤
│    Infrastructure (Integrations)    │
│  ├─ FalconAI Service                │
|  ├─ WhisperAI Service               │
|  ├─ Audio Service                   │
│  ├─ Chat Prompt                     │
│  ├─ Environment & Configuration     │
│  └─ AI Models                       │
├─────────────────────────────────────┤
│    Interfaces (Contracts)           │
│  ├─ Chat Interfaces                 │
│  ├─ Audio Interfaces                │
│  ├─ Infrastructure Interfaces       │
│  ├─ Logging Interfaces              │
│  └─ LLM Interfaces                  │
├─────────────────────────────────────┤
│    Container (Dependency Injection) │
│    Dependency Wiring & Factory      │
├─────────────────────────────────────┤
│    Core (Utilities & Patterns)      │
│    Singleton Meta-class             │
└─────────────────────────────────────┘

Layer Responsibilities

UI/API Layer: Facilitates user interactions and exposes RESTful endpoints.
Components Layer: Orchestrates business logic and AI-driven chat operations.
Infrastructure Layer: Manages external integrations, Audio services, and system utilities.
Interfaces Layer: Defines abstract contracts to ensure modularity and testability.
Container LayerL: Oversees dependency injection and management of singleton instances.

📁 Project Structure

open-voice-audio-summarizer/
│
├── audio/                            # Audio files for testing
│   ├── meeting_audio.wav
│   └── team_meeting.mp3
│
├── backend/                          # FastAPI API Layer
│   ├── dependencies/                 # Dependency providers for routes
│   │   └── dependencies.py
│   │
│   ├── routes/                       # API endpoints
│   │   └── chat.py
│   │
│   └── main.py                       # FastAPI app initialization with lifespan
│
├── src/                              # Application Core Layer
│   ├── dto/                          # Internal DTOs (service layer contracts)
│   │   └── audio_data.py             # Audio data representation
│   │
│   │
│   ├── components/                   # Business logic orchestration
│   │   ├── chat_generation.py        # summary generation
│   │   ├── chat_connection.py        # falcon3 client management
│   │   ├── voice_transcription.py    # audio to raw text generation
│   │   └── voice_connection.py       # whisper client management
│   │
│   │
│   ├── container/                    # Dependency Injection
│   │   └── openvoice_container.py    # Factory for FalconAI and WhisperAI
│   │
│   │
│   ├── core/                         # Core utilities
│   │   └── singleton_meta.py         # Singleton pattern implementation
│   │
│   │
│   ├── infrastructure/                # External integrations
│   │   ├── falconai_service.py        # Falcon3 client wrapper
│   │   ├── falconai_client.py         # Low-level FalconAI integration
│   │   ├── whisperai_service.py       # Whisper client wrapper
│   │   ├── whisperai_client.py        # Low-level WhisperAI integration
│   │   ├── falcon_prompt.py           # Prompt management
│   │   ├── config_provider.py         # Configuration loader
│   │   └── audio_processor.py         # Audio preprocessing services
│   │
│   ├── interfaces/                          # Abstract contracts
│   │   ├── audio/                           # Audio interfaces
│   │   │   └── audio_service_interface.py
│   │   │
│   │   ├── chat/                            # Chat interfaces
│   │   │   └── chat_service_interface.py
│   │   │
│   │   ├── infra/                           # Infrastructure interfaces
│   │   │   └── config_provider_interface.py
│   │   │
│   │   ├── logging/                         # Logging interfaces
│   │   │   └── logger_interface.py
│   │   │
│   │   └── llm/                             # LLM interfaces
│   │       ├── falcon_interface.py
│   │       └── whisper_interface.py
│   │
│   │
│   └── logs/
│       ├── logger_singleton.py        # Singleton logger
│       └── logger_streamlit.py        # Streamlit logger
│
│
├── ui/                                # Frontend Layer
│   └── app.py                         # Streamlit openvoice interface
│
│
├── config.yaml                        # Application configuration
├── setup.py                           # Package setup
├── requirements.txt                   # Dependencies
└── README.md                          # Documentation

🔄 How It Works

Request Flow

User Upload Audio Input (Streamlit UI)
        │
        ▼
FastAPI Backend (async)
        │
        ├─► (/chat/user_upload)
        │   
        │
        ├─► (/chat/streamlit_logs)
        │   
        │
        ▼
OpenVoiceContainer ──────► VoiceConnectionService & ChatConnectionService
        │
        ├─► VoiceTranscriptionService
        │  	 (Create and return an audio transcription)
        │
        ├─► ChatGenerationService
        │   (Generation and retrun the audio summary)
        │
        ▼
AI Model & Audio Process
        │
        ├─► Audio Processor
        │      (Provide preprocess audio waveform)
        │      └─ Stream logs to UI
        │
        ├─► Whisper Audio Transcription
        │      (Transcribe audio data to text)
        │      └─ Stream logs to UI
        │
        ├─► Prompt Provider
        │      (Provide system prompts and user prompt with transcribed data)
        │      └─ Stream logs to UI
        │
        ├─► Falcon3 Text to Summary
        │      (Generate a response for given prompt)
        │      └─ Stream logs to UI
        │
        └─ If response generated:
        │        │
        │        └─► Return to UI
        │              ├─ Display final result to user
        │─────────►├─ Streaming logs continue

📌 Prerequisites

A system with GPU and CUDA support
Minimum GPU memory: 6 GB
Conda installed (for environment management)
Python 3.10 recommended

💻 Installation

Create a Conda environment:

conda create -n llm-openvoice python=3.10

Activate the environment:

conda activate llm-openvoice

Install PyTorch, TorchAudio, and CUDA support:

conda install pytorch torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Install dependencies

pip install -r requirement.txt

Download required models:

Falcon3-1B-Instruct → save to a local folder, e.g., G:/LLMs/Falcon3-1B-Instruct
- Download tiiuae/Falcon3-1B-Instruct
Whisper-Medium → save to a local folder, e.g., G:/LLMs/whisper_medium_en
- Download openai/whisper-medium.en
After downloading the models, update config.yaml with the paths where you saved the models

models:
  Falcon3-1B-Instruct:
    location: "YOUR_LOCAL_PATH/Falcon3-1B-Instruct"
  Whisper-Medium:
    location: "YOUR_LOCAL_PATH/whisper_medium_en"

🤝 Contributing

We welcome contributions related to:

Additional language support
AI & Prompt Engineering
Architecture Improvements
Backend Enhancements
UI Improvements
Testing & Quality Assurance

Contribution Steps

🍴 Fork the repository
🌿 Create a feature/* branch
🛠️ Commit changes with clear messages
📤 Open a Pull Request

🔀 Git Flow Workflow

The project follows a Git Flow–inspired workflow:

🌿 master — Stable, production-ready releases
🌱 develop — Active development branch
✨ feature/* — New feature branches

Typical Workflow

Pull latest changes from develop
Create a feature/* branch
Implement and test changes
Open PR → Merge into develop
Release from develop → Merge into master

This ensures stability while enabling safe feature development.

About

OpenVoice Audio Summarizer – An open-source platform that turns audio recordings into quick summaries and actionable insights, built with Pytorch, Falcon, Whsiper, Quantization, FastAPI and Streamlit.

python solid accelerator pytorch falcon quantization fastapi streamlit openai-whisper

Report repository

Releases

No releases published

Packages

Contributors

Languages

Python 100.0%