diff --git a/README.md b/README.md index 2ddea46..6684028 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # LIRA: Local Inference tool for Realtime Audio

- LIRA logo + LIRA logo

**Local, efficient speech recognition. @@ -11,28 +11,52 @@ LIRA is a **CLI-first, developer-friendly tool**: run and serve ASR models local --- + +## 🧩 Supported Model Architectures & Runtimes + +LIRA supports multiple speech-model architectures. Runtime support depends on the exported model and chosen runtime. + +| Model | Typical use case | Runs on | Supported datatypes | +|----------------------|-----------------------------------------|-----------------|------------------------------------| +| whisper-base | Low-latency, resource-constrained | CPU, GPU, NPU | FP32, BFP16 | +| whisper-small | Balanced accuracy and performance | CPU, GPU, NPU | FP32, BFP16 | +| whisper-medium | Higher accuracy for challenging audio | CPU, GPU, NPU | FP32, BFP16 | +| whisper-large-v3-turbo | Highest accuracy (more compute) | CPU, GPU, NPU | FP32, BFP16 | +| Zipformer | Streaming / low-latency ASR encoder | CPU, GPU, NPU | FP32, BFP16 | + +*NPU support depends on available Vitis AI export artifacts and target hardware. + +--- + ## πŸš€ Getting Started **Prerequisites:** - **Python 3.10** is required. - We recommend using **conda** for environment management. -- For Ryzenβ„’ AI NPU flow, follow the [Ryzen AI installation instructions](https://ryzenai.docs.amd.com/en/latest/inst.html) and verify drivers/runtime for your device. Ensure that you have a Ryzen AI 300 Series machine to nebale NPU use cases. -- Current recommended Ryzen AI Version: RAI 1.5.1 with 32.0.203.280 driver. +- To use Ryzen AI powered NPUs, you must have a RYzen AI 300 series laptop. +- For RyzenAI NPU flow, follow the [RyzenAI installation instructions](https://ryzenai.docs.amd.com/en/latest/inst.html) and verify drivers/runtime for your device. +- Requires **Ryzen AI 1.6.0 or above** **Minimal install steps:** -1. **Clone the repo and change directory:** +1. **Activate your conda environment:** + If you have followed Ryzen AI setup instructions and installed the latest Ryzen AI, you'll have the latest environment available in `conda envs list`. + ```bash - git clone https://github.com/aigdat/LIRA.git - cd LIRA + conda activate ``` - -2. **Activate your conda environment:** + For example, use the latest ```bash conda activate ryzen-ai-1.6.0 ``` +2. **Clone the repo and change directory:** + ```bash + git clone https://github.com/aigdat/LIRA.git + cd LIRA + ``` + 3. **Install LIRA in editable mode:** ```bash pip install -e . @@ -66,8 +90,6 @@ lira run whisper --model-type whisper-base --export --device cpu --audio audio_f lira serve --backend openai --model whisper-base --device cpu --host 0.0.0.0 --port 5000 ``` ---- - ## πŸ–₯️ LIRA Server LIRA includes a FastAPI-based HTTP server for rapid integration with your applications. The server offers **OpenAI API compatibility** for real-time speech recognition. @@ -75,24 +97,52 @@ LIRA includes a FastAPI-based HTTP server for rapid integration with your applic **Start the server:** - **CPU acceleration:** + ```bash lira serve --backend openai --model whisper-base --device cpu --host 0.0.0.0 --port 5000 ``` - **NPU acceleration:** + ```bash lira serve --backend openai --model whisper-base --device npu --host 0.0.0.0 --port 5000 ``` +
+Test & Debug Your LIRA Server with a Sample Audio File + +Open a new command prompt and run the following `curl` command to verify your server is working. Replace `audio_files\test.wav` with your actual audio file path if needed: + +```bash +curl -X POST "http://localhost:5000/v1/audio/transcriptions" ^ + -H "accept: application/json" ^ + -H "Content-Type: multipart/form-data" ^ + -F "file=@audio_files\test.wav" ^ + -F "model=whisper-onnx" +``` + +If everything is set up correctly, you should receive a JSON response containing the transcribed text. +
+ +**Notes:** +- Models are configured in `config/model_config.json`. +- For protected backends, set API keys as environment variables. -> Interested in more server features? -> Try the **LIRA server demo** with Open WebUI. -> See [docs/OpenWebUI_README.md](docs/OpenWebUI_README.md) for setup instructions. +
-- Configure models via `config/model_config.json`. -- Set API keys (dummy) as environment variables for protected backends. +#### 🌐 **LIRA Server Demo: Try Open WebUI with LIRA** + +Interested in more? Try the **LIRA server demo** with Open WebUI for interactive transcription, monitoring, and management. + +

+ + πŸ‘‰ Get Started: Step-by-step Setup Guide + +

--- -## πŸƒ Running Models with `lira run` +
+ +## πŸ—£οΈ Running Models with `lira run` To run a model using the CLI: ```bash @@ -110,51 +160,56 @@ _Tip: run `lira run --help` for model-specific flags._ --- -### πŸ—£οΈ Running Whisper - -Whisper supports export/optimization and model-specific flags. +### Running Whisper Locally using `lira run whisper` -**Example:** -```bash -# Export Whisper base model to ONNX, optimize and run on NPU -lira run whisper --model-type whisper-base --export --device npu --audio --use-kv-cache +To export, optimize, compile, and run Whisper models locally, follow this section for examples. -# Run inference on a sample audio file -lira run whisper -m exported_models/whisper_base --device cpu --audio "audio_files/test.wav" -``` +Check out `lira run whisper --help` for more details on running Whisper and Whisper-specific CLI flags. **Key Whisper flags:** -- `--model-type` β€” Hugging Face model id +- `--model-type` β€” Hugging Face model ID - `--export` β€” export/prepare Whisper model to ONNX - `--export-dir` β€” output path for export - `--force` β€” overwrite existing export - `--use-kv-cache` β€” enable KV-cache decoding - `--static` β€” request static shapes during export - `--opset` β€” ONNX opset version -- `--eval-dir` / `--results-dir` β€” run dataset evaluation +**Examples:** + +- **Run `whisper-base` on CPU (auto-download, export, and run):** + ```bash + # Export Whisper base model to ONNX, optimize, and run on CPU with KV caching enabled + lira run whisper --model-type whisper-base --export --device cpu --audio audio_files/test.wav --use-kv-cache + ``` + +- **Run Whisper on NPU (encoder and decoder on NPU):** + ```bash + # Export Whisper base model to ONNX, optimize, and run on NPU + lira run whisper --model-type whisper-base --export --device npu --audio + ``` + *Note:* + KV caching is not supported on NPU. If you use `--use-kv-cache` with `--device npu`, only the encoder runs on NPU; the decoder runs on CPU. + +- **Run a locally exported Whisper ONNX model:** + ```bash + lira run whisper -m --device cpu --audio "audio_files/test.wav" + ``` --- -### πŸ”„ Running Zipformer +### πŸ”„ Running Zipformer using `lira run zipformer` + +Zipformer enables streaming, low-latency transcription. -Zipformer enables streaming, low-latency transcription. +Run `lira run zipformer -h` for +more details on runnign Zipformer. **Example:** +Using Zipformer English model exported in AMD Huggingface ```bash -lira run zipformer -m --device cpu --audio "audio_files/stream_sample.wav" +lira run zipformer -m aigdat/AMD-zipformer-en --device cpu --audio "audio_files/test.wav" ``` -**Common CLI Flags:** -- `-m`, `--model` β€” exported Zipformer model directory -- `--device` β€” target device -- `--audio` β€” input audio file (WAV) -- `--cache` β€” cache directory (optional) -- `--profile` β€” enable profiling - -_Tip: Run `lira run zipformer --help` for all options._ - ---- - ## βš™οΈ Configuration Model and runtime configs live in `config/`: @@ -166,19 +221,6 @@ You can point to custom config files or modify those in the repo. --- -## 🧩 Supported Model Architectures & Runtimes - -LIRA supports multiple speech-model architectures. Runtime support depends on the exported model and chosen runtime. - -| Model | Typical use case | Runs on | Supported datatypes | -|----------------------|-----------------------------------------|-----------------|------------------------------------| -| Whisper (small) | Low-latency, resource-constrained | CPU, GPU, NPU* | FP32, BFP16 | -| Whisper (base) | Balanced accuracy and performance | CPU, GPU, NPU* | FP32, BFP16 | -| Whisper (medium) | Higher accuracy for challenging audio | CPU, GPU, NPU* | FP32, BFP16 | -| Whisper (large) | Highest accuracy (more compute) | CPU, GPU | FP32, BFP16 | -| Zipformer | Streaming / low-latency ASR encoder | CPU, GPU, NPU* | FP32, BFP16 | - -*NPU support depends on available Vitis AI export artifacts and target hardware. ## πŸ§ͺ Early Access & Open Source Intentions @@ -239,5 +281,3 @@ This project is licensed under the terms of the MIT license. See the [LICENSE](LICENSE) file for details. Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved. - -SPDX-License-Identifier: MIT diff --git a/lira/models/whisper/transcribe.py b/lira/models/whisper/transcribe.py index 1e3849b..c86d5c1 100644 --- a/lira/models/whisper/transcribe.py +++ b/lira/models/whisper/transcribe.py @@ -241,7 +241,7 @@ def parse_cli(subparsers): whisper_parser.add_argument( "--device", default="cpu", - choices=["cpu", "npu", "igpu"], + choices=["cpu", "npu", "gpu"], help="Device to run the model on (default: cpu)", ) whisper_parser.add_argument( diff --git a/lira/server/openai_server.py b/lira/server/openai_server.py index e961c31..7ddf782 100644 --- a/lira/server/openai_server.py +++ b/lira/server/openai_server.py @@ -1,7 +1,4 @@ -# Copyright (C) 2025, Advanced Micro Devices, Inc. All rights reserved. -# SPDX-License-Identifier: MIT - -from fastapi import FastAPI, UploadFile, Form +from fastapi import FastAPI, UploadFile, Form, Depends from fastapi.responses import JSONResponse import tempfile import os @@ -9,94 +6,91 @@ from pathlib import Path from lira.utils.config import get_provider, get_cache_dir -app = FastAPI() -_model = None - - -def load_model(): - global _model - model_dir = os.getenv("LIRA_MODEL_DIR") - if not model_dir: - raise RuntimeError("LIRA_MODEL_DIR not set") - - encoder = os.path.join(model_dir, "encoder_model.onnx") - decoder = os.path.join(model_dir, "decoder_model.onnx") - decoder_init = os.path.join(model_dir, "decoder_init_model.onnx") - decoder_past = os.path.join(model_dir, "decoder_with_past_model.onnx") - device = "cpu" - - _model = WhisperONNX( - encoder_path=encoder, - decoder_path=decoder, - decoder_init_path=decoder_init, - decoder_past_path=decoder_past, - encoder_provider=get_provider( - device, - "whisper", - "encoder", - ), - decoder_provider=get_provider( - device, - "whisper", - "decoder", - ), - decoder_init_provider=get_provider( - device, - "whisper", - "decoder_init", - ), - use_kv_cache=True, - ) - - -@app.on_event("startup") -def _startup(): - load_model() - - -@app.post("/v1/audio/transcriptions") -async def transcribe(file: UploadFile, model: str = Form(...)): - if model not in ["whisper-1", "whisper-onnx"]: - return JSONResponse({"error": "Unsupported model"}, status_code=400) - - with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp: - tmp.write(await file.read()) - tmp_path = tmp.name - - try: - transcription, _ = _model.transcribe(tmp_path) - return JSONResponse({"text": transcription}) - finally: - os.remove(tmp_path) - - -def get_app(): - return app +class WhisperService: + def __init__(self, model_type: str, device: str): + self.model_type = model_type + self.device = device + self.model_dir = None + self.model = None + self._setup_model_dir_and_export() -def setup_openai_server(model_type, device): - if model_type.startswith("whisper-"): + def _setup_model_dir_and_export(self): # Use centralized cache directory cache_dir = get_cache_dir() / "models" cache_dir.mkdir(parents=True, exist_ok=True) - - output_dir = cache_dir / model_type - print(f"Checking cache for model {model_type} at {output_dir}...") + output_dir = cache_dir / self.model_type if not output_dir.exists(): print( - f"Exporting model {model_type} for device {device} to {output_dir}..." + f"Exporting model {self.model_type} for device {self.device} to {output_dir}..." ) export_whisper_model( - model_name=model_type, + model_name=self.model_type, output_dir=str(output_dir), opset=17, static=True, ) else: - print(f"Model {model_type} already exists in cache. Reusing cached model.") + print( + f"Model {self.model_type} already exists in cache. Reusing cached model." + ) + self.model_dir = str(output_dir) + self._load_model() + + def _load_model(self): + encoder = os.path.join(self.model_dir, "encoder_model.onnx") + decoder = os.path.join(self.model_dir, "decoder_model.onnx") + decoder_init = os.path.join(self.model_dir, "decoder_init_model.onnx") + decoder_past = os.path.join(self.model_dir, "decoder_with_past_model.onnx") + + self.model = WhisperONNX( + encoder_path=encoder, + decoder_path=decoder, + decoder_init_path=decoder_init, + decoder_past_path=decoder_past, + encoder_provider=get_provider(self.device, "whisper", "encoder"), + decoder_provider=get_provider(self.device, "whisper", "decoder"), + decoder_init_provider=get_provider(self.device, "whisper", "decoder_init"), + use_kv_cache=True, + ) + + def transcribe(self, wav_path: str): + transcription, _ = self.model.transcribe(wav_path) + return transcription + + +def create_app(model_type: str, device: str): + app = FastAPI() + whisper_service = WhisperService(model_type, device) + + def get_whisper_service(): + return whisper_service + + @app.post("/v1/audio/transcriptions") + async def transcribe( + file: UploadFile, + model: str = Form(...), + svc: WhisperService = Depends(get_whisper_service), + ): + if model not in ["whisper-1", "whisper-onnx"]: + return JSONResponse({"error": "Unsupported model"}, status_code=400) + + with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp: + tmp.write(await file.read()) + tmp_path = tmp.name + + try: + text = svc.transcribe(tmp_path) + return JSONResponse({"text": text}) + finally: + os.remove(tmp_path) + + return app + - os.environ["LIRA_MODEL_DIR"] = str(output_dir) - return get_app() - else: +# Entry function (to match your OpenAI-style signature) +def setup_openai_server(model_type: str, device: str): + if not model_type.startswith("whisper-"): raise ValueError(f"Unsupported model type: {model_type}") + return create_app(model_type=model_type, device=device) diff --git a/lira/server/server.py b/lira/server/server.py index 26a8818..feac1f7 100644 --- a/lira/server/server.py +++ b/lira/server/server.py @@ -12,7 +12,7 @@ def serve(): parser.add_argument("serve", help="Run server", nargs="?") parser.add_argument( "--backend", - choices=["openai", "deepgram"], + choices=["openai"], required=True, help="Choose which API style to expose", ) diff --git a/lira/utils/config.py b/lira/utils/config.py index 5285988..f31c397 100644 --- a/lira/utils/config.py +++ b/lira/utils/config.py @@ -51,6 +51,7 @@ def get_provider( tuple: A tuple containing the list of providers and provider options. """ # Normalize model name for variations like whisper-base, whisper-small, etc. + if model.startswith("whisper"): cache_key_startswith = model.replace("-", "_") model = "whisper" @@ -92,5 +93,8 @@ def get_provider( return [("VitisAIExecutionProvider", options)] + if device == "gpu": + return ["DmlExecutionProvider"] + # Simplify CPU case to return only the list of providers return ["CPUExecutionProvider"] diff --git a/setup.py b/setup.py index 0529491..2d36b8b 100644 --- a/setup.py +++ b/setup.py @@ -19,15 +19,15 @@ "soundfile", "gradio", "jiwer", - #dev tools + # dev tools "onnx", - "optimum==1.26.1" + "optimum==1.26.1", ], python_requires=">=3.7", entry_points={ "console_scripts": [ "run_asr=lira.scripts.run_asr:main", - "lira=lira.cli:main", + "lira=lira.cli:main", ], }, ) diff --git a/tests/test_openai_server.py b/tests/test_openai_server.py index 90c6a87..0252e37 100644 --- a/tests/test_openai_server.py +++ b/tests/test_openai_server.py @@ -11,10 +11,9 @@ class TestOpenAIServerReal(unittest.TestCase): @classmethod def setUpClass(cls): - + cls.app = setup_openai_server("whisper-base.en", "cpu") cls.client = TestClient(cls.app) - openai_server.load_model() def test_transcribe_sample_wav(self): sample_path = os.path.join("audio_files", "test.wav") diff --git a/tests/test_whisper_cpu.py b/tests/test_whisper_cpu.py index 6e37418..2a0fa02 100644 --- a/tests/test_whisper_cpu.py +++ b/tests/test_whisper_cpu.py @@ -7,6 +7,7 @@ import os import torchaudio + class TestWhisperONNX(unittest.TestCase): @classmethod def setUpClass(cls): @@ -17,7 +18,7 @@ def setUpClass(cls): output_dir=cls.export_dir, opset=17, static=True, - force=True + force=True, ) cls._encoder = os.path.join(cls.export_dir, "encoder_model.onnx") @@ -34,7 +35,7 @@ def test_01_whisper_base_transcribe_cpu(self): decoder_path=self._decoder, encoder_provider=["CPUExecutionProvider"], decoder_provider=["CPUExecutionProvider"], - decoder_init_provider=["CPUExecutionProvider"] + decoder_init_provider=["CPUExecutionProvider"], ) transcription, _ = whisper.transcribe(audio_path) print(transcription) @@ -52,12 +53,13 @@ def test_02_whisper_base_transcribe_cpu_kv_cache(self): encoder_provider=["CPUExecutionProvider"], decoder_provider=["CPUExecutionProvider"], decoder_init_provider=["CPUExecutionProvider"], - use_kv_cache=True + use_kv_cache=True, ) transcription, _ = whisper.transcribe(audio_path) print(transcription) self.assertIsInstance(transcription, str) self.assertGreater(len(transcription), 0) + if __name__ == "__main__": unittest.main() diff --git a/tests/test_whisper_npu.py b/tests/test_whisper_npu.py index a507aaa..5e55eba 100644 --- a/tests/test_whisper_npu.py +++ b/tests/test_whisper_npu.py @@ -8,6 +8,7 @@ from lira.models.whisper.export import export_whisper_model from lira.utils.config import get_provider + class TestWhisperONNX(unittest.TestCase): @classmethod def setUpClass(cls): @@ -19,7 +20,7 @@ def setUpClass(cls): output_dir=cls.export_dir, opset=17, static=True, - force=True + force=True, ) cls._encoder = os.path.join(cls.export_dir, "encoder_model.onnx") @@ -40,15 +41,25 @@ def test_02_whisper_base_transcribe_npu_kv_cache(self): decoder_path=self._decoder, decoder_init_path=self._decoder_init, decoder_past_path=self._decoder_past, - encoder_provider=get_provider(device, model, "encoder", cache_dir=self.export_dir + "_vitisai_cache"), - decoder_provider=get_provider("cpu", model, "decoder", cache_dir=self.export_dir + "_vitisai_cache"), - decoder_init_provider=get_provider("cpu", model, "decoder_init", cache_dir=self.export_dir + "_vitisai_cache"), - use_kv_cache=True + encoder_provider=get_provider( + device, model, "encoder", cache_dir=self.export_dir + "_vitisai_cache" + ), + decoder_provider=get_provider( + "cpu", model, "decoder", cache_dir=self.export_dir + "_vitisai_cache" + ), + decoder_init_provider=get_provider( + "cpu", + model, + "decoder_init", + cache_dir=self.export_dir + "_vitisai_cache", + ), + use_kv_cache=True, ) transcription, _ = whisper.transcribe(audio_path) print(transcription) self.assertIsInstance(transcription, str) self.assertGreater(len(transcription), 0) + if __name__ == "__main__": unittest.main() diff --git a/tests/test_zipformer_cpu.py b/tests/test_zipformer_cpu.py index e051222..2a6399f 100644 --- a/tests/test_zipformer_cpu.py +++ b/tests/test_zipformer_cpu.py @@ -6,6 +6,7 @@ from huggingface_hub import snapshot_download import os + class TestZipformerONNX(unittest.TestCase): @classmethod def setUpClass(cls): @@ -22,12 +23,13 @@ def test_transcribe_cpu(self): decoder_path=self.decoder_path, joiner_path=self.joiner_path, tokens=self.tokens_path, - device="cpu" + device="cpu", ) transcription = zipformer.transcribe("audio_files/test.wav") self.assertIsInstance(transcription, str) self.assertGreater(len(transcription), 0) + if __name__ == "__main__": unittest.main() diff --git a/tests/test_zipformer_npu.py b/tests/test_zipformer_npu.py index 73d9446..eee22d1 100644 --- a/tests/test_zipformer_npu.py +++ b/tests/test_zipformer_npu.py @@ -6,6 +6,7 @@ from huggingface_hub import snapshot_download import os + class TestZipformerONNX(unittest.TestCase): @classmethod def setUpClass(cls): @@ -22,12 +23,13 @@ def test_transcribe_npu(self): decoder_path=self.decoder_path, joiner_path=self.joiner_path, tokens=self.tokens_path, - device="npu" + device="npu", ) transcription = zipformer.transcribe("audio_files/test.wav") self.assertIsInstance(transcription, str) self.assertGreater(len(transcription), 0) + if __name__ == "__main__": unittest.main()