feat: add Qwen3-ASR engine with timestamp support and unit tests by phonk2682 · Pull Request #16 · minhsaco99/VoiceCore

phonk2682 · 2026-02-01T02:38:29Z

Description

Integrates the Qwen3-ASR engine into VoiceCore, supporting high-accuracy multilingual speech recognition (52 languages). This engine supports both batch and streaming transcription modes and includes word-level timestamp support via the Qwen3-ForcedAligner model.

Key changes:

Added the qwen3asr engine implementation in app/engines/stt/qwen3asr/.
Fixed an AttributeError when extracting timestamps from ForcedAlignItem (the qwen-asr library updated these from dictionaries to dataclasses).
Updated engines.yaml with a sample configuration for Qwen3-ASR.
Added a comprehensive unit test suite for the new engine, including timestamp verification for streaming.

Type of Change

Bug fix (Fixed attribute access for ForcedAlignItem)
New feature (Added word-level timestamp support for Qwen3)
New Engine (Qwen3-ASR STT provider)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (Cleaned up debug print statements)
Documentation update
Performance improvement

Checklist

I have read the CONTRIBUTING guide
My code follows the project's code style (make format)
Linting passes (make lint)
Tests pass (make test)
Documentation updated (Updated docstrings and engine implementation)
No sensitive information (API keys, secrets) included

Related Issues

Closes #

Testing & Verification

Automated Tests

Unit tests added/updated: tests/unit/engines/stt/qwen3asr/test_qwen3asr_engine.py (42 tests passed)
All existing tests pass: A total of 396 tests passed successfully (make test).

Manual Verification

Verified on a GPU-enabled environment:

Model loaded successfully using the vLLM backend.
Transcription returned accurate results (the AttributeError is resolved).
Word-level timestamps are correctly extracted into Segment objects.

API Endpoints Tested

Batch endpoint (POST /api/v1/stt/transcribe)
SSE streaming (POST /api/v1/stt/transcribe/stream)

Engine-Specific Tests

Engine type: STT
Provider: Qwen3-ASR
Model: Qwen3-ASR-1.7B + Qwen3-ForcedAligner-0.6B
Device tested: cuda

Security Impact

No security implications
Security impact (please describe below)

minhsaco99 · 2026-02-02T09:45:12Z

tests/unit/engines/stt/qwen3asr/__init__.py

remove this file

minhsaco99 · 2026-02-02T09:47:12Z

app/engines/stt/qwen3asr/engine.py

+
+            # Process with Qwen3-ASR
+            processing_start = time.time()
+            results = self._model.transcribe(


this model not supported stream ?

Copilot

Pull request overview

Adds a new STT provider to VoiceCore by integrating a Qwen3-ASR engine (batch + streaming) with optional word-level timestamp extraction, plus configuration and a dedicated unit test suite.

Changes:

Introduces Qwen3ASREngine and Qwen3ASRConfig under app/engines/stt/qwen3asr/.
Adds a qwen3-asr uv dependency group and a sample engines.yaml entry.
Adds unit tests covering config validation, lifecycle, batch transcription, streaming, and metrics.

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`app/engines/stt/qwen3asr/engine.py`	New Qwen3-ASR STT engine implementation (batch + streaming, metrics, optional timestamps).
`app/engines/stt/qwen3asr/config.py`	New Pydantic config model for Qwen3-ASR/vLLM and streaming parameters.
`app/engines/stt/qwen3asr/__init__.py`	Exports the new engine and config.
`engines.yaml`	Adds a sample Qwen3-ASR engine configuration entry.
`pyproject.toml`	Adds a `qwen3-asr` dependency group for `qwen-asr[vllm]`.
`tests/unit/engines/stt/qwen3asr/test_qwen3asr_engine.py`	New unit tests for Qwen3-ASR engine behavior and timestamps.
`tests/unit/engines/stt/qwen3asr/__init__.py`	Test package marker for the new unit test directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T10:17:30Z

app/engines/stt/qwen3asr/engine.py

+                ),
+                time_to_first_token_ms=time_to_first_token_ms,
+                total_stream_duration_ms=total_duration_ms,
+                total_chunks=int(len(audio_array) / chunk_size_samples) + 1,


total_chunks is computed as int(len(audio_array) / chunk_size_samples) + 1, which is off by one when len(audio_array) is an exact multiple of chunk_size_samples. Track the actual number of yielded STTChunks (or use a ceiling division) to avoid incorrect metrics.

Suggested change

total_chunks=int(len(audio_array) / chunk_size_samples) + 1,

total_chunks=(len(audio_array) + chunk_size_samples - 1) // chunk_size_samples,

Copilot · 2026-02-12T10:17:31Z

app/engines/stt/qwen3asr/engine.py

+            yield STTResponse(
+                text=final_text,
+                language=detected_language,
+                segments=None,  # Streaming doesn't return full segments list usually
+                performance_metrics=metrics,
+            )


The final streaming STTResponse hard-codes segments=None, but the PR description and the new unit tests expect word-level timestamps in streaming mode. Either populate segments when forced_aligner/timestamps are enabled, or update the tests/docs to reflect that streaming does not return segments.

Copilot · 2026-02-12T10:17:31Z

tests/unit/engines/stt/qwen3asr/test_qwen3asr_engine.py

+from unittest.mock import MagicMock, patch
+
+import numpy as np
+import pytest


These unit tests patch qwen_asr.Qwen3ASRModel, which will raise ModuleNotFoundError if the optional qwen-asr dependency group isn’t installed (e.g., after uv sync --dev). Add a module-level pytest.importorskip("qwen_asr", ...) (or similar guarding) so the overall test suite can run without optional deps.

Suggested change

import pytest

import pytest

pytest.importorskip("qwen_asr", reason="qwen-asr package is not installed")

Copilot · 2026-02-12T10:17:31Z

tests/unit/engines/stt/qwen3asr/test_qwen3asr_engine.py

+    async def test_transcribe_stream_returns_segments_with_timestamps(
+        self, qwen3asr_config, mock_qwen3asr_model
+    ):
+        """Should extract and return segments in streaming mode"""
+        from app.engines.stt.qwen3asr.engine import Qwen3ASREngine
+
+        engine = Qwen3ASREngine(qwen3asr_config)
+
+        mock_result = MagicMock()
+        mock_result.text = "Hello world"
+        mock_result.language = "English"
+
+        # Mock qwen-asr ForcedAlignItem mocks
+        item1 = MagicMock()
+        item1.text = "Hello"
+        item1.start_time = 0.0
+        item1.end_time = 0.5
+
+        item2 = MagicMock()
+        item2.text = "world"
+        item2.start_time = 0.5
+        item2.end_time = 1.0
+
+        mock_result.time_stamps = [item1, item2]
+        mock_qwen3asr_model.transcribe.return_value = [mock_result]
+
+        with patch.object(engine, "_audio_processor") as mock_processor:
+            mock_processor.to_numpy.return_value = (np.array([0.1, 0.2]), 16000)
+            mock_processor.get_duration_ms.return_value = 1000.0
+
+            chunks = []
+            async for item in engine.transcribe_stream(np.array([0.1, 0.2])):
+                chunks.append(item)
+
+            final_response = chunks[-1]
+            assert final_response.segments is not None
+            assert len(final_response.segments) == 2
+            assert final_response.segments[0].text == "Hello"
+            assert final_response.segments[0].start == 0.0
+            assert final_response.segments[0].end == 0.5
+


Qwen3ASREngine.transcribe_stream() never calls self._model.transcribe() and the implementation currently yields a final STTResponse with segments=None. This test sets up mock_qwen3asr_model.transcribe.return_value and then asserts final_response.segments contains timestamps, which will not be true with the current streaming implementation. Update the test to reflect actual streaming behavior, or extend the engine to extract/return segments in streaming mode.

Copilot · 2026-02-12T10:17:31Z

app/engines/stt/qwen3asr/engine.py

+            )
+        except ImportError as e:
+            raise EngineNotReadyError(
+                "qwen-asr package not installed. Run: pip install qwen-asr[vllm]"


The ImportError message suggests pip install qwen-asr[vllm], but this repo’s install flow uses uv dependency groups (see engines.yaml / pyproject.toml). Consider updating the message to also mention uv sync --group qwen3-asr so users follow the project’s standard setup path.

Suggested change

"qwen-asr package not installed. Run: pip install qwen-asr[vllm]"

"qwen-asr package not installed. Install via: uv sync --group qwen3-asr (recommended) or pip install qwen-asr[vllm]"

Copilot · 2026-02-12T10:17:32Z

app/engines/stt/qwen3asr/engine.py

+    async def transcribe(
+        self, audio_data: AudioInput, language: str | None = None, **kwargs
+    ) -> STTResponse:


**kwargs is accepted but never used. If callers pass engine-specific params via engine_params, they’ll be silently ignored. Consider forwarding supported kwargs into self._model.transcribe(...) (and/or streaming calls) or remove **kwargs to avoid a misleading API.

Copilot · 2026-02-12T10:17:32Z

app/engines/stt/qwen3asr/engine.py

+                # Capture first token time
+                if first_token_time is None:
+                    first_token_time = time.time()
+


In streaming mode, first_token_time is set on the first processed chunk regardless of whether any text was produced. This makes time_to_first_token_ms inaccurate when initial chunks yield empty text; align with the Whisper engine behavior by setting this timestamp only when state.text first becomes non-empty (or changes from empty).

Copilot · 2026-02-12T10:17:32Z

app/engines/stt/qwen3asr/engine.py

+                chunk_latency_ms = (time.time() - processing_start) * 1000
+                yield STTChunk(
+                    text=current_text,
+                    timestamp=float(pos) / sample_rate,
+                    confidence=None,
+                    chunk_latency_ms=chunk_latency_ms,
+                )


chunk_latency_ms is computed from processing_start, so it grows cumulatively over the stream. The STTChunk.chunk_latency_ms field is described as per-chunk latency; measure just the time spent processing the current chunk (similar to the Whisper streaming implementation).

Copilot · 2026-02-12T10:17:33Z

engines.yaml

+  qwen3-asr:
+    enabled: true
+    engine_class: "app.engines.stt.qwen3asr.engine.Qwen3ASREngine"


This config enables qwen3-asr by default. Since qwen-asr is in an optional dependency group, startup will log an initialization error (and the engine won’t register) unless users install the extra group. Consider setting enabled: false by default and leaving this as a sample config users can opt into.

Copilot · 2026-02-12T10:17:33Z

app/engines/stt/qwen3asr/config.py

+        gt=0,
+        description="Maximum batch size for inference. -1 for unlimited.",
+    )


max_inference_batch_size is documented as supporting -1 for unlimited, but the field is validated with gt=0, which rejects -1. Either allow -1 in validation (e.g., ge=-1 with a custom validator) or adjust the description/tests to match the enforced constraint.

Suggested change

gt=0,

description="Maximum batch size for inference. -1 for unlimited.",

)

ge=-1,

description="Maximum batch size for inference. -1 for unlimited.",

)

@field_validator("max_inference_batch_size")

@classmethod

def validate_max_inference_batch_size(cls, v: int) -> int:

"""

Ensure max_inference_batch_size is either -1 (for unlimited) or a positive integer.

"""

if v == -1 or v > 0:

return v

raise ValueError(

"max_inference_batch_size must be -1 (for unlimited) or a positive integer"

)

…r Qwen3-ASR STT.

feat: add Qwen3-ASR engine with timestamp support and unit tests

c3fa583

Copilot AI review requested due to automatic review settings February 1, 2026 02:38

Copilot started reviewing on behalf of phonk2682 February 1, 2026 02:38 View session

phonk2682 requested review from minhsaco99 and removed request for Copilot February 1, 2026 02:41

minhsaco99 requested changes Feb 2, 2026

View reviewed changes

update streaming feature for qwen3-asr

8eb0a5e

Copilot AI review requested due to automatic review settings February 12, 2026 10:09

Copilot started reviewing on behalf of phonk2682 February 12, 2026 10:09 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

phonk2682 added 2 commits February 13, 2026 08:59

feat: Implement CosyVoice3 TTS engine and add a manual test script fo…

d59f7f8

…r Qwen3-ASR STT.

fix test error

5581b59

phonk2682 requested a review from minhsaco99 February 13, 2026 09:18

	total_chunks=int(len(audio_array) / chunk_size_samples) + 1,
	total_chunks=(len(audio_array) + chunk_size_samples - 1) // chunk_size_samples,

	import pytest
	import pytest
	pytest.importorskip("qwen_asr", reason="qwen-asr package is not installed")

	"qwen-asr package not installed. Run: pip install qwen-asr[vllm]"
	"qwen-asr package not installed. Install via: uv sync --group qwen3-asr (recommended) or pip install qwen-asr[vllm]"

-        gt=0,
-        description="Maximum batch size for inference. -1 for unlimited.",
-    )
+        ge=-1,
+        description="Maximum batch size for inference. -1 for unlimited.",
+    )
+    @field_validator("max_inference_batch_size")
+    @classmethod
+    def validate_max_inference_batch_size(cls, v: int) -> int:
+        """
+        Ensure max_inference_batch_size is either -1 (for unlimited) or a positive integer.
+        """
+        if v == -1 or v > 0:
+            return v
+        raise ValueError(
+            "max_inference_batch_size must be -1 (for unlimited) or a positive integer"
+        )

Conversation

phonk2682 commented Feb 1, 2026

Description

Type of Change

Checklist

Related Issues

Testing & Verification

Automated Tests

Manual Verification

API Endpoints Tested

Engine-Specific Tests

Security Impact

Uh oh!

minhsaco99 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

minhsaco99 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants