Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
011d489
first stab at STT evals
AkhileshNegi Jan 30, 2026
7777290
Merge branch 'main' of github.com:ProjectTech4DevAI/kaapi-backend int…
AkhileshNegi Jan 30, 2026
d8df80c
Merge branch 'main' of github.com:ProjectTech4DevAI/kaapi-backend int…
AkhileshNegi Jan 31, 2026
f1df7f9
fix migration naming
AkhileshNegi Jan 31, 2026
cda0611
fixing endpoints
AkhileshNegi Jan 31, 2026
ad5779f
update dataset endpoint
AkhileshNegi Jan 31, 2026
01e2beb
update types
AkhileshNegi Jan 31, 2026
1637007
updated dataset with URL
AkhileshNegi Jan 31, 2026
36af7e9
added few more testcases
AkhileshNegi Jan 31, 2026
78fd206
added storage to core for easy reuse
AkhileshNegi Jan 31, 2026
4ac2ca6
cleanup for audio duration
AkhileshNegi Jan 31, 2026
d8b531c
first stab at fixing celery task to cron
AkhileshNegi Jan 31, 2026
2295da5
added gemini as provider
AkhileshNegi Feb 2, 2026
25e6002
moving to batch job in gemini
AkhileshNegi Feb 2, 2026
db2512e
code refactoring, using batch requests and files similar to OpenAI
AkhileshNegi Feb 2, 2026
ff29ddd
few cleanups
AkhileshNegi Feb 2, 2026
cd979fd
updated migration
AkhileshNegi Feb 3, 2026
b6c633a
cleanup config for batch
AkhileshNegi Feb 3, 2026
b6e6649
moved documentation to separate folder
AkhileshNegi Feb 3, 2026
719584d
updated score format in stt result
AkhileshNegi Feb 3, 2026
bf0b4c2
cleaner dataset sample count
AkhileshNegi Feb 3, 2026
68e6821
got rid of redundant sample count
AkhileshNegi Feb 3, 2026
2247faa
removed deadcode
AkhileshNegi Feb 3, 2026
056612c
removing more redundant code
AkhileshNegi Feb 3, 2026
13bb9cc
clean few more cruds
AkhileshNegi Feb 3, 2026
7bbf811
more free from dead code
AkhileshNegi Feb 3, 2026
04e419c
cleanup batch request code
AkhileshNegi Feb 3, 2026
09deab2
cleanup batch
AkhileshNegi Feb 3, 2026
f6bf0c2
got rid of processed_samples as well
AkhileshNegi Feb 3, 2026
d20084b
cleanup provider_metadata from results
AkhileshNegi Feb 3, 2026
4afdd2d
cleanup optimize results
AkhileshNegi Feb 4, 2026
3e62a98
cleanup queries
AkhileshNegi Feb 4, 2026
63de270
cleanup leftovers
AkhileshNegi Feb 4, 2026
c95c044
added validation for provider
AkhileshNegi Feb 4, 2026
9aa6858
updated test suite
AkhileshNegi Feb 4, 2026
4a92416
coderabbit suggestions
AkhileshNegi Feb 4, 2026
e204416
added few more testcases
AkhileshNegi Feb 4, 2026
0210dab
added more testcases for coverage
AkhileshNegi Feb 4, 2026
cce5f11
moving to file table
AkhileshNegi Feb 5, 2026
497427e
Merge branch 'main' into feature/stt-evaluation
AkhileshNegi Feb 6, 2026
0d5a0f8
Merge branch 'feature/stt-evaluation' of github.com:ProjectTech4DevAI…
AkhileshNegi Feb 6, 2026
066f645
update migration
AkhileshNegi Feb 6, 2026
a3428df
updating with language id
AkhileshNegi Feb 6, 2026
5dcf743
updated testcases
AkhileshNegi Feb 6, 2026
d07f6fa
cleanup code
AkhileshNegi Feb 6, 2026
7f8cfaa
removed language_id from evaluation run
AkhileshNegi Feb 6, 2026
e357949
updated provider as gemini
AkhileshNegi Feb 6, 2026
0dabf82
added support for multiple provider
AkhileshNegi Feb 8, 2026
3cabb49
updated doc
AkhileshNegi Feb 8, 2026
5455139
merging with master
AkhileshNegi Feb 9, 2026
9587d44
updating migration
AkhileshNegi Feb 9, 2026
ee9fbc8
updated testcase
AkhileshNegi Feb 9, 2026
b9a92d0
updated testcase
AkhileshNegi Feb 9, 2026
d02b702
cleanup few things
AkhileshNegi Feb 9, 2026
94c4000
cleaned up unnecessary wrapper
AkhileshNegi Feb 9, 2026
ececf9a
reusing same status for stt results
AkhileshNegi Feb 9, 2026
2588bdf
updated routes
AkhileshNegi Feb 9, 2026
e3f4fec
update routs
AkhileshNegi Feb 9, 2026
f11bfea
coderabbit cleanups
AkhileshNegi Feb 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
472 changes: 472 additions & 0 deletions backend/app/alembic/versions/045_add_stt_evaluation_tables.py

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions backend/app/api/docs/stt_evaluation/create_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Create a new STT evaluation dataset with audio samples.

Each sample requires:
- **object_store_url**: S3 URL of the audio file (from /evaluations/stt/files endpoint)
- **ground_truth**: Reference transcription (optional, for WER/CER metrics)
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/get_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Get an STT dataset with its samples.
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/get_result.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Get a single STT transcription result.
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/get_run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Get an STT evaluation run with its results.
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/list_datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
List all STT evaluation datasets for the current project.
1 change: 1 addition & 0 deletions backend/app/api/docs/stt_evaluation/list_runs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
List all STT evaluation runs for the current project.
8 changes: 8 additions & 0 deletions backend/app/api/docs/stt_evaluation/start_evaluation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Start an STT evaluation run on a dataset.

The evaluation will:
1. Process each audio sample through the specified providers
2. Generate transcriptions using Gemini Batch API
3. Store results for human review

**Supported providers:** gemini-2.5-pro
Comment on lines +1 to +8
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation uses "providers" but the API model uses "models".

STTEvaluationRunCreate defines the field as models: list[str], but this doc references "providers" (lines 4 and 8). Update the terminology to match the API contract to avoid confusing consumers.

Proposed fix
 Start an STT evaluation run on a dataset.
 
 The evaluation will:
-1. Process each audio sample through the specified providers
+1. Process each audio sample through the specified models
 2. Generate transcriptions using Gemini Batch API
 3. Store results for human review
 
-**Supported providers:** gemini-2.5-pro
+**Supported models:** gemini-2.5-pro
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Start an STT evaluation run on a dataset.
The evaluation will:
1. Process each audio sample through the specified providers
2. Generate transcriptions using Gemini Batch API
3. Store results for human review
**Supported providers:** gemini-2.5-pro
Start an STT evaluation run on a dataset.
The evaluation will:
1. Process each audio sample through the specified models
2. Generate transcriptions using Gemini Batch API
3. Store results for human review
**Supported models:** gemini-2.5-pro
🤖 Prompt for AI Agents
In `@backend/app/api/docs/stt_evaluation/start_evaluation.md` around lines 1 - 8,
The docs refer to "providers" but the API expects "models"; update the wording
in this doc so it matches the STTEvaluationRunCreate contract by replacing
mentions of "providers" with "models" and noting the field name `models:
list[str]` where examples or supported options are listed (e.g., change
"**Supported providers:** gemini-2.5-pro" to "**Supported models:**
gemini-2.5-pro") and ensure any descriptive text referencing "providers" (lines
describing processing through providers) instead mentions processing through the
specified `models`.

5 changes: 5 additions & 0 deletions backend/app/api/docs/stt_evaluation/update_feedback.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Update human feedback on an STT transcription result.

**Fields:**
- **is_correct**: Boolean indicating if the transcription is correct
- **comment**: Optional feedback comment explaining issues or observations
7 changes: 7 additions & 0 deletions backend/app/api/docs/stt_evaluation/upload_audio.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Upload a single audio file to S3 for STT evaluation.

**Supported formats:** mp3, wav, flac, m4a, ogg, webm

**Maximum file size:** 200 MB

Returns the S3 URL which can be used when creating an STT dataset.
5 changes: 2 additions & 3 deletions backend/app/api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
model_evaluation,
collection_job,
)
from app.api.routes.evaluations import dataset as evaluation_dataset, evaluation
from app.api.routes import evaluations
from app.core.config import settings

api_router = APIRouter()
Expand All @@ -38,8 +38,7 @@
api_router.include_router(cron.router)
api_router.include_router(documents.router)
api_router.include_router(doc_transformation_job.router)
api_router.include_router(evaluation_dataset.router)
api_router.include_router(evaluation.router)
api_router.include_router(evaluations.router)
api_router.include_router(languages.router)
api_router.include_router(llm.router)
api_router.include_router(login.router)
Expand Down
12 changes: 12 additions & 0 deletions backend/app/api/routes/evaluations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
"""Main router for evaluation API routes."""

from fastapi import APIRouter

from app.api.routes.evaluations import dataset, evaluation
from app.api.routes.stt_evaluations.router import router as stt_router

router = APIRouter()

router.include_router(evaluation.router)
router.include_router(dataset.router)
router.include_router(stt_router)
5 changes: 5 additions & 0 deletions backend/app/api/routes/stt_evaluations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""STT Evaluation API routes."""

from .router import router

__all__ = ["router"]
193 changes: 193 additions & 0 deletions backend/app/api/routes/stt_evaluations/dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
"""STT dataset API routes."""

import logging

from fastapi import APIRouter, Body, Depends, HTTPException, Query

from app.api.deps import AuthContextDep, SessionDep
from app.api.permissions import Permission, require_permission
from app.crud.file import get_files_by_ids
from app.crud.language import get_language_by_id
from app.crud.stt_evaluations import (
get_stt_dataset_by_id,
list_stt_datasets,
get_samples_by_dataset_id,
)
from app.models.stt_evaluation import (
STTDatasetCreate,
STTDatasetPublic,
STTDatasetWithSamples,
STTSamplePublic,
)
from app.services.stt_evaluations.dataset import upload_stt_dataset
from app.utils import APIResponse, load_description

logger = logging.getLogger(__name__)

router = APIRouter()


@router.post(
"/datasets",
response_model=APIResponse[STTDatasetPublic],
dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))],
summary="Create STT dataset",
description=load_description("stt_evaluation/create_dataset.md"),
)
def create_dataset(
_session: SessionDep,
auth_context: AuthContextDep,
dataset_create: STTDatasetCreate = Body(...),
) -> APIResponse[STTDatasetPublic]:
"""Create an STT evaluation dataset."""
# Validate language_id if provided
if dataset_create.language_id is not None:
language = get_language_by_id(
session=_session, language_id=dataset_create.language_id
)
if not language:
raise HTTPException(
status_code=400, detail="Invalid language_id: language not found"
)

dataset, samples = upload_stt_dataset(
session=_session,
name=dataset_create.name,
samples=dataset_create.samples,
organization_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
description=dataset_create.description,
language_id=dataset_create.language_id,
)

return APIResponse.success_response(
data=STTDatasetPublic(
id=dataset.id,
name=dataset.name,
description=dataset.description,
type=dataset.type,
language_id=dataset.language_id,
object_store_url=dataset.object_store_url,
dataset_metadata=dataset.dataset_metadata,
sample_count=len(samples),
organization_id=dataset.organization_id,
project_id=dataset.project_id,
inserted_at=dataset.inserted_at,
updated_at=dataset.updated_at,
)
)


@router.get(
"/datasets",
response_model=APIResponse[list[STTDatasetPublic]],
dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))],
summary="List STT datasets",
description=load_description("stt_evaluation/list_datasets.md"),
)
def list_datasets(
_session: SessionDep,
auth_context: AuthContextDep,
limit: int = Query(50, ge=1, le=100, description="Maximum results to return"),
offset: int = Query(0, ge=0, description="Number of results to skip"),
) -> APIResponse[list[STTDatasetPublic]]:
"""List STT evaluation datasets."""
datasets, total = list_stt_datasets(
session=_session,
org_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
limit=limit,
offset=offset,
)

return APIResponse.success_response(
data=datasets,
metadata={"total": total, "limit": limit, "offset": offset},
)


@router.get(
"/datasets/{dataset_id}",
response_model=APIResponse[STTDatasetWithSamples],
dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))],
summary="Get STT dataset",
description=load_description("stt_evaluation/get_dataset.md"),
)
def get_dataset(
_session: SessionDep,
auth_context: AuthContextDep,
dataset_id: int,
include_samples: bool = Query(True, description="Include samples in response"),
sample_limit: int = Query(100, ge=1, le=1000, description="Max samples to return"),
sample_offset: int = Query(0, ge=0, description="Sample offset"),
) -> APIResponse[STTDatasetWithSamples]:
"""Get an STT evaluation dataset."""
dataset = get_stt_dataset_by_id(
session=_session,
dataset_id=dataset_id,
org_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
)

if not dataset:
raise HTTPException(status_code=404, detail="Dataset not found")

samples = []
samples_total = (dataset.dataset_metadata or {}).get("sample_count", 0)

if include_samples:
sample_records = get_samples_by_dataset_id(
session=_session,
dataset_id=dataset_id,
org_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
limit=sample_limit,
offset=sample_offset,
)

# Fetch file records to get object_store_url
file_ids = [s.file_id for s in sample_records]
file_records = get_files_by_ids(
session=_session,
file_ids=file_ids,
organization_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
)
file_map = {f.id: f for f in file_records}

samples = [
STTSamplePublic(
id=s.id,
file_id=s.file_id,
object_store_url=file_map.get(s.file_id).object_store_url
if s.file_id in file_map
else None,
language_id=s.language_id,
ground_truth=s.ground_truth,
sample_metadata=s.sample_metadata,
dataset_id=s.dataset_id,
organization_id=s.organization_id,
project_id=s.project_id,
inserted_at=s.inserted_at,
updated_at=s.updated_at,
)
for s in sample_records
]

return APIResponse.success_response(
data=STTDatasetWithSamples(
id=dataset.id,
name=dataset.name,
description=dataset.description,
type=dataset.type,
language_id=dataset.language_id,
object_store_url=dataset.object_store_url,
dataset_metadata=dataset.dataset_metadata,
organization_id=dataset.organization_id,
project_id=dataset.project_id,
inserted_at=dataset.inserted_at,
updated_at=dataset.updated_at,
samples=samples,
),
metadata={"samples_total": samples_total},
)
Loading