TTS/STT: Gemini-TTS Model Integration with unified API#574
TTS/STT: Gemini-TTS Model Integration with unified API#574Prajna1999 wants to merge 41 commits intomainfrom
Conversation
…-stt-new' into feature/unified-api-stt-new merge upstream
…ssistant_id handling
📝 WalkthroughWalkthroughThis PR introduces comprehensive LLM call tracking, Google Gemini integration for STT/TTS, and partial config version updates. It adds database schema for logging LLM calls, extends models to support multimodal inputs/outputs, implements Google AI provider with audio handling, refactors config versioning to allow partial updates, and updates services to wire LLM call creation/updates into the job execution flow. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as API Endpoint
participant JobService as Job Service
participant InputResolver as Input Resolver
participant CRUD as LLM CRUD
participant Database as Database
participant Provider as LLM Provider
Client->>API: POST /api/llm/execute
API->>JobService: start_job(request)
JobService->>InputResolver: resolve_input(query.input)
InputResolver-->>JobService: resolved_input_path
JobService->>CRUD: create_llm_call(..., request, job_id)
CRUD->>Database: INSERT llm_call
Database-->>CRUD: llm_call_id
CRUD-->>JobService: LlmCall
JobService->>Provider: execute(config, query, resolved_input)
Provider->>Provider: _execute_[stt|tts](...)
Provider-->>JobService: LLMCallResponse
JobService->>CRUD: update_llm_call_response(llm_call_id, response)
CRUD->>Database: UPDATE llm_call
Database-->>CRUD: updated LlmCall
JobService->>InputResolver: cleanup_temp_file(resolved_input_path)
InputResolver-->>JobService: cleanup_done
JobService-->>API: job_result
API-->>Client: response
sequenceDiagram
participant Client
participant API as Config API
participant VersionCRUD as Version CRUD
participant Database as Database
Client->>API: POST /api/configs/{id}/versions (partial)
API->>VersionCRUD: create_from_partial_or_raise(ConfigVersionUpdatePartial)
VersionCRUD->>VersionCRUD: _get_latest_version()
VersionCRUD->>Database: SELECT ConfigVersion WHERE NOT deleted
Database-->>VersionCRUD: latest_version
VersionCRUD->>VersionCRUD: _deep_merge(latest.config_blob, partial.config_blob)
VersionCRUD->>VersionCRUD: _validate_immutable_fields(latest, merged)
VersionCRUD->>VersionCRUD: ConfigBlob.model_validate(merged)
VersionCRUD->>Database: INSERT ConfigVersion
Database-->>VersionCRUD: new_version
VersionCRUD-->>API: ConfigVersion
API-->>Client: version_response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ConfigVersionUpdatePartial
be7f1ad to
e7bdf75
Compare
af1d59b to
2cd9519
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 9
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
backend/app/tests/api/routes/test_llm.py (1)
145-167:⚠️ Potential issue | 🟡 MinorTest may validate the wrong error path.
This test intends to verify that an invalid provider is rejected, but the payload at line 153-155 is missing the now-required
typefield. With discriminated-union validation ontype, the 422 may be triggered by the missingtyperather than the invalid provider. The assertion still passes but doesn't test what the docstring claims.Proposed fix
"config": { "blob": { "completion": { "provider": "invalid-provider", + "type": "text", "params": {"model": "gpt-4"}, } } },backend/app/core/langfuse/langfuse.py (1)
241-243:⚠️ Potential issue | 🟠 MajorExtract
query.inputvalue before passing to Langfuse, or verify serialization is safe.
query.inputcan be astr,TextInput, orAudioInput(discriminated union). Passing complexTextInput/AudioInputobjects directly to Langfuse'strace()andgeneration()inputfields may serialize unexpectedly (lines 241, 250). Either extract a loggable representation similar to the pattern used incrud/llm.pywith isinstance checks, or verify Langfuse gracefully handles non-string input.backend/app/services/llm/jobs.py (2)
156-158:⚠️ Potential issue | 🟡 MinorTrailing comma in log message.
The f-string on line 157-158 ends with a trailing comma and space:
"...task_id={task_id}, ". Remove the trailing comma.Proposed fix
- f"[execute_job] Starting LLM job execution | job_id={job_id}, task_id={task_id}, " + f"[execute_job] Starting LLM job execution | job_id={job_id}, task_id={task_id}"
345-374:⚠️ Potential issue | 🟠 MajorOutput guardrails assume
TextOutput— will break for audio responses.Lines 347, 360-362, and 374 access
response.response.output.content.valuewhich works forTextOutputbut will fail forAudioOutput(wherecontentisAudioContentwith base64 audio data, not meaningful text). If guardrails are configured for a TTS response, this path will crash or produce nonsensical results.Consider gating guardrail application on the output type:
Proposed fix
if response: if output_guardrails: + if response.response.output.type != "text": + logger.warning( + f"[execute_job] Output guardrails skipped for non-text output type: {response.response.output.type}" + ) + else: - output_text = response.response.output.content.value + output_text = response.response.output.content.value(Then indent the rest of the guardrails block under the
else.)
🤖 Fix all issues with AI agents
In `@backend/app/alembic/versions/045_add_llm_call_table.py`:
- Line 20: Add explicit return type annotations to the alembic migration
functions by updating the function signatures for upgrade() and downgrade() to
include a return type (e.g., change "def upgrade():" and "def downgrade():" to
"def upgrade() -> None:" and "def downgrade() -> None:"); locate both
occurrences (the definitions named upgrade and downgrade in this file, including
the other occurrence around the referenced line ~172) and ensure the signatures
use -> None to satisfy the project type-hinting guideline.
In `@backend/app/core/audio_utils.py`:
- Around line 30-45: The convert_pcm_to_ogg function swallows exceptions without
logging details; update its except block to log the exception and context before
returning. Locate convert_pcm_to_ogg and add or use a module logger (e.g.,
logger = logging.getLogger(__name__)), then replace the bare except handling
with logger.exception("Failed to convert PCM to OGG for sample_rate=%s",
sample_rate) (or similar) so the stack trace and error are recorded, and then
return None, str(e) as before.
- Around line 15-27: The convert_pcm_to_mp3 function should log failures and
include a docstring: add a short docstring matching convert_pcm_to_ogg
describing inputs/outputs, and in the except block call the module logger (use
logger.error or logger.exception) to log the error prefixed with
"[convert_pcm_to_mp3]" and include the exception details before returning (e.g.,
logger.exception(f"[convert_pcm_to_mp3] Failed to convert PCM to MP3: %s", e)).
Ensure you reference the existing logger variable and keep the returned tuple
behavior unchanged.
In `@backend/app/services/llm/jobs.py`:
- Around line 196-212: The block in execute_job that imports select, queries
recent jobs, and logs Recent jobs in DB is debug scaffolding and should be
removed; replace the entire "if not job:" branch so it only logs a concise error
using logger.error with the missing job_id (e.g., logger.error(f"[execute_job]
Job not found | job_id={job_id}")) and avoid importing sqlmodel.select or
iterating over other Job entries; keep the else branch that logs the found job
status intact.
- Around line 170-185: The guardrails handling assumes text payloads by directly
reading/writing request.query.input.content.value (TextInput) which will crash
for AudioInput/AudioContent; update the logic in the block around safe_input
handling to first check the input type (e.g., inspect request.query.input.type
or isinstance against TextInput) and only mutate
request.query.input.content.value for text inputs, otherwise skip guardrail
text-rewrite and skip/reject or return an appropriate APIResponse via
APIResponse.failure_response + handle_job_error for non-text types (or leave
audio untouched), ensuring branches reference the same identifiers (safe_input,
request.query.input, APIResponse.failure_response, handle_job_error,
request.callback_url, job_id).
- Around line 340-343: The cleanup currently compares resolved_input (str) to
request.query.input (QueryInput) which will never be equal and causes cleanup to
run for non-temp text inputs; change the finally block in jobs.py to only call
cleanup_temp_file(resolved_input) when the original input was an audio type or
when you explicitly created a temp file: e.g., check
isinstance(request.query.input, AudioInput) (or use a boolean flag like
temp_file_created set where you generate the temp file) before calling
cleanup_temp_file; reference resolved_input, request.query.input,
AudioInput/TextInput, and cleanup_temp_file to locate and fix the logic.
In `@backend/app/services/llm/mappers.py`:
- Around line 32-39: The call to litellm.supports_reasoning currently uses model
= kaapi_params.get("model") which can be None and causes runtime errors; update
the logic so you only call litellm.supports_reasoning(model=f"openai/{model}")
when model is truthy (e.g., after validating model or inside an if model: block)
and otherwise set support_reasoning to a safe default (False or None) so
downstream code using support_reasoning doesn't crash; locate the usage around
the variables model, support_reasoning and the litellm.supports_reasoning call
and move or guard that call accordingly.
In `@backend/app/services/llm/providers/registry.py`:
- Around line 75-159: Remove the ad-hoc testing block starting at the `if
__name__ == "__main__":` section in registry.py (this whole simulated test that
imports dotenv, reads GEMINI_API_KEY, constructs mock_credentials, calls
LLMProvider.get_provider_class, creates a client/instance, builds
NativeCompletionConfig and QueryParams, executes instance.execute, and writes
audio files) so no debug/test code is left in the module; also replace the
unnecessary f-string f"✅ TTS Success!" with a plain string "✅ TTS Success!" (or
remove the f prefix) to satisfy static analysis.
In `@backend/pyproject.toml`:
- Around line 39-40: The Docker image is missing system-level ffmpeg/libav
needed by pydub (used by convert_pcm_to_mp3 and convert_pcm_to_ogg in gai.py),
so update the Dockerfile's apt-get install line to include ffmpeg (or
libav-tools) so audio conversions succeed at runtime; ensure the package is
added to the same RUN apt-get install step and the image build still cleans up
apt caches as before.
🟡 Minor comments (14)
backend/app/services/evaluations/evaluation.py-145-152 (1)
145-152:⚠️ Potential issue | 🟡 MinorUnhandled
KeyErrorifconfig.completion.typeis unexpected.If
config.completion.typeholds a value not in theparam_modelsdict, line 151 raises a bareKeyError, which is caught by the genericexcepton line 169 but produces a cryptic error message. Consider using.get()with an explicit error.Additionally, evaluations currently only support the OpenAI provider (line 109), but there's no guard against non-
texttypes. An STT/TTS config would pass provider validation yet produce invalid evaluation JSONL for/v1/responses.Proposed fix
- model_class = param_models[config.completion.type] + model_class = param_models.get(config.completion.type) + if model_class is None: + raise HTTPException( + status_code=422, + detail=f"Unsupported completion type for evaluation: '{config.completion.type}'. " + f"Only 'text' is currently supported.", + ) validated_params = model_class.model_validate(config.completion.params)backend/app/tests/api/routes/configs/test_version.py-477-477 (1)
477-477:⚠️ Potential issue | 🟡 MinorUnused import:
TextLLMParams.
TextLLMParamsis imported but never used in this test function.- from app.models.llm.request import KaapiCompletionConfig, TextLLMParams + from app.models.llm.request import KaapiCompletionConfigbackend/app/models/llm/response.py-7-8 (1)
7-8:⚠️ Potential issue | 🟡 MinorUnused imports:
base64andmodel_validator.Neither
base64normodel_validatoris referenced anywhere in this file.-import base64 -from pydantic import model_validator from sqlmodel import SQLModel, Fieldbackend/app/tests/utils/test_data.py-339-358 (1)
339-358:⚠️ Potential issue | 🟡 Minor
config_typecan beNoneif"type"key is missing from the stored config blob.Line 342 uses
.get("type")which returnsNoneif the key is absent. PassingNonetoNativeCompletionConfig(type=config_type)orKaapiCompletionConfig(type=config_type)will fail validation sincetypeisLiteral["text", "stt", "tts"].Proposed fix — add a default
- config_type = completion_config.get("type") + config_type = completion_config.get("type", "text")backend/app/services/llm/providers/registry.py-71-71 (1)
71-71:⚠️ Potential issue | 🟡 MinorLog message missing function name prefix.
Per coding guidelines, log messages should be prefixed with the function name in square brackets.
Proposed fix
- logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True) + logger.error(f"[get_llm_provider] Failed to initialize {provider_type} client: {e}", exc_info=True)As per coding guidelines, "Prefix all log messages with the function name in square brackets."
backend/app/services/llm/providers/gai.py-118-118 (1)
118-118:⚠️ Potential issue | 🟡 MinorRemove extraneous
fprefix from f-strings with no placeholders.Static analysis (Ruff F541) flags lines 118 and 272 as f-strings that contain no interpolated expressions.
Proposed fix
- f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros" + "[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"- f"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros" + "[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"Also applies to: 272-272
backend/app/crud/llm.py-39-47 (1)
39-47:⚠️ Potential issue | 🟡 Minor
size_bytesrecords the base64 string length, not the actual audio byte count.
len(query_input.content.value)wherevalueis a base64-encoded string gives the character count of the encoding, which is ~33% larger than the actual audio data. If the intent is to store real audio size, decode first (or multiply by 3/4). If the intent is just to store the payload size, rename tobase64_lengthto avoid confusion with the decoded-size calculation inupdate_llm_call_response(line 186-187).backend/app/models/llm/request.py-180-182 (1)
180-182:⚠️ Potential issue | 🟡 MinorStale description:
"LLM provider (openai)"should reflect the expanded provider set.The field description still says
(openai)but theLiteralnow includes"google"as well.Proposed fix
provider: Literal["openai", "google"] = Field( - ..., description="LLM provider (openai)" + ..., description="LLM provider (openai, google)" )backend/app/crud/llm.py-78-80 (1)
78-80:⚠️ Potential issue | 🟡 Minor
getattrfallback on adictwill never resolve"type"correctly.
completion_config.paramsisdict[str, Any](per bothNativeCompletionConfigandKaapiCompletionConfig).getattr(a_dict, "type", "text")checks for a Python attribute namedtypeon the dict object, not a dict key — it would always return"text"regardless of the dict's contents. Per a retrieved learning, thetypefield oncompletion_configis mandatory, socompletion_config.typeshould always be set and this fallback is effectively dead code. If the fallback is still desired for safety, use.get().Proposed fix
- completion_type = completion_config.type or getattr( - completion_config.params, "type", "text" - ) + completion_type = completion_config.typeBased on learnings: "the
typefield inconfig_blob.completion(for both NativeCompletionConfig and KaapiCompletionConfig) is compulsory/mandatory during config creation and validation ensures it will never be missing."backend/app/tests/crud/test_llm.py-42-46 (1)
42-46:⚠️ Potential issue | 🟡 MinorMissing return type annotation on
test_jobfixture.All other fixtures and test functions include return type hints. This fixture is missing one.
As per coding guidelines: "Always add type hints to all function parameters and return values in Python code."
Proposed fix
`@pytest.fixture` -def test_job(db: Session): +def test_job(db: Session) -> Any: """Create a test job for LLM call tests.""" crud = JobCrud(db) return crud.create(job_type=JobType.LLM_API, trace_id="test-llm-trace")Replace
Anywith the actualJobtype if available — likelyfrom app.models import Jobwould be the proper return type.backend/app/services/llm/mappers.py-98-102 (1)
98-102:⚠️ Potential issue | 🟡 Minor
modelis unconditionally set (possibly toNone) unlike the OpenAI mapper which guards withif model.Line 102 sets
google_params["model"] = kaapi_params.get("model")which could storeNone. The OpenAI mapper (line 61-62) only sets model when truthy. Consider aligning the behavior or adding a guard.Proposed fix
- # Model is present in all param types - google_params["model"] = kaapi_params.get("model") + model = kaapi_params.get("model") + if not model: + raise ValueError("Missing required 'model' parameter in kaapi_params") + google_params["model"] = modelbackend/app/crud/llm.py-95-99 (1)
95-99:⚠️ Potential issue | 🟡 Minor
hasattron adictwon't find the"model"key — always falls to theelsebranch.
completion_config.paramsis a plaindict.hasattr(dict_instance, "model")checks for a Python attribute, not a dict key, so it will always beFalse. Theelsebranch with.get("model", "")is the one that actually executes. Simplify to just the.get()call.Proposed fix
- model = ( - completion_config.params.model - if hasattr(completion_config.params, "model") - else completion_config.params.get("model", "") - ) + model = completion_config.params.get("model", "")backend/app/models/llm/request.py-9-11 (1)
9-11:⚠️ Potential issue | 🟡 MinorDuplicate imports:
FieldandSQLModelare imported twice.Line 4 imports
Field, SQLModelfromsqlmodel, and line 11 reimports them along withIndex, text. Ruff (F811) correctly flags this. Consolidate into a single import.Proposed fix
-from sqlmodel import Field, SQLModel from pydantic import Discriminator, model_validator, HttpUrl from datetime import datetime from app.core.util import now import sqlalchemy as sa from sqlalchemy.dialects.postgresql import JSONB -from sqlmodel import Field, SQLModel, Index, text +from sqlmodel import Field, SQLModel, Index, textbackend/app/services/llm/providers/gai.py-230-257 (1)
230-257:⚠️ Potential issue | 🟡 MinorFix misleading comment about Gemini TTS output format.
Line 231 incorrectly states the output is "PCM (WAV format)" — Gemini TTS returns raw PCM bytes without a WAV container. The WAV header is added client-side if needed. Change the comment to clarify: "Gemini TTS natively outputs 24kHz 16-bit PCM (raw audio data, no WAV header)". The conversion logic itself is correct and properly handles raw PCM input.
🧹 Nitpick comments (23)
backend/app/tests/seed_data/seed_data.py (1)
90-97: Consider updating log message prefixes to use individual function names.All log messages currently use the generic
[tests.seed_data]prefix. This makes it harder to filter logs by specific functions during debugging. As per coding guidelines, each function should prefix logs with its own name:[load_seed_data],[create_organization],[clear_database], etc.Example fix for load_seed_data
except FileNotFoundError: logging.error( - f"[tests.seed_data]Error: Seed data file not found at {json_path}" + f"[load_seed_data] Error: Seed data file not found at {json_path}" ) raise except json.JSONDecodeError as e: logging.error( - f"[tests.seed_data]Error: Failed to decode JSON from {json_path}: {e}" + f"[load_seed_data] Error: Failed to decode JSON from {json_path}: {e}" )As per coding guidelines: "Prefix all log messages with the function name in square brackets."
Also applies to: 110-110, 137-137, 157-157, 212-212, 254-254, 293-293, 344-344, 359-359, 389-389, 418-418, 438-438, 479-479, 481-481
backend/app/services/llm/input_resolver.py (1)
33-33: Remove or replace the# important!!comment.This comment doesn't convey any useful information. Either remove it or replace with a meaningful note about why this function is critical.
Proposed fix
-# important!! def resolve_input(query_input: QueryInput) -> tuple[str, str | None]:backend/app/tests/services/llm/providers/test_registry.py (1)
23-26: Consider adding a test for thegoogle-nativeregistry entry.
test_registry_contains_openaiexplicitly asserts"openai-native"is in the registry and maps toOpenAIProvider. With the newgoogle-nativeprovider, a parallel assertion forGoogleAIProviderwould prevent regressions on the new provider wiring.backend/app/tests/api/routes/configs/test_config.py (1)
49-59: Consider assertingmax_tokensis stripped.The comment on line 49 states that "invalid fields like max_tokens are stripped," but the assertions only verify that valid fields are present. Adding a negative assertion (e.g.,
assert "max_tokens" not in ...params) would strengthen this test.backend/app/models/config/version.py (1)
99-116: Missing emptyconfig_blobvalidation.
ConfigVersionBasevalidates thatconfig_blobis non-empty viavalidate_blob_not_empty, butConfigVersionUpdatePartialdoesn't inherit from it and lacks this check. A request withconfig_blob: {}would pass model validation. Consider adding the same validator or inheriting it.Proposed fix
+from pydantic import field_validator + class ConfigVersionUpdatePartial(SQLModel): ... config_blob: dict[str, Any] = Field( description="Partial config blob. Only include fields you want to update. " "Provider and type are inherited from existing config and cannot be changed.", ) commit_message: str | None = Field( default=None, max_length=512, description="Optional message describing the changes in this version", ) + + `@field_validator`("config_blob") + def validate_blob_not_empty(cls, value: dict[str, Any]) -> dict[str, Any]: + if not value: + raise ValueError("config_blob cannot be empty") + return valuebackend/app/api/routes/config/version.py (1)
25-30: Missing return type annotation oncreate_version.As per coding guidelines, all function parameters and return values should have type hints.
Proposed fix
def create_version( config_id: UUID, version_create: ConfigVersionUpdatePartial, current_user: AuthContextDep, session: SessionDep, -): +) -> APIResponse[ConfigVersionPublic]:backend/app/services/llm/providers/PLAN.md (1)
1-88: Consider whether this planning document should be committed to the repository.This file reads like an internal implementation specification / prompt for development. If it's meant to be permanent documentation, it should live in a
docs/directory with proper formatting. If it was used only during development, consider removing it before merge.A few text issues if keeping it:
- Line 84:
thorugh→through- Line 84:
providers.gai.py→providers/gai.py(wrong separator)- Line 88:
over abstracted→over-abstractedbackend/app/tests/api/routes/configs/test_version.py (1)
522-620: Inconsistent assertion depth across type-change tests.
test_create_version_cannot_change_type_from_text_to_stt(Line 516-519) verifies the error message content ("cannot change config type","text","stt"), but the stt→tts and tts→text variants only assertstatus_code == 400without checking the error detail. Consider adding the same error message assertions for consistency and to catch regressions in error messaging.backend/app/alembic/versions/045_add_llm_call_table.py (2)
161-168: Redundantalter_columnon a table created in the same migration.Lines 22-138 create
llm_callwithprovidercomment set to"AI provider: openai, google, anthropic", then lines 161-168 immediately alter it to a different comment. Set the correct comment directly increate_tableand remove thisalter_column. Similarly, the downgrade (lines 182-189) alters this column's comment right before dropping the table — also unnecessary.Proposed fix in create_table
sa.Column( "provider", sa.String(), nullable=False, - comment="AI provider: openai, google, anthropic", + comment="AI provider as sent by user (e.g openai, -native, google)", ),Then remove the
alter_columnblocks forllm_call.providerin bothupgrade()anddowngrade().
22-152: Consider adding an index onproject_idfor query performance.The table has foreign keys on
job_id,project_id, andorganization_id, but onlyjob_idandconversation_idare indexed. If LLM calls are frequently queried by project (e.g., listing calls for a project), a partial index onproject_id(filtered bydeleted_at IS NULL) would improve performance.backend/app/tests/services/llm/test_mappers.py (2)
9-15:TTSLLMParamsis imported but unused in this test file.No tests exercise TTS-specific parameter mapping for Google (e.g.,
voice,languagefields). Given this PR introduces Gemini TTS, consider adding a test usingTTSLLMParamsto verifymap_kaapi_to_google_paramscorrectly mapsvoiceandlanguageparameters.Example test for TTS param mapping
def test_tts_params_mapping(self): """Test TTS-specific parameter mapping (voice, language).""" kaapi_params = TTSLLMParams( model="gemini-2.5-pro-preview-tts", voice="Kore", language="hi-IN", ) result, warnings = map_kaapi_to_google_params( kaapi_params.model_dump(exclude_none=True) ) assert result["model"] == "gemini-2.5-pro-preview-tts" assert result["voice"] == "Kore" assert result["language"] == "hi-IN" assert warnings == []
457-496: Consider adding a TTS-specific transform test for Google.The Google transform tests cover
type="stt"(line 461) andtype="text"(line 482) but nottype="tts", which is the primary use case this PR introduces.backend/app/tests/services/llm/test_jobs.py (1)
18-22: Remove the commented-out import.Line 20 has
# KaapiLLMParamsleft as a comment. Clean up dead code.Proposed fix
TextOutput, TextContent, - # KaapiLLMParams, KaapiCompletionConfig, )backend/app/services/llm/jobs.py (1)
253-255: Removeelse: pass— it's dead code.Proposed fix
request.request_metadata.setdefault("warnings", []).extend(warnings) - else: - pass except Exception as e:backend/app/tests/utils/test_data.py (1)
23-23: Remove unusedKaapiLLMParamsimport.The code constructs configs with dict-based
paramsdirectly.KaapiLLMParamsis imported but never referenced in the file.Proposed fix
-from app.models.llm import KaapiLLMParams, KaapiCompletionConfig, NativeCompletionConfig +from app.models.llm import KaapiCompletionConfig, NativeCompletionConfigbackend/app/crud/llm.py (1)
10-17: Nit: group stdlib imports together.
base64andjson(stdlib) are separated fromlogging,UUID, etc. by third-party imports. Minor readability nit.Proposed grouping
import logging +import base64 +import json from typing import Any, Literal - from uuid import UUID + from sqlmodel import Session, select from app.core.util import now -import base64 -import json from app.models.llm import LlmCall, LLMCallRequest, ConfigBlobbackend/app/crud/config/version.py (2)
291-341:_validate_config_type_unchangedduplicates_get_latest_version.Lines 300-311 replicate the same query that
_get_latest_version()(lines 155-168) already encapsulates. Consider reusing_get_latest_version()here.Proposed refactor
def _validate_config_type_unchanged( self, version_create: ConfigVersionCreate ) -> None: - # Get the latest version - stmt = ( - select(ConfigVersion) - .where( - and_( - ConfigVersion.config_id == self.config_id, - ConfigVersion.deleted_at.is_(None), - ) - ) - .order_by(ConfigVersion.version.desc()) - .limit(1) - ) - latest_version = self.session.exec(stmt).first() + latest_version = self._get_latest_version() # If this is the first version, no validation needed if latest_version is None: return
34-76: Bothcreate_or_raiseandcreate_from_partial_or_raiseshare identical version-creation boilerplate.The try/except block that calls
_get_next_version, constructs aConfigVersion, commits, and handles errors is duplicated across both methods (lines 45-76 and 122-153). Consider extracting a private_persist_version(config_blob_dict, commit_message)helper.Also applies to: 78-153
backend/app/services/llm/mappers.py (1)
42-50: Temperature is always suppressed when the model supports reasoning, even if noreasoningparam was provided.If a user sends only
temperature(noreasoningparam) for a reasoning-capable model, the temperature is silently dropped with a warning. This may be intentional per OpenAI API constraints, but it could surprise users who didn't opt into reasoning. Consider only suppressing temperature when reasoning is actually set.backend/app/services/llm/providers/gai.py (4)
196-198: Movefrom google.genai import typesto module-level imports.Inline imports inside a method body can hide dependencies and incur repeated import overhead on each call. Since
google.genaiis already imported at the module level (line 3), move this import to the top of the file.Proposed fix
At the top of the file:
from google import genai from google.genai.types import GenerateContentResponse +from google.genai import typesInside
_execute_tts, remove:- from google.genai import types
25-32: Redundantself.client = client— already set bysuper().__init__(client).
BaseProvider.__init__already assignsself.client = client(per the base class snippet). The explicit reassignment on line 32 is unnecessary.Proposed fix
def __init__(self, client: genai.Client): super().__init__(client) - self.client = client
340-350: Broadexcept Exceptionswallows the original error message from the caller.Line 346 sets
error_message = "Unexpected error occurred"discarding the actual exception text. While the detailed error is logged, the caller only sees a generic message. This makes debugging harder for API consumers. Consider including a sanitized error summary.
91-97: Uploaded file resource is never cleaned up.
self.client.files.upload(file=resolved_input)uploads the audio file to Google's storage, but there's nofinallyblock or cleanup call (e.g.,self.client.files.delete(name=gemini_file.name)) after the transcription completes. Over time this could accumulate orphaned files in the Google AI storage.
| depends_on = None | ||
|
|
||
|
|
||
| def upgrade(): |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add return type annotations to upgrade() and downgrade().
Per coding guidelines, all functions should have type hints.
-def upgrade():
+def upgrade() -> None:-def downgrade():
+def downgrade() -> None:As per coding guidelines: "Always add type hints to all function parameters and return values in Python code."
Also applies to: 172-172
🤖 Prompt for AI Agents
In `@backend/app/alembic/versions/045_add_llm_call_table.py` at line 20, Add
explicit return type annotations to the alembic migration functions by updating
the function signatures for upgrade() and downgrade() to include a return type
(e.g., change "def upgrade():" and "def downgrade():" to "def upgrade() ->
None:" and "def downgrade() -> None:"); locate both occurrences (the definitions
named upgrade and downgrade in this file, including the other occurrence around
the referenced line ~172) and ensure the signatures use -> None to satisfy the
project type-hinting guideline.
| def convert_pcm_to_mp3( | ||
| pcm_bytes: bytes, sample_rate: int = 24000 | ||
| ) -> tuple[bytes | None, str | None]: | ||
| try: | ||
| audio = AudioSegment( | ||
| data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1 | ||
| ) | ||
|
|
||
| output_buffer = io.BytesIO() | ||
| audio.export(output_buffer, format="mp3", bitrate="192k") | ||
| return output_buffer.getvalue(), None | ||
| except Exception as e: | ||
| return None, str(e) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Log errors before returning them.
The logger is initialized (Line 12) but never used. When conversion fails, the exception is silently returned as a string. Per coding guidelines, log messages should be prefixed with the function name.
Also, convert_pcm_to_mp3 is missing its docstring (unlike convert_pcm_to_ogg).
Proposed fix
def convert_pcm_to_mp3(
pcm_bytes: bytes, sample_rate: int = 24000
) -> tuple[bytes | None, str | None]:
+ """Convert raw PCM to MP3 format."""
try:
audio = AudioSegment(
data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1
)
output_buffer = io.BytesIO()
audio.export(output_buffer, format="mp3", bitrate="192k")
return output_buffer.getvalue(), None
except Exception as e:
+ logger.error(f"[convert_pcm_to_mp3] Failed to convert PCM to MP3: {e}")
return None, str(e)As per coding guidelines, prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message").
🤖 Prompt for AI Agents
In `@backend/app/core/audio_utils.py` around lines 15 - 27, The convert_pcm_to_mp3
function should log failures and include a docstring: add a short docstring
matching convert_pcm_to_ogg describing inputs/outputs, and in the except block
call the module logger (use logger.error or logger.exception) to log the error
prefixed with "[convert_pcm_to_mp3]" and include the exception details before
returning (e.g., logger.exception(f"[convert_pcm_to_mp3] Failed to convert PCM
to MP3: %s", e)). Ensure you reference the existing logger variable and keep the
returned tuple behavior unchanged.
| def convert_pcm_to_ogg( | ||
| pcm_bytes: bytes, sample_rate: int = 24000 | ||
| ) -> tuple[bytes | None, str | None]: | ||
| """Convert raw PCM to OGG with Opus codec.""" | ||
| try: | ||
| audio = AudioSegment( | ||
| data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1 | ||
| ) | ||
|
|
||
| output_buffer = io.BytesIO() | ||
| audio.export( | ||
| output_buffer, format="ogg", codec="libopus", parameters=["-b:a", "64k"] | ||
| ) | ||
| return output_buffer.getvalue(), None | ||
| except Exception as e: | ||
| return None, str(e) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Same logging gap in convert_pcm_to_ogg.
Proposed fix
except Exception as e:
+ logger.error(f"[convert_pcm_to_ogg] Failed to convert PCM to OGG: {e}")
return None, str(e)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def convert_pcm_to_ogg( | |
| pcm_bytes: bytes, sample_rate: int = 24000 | |
| ) -> tuple[bytes | None, str | None]: | |
| """Convert raw PCM to OGG with Opus codec.""" | |
| try: | |
| audio = AudioSegment( | |
| data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1 | |
| ) | |
| output_buffer = io.BytesIO() | |
| audio.export( | |
| output_buffer, format="ogg", codec="libopus", parameters=["-b:a", "64k"] | |
| ) | |
| return output_buffer.getvalue(), None | |
| except Exception as e: | |
| return None, str(e) | |
| def convert_pcm_to_ogg( | |
| pcm_bytes: bytes, sample_rate: int = 24000 | |
| ) -> tuple[bytes | None, str | None]: | |
| """Convert raw PCM to OGG with Opus codec.""" | |
| try: | |
| audio = AudioSegment( | |
| data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1 | |
| ) | |
| output_buffer = io.BytesIO() | |
| audio.export( | |
| output_buffer, format="ogg", codec="libopus", parameters=["-b:a", "64k"] | |
| ) | |
| return output_buffer.getvalue(), None | |
| except Exception as e: | |
| logger.error(f"[convert_pcm_to_ogg] Failed to convert PCM to OGG: {e}") | |
| return None, str(e) |
🤖 Prompt for AI Agents
In `@backend/app/core/audio_utils.py` around lines 30 - 45, The convert_pcm_to_ogg
function swallows exceptions without logging details; update its except block to
log the exception and context before returning. Locate convert_pcm_to_ogg and
add or use a module logger (e.g., logger = logging.getLogger(__name__)), then
replace the bare except handling with logger.exception("Failed to convert PCM to
OGG for sample_rate=%s", sample_rate) (or similar) so the stack trace and error
are recorded, and then return None, str(e) as before.
|
|
||
| elif safe_input["success"]: | ||
| request.query.input = safe_input["data"]["safe_text"] | ||
| # Update the text value within the QueryInput structure | ||
| request.query.input.content.value = safe_input["data"]["safe_text"] | ||
|
|
||
| if safe_input["data"]["rephrase_needed"]: | ||
| callback_response = APIResponse.failure_response( | ||
| error=request.query.input, | ||
| error=safe_input["data"]["safe_text"], | ||
| metadata=request.request_metadata, | ||
| ) | ||
| return handle_job_error( | ||
| job_id, request.callback_url, callback_response | ||
| ) | ||
| else: | ||
| request.query.input = safe_input["error"] | ||
| # Update the text value with error message | ||
| request.query.input.content.value = safe_input["error"] |
There was a problem hiding this comment.
Input guardrails assume TextInput — will crash for AudioInput.
Lines 173 and 185 access request.query.input.content.value, which assumes TextInput. For AudioInput, the .content is AudioContent (base64 data), not text. While it's unlikely guardrails would be configured for audio inputs today, there's no guard preventing it.
Consider adding a type check or skipping guardrails for non-text inputs:
Proposed guard
if input_guardrails:
+ from app.models.llm.request import TextInput
+ if not isinstance(request.query.input, TextInput):
+ logger.warning("[execute_job] Input guardrails skipped for non-text input type")
+ else:
- safe_input = call_guardrails(input_query, input_guardrails, job_id)
+ safe_input = call_guardrails(input_query, input_guardrails, job_id)(Then indent the remaining guardrail handling under the else.)
🤖 Prompt for AI Agents
In `@backend/app/services/llm/jobs.py` around lines 170 - 185, The guardrails
handling assumes text payloads by directly reading/writing
request.query.input.content.value (TextInput) which will crash for
AudioInput/AudioContent; update the logic in the block around safe_input
handling to first check the input type (e.g., inspect request.query.input.type
or isinstance against TextInput) and only mutate
request.query.input.content.value for text inputs, otherwise skip guardrail
text-rewrite and skip/reject or return an appropriate APIResponse via
APIResponse.failure_response + handle_job_error for non-text types (or leave
audio untouched), ensuring branches reference the same identifiers (safe_input,
request.query.input, APIResponse.failure_response, handle_job_error,
request.callback_url, job_id).
| logger.info(f"[execute_job] Attempting to fetch job | job_id={job_id}") | ||
| job = session.get(Job, job_id) | ||
| if not job: | ||
| # Log all jobs to see what's in the database | ||
| from sqlmodel import select | ||
|
|
||
| all_jobs = session.exec( | ||
| select(Job).order_by(Job.created_at.desc()).limit(5) | ||
| ).all() | ||
| logger.error( | ||
| f"[execute_job] Job not found! | job_id={job_id} | " | ||
| f"Recent jobs in DB: {[(j.id, j.status) for j in all_jobs]}" | ||
| ) | ||
| else: | ||
| logger.info( | ||
| f"[execute_job] Found job | job_id={job_id}, status={job.status}" | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove debug diagnostic code before merging.
This block queries recent jobs and logs their IDs on every "job not found" case. This is debug scaffolding (inline from sqlmodel import select, querying unrelated jobs) that shouldn't be in production. A simple error log on the missing job_id is sufficient.
Proposed simplification
logger.info(f"[execute_job] Attempting to fetch job | job_id={job_id}")
job = session.get(Job, job_id)
if not job:
- # Log all jobs to see what's in the database
- from sqlmodel import select
-
- all_jobs = session.exec(
- select(Job).order_by(Job.created_at.desc()).limit(5)
- ).all()
logger.error(
- f"[execute_job] Job not found! | job_id={job_id} | "
- f"Recent jobs in DB: {[(j.id, j.status) for j in all_jobs]}"
+ f"[execute_job] Job not found! | job_id={job_id}"
)
else:🤖 Prompt for AI Agents
In `@backend/app/services/llm/jobs.py` around lines 196 - 212, The block in
execute_job that imports select, queries recent jobs, and logs Recent jobs in DB
is debug scaffolding and should be removed; replace the entire "if not job:"
branch so it only logs a concise error using logger.error with the missing
job_id (e.g., logger.error(f"[execute_job] Job not found | job_id={job_id}"))
and avoid importing sqlmodel.select or iterating over other Job entries; keep
the else branch that logs the found job status intact.
| finally: | ||
| # Clean up temp files for audio inputs | ||
| if resolved_input and resolved_input != request.query.input: | ||
| cleanup_temp_file(resolved_input) |
There was a problem hiding this comment.
Bug: Temp file cleanup condition compares str to QueryInput object — always true, may delete unrelated files.
resolved_input is a str, while request.query.input is a QueryInput (either TextInput or AudioInput — a Pydantic model). They will never be equal, so cleanup_temp_file(resolved_input) executes for every request, including text inputs where resolved_input is the user's text — not a temp file path. With missing_ok=True in cleanup_temp_file, this silently attempts to unlink a path matching the user text, which is unintended and potentially destructive.
Proposed fix — check input type instead
finally:
# Clean up temp files for audio inputs
- if resolved_input and resolved_input != request.query.input:
+ from app.models.llm.request import AudioInput
+ if resolved_input and isinstance(request.query.input, AudioInput):
cleanup_temp_file(resolved_input)🤖 Prompt for AI Agents
In `@backend/app/services/llm/jobs.py` around lines 340 - 343, The cleanup
currently compares resolved_input (str) to request.query.input (QueryInput)
which will never be equal and causes cleanup to run for non-temp text inputs;
change the finally block in jobs.py to only call
cleanup_temp_file(resolved_input) when the original input was an audio type or
when you explicitly created a temp file: e.g., check
isinstance(request.query.input, AudioInput) (or use a boolean flag like
temp_file_created set where you generate the temp file) before calling
cleanup_temp_file; reference resolved_input, request.query.input,
AudioInput/TextInput, and cleanup_temp_file to locate and fix the logic.
| model = kaapi_params.get("model") | ||
| reasoning = kaapi_params.get("reasoning") | ||
| temperature = kaapi_params.get("temperature") | ||
| instructions = kaapi_params.get("instructions") | ||
| knowledge_base_ids = kaapi_params.get("knowledge_base_ids") | ||
| max_num_results = kaapi_params.get("max_num_results") | ||
|
|
||
| support_reasoning = litellm.supports_reasoning(model=f"openai/{model}") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the mappers.py file to see the full context
cat -n backend/app/services/llm/mappers.py | head -100Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 4282
🏁 Script executed:
# Check if there's type hint information on the function parameter
ast-grep --pattern 'function map_kaapi_to_openai_params($$$) {
$$$
}'Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Also search for how kaapi_params is defined/documented
rg "kaapi_params" backend/app/services/llm/mappers.py -B 3 -A 3 | head -50Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 1645
🏁 Script executed:
# Search for type hints on the function signature
rg "def map_kaapi_to_openai_params" backend/app/services/llm/ -A 5Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 580
🌐 Web query:
litellm supports_reasoning function behavior with None model parameter
💡 Result:
litellm.supports_reasoning(...) is designed to be called with a model name string (it relies on LiteLLM’s “model info” utilities, which take model: str). If you pass model=None, it’s not a supported input and (in practice) you should expect a Python runtime error (e.g., TypeError / AttributeError from trying to do string operations on None) rather than a meaningful True/False result. [1]
If you need “safe” behavior, wrap it like:
import litellm
def safe_supports_reasoning(model):
return bool(model) and litellm.supports_reasoning(model)Notes:
- In LiteLLM model metadata,
supports_reasoningis a boolean capability flag and generally defaults toFalsewhen missing/Nonein structured model-info objects—but that’s after you have a valid model record, not when the model argument itself isNone. [2]
Sources:
[1] ReadmeX “Utility Functions (litellm/utils.py)” (describes model-info utilities taking model: str) https://readmex.com/en-US/BerriAI/litellm/page-3d4d91461-decc-4cb6-a010-ba823069a2b8
[2] ModelGroupInfo schema showing supports_reasoning defaulting to False and normalizing None bools to False https://huggingface.co/spaces/Shyamnath/inferencing-llm/blob/c40c75ae1fe76d0d34f6eb6fe5fdc1c4f26dab0e/litellm/types/router.py
litellm.supports_reasoning will fail at runtime when "model" key is absent from kaapi_params.
Line 39 calls litellm.supports_reasoning(model=f"openai/{model}") where model is obtained via .get("model") and can be None. This results in passing "openai/None" to litellm, which will raise a Python runtime error (TypeError or AttributeError) since litellm expects a valid model name string and performs string operations on the input. The validation check if model: at line 61 occurs after this call, too late to prevent the error.
Consider validating that model is present before calling litellm.supports_reasoning.
🤖 Prompt for AI Agents
In `@backend/app/services/llm/mappers.py` around lines 32 - 39, The call to
litellm.supports_reasoning currently uses model = kaapi_params.get("model")
which can be None and causes runtime errors; update the logic so you only call
litellm.supports_reasoning(model=f"openai/{model}") when model is truthy (e.g.,
after validating model or inside an if model: block) and otherwise set
support_reasoning to a safe default (False or None) so downstream code using
support_reasoning doesn't crash; locate the usage around the variables model,
support_reasoning and the litellm.supports_reasoning call and move or guard that
call accordingly.
| # ad hoc testing code | ||
| if __name__ == "__main__": | ||
| import os | ||
| import base64 | ||
| from pathlib import Path | ||
| from dotenv import load_dotenv | ||
| from app.models.llm import NativeCompletionConfig, QueryParams | ||
|
|
||
| load_dotenv() | ||
|
|
||
| # 1. Simulate environment/credentials | ||
| GEMINI_KEY = os.getenv("GEMINI_API_KEY") | ||
| if not GEMINI_KEY: | ||
| print("Set GEMINI_API_KEY environment variable first.") | ||
| exit(1) | ||
|
|
||
| # This dictionary mimics what get_provider_credential would return from the DB | ||
| mock_credentials = {"api_key": GEMINI_KEY} | ||
|
|
||
| # 2. Idiomatic Initialization via Registry | ||
| provider_type = "google-native" | ||
|
|
||
| print(f"Initializing provider: {provider_type}...") | ||
|
|
||
| # This block mimics the core logic of your get_llm_provider function | ||
| ProviderClass = LLMProvider.get_provider_class(provider_type) | ||
| client = ProviderClass.create_client(credentials=mock_credentials) | ||
| instance = ProviderClass(client=client) | ||
|
|
||
| # 3. Setup TTS Config and Query | ||
| test_config = NativeCompletionConfig( | ||
| provider="google-native", | ||
| type="tts", | ||
| params={ | ||
| "model": "gemini-2.5-pro-preview-tts", | ||
| "voice": "Kore", | ||
| "language": "hi-IN", | ||
| "response_format": "mp3", | ||
| }, | ||
| ) | ||
|
|
||
| test_text = ( | ||
| "हाँ आपको एक बार डॉक्यूमेंट जमा करने के बाद पाँच से सात दिन का टाइम लगता है" | ||
| ) | ||
| test_query = QueryParams(input=test_text) | ||
|
|
||
| # 4. Execution | ||
| print(f"Executing TTS for text: '{test_text}'...") | ||
| result, error = instance.execute( | ||
| completion_config=test_config, query=test_query, resolved_input=test_text | ||
| ) | ||
|
|
||
| if error: | ||
| print(f"❌ Error: {error}") | ||
| else: | ||
| logger.error( | ||
| f"[get_llm_provider] Unsupported provider type requested: {provider_type}" | ||
| ) | ||
| raise ValueError(f"Provider '{provider_type}' is not supported.") | ||
| print(f"✅ TTS Success!") | ||
| print(f" Provider Response ID: {result.response.provider_response_id}") | ||
| print(f" Model: {result.response.model}") | ||
| print(f" Usage: {result.usage.total_tokens} tokens") | ||
|
|
||
| # 5. Save audio to file | ||
| output_dir = Path("speech_output") | ||
| output_dir.mkdir(exist_ok=True) | ||
|
|
||
| # Get audio data (already base64 encoded by LLMOutput validator) | ||
| if result.response.output.content.value: | ||
| audio_bytes = base64.b64decode(result.response.output.content.value) | ||
|
|
||
| # Use the actual format from the response | ||
| mime = result.response.output.content.mime_type or "audio/wav" | ||
| audio_format = mime.split("/")[-1] if "/" in mime else mime | ||
| output_file = ( | ||
| output_dir | ||
| / f"tts_test_{result.response.provider_response_id[:8]}.{audio_format}" | ||
| ) | ||
|
|
||
| with open(output_file, "wb") as f: | ||
| f.write(audio_bytes) | ||
|
|
||
| return provider_class(client=client) | ||
| print(f" Audio saved to: {output_file}") | ||
| print(f" Audio size: {len(audio_bytes)} bytes") | ||
| print(f" Audio format: {audio_format.upper()}") | ||
| print(f"\n🎵 Play the audio with: afplay {output_file}") | ||
| else: | ||
| print(" ⚠️ No audio data in response") |
There was a problem hiding this comment.
Remove ad-hoc testing block before merging.
The if __name__ == "__main__": block (lines 75–159) is debug/testing code. The PR description acknowledges this will be removed, but flagging to ensure it doesn't slip through. It also contains an f-string without placeholders on line 130 (f"✅ TTS Success!"), as noted by static analysis.
🧰 Tools
🪛 Ruff (0.14.14)
[error] 130-130: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In `@backend/app/services/llm/providers/registry.py` around lines 75 - 159, Remove
the ad-hoc testing block starting at the `if __name__ == "__main__":` section in
registry.py (this whole simulated test that imports dotenv, reads
GEMINI_API_KEY, constructs mock_credentials, calls
LLMProvider.get_provider_class, creates a client/instance, builds
NativeCompletionConfig and QueryParams, executes instance.execute, and writes
audio files) so no debug/test code is left in the module; also replace the
unnecessary f-string f"✅ TTS Success!" with a plain string "✅ TTS Success!" (or
remove the f prefix) to satisfy static analysis.
| "google-genai>=1.59.0", | ||
| "pydub>=0.25.1", |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
google-genai 1.59.0 pypi package
💡 Result:
google-genai (Google Gen AI Python SDK) version 1.59.0 is published on PyPI and was released on Jan 15, 2026. The PyPI release history also shows newer versions (e.g., 1.62.0 on Feb 4, 2026). [1]
Install exactly 1.59.0:
pip install google-genai==1.59.0Package page / release history (and docs link shown there): [1]
🏁 Script executed:
# Find gai.py file and check what it imports from google.genai
find . -name "gai.py" -o -name "*gai*" -type f | head -20Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 173
🏁 Script executed:
# Search for imports from google.genai in the codebase
rg "from google\.genai|import google\.genai" -A 5Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 1384
🏁 Script executed:
# Check for Dockerfile or deployment configuration files
find . -name "Dockerfile*" -o -name "docker-compose*" -o -name "requirements*" | head -10Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 116
🏁 Script executed:
cat ./backend/DockerfileRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1551
🏁 Script executed:
cat -n ./backend/app/services/llm/providers/gai.py | head -100Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 4364
Add ffmpeg to Dockerfile system dependencies.
The google-genai>=1.59.0 version is available on PyPI and correctly exports GenerateContentResponse, SpeechConfig, and VoiceConfig used in gai.py. However, pydub (imported on line 5 of gai.py) requires ffmpeg or libav installed at the system level. The current Dockerfile does not install either, which will cause audio conversion operations (via convert_pcm_to_mp3/convert_pcm_to_ogg) to fail at runtime. Add ffmpeg to the apt-get install step in the Dockerfile.
🤖 Prompt for AI Agents
In `@backend/pyproject.toml` around lines 39 - 40, The Docker image is missing
system-level ffmpeg/libav needed by pydub (used by convert_pcm_to_mp3 and
convert_pcm_to_ogg in gai.py), so update the Dockerfile's apt-get install line
to include ffmpeg (or libav-tools) so audio conversions succeed at runtime;
ensure the package is added to the same RUN apt-get install step and the image
build still cleans up apt caches as before.
Summary
Target issue is #528
Intergrate Gemin-pro-2.5-preview-tts models with unified API
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Summary by CodeRabbit
New Features