-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Description
When trying to send a request using the reranker models such as BAAI/bge-reranker-v2-m3, the worker successfully processes rerank requests but crashes when attempting to return results, causing 60-second client timeouts. The error indicates that RerankReturnType objects from the infinity-emb library are not JSON serializable.
Error Message
Error while returning job result. | Object of type RerankReturnType is not JSON serializable
Environment
- RunPod Serverless: Yes
- infinity-emb version: 0.0.76
- runpod version: ~1.7.0
- Base Image: nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
- Python: 3.11
Steps to Reproduce
- Deploy the worker-infinity-embedding to RunPod Serverless
- Send a rerank request with the following structure:
{
"input": {
"query": "your search query",
"docs": ["doc1", "doc2", "doc3"],
"model": "your-rerank-model",
"return_docs": true
}
}- Worker processes successfully but crashes when returning results
- Client receives timeout after 60 seconds
Root Cause
The handler.py returns the result from embedding_service.infinity_rerank() directly, which returns a RerankReturnType Pydantic model object. RunPod's serverless framework requires plain Python dictionaries (JSON-serializable objects) as return values.
The issue is in handler.py at this code block:
if job_input.get("query"):
call_fn, kwargs = embedding_service.infinity_rerank, {
"query": job_input.get("query"),
"docs": job_input.get("docs"),
"return_docs": job_input.get("return_docs"),
"model_name": job_input.get("model"),
}And later:
try:
out = await call_fn(**kwargs)
return out # ❌ This returns a Pydantic model, not a dict
except Exception as e:
return create_error_response(str(e)).model_dump()Proposed Solution
Convert all Pydantic model responses to dictionaries before returning:
try:
out = await call_fn(**kwargs)
# Convert Pydantic models to dicts
if hasattr(out, 'model_dump'):
return out.model_dump()
elif hasattr(out, 'dict'):
return out.dict()
return out
except Exception as e:
return create_error_response(str(e)).model_dump()Alternatively, ensure each route handler explicitly converts its response:
if job_input.get("query"):
call_fn, kwargs = embedding_service.infinity_rerank, {
"query": job_input.get("query"),
"docs": job_input.get("docs"),
"return_docs": job_input.get("return_docs"),
"model_name": job_input.get("model"),
}
result = await call_fn(**kwargs)
return result.model_dump() if hasattr(result, 'model_dump') else resultAdditional Context
- Embedding requests work fine (possibly because they return serializable structures)
- Error responses work correctly (they use
.model_dump()) - This affects all rerank operations