Skip to content

RerankReturnType not JSON serializable - Worker crashes when returning rerank results #37

@Khalid-J02

Description

@Khalid-J02

@michaelfeil

Description

When trying to send a request using the reranker models such as BAAI/bge-reranker-v2-m3, the worker successfully processes rerank requests but crashes when attempting to return results, causing 60-second client timeouts. The error indicates that RerankReturnType objects from the infinity-emb library are not JSON serializable.

Error Message

Error while returning job result. | Object of type RerankReturnType is not JSON serializable

Environment

  • RunPod Serverless: Yes
  • infinity-emb version: 0.0.76
  • runpod version: ~1.7.0
  • Base Image: nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
  • Python: 3.11

Steps to Reproduce

  1. Deploy the worker-infinity-embedding to RunPod Serverless
  2. Send a rerank request with the following structure:
{
  "input": {
    "query": "your search query",
    "docs": ["doc1", "doc2", "doc3"],
    "model": "your-rerank-model",
    "return_docs": true
  }
}
  1. Worker processes successfully but crashes when returning results
  2. Client receives timeout after 60 seconds

Root Cause

The handler.py returns the result from embedding_service.infinity_rerank() directly, which returns a RerankReturnType Pydantic model object. RunPod's serverless framework requires plain Python dictionaries (JSON-serializable objects) as return values.

The issue is in handler.py at this code block:

if job_input.get("query"):
    call_fn, kwargs = embedding_service.infinity_rerank, {
        "query": job_input.get("query"),
        "docs": job_input.get("docs"),
        "return_docs": job_input.get("return_docs"),
        "model_name": job_input.get("model"),
    }

And later:

try:
    out = await call_fn(**kwargs)
    return out  # ❌ This returns a Pydantic model, not a dict
except Exception as e:
    return create_error_response(str(e)).model_dump()

Proposed Solution

Convert all Pydantic model responses to dictionaries before returning:

try:
    out = await call_fn(**kwargs)
    # Convert Pydantic models to dicts
    if hasattr(out, 'model_dump'):
        return out.model_dump()
    elif hasattr(out, 'dict'):
        return out.dict()
    return out
except Exception as e:
    return create_error_response(str(e)).model_dump()

Alternatively, ensure each route handler explicitly converts its response:

if job_input.get("query"):
    call_fn, kwargs = embedding_service.infinity_rerank, {
        "query": job_input.get("query"),
        "docs": job_input.get("docs"),
        "return_docs": job_input.get("return_docs"),
        "model_name": job_input.get("model"),
    }
    result = await call_fn(**kwargs)
    return result.model_dump() if hasattr(result, 'model_dump') else result

Additional Context

  • Embedding requests work fine (possibly because they return serializable structures)
  • Error responses work correctly (they use .model_dump())
  • This affects all rerank operations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions