Skip to content

Performance expectations #5

@Sopamo

Description

@Sopamo

Thanks for working on this!

I've been testing running embeddings in a runpod serverless environment, but the performance isn't what I would have expected. For running bge-m3, we're seeing an end to end latency of ~600ms. Runpod itself reports around 100ms delay time and around 110ms processing time.

I tried running bge-m3 locally on my machine (on a Geforce 4080, directly via python using BGEM3FlagModel) and for the first embedding I see a very high latency as well (~180ms), but for embeddings afterwards the latency is very low, as expected. Around 4-5ms for simple text.

I don't see obvious reasons why requests after the first one on a running worker would still take 100+ms. Is this something that can be improved somehow? I would be willing to contribute, but would like to ask first if this performance is to be expected or if there is potential to improve it.

I would also like to ask about the 100ms delay time. What could be the reasons for it being so high, even though the worker is already running?

We are using European Data centers. Could it be that the requests are somehow routed through the US?

This is the python script I used for testing locally:

import time

from FlagEmbedding import BGEM3FlagModel

model = BGEM3FlagModel('BAAI/bge-m3', use_fp16=False)

sentences = ["What is BGE M3?"]
sentences2 = ["More text"]
sentences3 = ["<<< More text"]


def get_embeddings(inputs):
    start_time = time.time()
    model.encode(inputs)['dense_vecs']
    end_time = time.time()

    execution_time = end_time - start_time
    print(f"Execution time: {execution_time} seconds")


get_embeddings(sentences)
get_embeddings(sentences2)
get_embeddings(sentences3)

Output:

Execution time: 0.214857816696167 seconds
Execution time: 0.004781007766723633 seconds
Execution time: 0.00433349609375 seconds

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions