Skip to content

Optimize to speed up inference evaluation tests #8

@betogaona7

Description

@betogaona7

ATM, the evaluation is being done sequentially, we load the model, process the 1000 prompts from the db, and then unload the model to release the memory, so we can load the next one.

Instead, of processing prompt after prompt, we could explore adding support to process batches aiming an speed up.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions