ATM, the evaluation is being done sequentially, we load the model, process the 1000 prompts from the db, and then unload the model to release the memory, so we can load the next one.
Instead, of processing prompt after prompt, we could explore adding support to process batches aiming an speed up.