Do a stress testing of text evals by monitoring
- Max batch processing CSV size (line items)
- P95 batch processing time of CSV files of various line item counts e.g 20, 50,100, 500,1000 etc.
- Average indicative ballpark cost figures of each batch process.
- Server side errors if any including but not limited to
4.1 OpenAI rate limiting
4.2 Langfuse rate limiting
4.3 Kaapi proxy server rate limiting etc
4.4 Timeout/server unavailabe errors (504, 529,429 status codes)
- Processing bottleneck at which service i.e Langfuse traces? OpenAI embeddings? Kaapi backend?