Skip to content

Evaluation: Check Evaluation Quality of Life Metrics #41

@Prajna1999

Description

@Prajna1999

Do a stress testing of text evals by monitoring

  1. Max batch processing CSV size (line items)
  2. P95 batch processing time of CSV files of various line item counts e.g 20, 50,100, 500,1000 etc.
  3. Average indicative ballpark cost figures of each batch process.
  4. Server side errors if any including but not limited to
    4.1 OpenAI rate limiting
    4.2 Langfuse rate limiting
    4.3 Kaapi proxy server rate limiting etc
    4.4 Timeout/server unavailabe errors (504, 529,429 status codes)
  5. Processing bottleneck at which service i.e Langfuse traces? OpenAI embeddings? Kaapi backend?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    To Do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions