Better Score schema #28
Replies: 3 comments 1 reply
-
|
Related work https://github.com/confident-ai/deepeval/blob/main/deepeval/tracing/api.py line 30 MetricData - deepeval/tracing/api.py:30-42 TestRun - deepeval/test_run/test_run.py:126-409
LLMApiTestCase - deepeval/test_run/api.py:9-97
The TestRun.save() method (deepeval/test_run/test_run.py:389-396) saves data to:
|
Beta Was this translation helpful? Give feedback.
-
|
Here an example of the result object of promptfoo (when using // ...
{
"cost": 0,
"gradingResult": {
"pass": true,
"score": 1,
"reason": "No assertions",
"tokensUsed": {
"total": 0,
"prompt": 0,
"completion": 0,
"cached": 0,
"numRequests": 0
}
},
"id": "29176d50-dc54-4f5a-88f0-c88dd8c044f3",
"latencyMs": 7,
"namedScores": {},
"prompt": {
"raw": "Write a tweet about bananas",
"label": "Write a tweet about {{topic}}"
},
"promptId": "add16627d8dbb348b8b3ac175c8b96107d26a4b08b5be0262962f8ec5b18ec9e",
"promptIdx": 0,
"provider": {
"id": "openrouter:google/gemini-2.5-flash-lite",
"label": ""
},
"response": {
"output": "Here are a few options for a tweet about bananas, choose the one that best fits your vibe!\n\n**Option 1 (Simple & Sweet):**\n\n> Just a friendly reminder that bananas are nature's perfect snack. 🍌 Delicious, convenient, and packed with goodness! #banana #healthysnack #fruit\n\n**Option 2 (Playful & Fun):**\n\n> Officially declaring today \"Banana Appreciation Day\"! 🤩 Who else is a huge fan of this amazing yellow fruit? Let's go bananas! 🤪 #banana #fruity #love\n\n**Option 3 (Focus on Benefits):**\n\n> Feeling that afternoon slump? Reach for a banana! ⚡️ Great for energy and a good source of potassium. Your body will thank you. 🙏 #banana #energyboost #potassium #healthy\n\n**Option 4 (Short & Punchy):**\n\n> Banana vibes. 🍌 Simple perfection. #banana\n\n**Option 5 (Engaging Question):**\n\n> What's your favorite way to eat a banana? Smoothie, plain, or baked? 🤔 I'm curious! 👇 #banana #foodie #snackideas\n\n**Remember to add a banana emoji (🍌) for extra visual appeal!**",
"tokenUsage": {
"cached": 259,
"total": 259
},
"cached": true,
"finishReason": "stop"
},
"score": 1,
"success": true,
"testCase": {
"vars": {
"topic": "bananas"
},
"assert": [],
"options": {},
"metadata": {}
},
"testIdx": 0,
"vars": {
"topic": "bananas"
},
"metadata": {
"_promptfooFileMetadata": {}
},
"failureReason": 0
}
// ...
|
Beta Was this translation helpful? Give feedback.
-
|
https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/evaluator.py Location: lm_eval/evaluator.py:634-659 Each sample saved to disk has this structure: { Saved to: samples_{task_name}_{timestamp}.jsonl files |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Our score schema is too verbose
Beta Was this translation helpful? Give feedback.
All reactions