-
Notifications
You must be signed in to change notification settings - Fork 95
[Feature Request] 'Vibe' Benchmark #92
Copy link
Copy link
Open
Description
A big part of OpenClaw's user experience is the 'vibe': How authentically human-like an agent is.
- Does it tell funny jokes?
- Is it sassy (as much as SOUL.md tells it to be)?
- Do the responses feel 'human'?
- Does it use casual punctuation?
Because of how important this it, it would be useful to have a 'Vibe' benchmark on PinchBench.
The evals for this might require human attention. Consider that 'success' for many of these factors are inherently subjective human opinions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels