Skip to content

[Feature Request] 'Vibe' Benchmark #92

@joshavant

Description

@joshavant

A big part of OpenClaw's user experience is the 'vibe': How authentically human-like an agent is.

  • Does it tell funny jokes?
  • Is it sassy (as much as SOUL.md tells it to be)?
  • Do the responses feel 'human'?
  • Does it use casual punctuation?

Because of how important this it, it would be useful to have a 'Vibe' benchmark on PinchBench.

The evals for this might require human attention. Consider that 'success' for many of these factors are inherently subjective human opinions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions