Skip to content

feat: model comparison — cost/quality across providers #3

@jpleva91

Description

@jpleva91

Problem

Need cost/quality comparison when evaluating DeepSeek vs Haiku vs Ollama.

Acceptance Criteria

  • llmint compare <run1> <run2> compares two bench runs
  • Reports: cost, solve rate, cost per solve
  • Markdown table output

Generated by /forge cascade

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent:claimedAgent dispatched — do not re-dispatchenhancementNew feature or requestsprintCurrent sprint priority

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions