Skip to content

feat: support custom OpenAI-compatible endpoints#84

Open
kodee2k wants to merge 1 commit intopinchbench:mainfrom
kodee2k:feat/custom-base-url
Open

feat: support custom OpenAI-compatible endpoints#84
kodee2k wants to merge 1 commit intopinchbench:mainfrom
kodee2k:feat/custom-base-url

Conversation

@kodee2k
Copy link
Copy Markdown

@kodee2k kodee2k commented Apr 1, 2026

Adds --base-url and --api-key flags to benchmark.py so you can point PinchBench
at any OpenAI-compatible endpoint instead of only going through OpenRouter.

Useful for testing against local inference servers (vLLM, ollama, llama.cpp),
hosted providers like Together or Fireworks, or just hitting the OpenAI API directly.

What changed

scripts/benchmark.py

  • New --base-url arg for the API endpoint URL
  • New --api-key arg (falls back to $OPENAI_API_KEY if not provided)
  • Skips OpenRouter model validation when a custom base URL is set

scripts/lib_agent.py

  • ensure_agent_exists now accepts base_url and api_key keyword args
  • When a base URL is given, writes a custom provider into the bench agent's
    models.json using the OpenClaw provider config format (openai-completions api type)
  • When no base URL is given, the existing OpenRouter flow is unchanged

Usage

# local server
python benchmark.py --model my-model --base-url http://localhost:8000/v1

# hosted provider with explicit key
python benchmark.py --model meta-llama/llama-3-70b \
  --base-url https://api.together.xyz/v1 --api-key tgr_xxx

# key from env (default if --api-key is omitted)
export OPENAI_API_KEY=sk-xxx
python benchmark.py --model gpt-4o --base-url https://api.openai.com/v1

The provider gets registered as "custom" in models.json with sane defaults
(200k context window, 8192 max output tokens). These work fine for most
endpoints but can be adjusted later if needed.

No changes to the grading, upload, or reporting paths.

Add --base-url and --api-key flags to benchmark.py for targeting
any OpenAI-compatible API instead of only OpenRouter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant