This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
An MCP (Model Context Protocol) server that lets Claude Desktop query other LLMs through a unified interface. It uses LiteLLM as the multi-provider backend and FastMCP for the server framework. Providers are configured via environment variables, and models are discovered dynamically via provider APIs.
uv sync # Install dependencies
PROVIDER_OPENAI="openai;sk-test" uv run ask-another # Run the server locallyTests can be run with:
uv run --with pytest python -m pytest tests/ -vThe entire server lives in a single file: src/ask_another/server.py. It exposes these MCP tools:
search_families— discovers model families across configured providers, with optional substring searchsearch_models— finds specific model identifiers with optional substring search; enriches results with metadata (Elo, knowledge cutoff, context length, pricing, notes) from the annotations filecompletion— proxies a completion request to a specified LLM via LiteLLM, supports full model identifiers or favourite shorthand; tracks usage in the annotations fileannotate_models— adds or updates a personal note on a model; notes appear in search results and server instructionsrefresh_models— force re-scan of all providers and re-fetch benchmark data from LMArena arena-catalog (GitHub JSON) and LMArena metadata (HuggingFace CSV)feedback— collects usability issues from the LLM client into a JSONL log file (~/.ask-another-feedback.jsonlby default, configurable viaFEEDBACK_LOGenv var)start_research— starts a deep research task that runs in the background via a lifespan task group. Supports two paths: OpenRouter (Perplexity/OpenAI vialitellm.completion) and Gemini deep research (vialitellm.interactions.createwith polling). Blocks until results arrive or timeout, then returns results or a job handle. If interrupted (user hits escape), the research continues in the background.check_research— lists all research jobs as a markdown table, or retrieves full results for a specific job_idcancel_research— cancels a running research task by its job_idgenerate_image— generates an image from a text prompt. Automatically routes between two LiteLLM paths:litellm.image_generation()for dedicated image models (gpt-image-1, dall-e-3, imagen-4) andlitellm.completion()withmodalities=["image","text"]for native image-output models (Gemini Nano Banana family). Returns images inline via MCPImageContentand saves to disk.
Providers are configured via PROVIDER_* environment variables with the format provider-name;api-key. These are parsed at module import time into a _provider_registry dict mapping provider names to API keys.
Model discovery uses litellm.get_valid_models(check_provider_endpoint=True) by default, with exception handlers for providers where LiteLLM listing is unsupported (OpenRouter). A generic normalisation rule ensures all model IDs use provider/model-name format. Results are cached in memory with a configurable TTL.
~/.ask-another-annotations.json (configurable via ANNOTATIONS_FILE env var) is the single source of truth for model metadata, usage data, and personal notes. Schema:
{
"openai/gpt-5.2": {
"metadata": {
"arena_elo": 1486,
"knowledge_cutoff": "2025/6",
"organization": "OpenAI",
"license": "Proprietary",
"first_seen": "2026-01-15T08:00:00Z",
"last_updated": "2026-03-12T10:30:00Z"
},
"usage": {
"call_count": 47,
"last_used": "2026-03-12T14:20:00Z"
},
"annotations": {
"note": "Fast, good for code review"
}
}
}- Favourites are derived automatically from the top 5 models by
usage.call_count. No configuration needed — just use the MCP and favourites emerge from actual usage. - Notes are set via the
annotate_modelstool. They appear in search results and server instructions. - Metadata is populated automatically on startup: the server scans all providers for available models, then fetches Elo ratings from LMArena arena-catalog (GitHub JSON) and knowledge cutoff dates/org/license from LMArena metadata (HuggingFace CSV). OpenRouter models additionally get pricing and context length from the OpenRouter API.
first_seenis stamped when a model is first discovered. - Refresh happens automatically when
metadata.last_updatedexceeds the TTL (CACHE_TTL_MINUTES, default 360 = 6 hours). Can also be triggered manually viarefresh_models.
File-based debug logging can be enabled via environment variables. When disabled (default), no handlers are attached and there is zero overhead.
| Var | Default | Notes |
|---|---|---|
LOG_LEVEL |
(empty = disabled) | DEBUG, INFO, WARNING, ERROR |
LOG_FILE |
~/.ask-another.log |
Path to log file |
LOG_FILE_SIZE |
5 |
Max file size in MB |
LOG_FILE_COUNT |
2 |
Number of backup files to keep |
IMAGE_OUTPUT_DIR |
~/Pictures/ask-another |
Directory for saved generated images |
Uses RotatingFileHandler — files rotate at LOG_FILE_SIZE MB, keeping LOG_FILE_COUNT backups (e.g. .log, .log.1, .log.2).
The entrypoint is ask_another.server:main (defined in pyproject.toml [project.scripts]), which calls mcp.run() on the FastMCP instance.
- mcp (FastMCP) — MCP server framework
- litellm — unified LLM API client (imported lazily inside tools)
- uv — package manager (Python >=3.10)