This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
LLM Council Plus is a 3-stage deliberation system where multiple LLMs collaboratively answer user questions through:
- Stage 1: Individual model responses (with optional web search context)
- Stage 2: Anonymous peer review/ranking to prevent bias
- Stage 3: Chairman synthesis of collective wisdom
Key Innovation: Hybrid architecture supporting OpenRouter (cloud), Ollama (local), Groq (fast inference), direct provider connections, and custom OpenAI-compatible endpoints.
Quick Start:
./start.shManual Start:
# Backend (from project root)
uv run python -m backend.main
# Frontend (in new terminal)
cd frontend
npm run devPorts:
- Backend:
http://localhost:8001(NOT 8000 - avoid conflicts) - Frontend:
http://localhost:5173
Network Access:
# Backend already listens on 0.0.0.0:8001
# Frontend with network access:
cd frontend && npm run dev -- --hostInstalling Dependencies:
# Backend
uv sync
# Frontend
cd frontend
npm installImportant: If switching between Intel/Apple Silicon Macs with iCloud sync:
rm -rf frontend/node_modules && cd frontend && npm installThis fixes binary incompatibilities (e.g., @rollup/rollup-darwin-* variants).
Provider System (backend/providers/)
- Base:
base.py- Abstract interface for all LLM providers - Implementations:
openrouter.py,ollama.py,groq.py,openai.py,anthropic.py,google.py,mistral.py,deepseek.py,custom_openai.py - Auto-routing: Model IDs with prefix (e.g.,
openai:gpt-4.1,ollama:llama3,custom:model-name) route to correct provider - Routing logic:
council.py:get_provider_for_model()handles prefix parsing
Core Modules
| Module | Purpose |
|---|---|
council.py |
Orchestration: stage1/2/3 collection, rankings, title generation |
search.py |
Web search: DuckDuckGo, Tavily, Brave with Jina Reader content fetch |
settings.py |
Config management, persisted to data/settings.json |
prompts.py |
Default system prompts for all stages |
main.py |
FastAPI app with streaming SSE endpoint |
storage.py |
Conversation persistence in data/conversations/{id}.json |
| Component | Purpose |
|---|---|
App.jsx |
Main orchestration, SSE streaming, conversation state |
ChatInterface.jsx |
User input, web search toggle, execution mode |
Stage1.jsx |
Tab view of individual model responses |
Stage2.jsx |
Peer rankings with de-anonymization, aggregate scores |
Stage3.jsx |
Chairman synthesis (final answer) |
CouncilGrid.jsx |
Visual grid of council members with provider icons |
Settings.jsx |
5-section settings: LLM API Keys, Council Config, System Prompts, Search Providers, Backup & Reset |
Sidebar.jsx |
Conversation list with inline delete confirmation |
SearchableModelSelect.jsx |
Searchable dropdown for model selection |
Styling: "Midnight Glass" dark theme with glassmorphic effects. Primary colors: blue (#3b82f6) and cyan (#06b6d4) gradients. Font: Merriweather 15px/1.7 for content, JetBrains Mono for errors.
ALWAYS use relative imports in backend modules:
from .config import ...
from .council import ...NEVER use absolute imports like from backend.config import ...
Run backend as module from project root:
uv run python -m backend.main # Correct
cd backend && python main.py # WRONG - breaks importsopenrouter:anthropic/claude-sonnet-4 → Cloud via OpenRouter
ollama:llama3.1:latest → Local via Ollama
groq:llama3-70b-8192 → Fast inference via Groq
openai:gpt-4.1 → Direct OpenAI connection
anthropic:claude-sonnet-4 → Direct Anthropic connection
custom:model-name → Custom OpenAI-compatible endpoint
Use this pattern in Stage components to handle both / and : delimiters:
const getShortModelName = (modelId) => {
if (!modelId) return 'Unknown';
if (modelId.includes('/')) return modelId.split('/').pop();
if (modelId.includes(':')) return modelId.split(':').pop();
return modelId;
};Check prefixes FIRST before name-based detection to avoid mismatches:
const getProviderInfo = (modelId) => {
const id = modelId.toLowerCase();
// Check prefixes FIRST (order matters!)
if (id.startsWith('custom:')) return PROVIDER_CONFIG.custom;
if (id.startsWith('ollama:')) return PROVIDER_CONFIG.ollama;
if (id.startsWith('groq:')) return PROVIDER_CONFIG.groq;
// Then check name-based patterns...
};The prompt enforces strict format for parsing:
1. Individual evaluations
2. Blank line
3. "FINAL RANKING:" header (all caps, with colon)
4. Numbered list: "1. Response C", "2. Response A", etc.
Fallback regex extracts "Response X" patterns if format not followed.
- Backend checks
request.is_disconnected()inside loops - Frontend aborts via AbortController signal
- Critical: Always inject raw
Requestobject into streaming endpoints (Pydantic models lackis_disconnected())
<div className="markdown-content">
<ReactMarkdown>
{typeof content === 'string' ? content : String(content || '')}
</ReactMarkdown>
</div>Always wrap in .markdown-content div and ensure string type (some providers return arrays/objects).
In Stage1/Stage2, auto-adjust activeTab when out of bounds during streaming:
useEffect(() => {
if (activeTab >= responses.length && responses.length > 0) {
setActiveTab(responses.length - 1);
}
}, [responses.length]);-
Port Conflicts: Backend uses 8001 (not 8000). Update
backend/main.pyandfrontend/src/api.jstogether. -
CORS Errors: Frontend origins must match
main.pyCORS middleware (localhost:5173 and :3000). -
Missing Metadata:
label_to_modelandaggregate_rankingsare ephemeral - only in API responses, not stored. -
Duplicate Tabs: Use immutable state updates (spread operator), not mutations. StrictMode runs effects twice.
-
Search Rate Limits: DuckDuckGo can rate-limit. Retry logic in
search.pyhandles this. -
Jina Reader 451 Errors: Many news sites block AI scrapers. Use Tavily/Brave or set
full_content_resultsto 0. -
Model Deduplication: When multiple sources provide same model, use Map-based deduplication preferring direct connections.
-
Binary Dependencies:
node_modulesin iCloud can break between Mac architectures. Delete and reinstall. -
Custom Endpoint Icons: Models from custom endpoints may match name patterns (e.g., "claude"). Check
custom:prefix first.
User Query (+ optional web search)
↓
[Web Search: DuckDuckGo/Tavily/Brave + Jina Reader]
↓
Stage 1: Parallel queries → Stream individual responses
↓
Stage 2: Anonymize → Parallel peer rankings → Parse rankings
↓
Calculate aggregate rankings
↓
Stage 3: Chairman synthesis → Stream final answer
↓
Save conversation (stage1, stage2, stage3 only)
Three modes control deliberation depth:
- Chat Only: Stage 1 only (quick responses)
- Chat + Ranking: Stages 1 & 2 (peer review without synthesis)
- Full Deliberation: All 3 stages (default)
# Check Ollama models
curl http://localhost:11434/api/tags
# Test custom endpoint
curl https://your-endpoint.com/v1/models -H "Authorization: Bearer $API_KEY"
# View logs
# Watch terminal running backend/main.pyProviders: DuckDuckGo (free), Tavily (API), Brave (API)
Full Content Fetching: Jina Reader (https://r.jina.ai/{url}) extracts article text for top N results (configurable 0-10, default 3). Falls back to summary if fetch fails or yields <500 chars. 25-second timeout per article, 60-second total search budget.
Search Query Processing:
- Direct (default): Send exact query to search engine
- YAKE: Extract keywords first (useful for long prompts)
UI Sections (sidebar navigation):
- LLM API Keys: OpenRouter, Groq, Ollama, Direct providers, Custom endpoint
- Council Config: Model selection with Remote/Local toggles, temperature controls, "I'm Feeling Lucky" randomizer
- System Prompts: Stage 1/2/3 prompts with reset-to-default
- Search Providers: DuckDuckGo, Tavily, Brave + Jina full content settings
- Backup & Reset: Import/Export config, reset to defaults
Auto-Save Behavior:
- Credentials auto-save: API keys and URLs save immediately on successful test
- Configs require manual save: Model selections, prompts, temperatures
- UX flow: Test → Success → Auto-save → Clear input → "Settings saved!"
Temperature Controls:
- Council Heat: Stage 1 creativity (default: 0.5)
- Chairman Heat: Stage 3 synthesis (default: 0.4)
- Stage 2 Heat: Peer ranking consistency (default: 0.3)
Rate Limit Warnings:
- Formula:
(council_members × 2) + 2requests per council run - OpenRouter free tier: 20 RPM, 50 requests/day
- Groq: 30 RPM, 14,400 requests/day
Storage: data/settings.json
- Graceful Degradation: Single model failure doesn't block entire council
- Transparency: All raw outputs inspectable via tabs
- De-anonymization: Models receive "Response A/B/C", frontend displays real names
- Progress Indicators: "X/Y completed" during streaming
- Provider Flexibility: Mix cloud, local, and custom endpoints freely
Communication:
- NEVER make assumptions when requirements are vague - ask for clarification
- Provide options with pros/cons for different approaches
- Confirm understanding before significant changes
Code Safety:
- NEVER use placeholders like
// ...in edits - this deletes code - Always provide full content when writing/editing files
- FastAPI: Inject raw
Requestobject to accessis_disconnected() - React: Use spread operators for immutable state updates (StrictMode runs effects twice)
- Model performance analytics over time
- Export conversations to markdown/PDF
- Custom ranking criteria (beyond accuracy/insight)
- Backend caching for repeated queries
- Multiple custom endpoints support