Codename: Quorum Status: Design Phase Author: Colin Johnson / Solvely License: Open Source (TBD β MIT or Apache 2.0)
An open-source framework that orchestrates multiple AI systems into a deliberative council β weighing, challenging, and synthesizing their outputs to produce the best possible outcome, free from single-model bias.
No single AI gets the final say. Every answer is earned through structured debate.
This is not a coding tool. It's a thinking tool. Code is a heavy use case, but Quorum works for anything: writing, prompt engineering, research, idea clarification, decision-making, strategy. If a question has a better answer hiding behind bias, this finds it.
- Developers already using AI for code gen, review, and architecture
- Vibe coders β building with AI as a core workflow, not a side tool
- AI power users β people who prompt multiple models and compare outputs manually
These are people who already know one model isn't enough. Quorum automates what they're doing by hand: asking multiple AIs, comparing, picking the best parts.
Today's AI workflows have a fundamental flaw: single-model dependency. You pick one AI, trust its output, and ship it. But every model has blind spots, training biases, and failure modes. What if the answer you got was just the first answer, not the best one?
Current "multi-model" approaches are either:
- Naive voting β majority rules, no reasoning
- Sequential chains β one model feeds the next, compounding errors
- Human-in-the-loop β doesn't scale, still biased by which model the human trusts
Quorum creates a structured deliberation process across multiple AI systems. Think of it as a panel of experts who must debate, defend, and refine their positions before a decision is made.
Input (prompt/task)
β
βββββββββββββββββββββββββββββββ
β 1. DIVERGE β Independent β Each AI generates its own response
β Generation β in isolation (no cross-contamination)
βββββββββββββββββββββββββββββββ€
β 2. CHALLENGE β Adversarial β Each response is critiqued by the
β Review β other AIs (find flaws, gaps, biases)
βββββββββββββββββββββββββββββββ€
β 3. DEFEND β Rebuttal β Original authors respond to critiques,
β β revise or hold their ground
βββββββββββββββββββββββββββββββ€
β 4. CONVERGE β Synthesis β A synthesizer model (or algorithm)
β & Scoring β merges the best elements into a
β β final output with confidence scores
βββββββββββββββββββββββββββββββ
β
Output (weighted, challenged, refined)
Every participating AI gets equal standing. No model is hardcoded as "the smart one." Weighting is earned through performance, not reputation.
Initial responses are generated independently. No model sees another's output until the challenge phase. This prevents anchoring bias.
Critique isn't optional β it's structural. Every response gets challenged. This surfaces weaknesses that consensus-seeking misses.
Every step is logged: who said what, what was challenged, what survived. The user sees the deliberation trail, not just the final answer.
Works for code, research, writing, strategy, architecture β any task where quality matters more than speed.
| Domain | Example |
|---|---|
| Code | Generate implementations from multiple models β cross-review for bugs, edge cases, performance β synthesize best approach |
| Research | Multiple AIs research a topic independently β challenge each other's sources and conclusions β produce balanced synthesis |
| Architecture | Propose system designs β adversarial review for scalability, security, cost β refined architecture |
| Prompting | Optimize a prompt by having AIs critique and improve each other's prompt engineering |
| Decision Making | Evaluate options with multiple AIs playing devil's advocate on each |
ββββββββββββββββββββββββββββββββββββββββββββ
β Quorum CLI / API β
ββββββββββββββββββββββββββββββββββββββββββββ€
β Orchestrator β
β βββββββββββ βββββββββββ βββββββββββ β
β β Session β β Scoring β β Round β β
β β Manager β β Engine β β Control β β
β βββββββββββ βββββββββββ βββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββ€
β Provider Adapters β
β βββββββββ βββββββββ ββββββββ βββββββ β
β βOpenAI β βClaude β βGeminiβ βLocalβ β
β β β β β β β βOllamaβ β
β βββββββββ βββββββββ ββββββββ βββββββ β
β βββββββββ βββββββββ ββββββββ β
β βMistralβ βDeepSk β βCustomβ β
β β β β β β β β
β βββββββββ βββββββββ ββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββ€
β Output Layer β
β βββββββββββ ββββββββββββ ββββββββββββ β
β βSynthesisβ βConfidenceβ β Audit β β
β β Report β β Scores β β Trail β β
β βββββββββββ ββββββββββββ ββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββ
Orchestrator β Controls the deliberation flow (rounds, timing, termination conditions).
Provider Adapters β Uniform interface to any AI backend. OpenAI, Anthropic, Google, Mistral, DeepSeek, local models via Ollama β anything with a chat API.
Session Manager β Tracks the full deliberation: inputs, each model's responses, critiques, rebuttals, and final synthesis.
Scoring Engine β Weights outputs based on configurable criteria:
- Factual accuracy (verifiable claims)
- Reasoning quality (logical coherence)
- Completeness (coverage of edge cases)
- Novelty (unique insights not raised by others)
- Consensus (agreement across models)
Round Control β Manages how many deliberation rounds occur. Can be fixed (e.g., 2 rounds) or dynamic (converge when delta drops below threshold).
- Blind Review β During challenge phase, critiques don't know which model produced the original (prevents brand bias)
- Rotating Synthesizer β The model that produces the final synthesis rotates; no single model always gets the last word
- Historical Calibration β Track model accuracy over time; weight adjusts based on track record per domain
- Minority Report β Dissenting opinions are preserved in output, not silenced by majority
Each output includes:
- Consensus Score (0-1) β How much agreement across models
- Confidence Score (0-1) β Strength of evidence/reasoning
- Controversy Flag β Highlights where models fundamentally disagreed
- Source Attribution β Which model contributed which elements
# quorum.yaml
council:
providers:
- name: claude
model: claude-sonnet-4-20250514
provider: anthropic
- name: gpt
model: gpt-4o
provider: openai
- name: gemini
model: gemini-2.0-flash
provider: google
- name: local
model: qwen2.5:14b
provider: ollama
deliberation:
rounds: 2 # Max deliberation rounds
convergence_threshold: 0.85 # Stop early if consensus > this
isolation: true # Independent generation (no peeking)
blind_review: true # Hide model identity during critique
scoring:
weights:
accuracy: 0.3
reasoning: 0.25
completeness: 0.2
novelty: 0.15
consensus: 0.1
output:
include_audit_trail: true
include_minority_report: true
format: markdown # markdown | json | htmlFirst run experience is critical. The framework should discover and guide, not assume.
$ quorum init
π Welcome to Quorum.
Let me figure out what AI tools you have available...
Scanning...
β
ollama β found (models: qwen2.5:14b, llama3.2)
β
claude β CLI detected (claude)
β
openai β API key found (OPENAI_API_KEY)
β gemini β not found
You've got 3 providers ready. That's enough for a solid council.
Want to add more?
[1] Add an API key (OpenAI, Anthropic, Google, Mistral, DeepSeek...)
[2] Connect a local model (Ollama, llama.cpp, LM Studio)
[3] Done β let's go
>
- CLI scan β check PATH for known tools (
ollama,claude,aichat,gemini, etc.) - Env vars β check for
OPENAI_API_KEY,ANTHROPIC_API_KEY,GOOGLE_API_KEY, etc. - Local services β probe localhost ports for Ollama (11434), LM Studio (1234), etc.
- Guided add β walk the user through adding any provider they have credentials for
- Test connection β verify each provider actually works before saving
Config gets written to ~/.quorum/config.yaml (or project-local quorum.yaml).
Users can re-run quorum init anytime to add/remove providers or run quorum providers add <name>.
# Onboarding
quorum init
# Simple query
quorum ask "What's the best database for time-series data?"
# Code generation
quorum code "Build a rate limiter in Go" --lang go
# Research
quorum research "Compare CRDT vs OT for real-time collaboration"
# Architecture review
quorum review ./architecture.md
# With specific providers
quorum ask "..." --providers claude,gpt,gemini
# Interactive deliberation (watch the debate)
quorum ask "..." --interactive
# Manage providers
quorum providers list
quorum providers add openai
quorum providers test| Feature | Quorum | Simple Routing (e.g., OpenRouter) | Agent Swarms |
|---|---|---|---|
| Independent generation | β | β (picks one model) | Varies |
| Adversarial challenge | β | β | Rarely |
| Blind review | β | N/A | β |
| Audit trail | β | β | Sometimes |
| Minority report | β | N/A | β |
| Domain-agnostic | β | β | Usually narrow |
| Provider-agnostic | β | β | Varies |
| Open source | β | Some | Some |
Quorum is not tied to a programming language or domain. It's a deliberation framework. The task could be:
- Writing production code
- Crafting a better prompt
- Clarifying a vague idea into a sharp spec
- Researching a topic with conflicting sources
- Naming a product
- Designing a system architecture
The framework doesn't care. It runs the same diverge β challenge β converge loop regardless of domain.
Users configure whichever providers and models they have access to. Could be all cloud APIs, all local models, or a mix. Quorum is the orchestration layer β it doesn't gatekeep which AIs participate.
Multi-model = multi-cost. The framework provides:
- Transparency β estimated token usage before and actual usage after each session
- Presets β quick (2 models, 1 round), thorough (3-4 models, 2 rounds), exhaustive (all models, converge-until-stable)
- Caps β optional per-session token/cost limits
- Local-first option β run entirely on local models (Ollama, llama.cpp) for $0
But the framework never restricts. If someone wants to throw 8 models at a problem, that's their choice.
Instead of hardcoded deliberation logic, each use case gets an agent file β a config that defines how the council behaves for that domain:
# agents/code-review.yaml
name: Code Review Council
rounds: 2
focus:
- correctness
- edge_cases
- performance
- readability
challenge_style: adversarial # adversarial | collaborative | socratic
scoring_weights:
accuracy: 0.35
completeness: 0.30
reasoning: 0.20
novelty: 0.15# agents/brainstorm.yaml
name: Brainstorm Council
rounds: 3
focus:
- creativity
- feasibility
- uniqueness
challenge_style: socratic # Push ideas further, don't tear down
scoring_weights:
novelty: 0.40
feasibility: 0.30
reasoning: 0.20
consensus: 0.10Users create, share, and remix agent files. The framework ships with sensible defaults; the community builds the rest.
The deliberation log isn't a debug artifact β it's a first-class output. Seeing how the council reached its conclusion (who argued what, what got challenged, what survived) is often more valuable than the final answer. Every session produces:
- The synthesized output
- The full deliberation trail (who said what, round by round)
- Confidence scores and minority reports
- A "decision rationale" summary
"Quorum" is the working codename. Final name TBD β will be brainstormed using the framework itself as a first real test.
Build solo first. Get the core solid and working. Then open source it. No design-by-committee on the foundation.
Launch a project coin on Bags tied to Quorum. Potential mechanics:
- Access tiers β Hold X tokens for premium agent files, priority hosted API, advanced analytics
- Governance β Token holders vote on roadmap priorities, default agent file configs, which models to benchmark
- Bounties β Fund community contributions (new agent files, provider adapters, scoring plugins) with token rewards
- Staking for compute β Stake tokens to access a hosted deliberation API (offsets infrastructure costs)
- Early supporter upside β Early backers get tokens before the framework gains traction; value grows with adoption
- Community alignment β Token holders are invested in the project's success, not just paying customers
- Open source compatible β The framework stays free; the token adds a value layer on top without paywalling the core
- Funding without VC β Bootstrap development through token launch, not pitch decks
- Synthesis Strategy β Should the synthesizer be a dedicated model call, or algorithmic merging?
- Streaming β Can we stream the deliberation in real-time for interactive mode?
- Plugin System β Should providers and scoring be pluggable from day one?
| Date | Decision | Rationale |
|---|---|---|
| 2026-02-11 | TypeScript for implementation | Async-native, first-class AI SDKs, npm distribution (npx), large contributor pool. Validated by CodeBot. Future: language-agnostic protocol layer so other runtimes can plug in. |
Build locally, prove the loop works before anything goes public.
- Pick implementation language (best for scale + ease of use)
- Core orchestrator: diverge β challenge β converge
- Provider adapters: 2-3 providers minimum
-
quorum initonboarding (auto-detect + guided setup) -
quorum askwith basic output + audit trail - First test: Council names itself
- Blind review system
- Historical calibration / model performance tracking
- Configurable scoring weights
-
codeandresearchspecialized modes - Interactive mode (watch the debate)
- Plugin system for custom providers and scorers
- API server mode (not just CLI)
- Web UI for deliberation visualization
- Integration with CI/CD (code review counsel)
- Community scoring benchmarks
- Constitutional AI (Anthropic) β AI critiquing AI outputs
- Mixture of Experts β routing to specialized models
- Debate (OpenAI research) β adversarial AI alignment technique
- Ensemble Methods (ML) β combining multiple models for better predictions
- Judicial Systems β adversarial process β better truth-finding
This is a living document. It will evolve as the design solidifies.