This document describes the implementation of the thinking budget feature for LEANN, which allows users to control the computational effort for reasoning models like GPT-Oss:20b.
The thinking budget feature provides three levels of computational effort for reasoning models:
low: Fast responses, basic reasoning (default for simple queries)medium: Balanced speed and reasoning depthhigh: Maximum reasoning effort, best for complex analytical questions
Added --thinking-budget parameter to both CLI and RAG examples:
# LEANN CLI
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high
# RAG Examples
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high
python apps/document_rag.py --llm openai --llm-model o3 --thinking-budget mediumdef ask(self, prompt: str, **kwargs) -> str:
# Handle thinking budget for reasoning models
options = kwargs.copy()
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget:
options.pop("thinking_budget", None)
if thinking_budget in ["low", "medium", "high"]:
options["reasoning"] = {"effort": thinking_budget, "exclude": False}API Format: Uses Ollama's reasoning parameter with effort and exclude fields.
def ask(self, prompt: str, **kwargs) -> str:
# Handle thinking budget for reasoning models
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget and thinking_budget in ["low", "medium", "high"]:
# Check if this is an o-series model
o_series_models = ["o3", "o3-mini", "o4-mini", "o1", "o3-pro", "o3-deep-research"]
if any(model in self.model for model in o_series_models):
params["reasoning_effort"] = thinking_budgetAPI Format: Uses OpenAI's reasoning_effort parameter for o-series models.
The thinking budget parameter is properly propagated through the LEANN architecture:
- CLI (
packages/leann-core/src/leann/cli.py): Captures--thinking-budgetargument - Base RAG (
apps/base_rag_example.py): Adds parameter to argument parser - LeannChat (
packages/leann-core/src/leann/api.py): Passesllm_kwargsto LLM - LLM Interface: Handles the parameter in backend-specific implementations
packages/leann-core/src/leann/chat.py: Added thinking budget support to OllamaChat and OpenAIChatpackages/leann-core/src/leann/cli.py: Added--thinking-budgetargumentapps/base_rag_example.py: Added thinking budget parameter to RAG examples
README.md: Added thinking budget parameter to usage examplesdocs/configuration-guide.md: Added detailed documentation and usage guidelines
examples/thinking_budget_demo.py: Comprehensive demo script with usage examples
# High reasoning effort for complex questions
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high
# Medium reasoning for balanced performance
leann ask my-index --llm openai --model gpt-4o --thinking-budget medium
# Low reasoning for fast responses
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget low# Email RAG with high reasoning
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high
# Document RAG with medium reasoning
python apps/document_rag.py --llm openai --llm-model gpt-4o --thinking-budget medium- GPT-Oss:20b: Primary target model with reasoning capabilities
- Other reasoning models: Any Ollama model that supports the
reasoningparameter
- o3, o3-mini, o4-mini, o1: o-series reasoning models with
reasoning_effortparameter - GPT-OSS models: Models that support reasoning capabilities
The implementation includes comprehensive testing:
- Parameter handling verification
- Backend-specific API format validation
- CLI argument parsing tests
- Integration with existing LEANN architecture