Skip to content

Latest commit

 

History

History
123 lines (89 loc) · 4.35 KB

File metadata and controls

123 lines (89 loc) · 4.35 KB

Thinking Budget Feature Implementation

Overview

This document describes the implementation of the thinking budget feature for LEANN, which allows users to control the computational effort for reasoning models like GPT-Oss:20b.

Feature Description

The thinking budget feature provides three levels of computational effort for reasoning models:

  • low: Fast responses, basic reasoning (default for simple queries)
  • medium: Balanced speed and reasoning depth
  • high: Maximum reasoning effort, best for complex analytical questions

Implementation Details

1. Command Line Interface

Added --thinking-budget parameter to both CLI and RAG examples:

# LEANN CLI
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high

# RAG Examples
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high
python apps/document_rag.py --llm openai --llm-model o3 --thinking-budget medium

2. LLM Backend Support

Ollama Backend (packages/leann-core/src/leann/chat.py)

def ask(self, prompt: str, **kwargs) -> str:
    # Handle thinking budget for reasoning models
    options = kwargs.copy()
    thinking_budget = kwargs.get("thinking_budget")
    if thinking_budget:
        options.pop("thinking_budget", None)
        if thinking_budget in ["low", "medium", "high"]:
            options["reasoning"] = {"effort": thinking_budget, "exclude": False}

API Format: Uses Ollama's reasoning parameter with effort and exclude fields.

OpenAI Backend (packages/leann-core/src/leann/chat.py)

def ask(self, prompt: str, **kwargs) -> str:
    # Handle thinking budget for reasoning models
    thinking_budget = kwargs.get("thinking_budget")
    if thinking_budget and thinking_budget in ["low", "medium", "high"]:
        # Check if this is an o-series model
        o_series_models = ["o3", "o3-mini", "o4-mini", "o1", "o3-pro", "o3-deep-research"]
        if any(model in self.model for model in o_series_models):
            params["reasoning_effort"] = thinking_budget

API Format: Uses OpenAI's reasoning_effort parameter for o-series models.

3. Parameter Propagation

The thinking budget parameter is properly propagated through the LEANN architecture:

  1. CLI (packages/leann-core/src/leann/cli.py): Captures --thinking-budget argument
  2. Base RAG (apps/base_rag_example.py): Adds parameter to argument parser
  3. LeannChat (packages/leann-core/src/leann/api.py): Passes llm_kwargs to LLM
  4. LLM Interface: Handles the parameter in backend-specific implementations

Files Modified

Core Implementation

  • packages/leann-core/src/leann/chat.py: Added thinking budget support to OllamaChat and OpenAIChat
  • packages/leann-core/src/leann/cli.py: Added --thinking-budget argument
  • apps/base_rag_example.py: Added thinking budget parameter to RAG examples

Documentation

  • README.md: Added thinking budget parameter to usage examples
  • docs/configuration-guide.md: Added detailed documentation and usage guidelines

Examples

  • examples/thinking_budget_demo.py: Comprehensive demo script with usage examples

Usage Examples

Basic Usage

# High reasoning effort for complex questions
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high

# Medium reasoning for balanced performance
leann ask my-index --llm openai --model gpt-4o --thinking-budget medium

# Low reasoning for fast responses
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget low

RAG Examples

# Email RAG with high reasoning
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high

# Document RAG with medium reasoning
python apps/document_rag.py --llm openai --llm-model gpt-4o --thinking-budget medium

Supported Models

Ollama Models

  • GPT-Oss:20b: Primary target model with reasoning capabilities
  • Other reasoning models: Any Ollama model that supports the reasoning parameter

OpenAI Models

  • o3, o3-mini, o4-mini, o1: o-series reasoning models with reasoning_effort parameter
  • GPT-OSS models: Models that support reasoning capabilities

Testing

The implementation includes comprehensive testing:

  • Parameter handling verification
  • Backend-specific API format validation
  • CLI argument parsing tests
  • Integration with existing LEANN architecture