Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ SPECIALIZED MISSING:
❌ codellama:7b-instruct - Code analysis optimization
❌ phi3:3.8b - Fast inference model
❌ qwen2.5:7b - Enhanced reasoning
mistral:7b-instruct - Alternative reasoning
qwen3.5:9b - Default reasoning model
```

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ ollama pull codellama:7b-instruct # For code-specific tasks

# Optional advanced models
ollama pull qwen2.5:7b # For general reasoning
ollama pull mistral:7b-instruct # Alternative reasoning model
ollama pull qwen3.5:9b # Default reasoning model
```

### 2. Update Configuration Files
Expand Down
10 changes: 5 additions & 5 deletions autobot-shared/ssot_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ def get_ollama_endpoint_for_model(self, model_name: str) -> str:
"""Route Ollama requests to GPU or CPU endpoint by model (#1070).

Args:
model_name: Ollama model name (e.g. 'mistral:7b-instruct')
model_name: Ollama model name (e.g. 'qwen3.5:9b')

Returns:
Ollama base URL (no /api suffix)
Expand Down Expand Up @@ -316,7 +316,7 @@ def get_model_for_agent(self, agent_id: str) -> str:
agent_id: Agent identifier (e.g., 'orchestrator', 'research', 'code_analysis')

Returns:
Model name (e.g., 'gpt-4', 'claude-3-opus', 'mistral:7b-instruct')
Model name (e.g., 'gpt-4', 'claude-3-opus', 'qwen3.5:9b')

Example:
# In .env:
Expand Down Expand Up @@ -898,7 +898,7 @@ class AutoBotConfig(BaseSettings):
config = get_config()
backend = config.backend_url # http://172.16.168.20:8001
redis = config.redis_url # redis://172.16.168.23:6379
model = config.llm.default_model # mistral:7b-instruct
model = config.llm.default_model # qwen3.5:9b
"""

model_config = SettingsConfigDict(
Expand Down Expand Up @@ -1223,7 +1223,7 @@ def get_agent_llm_config_explicit(agent_id: str) -> dict:
Each agent MUST have its own provider, endpoint, and model via environment variables:
- AUTOBOT_{AGENT_ID}_PROVIDER (e.g., AUTOBOT_ORCHESTRATOR_PROVIDER=ollama)
- AUTOBOT_{AGENT_ID}_ENDPOINT (e.g., AUTOBOT_ORCHESTRATOR_ENDPOINT=http://127.0.0.1:11434)
- AUTOBOT_{AGENT_ID}_MODEL (e.g., AUTOBOT_ORCHESTRATOR_MODEL=mistral:7b-instruct)
- AUTOBOT_{AGENT_ID}_MODEL (e.g., AUTOBOT_ORCHESTRATOR_MODEL=qwen3.5:9b)

Raises AgentConfigurationError if any setting is missing.

Expand Down Expand Up @@ -1306,7 +1306,7 @@ def get_agent_model_explicit(agent_id: str) -> str:
raise AgentConfigurationError(
f"Agent '{agent_id}' requires explicit LLM model configuration. "
f"Set {env_key} in .env file. "
f"Example: {env_key}=mistral:7b-instruct"
f"Example: {env_key}=qwen3.5:9b"
)
return model

Expand Down
10 changes: 5 additions & 5 deletions docs/ROADMAP_2025.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ This section documents key architectural decisions where the original plan was r

**Current Implementation** (Temporary):

- **Mistral 7B Instruct** (`mistral:7b-instruct`) - Used for ALL task types:
- **Qwen 3.5 9B** (`qwen3.5:9b`) - Used for ALL task types:
- Default LLM, Embedding, Classification, Reasoning
- RAG, Coding, Orchestrator, Agent tasks
- Research, Analysis, Planning
Expand All @@ -112,10 +112,10 @@ This section documents key architectural decisions where the original plan was r
**Current Configuration** (from `.env`):

```bash
AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
AUTOBOT_EMBEDDING_MODEL=mistral:7b-instruct
AUTOBOT_CLASSIFICATION_MODEL=mistral:7b-instruct # TODO: Use 1B model
AUTOBOT_REASONING_MODEL=mistral:7b-instruct
AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b
AUTOBOT_EMBEDDING_MODEL=qwen3.5:9b
AUTOBOT_CLASSIFICATION_MODEL=qwen3.5:9b # TODO: Use 1B model
AUTOBOT_REASONING_MODEL=qwen3.5:9b
# Future: tiered model distribution for specialized agents
```

Expand Down
6 changes: 3 additions & 3 deletions docs/api/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ AutoBot supports comprehensive configuration through environment variables with

| Variable | Default | Description |
|----------|---------|-------------|
| `AUTOBOT_DEFAULT_LLM_MODEL` | `mistral:7b-instruct` | **Primary** - Default LLM model for all tasks |
| `AUTOBOT_DEFAULT_LLM_MODEL` | `qwen3.5:9b` | **Primary** - Default LLM model for all tasks |
| `AUTOBOT_OLLAMA_HOST` | `172.16.168.24` | Ollama server host (AI Stack VM) |
| `AUTOBOT_OLLAMA_PORT` | `11434` | Ollama server port |
| `AUTOBOT_OLLAMA_ENDPOINT` | `http://${HOST}:${PORT}/api/generate` | Ollama API endpoint |
Expand Down Expand Up @@ -119,8 +119,8 @@ The frontend uses Vite environment variables with the `VITE_` prefix:

### Setting Default LLM Model
```bash
export AUTOBOT_DEFAULT_LLM_MODEL="mistral:7b-instruct"
export AUTOBOT_ORCHESTRATOR_LLM="mistral:7b-instruct"
export AUTOBOT_DEFAULT_LLM_MODEL="qwen3.5:9b"
export AUTOBOT_ORCHESTRATOR_LLM="qwen3.5:9b"
```

### Using Different Backend Port
Expand Down
10 changes: 5 additions & 5 deletions docs/architecture/EFFICIENT_INFERENCE_DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ This document describes a **latency-focused** inference optimization architectur
AutoBot's LLM infrastructure:
- **Ollama** (primary) - Local inference at `127.0.0.1:11434`
- **vLLM** - High-performance inference with prefix caching
- **Default model:** `mistral:7b-instruct`
- **Default model:** `qwen3.5:9b`
- **Current latency:** ~500ms first token

### Problem with AirLLM Approach
Expand Down Expand Up @@ -547,17 +547,17 @@ QUANTIZED_MODEL_REGISTRY = {
Ollama models already support quantization via GGUF format:

```bash
# Current: mistral:7b-instruct (FP16, ~14GB)
# Current: qwen3.5:9b (FP16, ~14GB)
# Optimized options:
ollama pull mistral:7b-instruct-q4_K_M # 4-bit, ~4GB, slight quality loss
ollama pull mistral:7b-instruct-q8_0 # 8-bit, ~8GB, minimal quality loss
ollama pull qwen3.5:9b-q4_K_M # 4-bit, ~4GB, slight quality loss
ollama pull qwen3.5:9b-q8_0 # 8-bit, ~8GB, minimal quality loss
```

**Update `.env` for quantized Ollama models:**

```bash
# Use quantized model for better performance
AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct-q8_0
AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b-q8_0
```

---
Expand Down
8 changes: 4 additions & 4 deletions docs/architecture/SSOT_CONFIGURATION_ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ AUTOBOT_MAIN_MACHINE_IP=172.16.168.20
AUTOBOT_FRONTEND_VM_IP=172.16.168.21
AUTOBOT_REDIS_VM_IP=172.16.168.23
AUTOBOT_BACKEND_PORT=8001
AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b

Layer 2: Frozen Code Defaults (Emergency Fallback)
--------------------------------------------------
Expand Down Expand Up @@ -314,7 +314,7 @@ AUTOBOT_PORT_GRAFANA=3000
# -----------------------------------------------------------------------------
# LLM CONFIGURATION
# -----------------------------------------------------------------------------
AUTOBOT_LLM_DEFAULT_MODEL=mistral:7b-instruct
AUTOBOT_LLM_DEFAULT_MODEL=qwen3.5:9b
AUTOBOT_LLM_EMBEDDING_MODEL=nomic-embed-text:latest
AUTOBOT_LLM_CLASSIFICATION_MODEL=gemma2:2b
AUTOBOT_LLM_PROVIDER=ollama
Expand Down Expand Up @@ -411,7 +411,7 @@ class PortConfig(BaseSettings):

class LLMConfig(BaseSettings):
"""LLM configuration"""
default_model: str = Field(alias="AUTOBOT_LLM_DEFAULT_MODEL", default="mistral:7b-instruct")
default_model: str = Field(alias="AUTOBOT_LLM_DEFAULT_MODEL", default="qwen3.5:9b")
embedding_model: str = Field(alias="AUTOBOT_LLM_EMBEDDING_MODEL", default="nomic-embed-text:latest")
provider: str = Field(alias="AUTOBOT_LLM_PROVIDER", default="ollama")
timeout: int = Field(alias="AUTOBOT_LLM_TIMEOUT", default=120)
Expand Down Expand Up @@ -589,7 +589,7 @@ export function getConfig(): AutoBotConfig {
};

const llm: LLMConfig = {
defaultModel: getEnv('VITE_LLM_DEFAULT_MODEL', 'mistral:7b-instruct'),
defaultModel: getEnv('VITE_LLM_DEFAULT_MODEL', 'qwen3.5:9b'),
embeddingModel: getEnv('VITE_LLM_EMBEDDING_MODEL', 'nomic-embed-text:latest'),
provider: getEnv('VITE_LLM_PROVIDER', 'ollama'),
timeout: getEnvNumber('VITE_LLM_TIMEOUT', 120),
Expand Down
2 changes: 1 addition & 1 deletion docs/developer/04-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ AutoBot supports environment variable overrides using the `AUTOBOT_` prefix:
|----------|-------------|---------|---------|
| `AUTOBOT_BACKEND_PORT` | `backend.server_port` | Backend server port | `8002` |
| `AUTOBOT_BACKEND_HOST` | `backend.server_host` | Backend bind address | `127.0.0.1` |
| `AUTOBOT_DEFAULT_LLM_MODEL` | `llm_config.ollama.model` | **Primary** - Default LLM model | `mistral:7b-instruct` |
| `AUTOBOT_DEFAULT_LLM_MODEL` | `llm_config.ollama.model` | **Primary** - Default LLM model | `qwen3.5:9b` |
| `AUTOBOT_OLLAMA_HOST` | `llm_config.ollama.host` | Ollama server URL | `http://ollama:11434` |
| `AUTOBOT_OLLAMA_PORT` | `llm_config.ollama.port` | Ollama server port | `11434` |
| `AUTOBOT_ORCHESTRATOR_LLM` | `llm_config.orchestrator_llm` | Orchestrator LLM | `gpt-4` |
Expand Down
2 changes: 1 addition & 1 deletion docs/developer/DISTRIBUTED_TRACING.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Examples:
#### LLM Spans
```python
"llm.provider": "ollama",
"llm.model": "mistral:7b-instruct",
"llm.model": "qwen3.5:9b",
"llm.streaming": True,
"llm.temperature": 0.7,
"llm.prompt_messages": 3,
Expand Down
2 changes: 1 addition & 1 deletion docs/developer/ROLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@ These conflicts drive the default fleet layout:
| **External deps** | — |
| **Ansible playbook** | `playbooks/deploy_role.yml` |
| **Source path** | — (binary install from ollama.ai) |
| **GPU models** | mistral:7b-instruct, deepseek-r1:14b, codellama:13b |
| **GPU models** | qwen3.5:9b, deepseek-r1:14b, codellama:13b |
| **Concurrency** | max_loaded=5, num_parallel=4, keep_alive=10m |
| **Special hardware** | NVIDIA GPU required. Auto-detected via nvidia-smi. |
| **Degraded without** | Large model inference — system falls back to CPU models or cloud providers |
Expand Down
2 changes: 1 addition & 1 deletion docs/developer/SSOT_CONFIG_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,7 @@ All infrastructure configuration (IPs, ports, hosts) is in `.env`:
AUTOBOT_BACKEND_HOST=172.16.168.20
AUTOBOT_REDIS_HOST=172.16.168.23
AUTOBOT_OLLAMA_HOST=127.0.0.1
AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b
```

### What Goes Where?
Expand Down
6 changes: 3 additions & 3 deletions docs/developer/THINKING_TOOLS_CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ def query(self, ...):
**Ensure Mistral is Default Model** (Required for Tool Calling):
```bash
# In .env file:
AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b
```

**Why Mistral?**
Expand All @@ -217,7 +217,7 @@ AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
**Verify**:
```bash
grep "AUTOBOT_DEFAULT_LLM_MODEL" .env
# Should output: AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
# Should output: AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b
```

---
Expand Down Expand Up @@ -324,7 +324,7 @@ Here's the implementation plan...
2. **Verify model is Mistral**:
```bash
grep "AUTOBOT_DEFAULT_LLM_MODEL" .env
# Should be: mistral:7b-instruct
# Should be: qwen3.5:9b
```

3. **Check system prompt loaded**:
Expand Down
12 changes: 6 additions & 6 deletions docs/developer/TIERED_MODEL_ROUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Tiered Model Routing automatically selects the most appropriate LLM model based

**Default Models:**
- Simple Tier: `gemma2:2b` (fast, low resource)
- Complex Tier: `mistral:7b-instruct` (capable, comprehensive)
- Complex Tier: `qwen3.5:9b` (capable, comprehensive)

### Complexity Scoring

Expand Down Expand Up @@ -105,7 +105,7 @@ AUTOBOT_COMPLEXITY_THRESHOLD=3.0

# Model assignments
AUTOBOT_MODEL_TIER_SIMPLE=gemma2:2b
AUTOBOT_MODEL_TIER_COMPLEX=mistral:7b-instruct
AUTOBOT_MODEL_TIER_COMPLEX=qwen3.5:9b

# Fallback behavior (default: true)
# If simple tier fails, automatically retry with complex tier
Expand All @@ -131,7 +131,7 @@ enabled = tier_config.get("enabled", True)

# Get models
simple_model = tier_config.get("models", {}).get("simple", "gemma2:2b")
complex_model = tier_config.get("models", {}).get("complex", "mistral:7b-instruct")
complex_model = tier_config.get("models", {}).get("complex", "qwen3.5:9b")

# Get threshold
threshold = tier_config.get("complexity_threshold", 3.0)
Expand Down Expand Up @@ -208,7 +208,7 @@ Get current tiered routing configuration.
"complexity_threshold": 3.0,
"models": {
"simple": "gemma2:2b",
"complex": "mistral:7b-instruct"
"complex": "qwen3.5:9b"
},
"fallback_to_complex": true,
"logging": {
Expand Down Expand Up @@ -321,10 +321,10 @@ curl -X POST http://localhost:8001/api/llm/tiered-routing/config \
When `log_routing_decisions` is enabled, routing decisions are logged:

```
INFO - Tiered routing: mistral:7b-instruct -> gemma2:2b
INFO - Tiered routing: qwen3.5:9b -> gemma2:2b
(score=1.8, tier=simple, reason=Low complexity request with minimal indicators)

INFO - Tiered routing: selected mistral:7b-instruct
INFO - Tiered routing: selected qwen3.5:9b
(score=5.4, tier=complex)

WARNING - Tiered routing fallback triggered: simple -> complex tier
Expand Down
Loading
Loading