mrveiss · mrveiss · Mar 26, 2026 · Mar 26, 2026
@@ -53,7 +53,7 @@ SPECIALIZED MISSING:
 ❌ codellama:7b-instruct         - Code analysis optimization
 ❌ phi3:3.8b                     - Fast inference model
 ❌ qwen2.5:7b                    - Enhanced reasoning
-❌ mistral:7b-instruct           - Alternative reasoning
+❌ qwen3.5:9b                    - Default reasoning model
 ```
 
 ---

@@ -120,7 +120,7 @@ ollama pull codellama:7b-instruct              # For code-specific tasks
 
 # Optional advanced models
 ollama pull qwen2.5:7b                          # For general reasoning
-ollama pull mistral:7b-instruct                # Alternative reasoning model
+ollama pull qwen3.5:9b                         # Default reasoning model
 ```
 
 ### 2. Update Configuration Files

@@ -228,7 +228,7 @@ def get_ollama_endpoint_for_model(self, model_name: str) -> str:
         """Route Ollama requests to GPU or CPU endpoint by model (#1070).
 
         Args:
-            model_name: Ollama model name (e.g. 'mistral:7b-instruct')
+            model_name: Ollama model name (e.g. 'qwen3.5:9b')
 
         Returns:
             Ollama base URL (no /api suffix)
@@ -316,7 +316,7 @@ def get_model_for_agent(self, agent_id: str) -> str:
             agent_id: Agent identifier (e.g., 'orchestrator', 'research', 'code_analysis')
 
         Returns:
-            Model name (e.g., 'gpt-4', 'claude-3-opus', 'mistral:7b-instruct')
+            Model name (e.g., 'gpt-4', 'claude-3-opus', 'qwen3.5:9b')
 
         Example:
             # In .env:
@@ -898,7 +898,7 @@ class AutoBotConfig(BaseSettings):
         config = get_config()
         backend = config.backend_url  # http://172.16.168.20:8001
         redis = config.redis_url  # redis://172.16.168.23:6379
-        model = config.llm.default_model  # mistral:7b-instruct
+        model = config.llm.default_model  # qwen3.5:9b
     """
 
     model_config = SettingsConfigDict(
@@ -1223,7 +1223,7 @@ def get_agent_llm_config_explicit(agent_id: str) -> dict:
     Each agent MUST have its own provider, endpoint, and model via environment variables:
     - AUTOBOT_{AGENT_ID}_PROVIDER (e.g., AUTOBOT_ORCHESTRATOR_PROVIDER=ollama)
     - AUTOBOT_{AGENT_ID}_ENDPOINT (e.g., AUTOBOT_ORCHESTRATOR_ENDPOINT=http://127.0.0.1:11434)
-    - AUTOBOT_{AGENT_ID}_MODEL (e.g., AUTOBOT_ORCHESTRATOR_MODEL=mistral:7b-instruct)
+    - AUTOBOT_{AGENT_ID}_MODEL (e.g., AUTOBOT_ORCHESTRATOR_MODEL=qwen3.5:9b)
 
     Raises AgentConfigurationError if any setting is missing.
 
@@ -1306,7 +1306,7 @@ def get_agent_model_explicit(agent_id: str) -> str:
         raise AgentConfigurationError(
             f"Agent '{agent_id}' requires explicit LLM model configuration. "
             f"Set {env_key} in .env file. "
-            f"Example: {env_key}=mistral:7b-instruct"
+            f"Example: {env_key}=qwen3.5:9b"
         )
     return model
 

@@ -90,7 +90,7 @@ This section documents key architectural decisions where the original plan was r
 
 **Current Implementation** (Temporary):
 
-- **Mistral 7B Instruct** (`mistral:7b-instruct`) - Used for ALL task types:
+- **Qwen 3.5 9B** (`qwen3.5:9b`) - Used for ALL task types:
   - Default LLM, Embedding, Classification, Reasoning
   - RAG, Coding, Orchestrator, Agent tasks
   - Research, Analysis, Planning
@@ -112,10 +112,10 @@ This section documents key architectural decisions where the original plan was r
 **Current Configuration** (from `.env`):
 
 ```bash
-AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
-AUTOBOT_EMBEDDING_MODEL=mistral:7b-instruct
-AUTOBOT_CLASSIFICATION_MODEL=mistral:7b-instruct  # TODO: Use 1B model
-AUTOBOT_REASONING_MODEL=mistral:7b-instruct
+AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b
+AUTOBOT_EMBEDDING_MODEL=qwen3.5:9b
+AUTOBOT_CLASSIFICATION_MODEL=qwen3.5:9b  # TODO: Use 1B model
+AUTOBOT_REASONING_MODEL=qwen3.5:9b
 # Future: tiered model distribution for specialized agents
 ```
 

@@ -17,7 +17,7 @@ AutoBot supports comprehensive configuration through environment variables with
 
 | Variable | Default | Description |
 |----------|---------|-------------|
-| `AUTOBOT_DEFAULT_LLM_MODEL` | `mistral:7b-instruct` | **Primary** - Default LLM model for all tasks |
+| `AUTOBOT_DEFAULT_LLM_MODEL` | `qwen3.5:9b` | **Primary** - Default LLM model for all tasks |
 | `AUTOBOT_OLLAMA_HOST` | `172.16.168.24` | Ollama server host (AI Stack VM) |
 | `AUTOBOT_OLLAMA_PORT` | `11434` | Ollama server port |
 | `AUTOBOT_OLLAMA_ENDPOINT` | `http://${HOST}:${PORT}/api/generate` | Ollama API endpoint |
@@ -119,8 +119,8 @@ The frontend uses Vite environment variables with the `VITE_` prefix:
 
 ### Setting Default LLM Model
 ```bash
-export AUTOBOT_DEFAULT_LLM_MODEL="mistral:7b-instruct"
-export AUTOBOT_ORCHESTRATOR_LLM="mistral:7b-instruct"
+export AUTOBOT_DEFAULT_LLM_MODEL="qwen3.5:9b"
+export AUTOBOT_ORCHESTRATOR_LLM="qwen3.5:9b"
 ```
 
 ### Using Different Backend Port

@@ -43,7 +43,7 @@ This document describes a **latency-focused** inference optimization architectur
 AutoBot's LLM infrastructure:
 - **Ollama** (primary) - Local inference at `127.0.0.1:11434`
 - **vLLM** - High-performance inference with prefix caching
-- **Default model:** `mistral:7b-instruct`
+- **Default model:** `qwen3.5:9b`
 - **Current latency:** ~500ms first token
 
 ### Problem with AirLLM Approach
@@ -547,17 +547,17 @@ QUANTIZED_MODEL_REGISTRY = {
 Ollama models already support quantization via GGUF format:
 
 ```bash
-# Current: mistral:7b-instruct (FP16, ~14GB)
+# Current: qwen3.5:9b (FP16, ~14GB)
 # Optimized options:
-ollama pull mistral:7b-instruct-q4_K_M   # 4-bit, ~4GB, slight quality loss
-ollama pull mistral:7b-instruct-q8_0     # 8-bit, ~8GB, minimal quality loss
+ollama pull qwen3.5:9b-q4_K_M   # 4-bit, ~4GB, slight quality loss
+ollama pull qwen3.5:9b-q8_0     # 8-bit, ~8GB, minimal quality loss
 ```
 
 **Update `.env` for quantized Ollama models:**
 
 ```bash
 # Use quantized model for better performance
-AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct-q8_0
+AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b-q8_0
 ```
 
 ---

@@ -205,7 +205,7 @@ AUTOBOT_MAIN_MACHINE_IP=172.16.168.20
 AUTOBOT_FRONTEND_VM_IP=172.16.168.21
 AUTOBOT_REDIS_VM_IP=172.16.168.23
 AUTOBOT_BACKEND_PORT=8001
-AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
+AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b
 
 Layer 2: Frozen Code Defaults (Emergency Fallback)
 --------------------------------------------------
@@ -314,7 +314,7 @@ AUTOBOT_PORT_GRAFANA=3000
 # -----------------------------------------------------------------------------
 # LLM CONFIGURATION
 # -----------------------------------------------------------------------------
-AUTOBOT_LLM_DEFAULT_MODEL=mistral:7b-instruct
+AUTOBOT_LLM_DEFAULT_MODEL=qwen3.5:9b
 AUTOBOT_LLM_EMBEDDING_MODEL=nomic-embed-text:latest
 AUTOBOT_LLM_CLASSIFICATION_MODEL=gemma2:2b
 AUTOBOT_LLM_PROVIDER=ollama
@@ -411,7 +411,7 @@ class PortConfig(BaseSettings):
 
 class LLMConfig(BaseSettings):
     """LLM configuration"""
-    default_model: str = Field(alias="AUTOBOT_LLM_DEFAULT_MODEL", default="mistral:7b-instruct")
+    default_model: str = Field(alias="AUTOBOT_LLM_DEFAULT_MODEL", default="qwen3.5:9b")
     embedding_model: str = Field(alias="AUTOBOT_LLM_EMBEDDING_MODEL", default="nomic-embed-text:latest")
     provider: str = Field(alias="AUTOBOT_LLM_PROVIDER", default="ollama")
     timeout: int = Field(alias="AUTOBOT_LLM_TIMEOUT", default=120)
@@ -589,7 +589,7 @@ export function getConfig(): AutoBotConfig {
   };
 
   const llm: LLMConfig = {
-    defaultModel: getEnv('VITE_LLM_DEFAULT_MODEL', 'mistral:7b-instruct'),
+    defaultModel: getEnv('VITE_LLM_DEFAULT_MODEL', 'qwen3.5:9b'),
     embeddingModel: getEnv('VITE_LLM_EMBEDDING_MODEL', 'nomic-embed-text:latest'),
     provider: getEnv('VITE_LLM_PROVIDER', 'ollama'),
     timeout: getEnvNumber('VITE_LLM_TIMEOUT', 120),

@@ -268,7 +268,7 @@ AutoBot supports environment variable overrides using the `AUTOBOT_` prefix:
 |----------|-------------|---------|---------|
 | `AUTOBOT_BACKEND_PORT` | `backend.server_port` | Backend server port | `8002` |
 | `AUTOBOT_BACKEND_HOST` | `backend.server_host` | Backend bind address | `127.0.0.1` |
-| `AUTOBOT_DEFAULT_LLM_MODEL` | `llm_config.ollama.model` | **Primary** - Default LLM model | `mistral:7b-instruct` |
+| `AUTOBOT_DEFAULT_LLM_MODEL` | `llm_config.ollama.model` | **Primary** - Default LLM model | `qwen3.5:9b` |
 | `AUTOBOT_OLLAMA_HOST` | `llm_config.ollama.host` | Ollama server URL | `http://ollama:11434` |
 | `AUTOBOT_OLLAMA_PORT` | `llm_config.ollama.port` | Ollama server port | `11434` |
 | `AUTOBOT_ORCHESTRATOR_LLM` | `llm_config.orchestrator_llm` | Orchestrator LLM | `gpt-4` |

@@ -114,7 +114,7 @@ Examples:
 #### LLM Spans
 ```python
 "llm.provider": "ollama",
-"llm.model": "mistral:7b-instruct",
+"llm.model": "qwen3.5:9b",
 "llm.streaming": True,
 "llm.temperature": 0.7,
 "llm.prompt_messages": 3,

@@ -375,7 +375,7 @@ These conflicts drive the default fleet layout:
 | **External deps** | — |
 | **Ansible playbook** | `playbooks/deploy_role.yml` |
 | **Source path** | — (binary install from ollama.ai) |
-| **GPU models** | mistral:7b-instruct, deepseek-r1:14b, codellama:13b |
+| **GPU models** | qwen3.5:9b, deepseek-r1:14b, codellama:13b |
 | **Concurrency** | max_loaded=5, num_parallel=4, keep_alive=10m |
 | **Special hardware** | NVIDIA GPU required. Auto-detected via nvidia-smi. |
 | **Degraded without** | Large model inference — system falls back to CPU models or cloud providers |

@@ -415,7 +415,7 @@ All infrastructure configuration (IPs, ports, hosts) is in `.env`:
 AUTOBOT_BACKEND_HOST=172.16.168.20
 AUTOBOT_REDIS_HOST=172.16.168.23
 AUTOBOT_OLLAMA_HOST=127.0.0.1
-AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
+AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b
 ```
 
 ### What Goes Where?

@@ -205,7 +205,7 @@ def query(self, ...):
 **Ensure Mistral is Default Model** (Required for Tool Calling):
 ```bash
 # In .env file:
-AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
+AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b
 ```
 
 **Why Mistral?**
@@ -217,7 +217,7 @@ AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
 **Verify**:
 ```bash
 grep "AUTOBOT_DEFAULT_LLM_MODEL" .env
-# Should output: AUTOBOT_DEFAULT_LLM_MODEL=mistral:7b-instruct
+# Should output: AUTOBOT_DEFAULT_LLM_MODEL=qwen3.5:9b
 ```
 
 ---
@@ -324,7 +324,7 @@ Here's the implementation plan...
 2. **Verify model is Mistral**:
    ```bash
    grep "AUTOBOT_DEFAULT_LLM_MODEL" .env
-   # Should be: mistral:7b-instruct
+   # Should be: qwen3.5:9b
    ```
 
 3. **Check system prompt loaded**:

@@ -24,7 +24,7 @@ Tiered Model Routing automatically selects the most appropriate LLM model based
 
 **Default Models:**
 - Simple Tier: `gemma2:2b` (fast, low resource)
-- Complex Tier: `mistral:7b-instruct` (capable, comprehensive)
+- Complex Tier: `qwen3.5:9b` (capable, comprehensive)
 
 ### Complexity Scoring
 
@@ -105,7 +105,7 @@ AUTOBOT_COMPLEXITY_THRESHOLD=3.0
 
 # Model assignments
 AUTOBOT_MODEL_TIER_SIMPLE=gemma2:2b
-AUTOBOT_MODEL_TIER_COMPLEX=mistral:7b-instruct
+AUTOBOT_MODEL_TIER_COMPLEX=qwen3.5:9b
 
 # Fallback behavior (default: true)
 # If simple tier fails, automatically retry with complex tier
@@ -131,7 +131,7 @@ enabled = tier_config.get("enabled", True)
 
 # Get models
 simple_model = tier_config.get("models", {}).get("simple", "gemma2:2b")
-complex_model = tier_config.get("models", {}).get("complex", "mistral:7b-instruct")
+complex_model = tier_config.get("models", {}).get("complex", "qwen3.5:9b")
 
 # Get threshold
 threshold = tier_config.get("complexity_threshold", 3.0)
@@ -208,7 +208,7 @@ Get current tiered routing configuration.
   "complexity_threshold": 3.0,
   "models": {
     "simple": "gemma2:2b",
-    "complex": "mistral:7b-instruct"
+    "complex": "qwen3.5:9b"
   },
   "fallback_to_complex": true,
   "logging": {
@@ -321,10 +321,10 @@ curl -X POST http://localhost:8001/api/llm/tiered-routing/config \
 When `log_routing_decisions` is enabled, routing decisions are logged:
 
 ```
-INFO - Tiered routing: mistral:7b-instruct -> gemma2:2b
+INFO - Tiered routing: qwen3.5:9b -> gemma2:2b
        (score=1.8, tier=simple, reason=Low complexity request with minimal indicators)
 
-INFO - Tiered routing: selected mistral:7b-instruct
+INFO - Tiered routing: selected qwen3.5:9b
        (score=5.4, tier=complex)
 
 WARNING - Tiered routing fallback triggered: simple -> complex tier