diff --git a/docs/en/llama_stack/install.mdx b/docs/en/llama_stack/install.mdx
index 735e59c..60a709d 100644
--- a/docs/en/llama_stack/install.mdx
+++ b/docs/en/llama_stack/install.mdx
@@ -31,7 +31,8 @@ violet push --platform-address=platform-access-address --platform-username=platf
 After the operator is installed, deploy Llama Stack Server by creating a `LlamaStackDistribution` custom resource:
 
 > **Note:** Prepare the following in advance; otherwise the distribution may not become ready:
-> - **Secret**: Create a Secret (e.g., `deepseek-api`) in the same namespace with the LLM API token. Example: `kubectl create secret generic deepseek-api -n default --from-literal=token=<LLM_API_KEY>`.
+> - **Inference URL**: `VLLM_URL` must point at a **vLLM OpenAI-compatible** HTTP base URL (for example an in-cluster vLLM or KServe InferenceService) that serves the target model.
+> - **Secret (optional)**: `VLLM_API_TOKEN` is only needed when the vLLM endpoint requires authentication. If vLLM has no auth, do not set it. When required, create a Secret in the same namespace and reference it from `containerSpec.env` (see the commented example in the manifest below).
 > - **Storage Class**: Ensure the `default` Storage Class exists in the cluster; otherwise the PVC cannot be bound and the resource will not become ready.
 
 ```yaml
@@ -48,23 +49,28 @@ spec:
   replicas: 1                                      # Number of server replicas
   server:
     containerSpec:
+      name: llama-stack
+      port: 8321
       env:
         - name: VLLM_URL
-          value: "https://api.deepseek.com/v1"     # URL of the LLM API provider
+          value: "http://vllm-predictor.default.svc.cluster.local/v1"   # vLLM OpenAI-compatible base URL
         - name: VLLM_MAX_TOKENS
           value: "8192"                            # Maximum output tokens
-        - name: VLLM_API_TOKEN                     # Load LLM API token from secret
-          valueFrom:
-            secretKeyRef:                          # Create this Secret in the same namespace beforehand, e.g. kubectl create secret generic deepseek-api -n default --from-literal=token=<LLM_API_KEY>
-              key: token
-              name: deepseek-api
-      name: llama-stack
-      port: 8321
+
+        # Optional: VLLM_API_TOKEN — add only when the vLLM endpoint requires authentication.
+        # If vLLM is deployed without auth, omit the entire block below (do not set VLLM_API_TOKEN).
+        # Example after creating: kubectl create secret generic vllm-api-token -n default --from-literal=token=<TOKEN>
+        # - name: VLLM_API_TOKEN
+        #   valueFrom:
+        #     secretKeyRef:
+        #       key: token
+        #       name: vllm-api-token
+
     distribution:
       name: starter                                # Distribution name (options: starter, postgres-demo, meta-reference-gpu)
     storage:
       mountPath: /home/lls/.lls
-      size: 20Gi                                   # Requires the "default" Storage Class to be configured beforehand
+      size: 1Gi                                    # Requires the "default" Storage Class to be configured beforehand
 ```
 
 After deployment, the Llama Stack Server will be available within the cluster. The access URL is displayed in `status.serviceURL`, for example:
@@ -74,3 +80,16 @@ status:
   phase: Ready
   serviceURL: http://demo-service.default.svc.cluster.local:8321
 ```
+
+## Tool calling with vLLM on KServe
+
+The following applies to the **vLLM predictor** on KServe, not to the `LlamaStackDistribution` manifest. For agent flows that use **tools** (client-side tools or MCP), the vLLM process must expose tool-call support. Add predictor container `args` as required by upstream vLLM, for example:
+
+```yaml
+args:
+  - --enable-auto-tool-choice
+  - --tool-call-parser
+  - hermes
+```
+
+Choose `--tool-call-parser` (and any related flags) according to the **served model** and the vLLM documentation for that model family.
diff --git a/docs/en/llama_stack/quickstart.mdx b/docs/en/llama_stack/quickstart.mdx
index f9ada4c..5c5af90 100644
--- a/docs/en/llama_stack/quickstart.mdx
+++ b/docs/en/llama_stack/quickstart.mdx
@@ -9,10 +9,9 @@ This section provides a quickstart example for creating an AI Agent with Llama S
 ## Prerequisites
 
 - Python 3.12 or higher (if not satisfied, refer to [FAQ: How to prepare Python 3.12 in Notebook](#how-to-prepare-python-312-in-notebook))
-- Llama Stack Server installed and running via Operator (see [Install Llama Stack](./install))
+- Llama Stack Server installed and running via Operator (see [Install Llama Stack](./install)), with **`VLLM_URL` pointing at a vLLM-served model endpoint** (see install notes)
 - Access to a Notebook environment (e.g., Jupyter Notebook, JupyterLab)
-- Python environment with `llama-stack-client` and required dependencies installed
-- API key for the LLM provider (e.g., DeepSeek API key)
+- Python environment with `llama-stack-client`, `fastmcp` (for the MCP section), and other notebook dependencies installed
 
 ## Quickstart Example
 
@@ -24,13 +23,10 @@ Download the notebook and upload it to a Notebook environment to run.
 
 The notebook demonstrates:
 
-- Connecting to Llama Stack Server and client setup
-- Tool definition using the `@client_tool` decorator (weather query tool example)
-- Client connection to Llama Stack Server
-- Model selection and Agent creation with tools and instructions
-- Agent execution with session management and streaming responses
-- Result handling and display
-- Optional FastAPI deployment example
+- **Two tool options:** client-side tools (`@client_tool`) and MCP tools (FastMCP + `toolgroups.register`)
+- **Shared agent flow:** connect to Llama Stack Server, select a model, create an `Agent` with `tools=AGENT_TOOLS`, then run sessions and streaming turns
+- Streaming responses and event logging
+- Optional FastAPI deployment of the `agent`
 
 ## FAQ
 
diff --git a/docs/public/llama-stack/llama-stack_quickstart.ipynb b/docs/public/llama-stack/llama-stack_quickstart.ipynb
index 0339c3b..e7ef89a 100644
--- a/docs/public/llama-stack/llama-stack_quickstart.ipynb
+++ b/docs/public/llama-stack/llama-stack_quickstart.ipynb
@@ -7,7 +7,25 @@
       "source": [
         "# Llama Stack Quick Start Demo\n",
         "\n",
-        "This notebook demonstrates how to use Llama Stack to run an agent with **client-side tools**."
+        "This notebook demonstrates how to use Llama Stack to run an agent with tools in two ways:\n",
+        "\n",
+        "- **Option A (section 2):** define a **client-side** weather tool with `@client_tool`; the cell sets **`AGENT_TOOLS`**.\n",
+        "- **Option B (section 2):** run an **MCP** weather tool with **FastMCP** and register it with the server; the register cell sets **`AGENT_TOOLS`**.\n",
+        "- **Section 3** uses the **same** connect / model selection / `Agent` construction / run flow for both options. The only difference is the value of **`AGENT_TOOLS`** passed into `Agent`.\n",
+        "\n",
+        "### Inference backend (`LlamaStackDistribution`)\n",
+        "\n",
+        "- **`VLLM_URL`** should point at a **vLLM OpenAI-compatible** HTTP API for the model in use.\n",
+        "- For **vLLM on KServe**, enable tool calling on the vLLM container by adding extra args, for example:\n",
+        "\n",
+        "```yaml\n",
+        "args:\n",
+        "  - --enable-auto-tool-choice\n",
+        "  - --tool-call-parser\n",
+        "  - <tool-call-parser-value>   # set from vLLM documentation for the deployed model\n",
+        "```\n",
+        "\n",
+        "**MCP prerequisites:** The server distribution must configure the **tool runtime** (the subsystem that executes tool calls for agents) to include the **`model-context-protocol`** provider so MCP tools can be invoked. The MCP URL must be reachable **from the server** (not only from the notebook).\n"
       ]
     },
     {
@@ -17,7 +35,7 @@
       "source": [
         "## 1. Install Dependencies\n",
         "\n",
-        "**Note:** `llama-stack-client` requires Python 3.12 or higher. If your Python version does not meet this requirement, refer to the FAQ section in the documentation: **How to prepare Python 3.12 in Notebook**."
+        "**Note:** `llama-stack-client` requires Python 3.12 or higher. If your Python version does not meet this requirement, refer to the FAQ section in the documentation: **How to prepare Python 3.12 in Notebook**.\n"
       ]
     },
     {
@@ -30,96 +48,286 @@
         "# Use current kernel's Python so PATH does not point to another env\n",
         "# If download is slow, add: -i https://pypi.tuna.tsinghua.edu.cn/simple\n",
         "import sys\n",
-        "!{sys.executable} -m pip install \"llama-stack-client>=0.4\" \"requests\" \"fastapi\" \"uvicorn\" --target ~/packages"
+        "!{sys.executable} -m pip install \"llama-stack-client>=0.4\" \"requests\" \"fastapi\" \"uvicorn\" \"fastmcp\""
       ]
     },
     {
       "cell_type": "markdown",
-      "id": "9d942699",
+      "id": "baabf4fc",
       "metadata": {},
       "source": [
-        "## 2. Import Libraries"
+        "\n",
+        "## 2. Define Tools\n",
+        "\n",
+        "### Create Llama Stack Client"
       ]
     },
     {
       "cell_type": "code",
       "execution_count": null,
-      "id": "cfd65276",
+      "id": "lls-client-init",
       "metadata": {},
       "outputs": [],
       "source": [
-        "import sys\n",
-        "from pathlib import Path\n",
+        "import os\n",
+        "from llama_stack_client import LlamaStackClient\n",
+        "\n",
+        "# Set LLAMA_STACK_URL to the actual Llama Stack Server URL (cluster Service/Route or port-forward).\n",
+        "# The default below only works when the server is reachable at localhost:8321.\n",
+        "base_url = os.getenv(\"LLAMA_STACK_URL\", \"http://localhost:8321\")\n",
+        "client = LlamaStackClient(base_url=base_url)\n",
+        "print(f\"Llama Stack client created (LLAMA_STACK_URL={base_url})\")\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "8d42246c",
+      "metadata": {},
+      "source": [
+        "### Select Tool Option\n",
+        "Set `TOOL_OPTION` first to control which tool path section 3 will use.\n",
         "\n",
-        "user_site_packages = Path.home() / \"packages\"\n",
-        "if str(user_site_packages) not in sys.path:\n",
-        "    sys.path.insert(0, str(user_site_packages))\n",
+        "- `A`: client-side tool via `@client_tool`\n",
+        "- `B`: MCP tool via FastMCP + toolgroup registration\n",
         "\n",
-        "import os\n",
+        "Then run the corresponding setup cells below.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "tool-option-selector",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Choose one: \"A\" (client tool) or \"B\" (MCP)\n",
+        "TOOL_OPTION = \"A\"\n",
+        "print(f\"Selected TOOL_OPTION={TOOL_OPTION}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "optA-title-md",
+      "metadata": {},
+      "source": [
+        "### Option A: Define client-side tool\n",
+        "\n",
+        "Run this cell only when `TOOL_OPTION = \"A\"`. It defines `get_weather` and sets `AGENT_TOOLS` for the shared flow in section 3.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "c57f95e5",
+      "metadata": {},
+      "outputs": [],
+      "source": [
         "import requests\n",
         "from typing import Dict, Any\n",
         "from urllib.parse import quote\n",
-        "from llama_stack_client import LlamaStackClient, Agent\n",
         "from llama_stack_client.lib.agents.client_tool import client_tool\n",
-        "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n",
         "\n",
-        "print('Libraries imported successfully')"
+        "\n",
+        "if globals().get(\"TOOL_OPTION\") != \"A\":\n",
+        "    print('Skip Option A setup (TOOL_OPTION != \"A\")')\n",
+        "else:\n",
+        "    @client_tool\n",
+        "    def get_weather(city: str) -> Dict[str, Any]:\n",
+        "        \"\"\"Get current weather information for a specified city.\n",
+        "\n",
+        "        Uses the wttr.in free weather API to fetch weather data.\n",
+        "\n",
+        "        :param city: City name, e.g., Beijing, Shanghai, Paris\n",
+        "        :returns: Dictionary containing weather information including city, temperature and humidity\n",
+        "        \"\"\"\n",
+        "        try:\n",
+        "            encoded_city = quote(city)\n",
+        "            url = f'https://wttr.in/{encoded_city}?format=j1'\n",
+        "            response = requests.get(url, timeout=10)\n",
+        "            response.raise_for_status()\n",
+        "            data = response.json()\n",
+        "\n",
+        "            current = data['current_condition'][0]\n",
+        "            return {\n",
+        "                'city': city,\n",
+        "                'temperature': f\"{current['temp_C']}°C\",\n",
+        "                'humidity': f\"{current['humidity']}%\",\n",
+        "            }\n",
+        "        except Exception as e:\n",
+        "            return {'error': f'Failed to get weather information: {str(e)}'}\n",
+        "\n",
+        "    AGENT_TOOLS = [get_weather]\n",
+        "    print('Option A: AGENT_TOOLS = [get_weather]')\n"
       ]
     },
     {
       "cell_type": "markdown",
-      "id": "baabf4fc",
+      "id": "1614c719",
       "metadata": {},
       "source": [
-        "## 3. Define Tools\n",
+        "### Option B: MCP tool (FastMCP)\n",
         "\n",
-        "Use the `@client_tool` decorator to define a weather query tool."
+        "Start an MCP server with the **`fastmcp`** package: Streamable HTTP on port **8002**, tool `get_weather_mcp`. Tools are executed by **llama-server** against this URL.\n",
+        "\n",
+        "Set **`MCP_SERVER_URL`** if the Llama Stack Server runs elsewhere (e.g. in-cluster): the URL must be reachable from that server. If unset, the notebook derives a LAN IP for port **8002**.\n"
       ]
     },
     {
       "cell_type": "code",
       "execution_count": null,
-      "id": "c57f95e5",
+      "id": "e4da207a",
       "metadata": {},
       "outputs": [],
       "source": [
-        "@client_tool\n",
-        "def get_weather(city: str) -> Dict[str, Any]:\n",
-        "    \"\"\"Get current weather information for a specified city.\n",
+        "if globals().get(\"TOOL_OPTION\") != \"B\":\n",
+        "    print('Skip Option B MCP server start (TOOL_OPTION != \"B\")')\n",
+        "else:\n",
+        "    import os\n",
+        "    import socket\n",
+        "    import sys\n",
+        "    import time\n",
+        "    from pathlib import Path\n",
+        "    from subprocess import Popen\n",
+        "\n",
+        "    # Start the MCP server in a separate Python process.\n",
+        "    # This avoids multiprocessing pickle/spawn issues on Windows/macOS.\n",
+        "    server_script = r'''\n",
+        "from urllib.parse import quote\n",
+        "\n",
+        "import requests\n",
+        "from fastmcp import FastMCP\n",
+        "\n",
+        "\n",
+        "mcp = FastMCP(\"demo-weather\")\n",
         "\n",
-        "    Uses the wttr.in free weather API to fetch weather data.\n",
         "\n",
-        "    :param city: City name, e.g., Beijing, Shanghai, Paris\n",
-        "    :returns: Dictionary containing weather information including city, temperature and humidity\n",
-        "    \"\"\"\n",
+        "@mcp.tool()\n",
+        "def get_weather_mcp(city: str) -> dict:\n",
+        "    \"\"\"Get current weather for a city (wttr.in).\"\"\"\n",
         "    try:\n",
-        "        # URL encode the city name to handle spaces and special characters\n",
         "        encoded_city = quote(city)\n",
-        "        url = f'https://wttr.in/{encoded_city}?format=j1'\n",
-        "        response = requests.get(url, timeout=10)\n",
-        "        response.raise_for_status()\n",
-        "        data = response.json()\n",
-        "\n",
-        "        current = data['current_condition'][0]\n",
+        "        url = f\"https://wttr.in/{encoded_city}?format=j1\"\n",
+        "        r = requests.get(url, timeout=10)\n",
+        "        r.raise_for_status()\n",
+        "        data = r.json()\n",
+        "        cur = data[\"current_condition\"][0]\n",
         "        return {\n",
-        "            'city': city,\n",
-        "            'temperature': f\"{current['temp_C']}°C\",\n",
-        "            'humidity': f\"{current['humidity']}%\",\n",
+        "            \"city\": city,\n",
+        "            \"temperature_c\": cur[\"temp_C\"],\n",
+        "            \"humidity\": cur[\"humidity\"],\n",
         "        }\n",
         "    except Exception as e:\n",
-        "        return {'error': f'Failed to get weather information: {str(e)}'}\n",
+        "        return {\"error\": str(e)}\n",
+        "\n",
+        "\n",
+        "if __name__ == \"__main__\":\n",
+        "    mcp.run(transport=\"streamable-http\", host=\"0.0.0.0\", port=8002)\n",
+        "'''\n",
+        "\n",
+        "    script_path = Path(\"/tmp/fastmcp_weather_server.py\")\n",
+        "    script_path.write_text(server_script, encoding=\"utf-8\")\n",
+        "\n",
+        "    # Best-effort stop existing process when re-running\n",
+        "    if \"mcp_proc\" in globals() and mcp_proc and getattr(mcp_proc, \"poll\", None) and mcp_proc.poll() is None:\n",
+        "        try:\n",
+        "            mcp_proc.terminate()\n",
+        "            mcp_proc.wait(timeout=2)\n",
+        "        except Exception:\n",
+        "            pass\n",
+        "\n",
+        "    mcp_proc = Popen([sys.executable, str(script_path)], env=os.environ.copy())\n",
+        "\n",
+        "    # Readiness: wait for local port to accept connections (no fixed sleep)\n",
+        "    deadline = time.time() + 20\n",
+        "    last_err = None\n",
+        "    while time.time() < deadline:\n",
+        "        try:\n",
+        "            with socket.create_connection((\"127.0.0.1\", 8002), timeout=1):\n",
+        "                last_err = None\n",
+        "                break\n",
+        "        except Exception as e:\n",
+        "            last_err = e\n",
+        "            time.sleep(0.25)\n",
+        "    if last_err is not None:\n",
+        "        raise RuntimeError(f\"MCP server did not become ready on 127.0.0.1:8002: {last_err}\")\n",
+        "\n",
+        "    MCP_SERVER_URL = os.getenv(\"MCP_SERVER_URL\")\n",
+        "    if not MCP_SERVER_URL:\n",
+        "        _host = socket.gethostbyname(socket.gethostname())\n",
+        "        if _host.startswith(\"127.\"):\n",
+        "            _host = os.getenv(\"MCP_SERVER_HOST\", \"127.0.0.1\")\n",
+        "        MCP_SERVER_URL = f\"http://{_host}:8002/mcp\"\n",
+        "\n",
+        "    os.environ[\"MCP_SERVER_URL\"] = MCP_SERVER_URL\n",
+        "    print(f\"✓ MCP (FastMCP) at {MCP_SERVER_URL} — tool get_weather_mcp\")\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "3b9ea887",
+      "metadata": {},
+      "source": [
+        "### Option B: Register MCP tool group\n",
+        "\n",
+        "Uses `toolgroups.register` with `provider_id=\"model-context-protocol\"`. A **timestamp** is appended to the tool group id so re-running the cell avoids duplicate-id errors. This cell sets **`AGENT_TOOLS`** for the MCP path.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "75ffadf0",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "import time\n",
+        "\n",
+        "\n",
+        "if globals().get(\"TOOL_OPTION\") != \"B\":\n",
+        "    print('Skip Option B registration (TOOL_OPTION != \"B\")')\n",
+        "else:\n",
+        "\n",
+        "    mcp_server_url = os.getenv(\"MCP_SERVER_URL\")\n",
+        "    if not mcp_server_url:\n",
+        "        raise RuntimeError(\"MCP_SERVER_URL is not set. Run Option B MCP server setup first, or export MCP_SERVER_URL.\")\n",
+        "\n",
+        "    toolgroup_id = f\"mcp::demo-weather-{int(time.time())}\"\n",
+        "    client.toolgroups.register(\n",
+        "        toolgroup_id=toolgroup_id,\n",
+        "        provider_id=\"model-context-protocol\",\n",
+        "        mcp_endpoint={\"uri\": mcp_server_url},\n",
+        "    )\n",
+        "\n",
+        "    AGENT_TOOLS = [\n",
+        "        {\n",
+        "            \"type\": \"mcp\",\n",
+        "            \"server_label\": toolgroup_id,\n",
+        "            \"server_url\": mcp_server_url,\n",
+        "        }\n",
+        "    ]\n",
+        "    print(\"Option B: AGENT_TOOLS configured for MCP\")\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "fed3605b",
+      "metadata": {},
+      "source": [
+        "### Troubleshooting (MCP / tool calling)\n",
         "\n",
-        "print('Weather tool defined successfully')"
+        "- **400 / message `content` type errors:** Some inference backends expect string `content` while tool turns use structured content. This is a **server–backend compatibility** issue; **vLLM** with `--enable-auto-tool-choice` and a matching `--tool-call-parser` is the supported path for tools here.\n",
+        "- **Alternative:** Prefer **Option A** (client-side tools) if MCP HTTP is not reachable from llama-server.\n"
       ]
     },
     {
       "cell_type": "markdown",
-      "id": "05cefded",
+      "id": "sec4-md",
       "metadata": {},
       "source": [
-        "## 4. Connect to Server and Create Agent\n",
+        "## 3. Connect, Create Agent, and Run\n",
         "\n",
-        "Use LlamaStackClient to connect to the running server, create an Agent with the client-side weather tool, and execute tool calls."
+        "Shared flow for both options. This section uses `TOOL_OPTION` + `AGENT_TOOLS` prepared in section 2.\n"
       ]
     },
     {
@@ -129,29 +337,36 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "base_url = os.getenv('LLAMA_STACK_URL', 'http://localhost:8321')\n",
-        "print(f'Connecting to Server: {base_url}')\n",
+        "import os\n",
+        "from llama_stack_client import Agent\n",
         "\n",
-        "client = LlamaStackClient(base_url=base_url)\n",
+        "\n",
+        "if \"AGENT_TOOLS\" not in globals():\n",
+        "    raise RuntimeError(\"AGENT_TOOLS is missing. Run the matching setup cell(s) in section 2 for the selected TOOL_OPTION.\")\n",
         "\n",
         "models = client.models.list()\n",
         "llm_model = next(\n",
-        "    (m for m in models\n",
-        "        if m.custom_metadata and m.custom_metadata.get('model_type') == 'llm'),\n",
-        "    None\n",
+        "    (m for m in models if m.custom_metadata and m.custom_metadata.get(\"model_type\") == \"llm\"),\n",
+        "    None,\n",
         ")\n",
         "if not llm_model:\n",
-        "    raise Exception('No LLM model found')\n",
+        "    raise RuntimeError(\"No LLM model found\")\n",
+        "\n",
         "model_id = llm_model.id\n",
-        "print(f'Using model: {model_id}\\n')\n",
+        "print(f\"Using model: {model_id}\\n\")\n",
+        "\n",
+        "WEATHER_INSTRUCTIONS = (\n",
+        "    \"You are a helpful weather assistant. When users ask about weather, use the weather tool to query and answer.\"\n",
+        ")\n",
         "\n",
         "agent = Agent(\n",
         "    client,\n",
         "    model=model_id,\n",
-        "    instructions='You are a helpful weather assistant. When users ask about weather, use the weather tool to query and answer.',\n",
-        "    tools=[get_weather],\n",
+        "    instructions=WEATHER_INSTRUCTIONS,\n",
+        "    tools=AGENT_TOOLS,\n",
         ")\n",
-        "print('Agent created successfully')"
+        "\n",
+        "print(\"Agent created successfully\")"
       ]
     },
     {
@@ -159,7 +374,9 @@
       "id": "90c28b81",
       "metadata": {},
       "source": [
-        "## 5. Run the Agent"
+        "### Run the Agent\n",
+        "\n",
+        "Same session and turn flow regardless of Option A or B.\n"
       ]
     },
     {
@@ -200,6 +417,8 @@
       "metadata": {},
       "outputs": [],
       "source": [
+        "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n",
+        "\n",
         "logger = AgentEventLogger()\n",
         "for printable in logger.log(response_stream):\n",
         "    print(printable, end='', flush=True)\n",
@@ -232,7 +451,6 @@
         "    stream=True,\n",
         ")\n",
         "\n",
-        "logger = AgentEventLogger()\n",
         "for printable in logger.log(response_stream):\n",
         "    print(printable, end='', flush=True)\n",
         "print('\\n')"
@@ -243,9 +461,9 @@
       "id": "6f8d31d0",
       "metadata": {},
       "source": [
-        "## 6. FastAPI Service Example\n",
+        "## 4. FastAPI Service Example\n",
         "\n",
-        "You can also run the agent as a FastAPI web service for production use. This allows you to expose the agent functionality via HTTP API endpoints."
+        "Expose the `llama-stack-client`-based `agent` as a FastAPI web service, so it can be called via HTTP.\n"
       ]
     },
     {
@@ -255,15 +473,17 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "# Import FastAPI components\n",
+        "import time\n",
         "from fastapi import FastAPI\n",
         "from pydantic import BaseModel\n",
         "from threading import Thread\n",
-        "import time\n",
+        "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n",
+        "\n",
         "\n",
         "# Create a simple FastAPI app\n",
         "api_app = FastAPI(title=\"Llama Stack Agent API\")\n",
         "\n",
+        "\n",
         "class ChatRequest(BaseModel):\n",
         "    message: str\n",
         "\n",
@@ -288,6 +508,7 @@
         "\n",
         "    return {\"response\": full_response}\n",
         "\n",
+        "\n",
         "print(\"FastAPI app created. Use the next cell to start the server.\")"
       ]
     },
@@ -346,6 +567,8 @@
       "metadata": {},
       "outputs": [],
       "source": [
+        "import requests\n",
+        "\n",
         "# Test the API endpoint\n",
         "response = requests.post(\n",
         "    \"http://127.0.0.1:8000/chat\",\n",
@@ -358,6 +581,16 @@
         "print(response.json().get('response'))"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "id": "cleanup-all-md",
+      "metadata": {},
+      "source": [
+        "## Cleanup\n",
+        "\n",
+        "Run cleanup cells when finished (especially if Option B was used)."
+      ]
+    },
     {
       "cell_type": "markdown",
       "id": "945a776f",
@@ -373,7 +606,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "# Stop the FastAPI server (section 6)\n",
+        "# Stop the FastAPI server\n",
         "if 'server' in globals() and server.started:\n",
         "    server.should_exit = True\n",
         "    print(\"✓ FastAPI server shutdown requested.\")\n",
@@ -381,12 +614,61 @@
         "    print(\"FastAPI server is not running or has already stopped.\")"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "id": "9d557594",
+      "metadata": {},
+      "source": [
+        "### Stop the MCP server process"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "a1679861",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "\n",
+        "stopped = False\n",
+        "\n",
+        "# New launcher uses subprocess.Popen stored in mcp_proc\n",
+        "if \"mcp_proc\" in globals() and mcp_proc and getattr(mcp_proc, \"poll\", None) and mcp_proc.poll() is None:\n",
+        "    try:\n",
+        "        mcp_proc.terminate()\n",
+        "        mcp_proc.wait(timeout=2)\n",
+        "        stopped = True\n",
+        "    except Exception:\n",
+        "        pass\n",
+        "\n",
+        "# Backward compatibility for older runs that used multiprocessing\n",
+        "if not stopped and \"mcp_process\" in globals() and getattr(mcp_process, \"is_alive\", None) and mcp_process.is_alive():\n",
+        "    try:\n",
+        "        mcp_process.terminate()\n",
+        "        mcp_process.join(timeout=2)\n",
+        "        stopped = True\n",
+        "    except Exception:\n",
+        "        pass\n",
+        "\n",
+        "if stopped:\n",
+        "    print(\"✓ MCP server process stopped.\")\n",
+        "else:\n",
+        "    print(\"MCP server process is not running or has already stopped.\")\n",
+        "\n",
+        "# Clear MCP runtime state for clean re-runs\n",
+        "os.environ.pop(\"MCP_SERVER_URL\", None)\n",
+        "if \"MCP_SERVER_URL\" in globals():\n",
+        "    del MCP_SERVER_URL\n",
+        "print(\"✓ Cleared MCP_SERVER_URL from env/state.\")\n"
+      ]
+    },
     {
       "cell_type": "markdown",
       "id": "a3ebed1f",
       "metadata": {},
       "source": [
-        "## 7. More Resources\n",
+        "## 5. More Resources\n",
         "\n",
         "For more resources on developing AI Agents with Llama Stack, see:\n",
         "\n",
diff --git a/docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb b/docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb
deleted file mode 100644
index 71d2be0..0000000
--- a/docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb
+++ /dev/null
@@ -1,173 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "# Llama Stack Quick Start — MCP Option (Optional)\n",
-        "\n",
-        "This notebook contains **Option B: MCP tool** only. Use it when the Llama Stack MCP adapter is ready. The main quickstart uses client-side tools only.\n",
-        "\n",
-        "**Prerequisites:** Same as the main quickstart (Section 1–2: install deps, import libs, define `get_weather` is not needed here). Run the **MCP server** below, then **connect and create the agent** with MCP tools. MCP tools are **invoked by the Llama Stack Server (llama-server)**; the MCP server URL must be reachable from where the server runs."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## Option B: MCP tool\n",
-        "\n",
-        "Run an MCP server that exposes a weather query tool (same capability as the client-side `get_weather`, via MCP). This example uses **Streamable HTTP** (single `/mcp` endpoint; SSE is deprecated). The server is registered with Llama Stack in the next section. *Requires the Llama Stack Server to have `tool_runtime` with the `model-context-protocol` provider.*"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Start the MCP server in a separate process\n",
-        "import os\n",
-        "from multiprocessing import Process\n",
-        "\n",
-        "def _run_mcp_weather_server():\n",
-        "    import logging\n",
-        "    logging.basicConfig(level=logging.DEBUG, format='%(name)s %(levelname)s: %(message)s')\n",
-        "    logging.getLogger(\"mcp\").setLevel(logging.DEBUG)\n",
-        "    from urllib.parse import quote\n",
-        "    import requests\n",
-        "    from mcp.server.fastmcp import FastMCP\n",
-        "    mcp = FastMCP(\"demo-weather\", host=\"0.0.0.0\", port=8002)\n",
-        "    @mcp.tool()\n",
-        "    def get_weather_mcp(city: str) -> str:\n",
-        "        \"\"\"Get current weather information for a specified city.\n",
-        "\n",
-        "        Uses the wttr.in free weather API to fetch weather data.\n",
-        "\n",
-        "        :param city: City name, e.g., Beijing, Shanghai, Paris\n",
-        "        :returns: Dictionary containing weather information including city, temperature and humidity\n",
-        "        \"\"\"\n",
-        "        try:\n",
-        "            encoded_city = quote(city)\n",
-        "            url = f\"https://wttr.in/{encoded_city}?format=j1\"\n",
-        "            r = requests.get(url, timeout=10)\n",
-        "            r.raise_for_status()\n",
-        "            data = r.json()\n",
-        "            cur = data[\"current_condition\"][0]\n",
-        "            return f\"City: {city}, Temperature: {cur['temp_C']}°C, Humidity: {cur['humidity']}%\"\n",
-        "        except Exception as e:\n",
-        "            return f\"Error: {e}\"\n",
-        "    # streamable-http: single endpoint; use transport=\"sse\" and /sse if server only supports legacy SSE\n",
-        "    mcp.run(transport=\"streamable-http\")\n",
-        "\n",
-        "mcp_process = Process(target=_run_mcp_weather_server, daemon=True)\n",
-        "mcp_process.start()\n",
-        "import socket\n",
-        "# Prefer env so Llama Stack Server can reach this URL\n",
-        "MCP_SERVER_URL = os.getenv(\"MCP_SERVER_URL\")\n",
-        "if not MCP_SERVER_URL:\n",
-        "    _host = socket.gethostbyname(socket.gethostname())\n",
-        "    if _host.startswith(\"127.\"):\n",
-        "        _host = os.getenv(\"MCP_SERVER_HOST\", \"127.0.0.1\")\n",
-        "    MCP_SERVER_URL = f\"http://{_host}:8002/mcp\"\n",
-        "os.environ[\"MCP_SERVER_URL\"] = MCP_SERVER_URL\n",
-        "print(f\"✓ MCP server running at {MCP_SERVER_URL} (Streamable HTTP, tool: get_weather_mcp, bind 0.0.0.0:8002)\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## Connect to Server and Create Agent (MCP tools)\n",
-        "\n",
-        "Register the MCP tool group and create an agent that uses it."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "from llama_stack_client import LlamaStackClient, Agent\n",
-        "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n",
-        "\n",
-        "base_url = os.getenv('LLAMA_STACK_URL', 'http://localhost:8321')\n",
-        "client = LlamaStackClient(base_url=base_url)\n",
-        "\n",
-        "models = client.models.list()\n",
-        "llm_model = next(\n",
-        "    (m for m in models\n",
-        "        if m.custom_metadata and m.custom_metadata.get('model_type') == 'llm'),\n",
-        "    None\n",
-        ")\n",
-        "if not llm_model:\n",
-        "    raise Exception('No LLM model found')\n",
-        "model_id = llm_model.id\n",
-        "\n",
-        "MCP_TOOLGROUP_ID = \"mcp::demo-weather\"\n",
-        "mcp_server_url = os.getenv(\"MCP_SERVER_URL\", \"http://127.0.0.1:8002/mcp\")\n",
-        "client.toolgroups.register(\n",
-        "    toolgroup_id=MCP_TOOLGROUP_ID,\n",
-        "    provider_id=\"model-context-protocol\",\n",
-        "    mcp_endpoint={\"uri\": mcp_server_url},\n",
-        ")\n",
-        "agent_tools = [{\"type\": \"mcp\", \"server_label\": MCP_TOOLGROUP_ID, \"server_url\": mcp_server_url}]\n",
-        "\n",
-        "agent = Agent(\n",
-        "    client,\n",
-        "    model=model_id,\n",
-        "    instructions='You are a helpful weather assistant. When users ask about weather, use the weather tool to query and answer.',\n",
-        "    tools=agent_tools,\n",
-        ")\n",
-        "print('Agent created with MCP weather tool')"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Troubleshooting (MCP / 400 error)\n",
-        "\n",
-        "If you see **400 - messages[3]: invalid type: sequence, expected a string**: the inference backend often expects message `content` to be a string, but the server may send tool-turn content as an array. This is a message-format compatibility issue between the server and the backend, **not caused by SSE/Streamable HTTP**. You can:\n",
-        "- Use the main quickstart with **client-side tool** (Option A) instead, or\n",
-        "- Use **stdio** for MCP (configure the server's `tool_runtime` with `command`/`args` so the server spawns the MCP process; no HTTP URL needed), or\n",
-        "- Check your Llama Stack Server and inference backend docs for tool message format compatibility."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Stop the MCP server"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "if 'mcp_process' in globals() and mcp_process.is_alive():\n",
-        "    mcp_process.terminate()\n",
-        "    mcp_process.join(timeout=2)\n",
-        "    print(\"✓ MCP server process stopped.\")\n",
-        "else:\n",
-        "    print(\"MCP server process is not running or has already stopped.\")"
-      ]
-    }
-  ],
-  "metadata": {
-    "kernelspec": {
-      "display_name": "Python (llama-stack-demo)",
-      "language": "python",
-      "name": "llama-stack-demo"
-    },
-    "language_info": {
-      "name": "python",
-      "version": "3.12.11"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 5
-}