diff --git a/docs/en/llama_stack/install.mdx b/docs/en/llama_stack/install.mdx index 735e59c..60a709d 100644 --- a/docs/en/llama_stack/install.mdx +++ b/docs/en/llama_stack/install.mdx @@ -31,7 +31,8 @@ violet push --platform-address=platform-access-address --platform-username=platf After the operator is installed, deploy Llama Stack Server by creating a `LlamaStackDistribution` custom resource: > **Note:** Prepare the following in advance; otherwise the distribution may not become ready: -> - **Secret**: Create a Secret (e.g., `deepseek-api`) in the same namespace with the LLM API token. Example: `kubectl create secret generic deepseek-api -n default --from-literal=token=`. +> - **Inference URL**: `VLLM_URL` must point at a **vLLM OpenAI-compatible** HTTP base URL (for example an in-cluster vLLM or KServe InferenceService) that serves the target model. +> - **Secret (optional)**: `VLLM_API_TOKEN` is only needed when the vLLM endpoint requires authentication. If vLLM has no auth, do not set it. When required, create a Secret in the same namespace and reference it from `containerSpec.env` (see the commented example in the manifest below). > - **Storage Class**: Ensure the `default` Storage Class exists in the cluster; otherwise the PVC cannot be bound and the resource will not become ready. ```yaml @@ -48,23 +49,28 @@ spec: replicas: 1 # Number of server replicas server: containerSpec: + name: llama-stack + port: 8321 env: - name: VLLM_URL - value: "https://api.deepseek.com/v1" # URL of the LLM API provider + value: "http://vllm-predictor.default.svc.cluster.local/v1" # vLLM OpenAI-compatible base URL - name: VLLM_MAX_TOKENS value: "8192" # Maximum output tokens - - name: VLLM_API_TOKEN # Load LLM API token from secret - valueFrom: - secretKeyRef: # Create this Secret in the same namespace beforehand, e.g. kubectl create secret generic deepseek-api -n default --from-literal=token= - key: token - name: deepseek-api - name: llama-stack - port: 8321 + + # Optional: VLLM_API_TOKEN — add only when the vLLM endpoint requires authentication. + # If vLLM is deployed without auth, omit the entire block below (do not set VLLM_API_TOKEN). + # Example after creating: kubectl create secret generic vllm-api-token -n default --from-literal=token= + # - name: VLLM_API_TOKEN + # valueFrom: + # secretKeyRef: + # key: token + # name: vllm-api-token + distribution: name: starter # Distribution name (options: starter, postgres-demo, meta-reference-gpu) storage: mountPath: /home/lls/.lls - size: 20Gi # Requires the "default" Storage Class to be configured beforehand + size: 1Gi # Requires the "default" Storage Class to be configured beforehand ``` After deployment, the Llama Stack Server will be available within the cluster. The access URL is displayed in `status.serviceURL`, for example: @@ -74,3 +80,16 @@ status: phase: Ready serviceURL: http://demo-service.default.svc.cluster.local:8321 ``` + +## Tool calling with vLLM on KServe + +The following applies to the **vLLM predictor** on KServe, not to the `LlamaStackDistribution` manifest. For agent flows that use **tools** (client-side tools or MCP), the vLLM process must expose tool-call support. Add predictor container `args` as required by upstream vLLM, for example: + +```yaml +args: + - --enable-auto-tool-choice + - --tool-call-parser + - hermes +``` + +Choose `--tool-call-parser` (and any related flags) according to the **served model** and the vLLM documentation for that model family. diff --git a/docs/en/llama_stack/quickstart.mdx b/docs/en/llama_stack/quickstart.mdx index f9ada4c..5c5af90 100644 --- a/docs/en/llama_stack/quickstart.mdx +++ b/docs/en/llama_stack/quickstart.mdx @@ -9,10 +9,9 @@ This section provides a quickstart example for creating an AI Agent with Llama S ## Prerequisites - Python 3.12 or higher (if not satisfied, refer to [FAQ: How to prepare Python 3.12 in Notebook](#how-to-prepare-python-312-in-notebook)) -- Llama Stack Server installed and running via Operator (see [Install Llama Stack](./install)) +- Llama Stack Server installed and running via Operator (see [Install Llama Stack](./install)), with **`VLLM_URL` pointing at a vLLM-served model endpoint** (see install notes) - Access to a Notebook environment (e.g., Jupyter Notebook, JupyterLab) -- Python environment with `llama-stack-client` and required dependencies installed -- API key for the LLM provider (e.g., DeepSeek API key) +- Python environment with `llama-stack-client`, `fastmcp` (for the MCP section), and other notebook dependencies installed ## Quickstart Example @@ -24,13 +23,10 @@ Download the notebook and upload it to a Notebook environment to run. The notebook demonstrates: -- Connecting to Llama Stack Server and client setup -- Tool definition using the `@client_tool` decorator (weather query tool example) -- Client connection to Llama Stack Server -- Model selection and Agent creation with tools and instructions -- Agent execution with session management and streaming responses -- Result handling and display -- Optional FastAPI deployment example +- **Two tool options:** client-side tools (`@client_tool`) and MCP tools (FastMCP + `toolgroups.register`) +- **Shared agent flow:** connect to Llama Stack Server, select a model, create an `Agent` with `tools=AGENT_TOOLS`, then run sessions and streaming turns +- Streaming responses and event logging +- Optional FastAPI deployment of the `agent` ## FAQ diff --git a/docs/public/llama-stack/llama-stack_quickstart.ipynb b/docs/public/llama-stack/llama-stack_quickstart.ipynb index 0339c3b..e7ef89a 100644 --- a/docs/public/llama-stack/llama-stack_quickstart.ipynb +++ b/docs/public/llama-stack/llama-stack_quickstart.ipynb @@ -7,7 +7,25 @@ "source": [ "# Llama Stack Quick Start Demo\n", "\n", - "This notebook demonstrates how to use Llama Stack to run an agent with **client-side tools**." + "This notebook demonstrates how to use Llama Stack to run an agent with tools in two ways:\n", + "\n", + "- **Option A (section 2):** define a **client-side** weather tool with `@client_tool`; the cell sets **`AGENT_TOOLS`**.\n", + "- **Option B (section 2):** run an **MCP** weather tool with **FastMCP** and register it with the server; the register cell sets **`AGENT_TOOLS`**.\n", + "- **Section 3** uses the **same** connect / model selection / `Agent` construction / run flow for both options. The only difference is the value of **`AGENT_TOOLS`** passed into `Agent`.\n", + "\n", + "### Inference backend (`LlamaStackDistribution`)\n", + "\n", + "- **`VLLM_URL`** should point at a **vLLM OpenAI-compatible** HTTP API for the model in use.\n", + "- For **vLLM on KServe**, enable tool calling on the vLLM container by adding extra args, for example:\n", + "\n", + "```yaml\n", + "args:\n", + " - --enable-auto-tool-choice\n", + " - --tool-call-parser\n", + " - # set from vLLM documentation for the deployed model\n", + "```\n", + "\n", + "**MCP prerequisites:** The server distribution must configure the **tool runtime** (the subsystem that executes tool calls for agents) to include the **`model-context-protocol`** provider so MCP tools can be invoked. The MCP URL must be reachable **from the server** (not only from the notebook).\n" ] }, { @@ -17,7 +35,7 @@ "source": [ "## 1. Install Dependencies\n", "\n", - "**Note:** `llama-stack-client` requires Python 3.12 or higher. If your Python version does not meet this requirement, refer to the FAQ section in the documentation: **How to prepare Python 3.12 in Notebook**." + "**Note:** `llama-stack-client` requires Python 3.12 or higher. If your Python version does not meet this requirement, refer to the FAQ section in the documentation: **How to prepare Python 3.12 in Notebook**.\n" ] }, { @@ -30,96 +48,286 @@ "# Use current kernel's Python so PATH does not point to another env\n", "# If download is slow, add: -i https://pypi.tuna.tsinghua.edu.cn/simple\n", "import sys\n", - "!{sys.executable} -m pip install \"llama-stack-client>=0.4\" \"requests\" \"fastapi\" \"uvicorn\" --target ~/packages" + "!{sys.executable} -m pip install \"llama-stack-client>=0.4\" \"requests\" \"fastapi\" \"uvicorn\" \"fastmcp\"" ] }, { "cell_type": "markdown", - "id": "9d942699", + "id": "baabf4fc", "metadata": {}, "source": [ - "## 2. Import Libraries" + "\n", + "## 2. Define Tools\n", + "\n", + "### Create Llama Stack Client" ] }, { "cell_type": "code", "execution_count": null, - "id": "cfd65276", + "id": "lls-client-init", "metadata": {}, "outputs": [], "source": [ - "import sys\n", - "from pathlib import Path\n", + "import os\n", + "from llama_stack_client import LlamaStackClient\n", + "\n", + "# Set LLAMA_STACK_URL to the actual Llama Stack Server URL (cluster Service/Route or port-forward).\n", + "# The default below only works when the server is reachable at localhost:8321.\n", + "base_url = os.getenv(\"LLAMA_STACK_URL\", \"http://localhost:8321\")\n", + "client = LlamaStackClient(base_url=base_url)\n", + "print(f\"Llama Stack client created (LLAMA_STACK_URL={base_url})\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "8d42246c", + "metadata": {}, + "source": [ + "### Select Tool Option\n", + "Set `TOOL_OPTION` first to control which tool path section 3 will use.\n", "\n", - "user_site_packages = Path.home() / \"packages\"\n", - "if str(user_site_packages) not in sys.path:\n", - " sys.path.insert(0, str(user_site_packages))\n", + "- `A`: client-side tool via `@client_tool`\n", + "- `B`: MCP tool via FastMCP + toolgroup registration\n", "\n", - "import os\n", + "Then run the corresponding setup cells below.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "tool-option-selector", + "metadata": {}, + "outputs": [], + "source": [ + "# Choose one: \"A\" (client tool) or \"B\" (MCP)\n", + "TOOL_OPTION = \"A\"\n", + "print(f\"Selected TOOL_OPTION={TOOL_OPTION}\")" + ] + }, + { + "cell_type": "markdown", + "id": "optA-title-md", + "metadata": {}, + "source": [ + "### Option A: Define client-side tool\n", + "\n", + "Run this cell only when `TOOL_OPTION = \"A\"`. It defines `get_weather` and sets `AGENT_TOOLS` for the shared flow in section 3.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c57f95e5", + "metadata": {}, + "outputs": [], + "source": [ "import requests\n", "from typing import Dict, Any\n", "from urllib.parse import quote\n", - "from llama_stack_client import LlamaStackClient, Agent\n", "from llama_stack_client.lib.agents.client_tool import client_tool\n", - "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n", "\n", - "print('Libraries imported successfully')" + "\n", + "if globals().get(\"TOOL_OPTION\") != \"A\":\n", + " print('Skip Option A setup (TOOL_OPTION != \"A\")')\n", + "else:\n", + " @client_tool\n", + " def get_weather(city: str) -> Dict[str, Any]:\n", + " \"\"\"Get current weather information for a specified city.\n", + "\n", + " Uses the wttr.in free weather API to fetch weather data.\n", + "\n", + " :param city: City name, e.g., Beijing, Shanghai, Paris\n", + " :returns: Dictionary containing weather information including city, temperature and humidity\n", + " \"\"\"\n", + " try:\n", + " encoded_city = quote(city)\n", + " url = f'https://wttr.in/{encoded_city}?format=j1'\n", + " response = requests.get(url, timeout=10)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + "\n", + " current = data['current_condition'][0]\n", + " return {\n", + " 'city': city,\n", + " 'temperature': f\"{current['temp_C']}°C\",\n", + " 'humidity': f\"{current['humidity']}%\",\n", + " }\n", + " except Exception as e:\n", + " return {'error': f'Failed to get weather information: {str(e)}'}\n", + "\n", + " AGENT_TOOLS = [get_weather]\n", + " print('Option A: AGENT_TOOLS = [get_weather]')\n" ] }, { "cell_type": "markdown", - "id": "baabf4fc", + "id": "1614c719", "metadata": {}, "source": [ - "## 3. Define Tools\n", + "### Option B: MCP tool (FastMCP)\n", "\n", - "Use the `@client_tool` decorator to define a weather query tool." + "Start an MCP server with the **`fastmcp`** package: Streamable HTTP on port **8002**, tool `get_weather_mcp`. Tools are executed by **llama-server** against this URL.\n", + "\n", + "Set **`MCP_SERVER_URL`** if the Llama Stack Server runs elsewhere (e.g. in-cluster): the URL must be reachable from that server. If unset, the notebook derives a LAN IP for port **8002**.\n" ] }, { "cell_type": "code", "execution_count": null, - "id": "c57f95e5", + "id": "e4da207a", "metadata": {}, "outputs": [], "source": [ - "@client_tool\n", - "def get_weather(city: str) -> Dict[str, Any]:\n", - " \"\"\"Get current weather information for a specified city.\n", + "if globals().get(\"TOOL_OPTION\") != \"B\":\n", + " print('Skip Option B MCP server start (TOOL_OPTION != \"B\")')\n", + "else:\n", + " import os\n", + " import socket\n", + " import sys\n", + " import time\n", + " from pathlib import Path\n", + " from subprocess import Popen\n", + "\n", + " # Start the MCP server in a separate Python process.\n", + " # This avoids multiprocessing pickle/spawn issues on Windows/macOS.\n", + " server_script = r'''\n", + "from urllib.parse import quote\n", + "\n", + "import requests\n", + "from fastmcp import FastMCP\n", + "\n", + "\n", + "mcp = FastMCP(\"demo-weather\")\n", "\n", - " Uses the wttr.in free weather API to fetch weather data.\n", "\n", - " :param city: City name, e.g., Beijing, Shanghai, Paris\n", - " :returns: Dictionary containing weather information including city, temperature and humidity\n", - " \"\"\"\n", + "@mcp.tool()\n", + "def get_weather_mcp(city: str) -> dict:\n", + " \"\"\"Get current weather for a city (wttr.in).\"\"\"\n", " try:\n", - " # URL encode the city name to handle spaces and special characters\n", " encoded_city = quote(city)\n", - " url = f'https://wttr.in/{encoded_city}?format=j1'\n", - " response = requests.get(url, timeout=10)\n", - " response.raise_for_status()\n", - " data = response.json()\n", - "\n", - " current = data['current_condition'][0]\n", + " url = f\"https://wttr.in/{encoded_city}?format=j1\"\n", + " r = requests.get(url, timeout=10)\n", + " r.raise_for_status()\n", + " data = r.json()\n", + " cur = data[\"current_condition\"][0]\n", " return {\n", - " 'city': city,\n", - " 'temperature': f\"{current['temp_C']}°C\",\n", - " 'humidity': f\"{current['humidity']}%\",\n", + " \"city\": city,\n", + " \"temperature_c\": cur[\"temp_C\"],\n", + " \"humidity\": cur[\"humidity\"],\n", " }\n", " except Exception as e:\n", - " return {'error': f'Failed to get weather information: {str(e)}'}\n", + " return {\"error\": str(e)}\n", + "\n", + "\n", + "if __name__ == \"__main__\":\n", + " mcp.run(transport=\"streamable-http\", host=\"0.0.0.0\", port=8002)\n", + "'''\n", + "\n", + " script_path = Path(\"/tmp/fastmcp_weather_server.py\")\n", + " script_path.write_text(server_script, encoding=\"utf-8\")\n", + "\n", + " # Best-effort stop existing process when re-running\n", + " if \"mcp_proc\" in globals() and mcp_proc and getattr(mcp_proc, \"poll\", None) and mcp_proc.poll() is None:\n", + " try:\n", + " mcp_proc.terminate()\n", + " mcp_proc.wait(timeout=2)\n", + " except Exception:\n", + " pass\n", + "\n", + " mcp_proc = Popen([sys.executable, str(script_path)], env=os.environ.copy())\n", + "\n", + " # Readiness: wait for local port to accept connections (no fixed sleep)\n", + " deadline = time.time() + 20\n", + " last_err = None\n", + " while time.time() < deadline:\n", + " try:\n", + " with socket.create_connection((\"127.0.0.1\", 8002), timeout=1):\n", + " last_err = None\n", + " break\n", + " except Exception as e:\n", + " last_err = e\n", + " time.sleep(0.25)\n", + " if last_err is not None:\n", + " raise RuntimeError(f\"MCP server did not become ready on 127.0.0.1:8002: {last_err}\")\n", + "\n", + " MCP_SERVER_URL = os.getenv(\"MCP_SERVER_URL\")\n", + " if not MCP_SERVER_URL:\n", + " _host = socket.gethostbyname(socket.gethostname())\n", + " if _host.startswith(\"127.\"):\n", + " _host = os.getenv(\"MCP_SERVER_HOST\", \"127.0.0.1\")\n", + " MCP_SERVER_URL = f\"http://{_host}:8002/mcp\"\n", + "\n", + " os.environ[\"MCP_SERVER_URL\"] = MCP_SERVER_URL\n", + " print(f\"✓ MCP (FastMCP) at {MCP_SERVER_URL} — tool get_weather_mcp\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "3b9ea887", + "metadata": {}, + "source": [ + "### Option B: Register MCP tool group\n", + "\n", + "Uses `toolgroups.register` with `provider_id=\"model-context-protocol\"`. A **timestamp** is appended to the tool group id so re-running the cell avoids duplicate-id errors. This cell sets **`AGENT_TOOLS`** for the MCP path.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "75ffadf0", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import time\n", + "\n", + "\n", + "if globals().get(\"TOOL_OPTION\") != \"B\":\n", + " print('Skip Option B registration (TOOL_OPTION != \"B\")')\n", + "else:\n", + "\n", + " mcp_server_url = os.getenv(\"MCP_SERVER_URL\")\n", + " if not mcp_server_url:\n", + " raise RuntimeError(\"MCP_SERVER_URL is not set. Run Option B MCP server setup first, or export MCP_SERVER_URL.\")\n", + "\n", + " toolgroup_id = f\"mcp::demo-weather-{int(time.time())}\"\n", + " client.toolgroups.register(\n", + " toolgroup_id=toolgroup_id,\n", + " provider_id=\"model-context-protocol\",\n", + " mcp_endpoint={\"uri\": mcp_server_url},\n", + " )\n", + "\n", + " AGENT_TOOLS = [\n", + " {\n", + " \"type\": \"mcp\",\n", + " \"server_label\": toolgroup_id,\n", + " \"server_url\": mcp_server_url,\n", + " }\n", + " ]\n", + " print(\"Option B: AGENT_TOOLS configured for MCP\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "fed3605b", + "metadata": {}, + "source": [ + "### Troubleshooting (MCP / tool calling)\n", "\n", - "print('Weather tool defined successfully')" + "- **400 / message `content` type errors:** Some inference backends expect string `content` while tool turns use structured content. This is a **server–backend compatibility** issue; **vLLM** with `--enable-auto-tool-choice` and a matching `--tool-call-parser` is the supported path for tools here.\n", + "- **Alternative:** Prefer **Option A** (client-side tools) if MCP HTTP is not reachable from llama-server.\n" ] }, { "cell_type": "markdown", - "id": "05cefded", + "id": "sec4-md", "metadata": {}, "source": [ - "## 4. Connect to Server and Create Agent\n", + "## 3. Connect, Create Agent, and Run\n", "\n", - "Use LlamaStackClient to connect to the running server, create an Agent with the client-side weather tool, and execute tool calls." + "Shared flow for both options. This section uses `TOOL_OPTION` + `AGENT_TOOLS` prepared in section 2.\n" ] }, { @@ -129,29 +337,36 @@ "metadata": {}, "outputs": [], "source": [ - "base_url = os.getenv('LLAMA_STACK_URL', 'http://localhost:8321')\n", - "print(f'Connecting to Server: {base_url}')\n", + "import os\n", + "from llama_stack_client import Agent\n", "\n", - "client = LlamaStackClient(base_url=base_url)\n", + "\n", + "if \"AGENT_TOOLS\" not in globals():\n", + " raise RuntimeError(\"AGENT_TOOLS is missing. Run the matching setup cell(s) in section 2 for the selected TOOL_OPTION.\")\n", "\n", "models = client.models.list()\n", "llm_model = next(\n", - " (m for m in models\n", - " if m.custom_metadata and m.custom_metadata.get('model_type') == 'llm'),\n", - " None\n", + " (m for m in models if m.custom_metadata and m.custom_metadata.get(\"model_type\") == \"llm\"),\n", + " None,\n", ")\n", "if not llm_model:\n", - " raise Exception('No LLM model found')\n", + " raise RuntimeError(\"No LLM model found\")\n", + "\n", "model_id = llm_model.id\n", - "print(f'Using model: {model_id}\\n')\n", + "print(f\"Using model: {model_id}\\n\")\n", + "\n", + "WEATHER_INSTRUCTIONS = (\n", + " \"You are a helpful weather assistant. When users ask about weather, use the weather tool to query and answer.\"\n", + ")\n", "\n", "agent = Agent(\n", " client,\n", " model=model_id,\n", - " instructions='You are a helpful weather assistant. When users ask about weather, use the weather tool to query and answer.',\n", - " tools=[get_weather],\n", + " instructions=WEATHER_INSTRUCTIONS,\n", + " tools=AGENT_TOOLS,\n", ")\n", - "print('Agent created successfully')" + "\n", + "print(\"Agent created successfully\")" ] }, { @@ -159,7 +374,9 @@ "id": "90c28b81", "metadata": {}, "source": [ - "## 5. Run the Agent" + "### Run the Agent\n", + "\n", + "Same session and turn flow regardless of Option A or B.\n" ] }, { @@ -200,6 +417,8 @@ "metadata": {}, "outputs": [], "source": [ + "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n", + "\n", "logger = AgentEventLogger()\n", "for printable in logger.log(response_stream):\n", " print(printable, end='', flush=True)\n", @@ -232,7 +451,6 @@ " stream=True,\n", ")\n", "\n", - "logger = AgentEventLogger()\n", "for printable in logger.log(response_stream):\n", " print(printable, end='', flush=True)\n", "print('\\n')" @@ -243,9 +461,9 @@ "id": "6f8d31d0", "metadata": {}, "source": [ - "## 6. FastAPI Service Example\n", + "## 4. FastAPI Service Example\n", "\n", - "You can also run the agent as a FastAPI web service for production use. This allows you to expose the agent functionality via HTTP API endpoints." + "Expose the `llama-stack-client`-based `agent` as a FastAPI web service, so it can be called via HTTP.\n" ] }, { @@ -255,15 +473,17 @@ "metadata": {}, "outputs": [], "source": [ - "# Import FastAPI components\n", + "import time\n", "from fastapi import FastAPI\n", "from pydantic import BaseModel\n", "from threading import Thread\n", - "import time\n", + "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n", + "\n", "\n", "# Create a simple FastAPI app\n", "api_app = FastAPI(title=\"Llama Stack Agent API\")\n", "\n", + "\n", "class ChatRequest(BaseModel):\n", " message: str\n", "\n", @@ -288,6 +508,7 @@ "\n", " return {\"response\": full_response}\n", "\n", + "\n", "print(\"FastAPI app created. Use the next cell to start the server.\")" ] }, @@ -346,6 +567,8 @@ "metadata": {}, "outputs": [], "source": [ + "import requests\n", + "\n", "# Test the API endpoint\n", "response = requests.post(\n", " \"http://127.0.0.1:8000/chat\",\n", @@ -358,6 +581,16 @@ "print(response.json().get('response'))" ] }, + { + "cell_type": "markdown", + "id": "cleanup-all-md", + "metadata": {}, + "source": [ + "## Cleanup\n", + "\n", + "Run cleanup cells when finished (especially if Option B was used)." + ] + }, { "cell_type": "markdown", "id": "945a776f", @@ -373,7 +606,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Stop the FastAPI server (section 6)\n", + "# Stop the FastAPI server\n", "if 'server' in globals() and server.started:\n", " server.should_exit = True\n", " print(\"✓ FastAPI server shutdown requested.\")\n", @@ -381,12 +614,61 @@ " print(\"FastAPI server is not running or has already stopped.\")" ] }, + { + "cell_type": "markdown", + "id": "9d557594", + "metadata": {}, + "source": [ + "### Stop the MCP server process" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a1679861", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "stopped = False\n", + "\n", + "# New launcher uses subprocess.Popen stored in mcp_proc\n", + "if \"mcp_proc\" in globals() and mcp_proc and getattr(mcp_proc, \"poll\", None) and mcp_proc.poll() is None:\n", + " try:\n", + " mcp_proc.terminate()\n", + " mcp_proc.wait(timeout=2)\n", + " stopped = True\n", + " except Exception:\n", + " pass\n", + "\n", + "# Backward compatibility for older runs that used multiprocessing\n", + "if not stopped and \"mcp_process\" in globals() and getattr(mcp_process, \"is_alive\", None) and mcp_process.is_alive():\n", + " try:\n", + " mcp_process.terminate()\n", + " mcp_process.join(timeout=2)\n", + " stopped = True\n", + " except Exception:\n", + " pass\n", + "\n", + "if stopped:\n", + " print(\"✓ MCP server process stopped.\")\n", + "else:\n", + " print(\"MCP server process is not running or has already stopped.\")\n", + "\n", + "# Clear MCP runtime state for clean re-runs\n", + "os.environ.pop(\"MCP_SERVER_URL\", None)\n", + "if \"MCP_SERVER_URL\" in globals():\n", + " del MCP_SERVER_URL\n", + "print(\"✓ Cleared MCP_SERVER_URL from env/state.\")\n" + ] + }, { "cell_type": "markdown", "id": "a3ebed1f", "metadata": {}, "source": [ - "## 7. More Resources\n", + "## 5. More Resources\n", "\n", "For more resources on developing AI Agents with Llama Stack, see:\n", "\n", diff --git a/docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb b/docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb deleted file mode 100644 index 71d2be0..0000000 --- a/docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb +++ /dev/null @@ -1,173 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Llama Stack Quick Start — MCP Option (Optional)\n", - "\n", - "This notebook contains **Option B: MCP tool** only. Use it when the Llama Stack MCP adapter is ready. The main quickstart uses client-side tools only.\n", - "\n", - "**Prerequisites:** Same as the main quickstart (Section 1–2: install deps, import libs, define `get_weather` is not needed here). Run the **MCP server** below, then **connect and create the agent** with MCP tools. MCP tools are **invoked by the Llama Stack Server (llama-server)**; the MCP server URL must be reachable from where the server runs." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Option B: MCP tool\n", - "\n", - "Run an MCP server that exposes a weather query tool (same capability as the client-side `get_weather`, via MCP). This example uses **Streamable HTTP** (single `/mcp` endpoint; SSE is deprecated). The server is registered with Llama Stack in the next section. *Requires the Llama Stack Server to have `tool_runtime` with the `model-context-protocol` provider.*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Start the MCP server in a separate process\n", - "import os\n", - "from multiprocessing import Process\n", - "\n", - "def _run_mcp_weather_server():\n", - " import logging\n", - " logging.basicConfig(level=logging.DEBUG, format='%(name)s %(levelname)s: %(message)s')\n", - " logging.getLogger(\"mcp\").setLevel(logging.DEBUG)\n", - " from urllib.parse import quote\n", - " import requests\n", - " from mcp.server.fastmcp import FastMCP\n", - " mcp = FastMCP(\"demo-weather\", host=\"0.0.0.0\", port=8002)\n", - " @mcp.tool()\n", - " def get_weather_mcp(city: str) -> str:\n", - " \"\"\"Get current weather information for a specified city.\n", - "\n", - " Uses the wttr.in free weather API to fetch weather data.\n", - "\n", - " :param city: City name, e.g., Beijing, Shanghai, Paris\n", - " :returns: Dictionary containing weather information including city, temperature and humidity\n", - " \"\"\"\n", - " try:\n", - " encoded_city = quote(city)\n", - " url = f\"https://wttr.in/{encoded_city}?format=j1\"\n", - " r = requests.get(url, timeout=10)\n", - " r.raise_for_status()\n", - " data = r.json()\n", - " cur = data[\"current_condition\"][0]\n", - " return f\"City: {city}, Temperature: {cur['temp_C']}°C, Humidity: {cur['humidity']}%\"\n", - " except Exception as e:\n", - " return f\"Error: {e}\"\n", - " # streamable-http: single endpoint; use transport=\"sse\" and /sse if server only supports legacy SSE\n", - " mcp.run(transport=\"streamable-http\")\n", - "\n", - "mcp_process = Process(target=_run_mcp_weather_server, daemon=True)\n", - "mcp_process.start()\n", - "import socket\n", - "# Prefer env so Llama Stack Server can reach this URL\n", - "MCP_SERVER_URL = os.getenv(\"MCP_SERVER_URL\")\n", - "if not MCP_SERVER_URL:\n", - " _host = socket.gethostbyname(socket.gethostname())\n", - " if _host.startswith(\"127.\"):\n", - " _host = os.getenv(\"MCP_SERVER_HOST\", \"127.0.0.1\")\n", - " MCP_SERVER_URL = f\"http://{_host}:8002/mcp\"\n", - "os.environ[\"MCP_SERVER_URL\"] = MCP_SERVER_URL\n", - "print(f\"✓ MCP server running at {MCP_SERVER_URL} (Streamable HTTP, tool: get_weather_mcp, bind 0.0.0.0:8002)\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Connect to Server and Create Agent (MCP tools)\n", - "\n", - "Register the MCP tool group and create an agent that uses it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from llama_stack_client import LlamaStackClient, Agent\n", - "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n", - "\n", - "base_url = os.getenv('LLAMA_STACK_URL', 'http://localhost:8321')\n", - "client = LlamaStackClient(base_url=base_url)\n", - "\n", - "models = client.models.list()\n", - "llm_model = next(\n", - " (m for m in models\n", - " if m.custom_metadata and m.custom_metadata.get('model_type') == 'llm'),\n", - " None\n", - ")\n", - "if not llm_model:\n", - " raise Exception('No LLM model found')\n", - "model_id = llm_model.id\n", - "\n", - "MCP_TOOLGROUP_ID = \"mcp::demo-weather\"\n", - "mcp_server_url = os.getenv(\"MCP_SERVER_URL\", \"http://127.0.0.1:8002/mcp\")\n", - "client.toolgroups.register(\n", - " toolgroup_id=MCP_TOOLGROUP_ID,\n", - " provider_id=\"model-context-protocol\",\n", - " mcp_endpoint={\"uri\": mcp_server_url},\n", - ")\n", - "agent_tools = [{\"type\": \"mcp\", \"server_label\": MCP_TOOLGROUP_ID, \"server_url\": mcp_server_url}]\n", - "\n", - "agent = Agent(\n", - " client,\n", - " model=model_id,\n", - " instructions='You are a helpful weather assistant. When users ask about weather, use the weather tool to query and answer.',\n", - " tools=agent_tools,\n", - ")\n", - "print('Agent created with MCP weather tool')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Troubleshooting (MCP / 400 error)\n", - "\n", - "If you see **400 - messages[3]: invalid type: sequence, expected a string**: the inference backend often expects message `content` to be a string, but the server may send tool-turn content as an array. This is a message-format compatibility issue between the server and the backend, **not caused by SSE/Streamable HTTP**. You can:\n", - "- Use the main quickstart with **client-side tool** (Option A) instead, or\n", - "- Use **stdio** for MCP (configure the server's `tool_runtime` with `command`/`args` so the server spawns the MCP process; no HTTP URL needed), or\n", - "- Check your Llama Stack Server and inference backend docs for tool message format compatibility." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Stop the MCP server" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "if 'mcp_process' in globals() and mcp_process.is_alive():\n", - " mcp_process.terminate()\n", - " mcp_process.join(timeout=2)\n", - " print(\"✓ MCP server process stopped.\")\n", - "else:\n", - " print(\"MCP server process is not running or has already stopped.\")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python (llama-stack-demo)", - "language": "python", - "name": "llama-stack-demo" - }, - "language_info": { - "name": "python", - "version": "3.12.11" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -}