A minimal, production-structured single-agent customer support app using FastAPI + local Ollama (llama3).
- Accepts natural-language support questions through a chat UI or API.
- Uses agent logic to decide when to call tools.
- Pulls structured order data from a mock database.
- Returns natural-language responses with order status, shipping updates, and order totals.
- Backend: FastAPI (Python)
- AI Runtime: Ollama (
llama3, local) - Tooling: Python function tools (
get_order_status,get_shipping_updates,get_order_total) - Memory: Lightweight in-process conversation memory (last 5 messages)
- Frontend: Static HTML, CSS, and JavaScript
POST /chatAPI for user support queries- Single-agent orchestration with local Ollama model (
llama3by default) - Three tools:
get_order_status(order_id)returnsstatus,location, andetaget_shipping_updates(order_id)returns shipping and delay updatesget_order_total(order_id)returns total order cost (for example$89.98)
- Fallback routing when tool-calling is not used
- Automatic retry without tool payload if current
llama3tag does not support tools - In-memory mock order database for a single customer account
- Split frontend pages:
- Chat page at
GET / - Orders page at
GET /orders
- Chat page at
- Ollama health endpoint and header status chip (
online,model_missing,offline) - Chat loading indicator (
Model is thinking...)
- Python 3.10+
- Ollama installed and running locally
llama3pulled locally
ollama pull llama3
ollama servepython -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txtThe app reads environment variables from system env and local .env files.
OLLAMA_BASE_URLdefault:http://localhost:11434OLLAMA_MODELdefault:llama3MAX_MEMORY_MESSAGESdefault:5
Use .env.example as your reference template.
uvicorn app.main:app --reload- API docs:
http://127.0.0.1:8000/docs - Chat UI:
http://127.0.0.1:8000/ - Orders UI:
http://127.0.0.1:8000/orders - Orders API:
GET /api/orders - Ollama Status API:
GET /api/status/ollama
Request:
POST /chat
Content-Type: application/json
{
"message": "Where is my order 123?"
}Response:
{
"response": "Your order 123 is currently in transit in Kansas City, MO and is expected to arrive tomorrow by 8 PM."
}{
"connected": true,
"model": "llama3",
"model_available": true,
"status": "ok",
"detail": "Connected to Ollama and model 'llama3' is available."
}- User message is added to short-term memory.
- Agent sends system prompt + recent messages to Ollama with tool schemas.
- If Ollama reports tool support is unavailable, the agent retries without tools.
- If tools are requested, the agent executes them and sends results back to the model.
- If no tool call is made, fallback routing handles order, shipping, and pricing intents.
- Final assistant response is returned and stored in memory.
/project
/app
__init__.py
main.py
agent.py
mock_data.py
tools.py
memory.py
config.py
/static
index.html
orders.html
chat.js
orders.js
status.js
styles.css
.env.example
requirements.txt
README.md

