Status: Deprecated/Archived
The Instructure Community site changed its structure, breaking many cited source links and the HTML selectors used by the extractor. CPAL is preserved for reference but is no longer maintained, and answers may reference stale or invalid URLs.
CPAL (Canvas Personal Assistant for Learning) was a Retrieval-Augmented Generation (RAG) chatbot designed to answer Canvas LMS questions with concise, cited responses. It indexed Canvas Community guides and solved forum threads, retrieved relevant passages, and synthesized answers with links to original sources.
Not affiliated with Instructure.
- RAG over Canvas Community docs and solved forum threads
- Query rewriting to improve recall before retrieval
- Answer synthesis with source citations
- CAPTCHA-protected query endpoint; Q/A event logging
- Full-stack: FastAPI backend, Vite/React frontend, single-container deployment
- Backend (FastAPI): serves API and static frontend
- App:
backend/main.py, routes:backend/web/api.py - LLM (Gemini 2.5 Flash):
backend/service/llm.py - Embeddings (MiniLM):
backend/service/embedding.py - Vector DB (Pinecone):
backend/service/vectordb.py - Event logging (SQLModel → Postgres):
backend/service/events.py,backend/model/event.py - reCAPTCHA verification:
backend/service/captcha.py
- App:
- Frontend (Vite/React):
frontend/ - Containerization/Deploy:
Dockerfile,fly.toml
Example flow:
- User question → FastAPI
/api/query - Query rewritten (LLM) to expand recall
- Embedding generated → similarity search (Pinecone)
- Top matches filtered; forum question chunks replaced with corresponding answer chunks
- Prompt assembled → LLM generates markdown answer with sources
- Q/A logged to Postgres; response returned
Ran manually once to populate the vector index:
- Extract HTML → Markdown + metadata (Go)
- Entrypoint:
pipeline/cmd/extract/main.go - Logic:
pipeline/internal/extract/extract.go - Config:
pipeline/config.json - Output:
tmp/raw/{id}.mdandtmp/raw/{id}.json
- Entrypoint:
- Clean and chunk Markdown → plain text chunks
pipeline/python/prepare_data.py→tmp/chunks/
- Generate embeddings for chunks (MiniLM)
pipeline/python/generate_embeddings.py→tmp/embeddings/
- Upsert vectors + metadata to Pinecone
pipeline/python/store_embeddings.py
Note: Extractor assumed the legacy Instructure Community DOM; it no longer matches the current site.
- Hosting: Fly.io (single container serving API + built frontend)
- Database: Supabase (Postgres) via
DATABASE_URLfor Q/A events - Vector Store: Pinecone via
VECTOR_DB_API_KEY/VECTOR_DB_INDEX_NAME - LLM: Google Generative AI (Gemini) via
LLM_API_KEY - CAPTCHA: Google reCAPTCHA via
CAPTCHA_SITE_KEY/CAPTCHA_SITE_SECRET
GET /api/livez→ healthGET /api/config→{ "captcha": <site_key> }POST /api/query- Body:
{ "query": string, "captcha_token": string } - Response:
{ "answer": markdown, "sources": [{ "url", "title", "score" }] } - Notes: top‑k=5 similarity; score filtering; forum Q→A chunk replacement; Q/A logged
- Body:
- Set required env vars:
HOST,LLM_API_KEY,VECTOR_DB_API_KEY,VECTOR_DB_INDEX_NAME,DATABASE_URL,CAPTCHA_SITE_KEY,CAPTCHA_SITE_SECRET. - Standard FastAPI + Vite flow (backend on 8080, frontend dev server as origin). See
Dockerfilefor container build.
Example .env (placeholders only):
HOST=http://localhost:5173
LLM_API_KEY=...
VECTOR_DB_API_KEY=...
VECTOR_DB_INDEX_NAME=cpal-index
DATABASE_URL=postgresql+psycopg2://user:pass@host:5432/db
CAPTCHA_SITE_KEY=...
CAPTCHA_SITE_SECRET=...
- Source URLs and selectors are tied to the legacy Community site and may be invalid.
- Project is not actively maintained; content freshness and security are not guaranteed.
backend/ FastAPI app, services, models
frontend/ Vite/React UI
pipeline/ Go extractor + Python preparation/embedding/store steps
doc/cpal-demo.png Screenshot
Dockerfile Multi-stage build serving static + API
fly.toml Fly.io deployment config
AGPL-3.0 — see LICENSE.
Canvas Community content; FastAPI; React/Vite; Sentence-Transformers;
