CPAL — Canvas Personal Assistant for Learning

Status: Deprecated/Archived

The Instructure Community site changed its structure, breaking many cited source links and the HTML selectors used by the extractor. CPAL is preserved for reference but is no longer maintained, and answers may reference stale or invalid URLs.

Overview

CPAL (Canvas Personal Assistant for Learning) was a Retrieval-Augmented Generation (RAG) chatbot designed to answer Canvas LMS questions with concise, cited responses. It indexed Canvas Community guides and solved forum threads, retrieved relevant passages, and synthesized answers with links to original sources.

Not affiliated with Instructure.

Key Capabilities

RAG over Canvas Community docs and solved forum threads
Query rewriting to improve recall before retrieval
Answer synthesis with source citations
CAPTCHA-protected query endpoint; Q/A event logging
Full-stack: FastAPI backend, Vite/React frontend, single-container deployment

System Architecture

Backend (FastAPI): serves API and static frontend
- App: backend/main.py, routes: backend/web/api.py
- LLM (Gemini 2.5 Flash): backend/service/llm.py
- Embeddings (MiniLM): backend/service/embedding.py
- Vector DB (Pinecone): backend/service/vectordb.py
- Event logging (SQLModel → Postgres): backend/service/events.py, backend/model/event.py
- reCAPTCHA verification: backend/service/captcha.py
Frontend (Vite/React): frontend/
Containerization/Deploy: Dockerfile, fly.toml

Example flow:

User question → FastAPI /api/query
Query rewritten (LLM) to expand recall
Embedding generated → similarity search (Pinecone)
Top matches filtered; forum question chunks replaced with corresponding answer chunks
Prompt assembled → LLM generates markdown answer with sources
Q/A logged to Postgres; response returned

Data Pipeline (one-time bootstrap)

Ran manually once to populate the vector index:

Extract HTML → Markdown + metadata (Go)
- Entrypoint: pipeline/cmd/extract/main.go
- Logic: pipeline/internal/extract/extract.go
- Config: pipeline/config.json
- Output: tmp/raw/{id}.md and tmp/raw/{id}.json
Clean and chunk Markdown → plain text chunks
- pipeline/python/prepare_data.py → tmp/chunks/
Generate embeddings for chunks (MiniLM)
- pipeline/python/generate_embeddings.py → tmp/embeddings/
Upsert vectors + metadata to Pinecone
- pipeline/python/store_embeddings.py

Note: Extractor assumed the legacy Instructure Community DOM; it no longer matches the current site.

Infrastructure

Hosting: Fly.io (single container serving API + built frontend)
Database: Supabase (Postgres) via DATABASE_URL for Q/A events
Vector Store: Pinecone via VECTOR_DB_API_KEY / VECTOR_DB_INDEX_NAME
LLM: Google Generative AI (Gemini) via LLM_API_KEY
CAPTCHA: Google reCAPTCHA via CAPTCHA_SITE_KEY / CAPTCHA_SITE_SECRET

API Summary

GET /api/livez → health
GET /api/config → { "captcha": <site_key> }
POST /api/query
- Body: { "query": string, "captcha_token": string }
- Response: { "answer": markdown, "sources": [{ "url", "title", "score" }] }
- Notes: top‑k=5 similarity; score filtering; forum Q→A chunk replacement; Q/A logged

Minimal Local Run (optional; not guaranteed due to deprecation)

Set required env vars: HOST, LLM_API_KEY, VECTOR_DB_API_KEY, VECTOR_DB_INDEX_NAME, DATABASE_URL, CAPTCHA_SITE_KEY, CAPTCHA_SITE_SECRET.
Standard FastAPI + Vite flow (backend on 8080, frontend dev server as origin). See Dockerfile for container build.

Example .env (placeholders only):

HOST=http://localhost:5173
LLM_API_KEY=...
VECTOR_DB_API_KEY=...
VECTOR_DB_INDEX_NAME=cpal-index
DATABASE_URL=postgresql+psycopg2://user:pass@host:5432/db
CAPTCHA_SITE_KEY=...
CAPTCHA_SITE_SECRET=...

Limitations

Source URLs and selectors are tied to the legacy Community site and may be invalid.
Project is not actively maintained; content freshness and security are not guaranteed.

Project Layout

backend/           FastAPI app, services, models
frontend/          Vite/React UI
pipeline/          Go extractor + Python preparation/embedding/store steps
doc/cpal-demo.png  Screenshot
Dockerfile         Multi-stage build serving static + API
fly.toml           Fly.io deployment config

License

AGPL-3.0 — see LICENSE.

Acknowledgements

Canvas Community content; FastAPI; React/Vite; Sentence-Transformers;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CPAL — Canvas Personal Assistant for Learning

Overview

Key Capabilities

System Architecture

Data Pipeline (one-time bootstrap)

Infrastructure

API Summary

Minimal Local Run (optional; not guaranteed due to deprecation)

Limitations

Project Layout

License

Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
backend		backend
doc		doc
frontend		frontend
pipeline		pipeline
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
fly.toml		fly.toml

Folders and files

Latest commit

History

Repository files navigation

CPAL — Canvas Personal Assistant for Learning

Overview

Key Capabilities

System Architecture

Data Pipeline (one-time bootstrap)

Infrastructure

API Summary

Minimal Local Run (optional; not guaranteed due to deprecation)

Limitations

Project Layout

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages