Ragussy and LLM Model Lab are now treated as one platform: a local-first RAG + inference stack with operations UI, document ingestion workflows, and OpenAI-compatible runtime endpoints.
- FastAPI backend for local
llama.cppcontrol, telemetry, run logging, and OpenAI-compatible APIs - React/Vite frontend with classic lab pages and
/next/*Ragussy operations console - Ragussy bridge/proxy endpoints for RAG and direct chat routing
- Multi-database document management, ingestion progress, restart/resume controls, and history
- Optional Discord bot for chat and status commands in servers
- Local deployment path for new environments (models, runs, env config, build steps)
- Inference control: discover GGUF models, start/stop/warmup
llama-server, tune sampling settings - OpenAI compatibility:
GET /v1/models,POST /v1/chat/completions,POST /v1/embeddings - Ragussy integration: provider switching (
Local llama.cpp,Ragussy RAG,Ragussy Direct) and Ragussy health checks - Ops console (
/next/*): dashboard telemetry, retrieval diagnostics, document workflows, and ingestion history - Document database profiles: create/switch/rename/delete profiles, forum mode toggle, local/public docs links, session-private access
- Ingestion resiliency: progress polling, throughput + ETA, resumable forum checkpoints, stale-run force-fresh restart
User -> Frontend (frontend/) -> Backend (backend/) -> llama.cpp + embeddings -> Ragussy (optional bridge mode)
Notes:
llama.cpphandles chat inference for GGUF models.- Embeddings use the backend embedding path (
bge-m3by default). - Ragussy can consume Model Lab through OpenAI-compatible endpoints.
backend/FastAPI app, API routers, runtime manager, metrics, OpenAI endpointsfrontend/React + Vite + TypeScript UI (Lab +/next/*operations pages)models/local GGUF model filesruns/JSONL event logs and SQLite run indexscripts/dev.shquick local dev launcherdiscord-bot/optional Discord integration servicedocker-compose.ymloptional containerized stack
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadcd frontend
npm install
npm run dev -- --host 0.0.0.0 --port 5173Frontend: http://localhost:5173
Backend: http://localhost:8000
- Qdrant for retrieval storage:
docker run -p 6333:6333 qdrant/qdrant - Ragussy service running (if using proxy/provider switch modes)
cd discord-bot
npm install
cp .env.example .env
npm run register
npm run devBy default, the bot targets Model Lab proxy mode at http://localhost:8000/api/ragussy.
Copy backend/.env.example to backend/.env.
Most important keys:
MODELS_DIR,RUNS_DIR,RUNS_DB_PATHLLAMA_SERVER_PATH,LLAMA_PORT,DEFAULT_THREADS,DEFAULT_CTX,DEFAULT_GPU_LAYERSMODEL_LAB_OPENAI_API_KEYEMBED_MODE,EMBED_MODEL,EMBED_DIMRAGUSSY_BASE_URL,RAGUSSY_API_KEY,RAGUSSY_ADMIN_URL
Example Ragussy bridge values:
RAGUSSY_BASE_URL=http://localhost:3001
RAGUSSY_API_KEY=<RAGUSSY_API_KEY>
RAGUSSY_ADMIN_URL=http://localhost:5173GET /healthGET /api/modelsPOST /api/server/startPOST /api/server/stopGET /api/server/statusGET /api/server/healthPOST /api/server/warmupPOST /api/chatGET /api/configGET /api/runsGET /api/runs/{run_id}GET /api/runs/{run_id}/export
GET /api/ragussy/healthPOST /api/ragussy/chatPOST /api/ragussy/direct
GET /v1/modelsPOST /v1/chat/completionsPOST /v1/embeddings
WS /ws/streamchannels:tokens,stats,console,events
git clone https://github.com/mojomast/ragussy.git
cd ragussy
# backend
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# frontend
cd ../frontend
npm installBefore deploy/start, verify:
llama-serverexists and is reachable viaLLAMA_SERVER_PATH(or PATH)MODELS_DIRpoints to real GGUF files- Ragussy env values are set when bridge mode is enabled
- Qdrant is available for retrieval-enabled workflows
- Frontend production build:
cd frontend && npm run build - Backend production run:
cd backend && uvicorn app.main:app --host 0.0.0.0 --port 8000 - Docker compose option:
docker compose up --build - Docker compose with Discord bot:
docker compose --profile with-discord up --build
- No models listed: check
MODELS_DIRand*.ggufavailability - Server start fails: verify model path and file permissions
- GPU telemetry unavailable: install NVIDIA drivers +
pynvml - No token stream: verify
WS /ws/streamconnectivity - Ragussy errors in UI: verify
RAGUSSY_BASE_URLandRAGUSSY_API_KEY
- Start backend + frontend.
- Confirm
GET /api/modelsreturns models. - Start/warmup a model from UI.
- Send a chat prompt and confirm streamed tokens.
- Switch provider to Ragussy and verify bridge health/chat calls.
- Upload docs, run ingestion, verify history rows and progress behavior.