- Allen @AllenLeeyn
Guidely is RAG system (Retrieval-Augmented Generation) designed to answer questions in plain language while citing the internal sources of information. It features a React/Vite frontend, a FastAPI backend and a Google genai API for embedding and response generation. This tool is ideal for building an internal knowledge assistant that helps users find information quickly and accurately from documents such as policies, guides, and FAQs.
Guidely provides a modern interface powered by:
- Semantic search (via FAISS)
- Embeddings + LLM generation (via Google genai API)
- Structured storage (SQLite)
- Web UI (React/Vite)
- FastAPI backend
Admin uploads document(s)
↓
Converts files to .txt and annotates for chunking
↓
chunk, embed, index and store data in FAISS + SQLite
↓
User asks a question
↓
Embed question and retrieve top-k chunks
↓
Send retrieved chunks + question sent to the LLM
↓
Return LLM Response with references to user
Retrieval-Augmented Generation (RAG) is a technique that enhances large language models (LLMs) by combining information retrieval with text generation. Instead of answering queries based solely on pre-existing training data, a RAG system:
- Retrieves relevant documents from a specified set (e.g., internal policies, manuals, FAQs, or databases)
- Generates answers grounded in those documents
This allows LLMs to:
- Access domain-specific or recent information not included in their training data
- Reduce hallucinations, such as citing nonexistent policies or cases
- Include source citations for transparency and verification
- Avoid expensive retraining when new data is available
In short, RAG blends the generative power of LLMs with the accuracy of targeted document retrieval. RAG was first formally introduced in research in 2020 and has since become a standard approach for knowledge-grounded AI systems, including internal assistants, chatbots, and customer support tools.
Guidely enables users to interact with company knowledge in a simple, intuitive way. Only Admins can upload documents and index them for semantic search, while all users can ask questions and receive concise answers with source references. To use Guidely, a user must first log in or sign up for an account.
Guidely supports two account types with different permissions Admin and User
Admins have full control over the document pipeline, including:
- Upload documents in multiple formats:
.txt,.md,.pdf,.docx,.html - Review and edit automatically generated annotated text files
- Update or re-index files at any time after edits
- Trigger indexing, which performs chunking, embedding, and vector storage
All users, including Admins, can:
- Ask questions through the chat interface
- View answers generated by the RAG system
- See source references for each answer (file name, heading, or snippet)
- Access conversation history to review past questions and answers
Guidely’s design ensures that Admins maintain the knowledge base, while all users can leverage it to quickly find accurate information.
Guidely currently supports the following document types for upload and indexing:
- Plain Text (
.txt) - Markdown (
.md) - PDF (
.pdf) - Microsoft Word (
.docx) - HTML (
.html)
The uploaded documents are converted to raw text with annotation to mark headings and pages for better chunking and retrieval. However, depending on how the document is formatted, headings and other information may not be perfectly extracted. Admins can edit annotated text files before indexing to ensure accuracy.
Guidely is designed to handle small to medium-sized document collections (not more than a few hundred files) and runs efficiently in a local or single-machine environment. The use of React, FastAPI and SQLite was chosen for their speed and ease of development.
- Frontend: React/ Vite
- Backend: FastAPI
- Vector Database: FAISS
- Embeddings & LLM: Google genai API
- Relational Database: SQLite
FAISS was chosen for its:
- Excellent performance in local environments
- Simple Python integration
- Mature support for vector similarity search
Guidely currently uses in-memory FAISS indexes for simplicity. This provides fast lookups but has limitations:
- Memory usage scales with the number and size of embeddings
- In-memory design is not suitable for very large datasets
Persistent indexes with Approximate Nearest Neighbor (ANN) are strategies to improve scalability and search speeds, but these are intentionally out of scope for this version.
- Weaviate: heavier local deployment, additional tooling required
- Pinecone: excellent managed service but cost was a deciding factor
FAISS offered the best balance of simplicity, speed, and cost for a local-first RAG system.
The project chose Google GenAI for the Large Language Model (LLM) component primarily due to cost-effectiveness and ease of use for development. The specific reasons provided are:
- Generous Free Tier: This is crucial for continuous local development and testing, allowing developers to iterate on the RAG pipeline without immediately incurring costs.
- Predictable and Cost-Effective Pricing: This ensures that as the application scales or sees heavy usage, the operational costs remain transparent and manageable.
- Straightforward Python SDK Integration: A simple and well-documented Software Development Kit (SDK) reduces development time and complexity.
In contrast, OpenAI was ruled out early in the project because of the lack of a free tier and higher anticipated API costs, making it less practical for a local-first development environment.
LLM and embedding API calls are encapsulated within the FaissManager service.
This means:
- Replacing Google GenAI with another provider requires changes in only one module
- The rest of the system (controllers, chunking, DB, UI) remains unaffected
The architecture intentionally supports easy swapping of embedding/LLM providers or vector databases in the future.
Below is a simple diagram illustrating the main logic layers and their separation of concerns:
┌──────────────┐
│ Frontend │
│ (React/Vite)│
└──────┬───────┘
│ HTTP/JSON
▼
┌──────────────────┐
│ FastAPI │
│ (Controllers) │
└──────┬───────────┘
│ calls
▼
┌──────────────────┐
│ Service Layer │
│ (RAG Pipeline) │
│ - Text extraction│
│ - Chunking │
│ - Embeddings │
│ - FAISS search │
│ - LLM synthesis │
└──────┬───────────┘
│ DB reads/writes
▼
┌───────────────────┐
│ Model Layer │
│ (SQLite + FAISS) │
└───────────────────┘
Below is the high-level project structure for Guidely:
guidely/
├── backend/
│ ├── main.py # FastAPI app entrypoint
│ ├── requirements.txt # dependencies for python backend
│ ├── guidely_api.db # Relation DB
│ │
│ ├── routes/ # FastAPI controllers (HTTP endpoints)
│ │ ├── ask.py # Chat + RAG queries
│ │ │ • POST /api/ask/new – create new conversation
│ │ │ • POST /api/ask/{id} – send message in existing conversation
│ │ │ • GET /api/ask/{id} – fetch conversation history
│ │ │
│ │ ├── auth.py # Authentication layer
│ │ │ • POST /api/auth/login
│ │ │ • POST /api/auth/logout
│ │ │ • POST /api/auth/signup
│ │ │ • GET /api/auth/check
│ │ │
│ │ ├── files.py # File ingestion + indexing
│ │ │ • POST /api/file/upload – upload new file
│ │ │ • POST /api/file/index/{file_id} – index uploaded/ edited file
│ │ │ • GET /api/file/edit/{file_id} – get annotated content for editing
│ │ │ • POST /api/file/edit/{file_id} – save edited annotated content
│ │ │
│ │ ├── profile.py
│ │ │ • GET /api/profile – get user profile
│ │ │
│ │ └── utils.py
│ │
│ ├── models/
│ │ ├── chunk.py
│ │ ├── conversation.py
│ │ ├── db.py # Initilizes the other models and hold their intances
│ │ ├── file.py
│ │ ├── question.py
│ │ ├── session.py
│ │ ├── user.py
│ │ ├── utils.py
│ │ └── guidely_api.sql # DB schema
│ │
│ ├── services/
│ │ └── annotation.db # File text extraction (PDF, DOCX, HTML, etc.)
│ │ └── chunking.db # Splits text into chunks for embeddings
│ │ └── faiss.py # FAISS index + Google GenAI embeddings + calls
│ │
│ ├── uploaded_files/ # Raw uploaded files
│ └── data/ # annotated txt files
│
├── frontend/
│ ├── dist/ # Production build (served by FastAPI)
│ ├── public/
│ └── src/
│ ├── pages/ # Main screens
│ │ ├── chat.jsx
│ │ ├── EditFile.jsx
│ │ ├── Login.jsx
│ │ ├── Profile.jsx
│ │ └── Signup.jsx
│ │
│ ├── services/api.js # API wrapper for backend
│ │
│ ├── index.css # global styles
│ └── App.jsx # App routing + layout
│
├── Makefile # Setup, run, and clean commands
└── README.md
Guidely includes a Makefile to simplify installation, environment setup, development, and running the full-stack application.
Instead of manually creating virtual environments, installing npm packages, or running long commands, developers can use short Makefile tasks to streamline the workflow.
To use the Makefile, ensure the following tools are installed on your system:
- Python 3.9+
- npm / Node.js
- make (Linux/macOS; optional on Windows)
- Windows users: Make can be run via WSL, Git Bash, or the GNU Make port for Windows.
The backend dependencies are managed by requirements.txt, and the frontend libraries are listed in package.json.
The Makefile includes the following commands:
make install: Installs backend Python dependencies and frontend npm packages.make run: Builds the frontend and starts the FastAPI with the bundled UImake clean: Removes the backend virtual environenment and frontend node_modules.make help: Displays all available commands.
To use Guidely, you need a Google genai API key.
The genai.Client() automatically picks up the API key from the environment variable GEMINI_API_KEY.
Below are instructions to set this variable on different operating systems.
Temporarily (for the current terminal session):
export GEMINI_API_KEY="your_api_key_here"
Permanently (every terminal session):
- Open ~/.bashrc or ~/.zshrc in a text editor.
- Add:
export GEMINI_API_KEY="your_api_key_here"
- eload:
source ~/.bashrc
or
source ~/.zshrc
Temporarily (current session):
$env:GEMINI_API_KEY="your_api_key_here"
Permanently (every session):
setx GEMINI_API_KEY "your_api_key_here"
After setx, restart the terminal to pick up the variable.
Guidely provides a simple development workflow using:
make run
Building frontend...
cd frontend && npm run build
> frontend@0.0.0 build
> vite build
vite v7.2.2 building client environment for production...
✓ 48 modules transformed.
dist/index.html 0.45 kB │ gzip: 0.29 kB
dist/assets/index-CiLkVGEj.css 7.15 kB │ gzip: 1.85 kB
dist/assets/index-B010lV47.js 279.08 kB │ gzip: 88.19 kB
✓ built in 530ms
Starting full stack app...
cd backend && venv/bin/uvicorn main:app --reload
INFO: Will watch for changes in these directories: [...]
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [...]
Admin user already exists: admin
Initialized new FAISS index for source: 'README' (Vectors: 5)
INFO: Waiting for application startup.
INFO: Application startup complete.This means:
- The frontend successfully built.
- FastAPI is running with auto-reload enabled.
- A default admin user was created (if it didn’t already exist).
- FAISS initialized properly.
After startup, open:
- http://127.0.0.1:8000 (or whatever host/port uvicorn prints)
- You will be greeted by a login page with a link to signup
Guidely automatically creates an admin user on first startup (in main.py):
username: admin
password: admin123
This behavior is purely for local development convenience. Developers are encouraged to:
- Change these credentials
- Or remove the auto-creation logic if deploying to production
- Dataset Size: The system is optimized for small to medium document sets (e.g., hundreds of files). Performance may degrade with very large, multi-terabyte datasets.
- Memory Dependency: The current in-memory FAISS index strictly limits the maximum dataset size by the available RAM on the host machine. Scaling for very large datasets requires persistent indexing and dynamic loading of index partitions, typically using disk-based ANN structures (like Faiss on disk, HNSW stored externally, or specialized vector databases) to manage data that exceeds memory capacity.
- Input Quality: Basic text extraction may not perfectly capture complex visual elements (e.g., tables, figures) or logical document structure (e.g., deeply nested headings), potentially leading to suboptimal chunking.
- Media Support: There is no processing for non-textual content such as images, tables, or video files.
- Document Update Overhead: While document edits are fully handled by deleting, re-chunking, and re-indexing the old version, this process is computationally intensive and introduces brief unavailability for the specific document during the index rebuild.
- Cache Staleness Risk: Although document version tracking is implemented, the system currently relies on simple question-hash matching for caching. When a document is re-indexed, old, cached answers may become factually stale or incorrect if the underlying source data has changed. To ensure accuracy, the cache for the re-indexed document must be manually invalidated or flushed, as there is currently no automatic version validation performed when retrieving a cached answer.
- API Performance and Cost: The system is limited by the Google GenAI API's capabilities, bandwidth, and latency.
- Cost/Quality Trade-off: The use of Gemini 2.5 Flash-Lite ensures low-cost scaling and ultra-low latency, but this prioritization comes with a reduced peak performance on highly complex reasoning or multi-step synthesis tasks compared to the larger Flash or Pro models.
- Latent Reliability Issues: The system is subject to external API performance fluctuations (e.g., rate limits, server traffic). Flash-Lite, being highly compressed, may occasionally produce less reliable or truncated output under peak load or with high complexity.