Guidely

Allen @AllenLeeyn

Overview

Guidely is RAG system (Retrieval-Augmented Generation) designed to answer questions in plain language while citing the internal sources of information. It features a React/Vite frontend, a FastAPI backend and a Google genai API for embedding and response generation. This tool is ideal for building an internal knowledge assistant that helps users find information quickly and accurately from documents such as policies, guides, and FAQs.

Guidely provides a modern interface powered by:

Semantic search (via FAISS)
Embeddings + LLM generation (via Google genai API)
Structured storage (SQLite)
Web UI (React/Vite)
FastAPI backend

Program Flow

Admin uploads document(s)
      ↓
Converts files to .txt and annotates for chunking
      ↓
chunk, embed, index and store data in FAISS + SQLite
      ↓
User asks a question
      ↓
Embed question and retrieve top-k chunks
      ↓
Send retrieved chunks + question sent to the LLM
      ↓
Return LLM Response with references to user

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a technique that enhances large language models (LLMs) by combining information retrieval with text generation. Instead of answering queries based solely on pre-existing training data, a RAG system:

Retrieves relevant documents from a specified set (e.g., internal policies, manuals, FAQs, or databases)
Generates answers grounded in those documents

This allows LLMs to:

Access domain-specific or recent information not included in their training data
Reduce hallucinations, such as citing nonexistent policies or cases
Include source citations for transparency and verification
Avoid expensive retraining when new data is available

In short, RAG blends the generative power of LLMs with the accuracy of targeted document retrieval. RAG was first formally introduced in research in 2020 and has since become a standard approach for knowledge-grounded AI systems, including internal assistants, chatbots, and customer support tools.

Features

Guidely enables users to interact with company knowledge in a simple, intuitive way. Only Admins can upload documents and index them for semantic search, while all users can ask questions and receive concise answers with source references. To use Guidely, a user must first log in or sign up for an account.

Account Types

Guidely supports two account types with different permissions Admin and User

Admin Features

Admins have full control over the document pipeline, including:

Upload documents in multiple formats: .txt, .md, .pdf, .docx, .html
Review and edit automatically generated annotated text files
Update or re-index files at any time after edits
Trigger indexing, which performs chunking, embedding, and vector storage

User Features

All users, including Admins, can:

Ask questions through the chat interface
View answers generated by the RAG system
See source references for each answer (file name, heading, or snippet)
Access conversation history to review past questions and answers

Guidely’s design ensures that Admins maintain the knowledge base, while all users can leverage it to quickly find accurate information.

Supported Document Types

Guidely currently supports the following document types for upload and indexing:

Plain Text (.txt)
Markdown (.md)
PDF (.pdf)
Microsoft Word (.docx)
HTML (.html)

The uploaded documents are converted to raw text with annotation to mark headings and pages for better chunking and retrieval. However, depending on how the document is formatted, headings and other information may not be perfectly extracted. Admins can edit annotated text files before indexing to ensure accuracy.

Technical Details

Guidely is designed to handle small to medium-sized document collections (not more than a few hundred files) and runs efficiently in a local or single-machine environment. The use of React, FastAPI and SQLite was chosen for their speed and ease of development.

Tech Stack

Frontend: React/ Vite
Backend: FastAPI
Vector Database: FAISS
Embeddings & LLM: Google genai API
Relational Database: SQLite

Vector Database (FAISS)

FAISS was chosen for its:

Excellent performance in local environments
Simple Python integration
Mature support for vector similarity search

Guidely currently uses in-memory FAISS indexes for simplicity. This provides fast lookups but has limitations:

Memory usage scales with the number and size of embeddings
In-memory design is not suitable for very large datasets

Persistent indexes with Approximate Nearest Neighbor (ANN) are strategies to improve scalability and search speeds, but these are intentionally out of scope for this version.

Why not Weaviate or Pinecone?

Weaviate: heavier local deployment, additional tooling required
Pinecone: excellent managed service but cost was a deciding factor

FAISS offered the best balance of simplicity, speed, and cost for a local-first RAG system.

Choice of LLM

The project chose Google GenAI for the Large Language Model (LLM) component primarily due to cost-effectiveness and ease of use for development. The specific reasons provided are:

Generous Free Tier: This is crucial for continuous local development and testing, allowing developers to iterate on the RAG pipeline without immediately incurring costs.
Predictable and Cost-Effective Pricing: This ensures that as the application scales or sees heavy usage, the operational costs remain transparent and manageable.
Straightforward Python SDK Integration: A simple and well-documented Software Development Kit (SDK) reduces development time and complexity.

In contrast, OpenAI was ruled out early in the project because of the lack of a free tier and higher anticipated API costs, making it less practical for a local-first development environment.

Implementation detail:

LLM and embedding API calls are encapsulated within the FaissManager service. This means:

Replacing Google GenAI with another provider requires changes in only one module
The rest of the system (controllers, chunking, DB, UI) remains unaffected

The architecture intentionally supports easy swapping of embedding/LLM providers or vector databases in the future.

Logic Layers

Below is a simple diagram illustrating the main logic layers and their separation of concerns:

            ┌──────────────┐
            │   Frontend   │
            │  (React/Vite)│
            └──────┬───────┘
                   │ HTTP/JSON
                   ▼
           ┌──────────────────┐
           │    FastAPI       │
           │  (Controllers)   │
           └──────┬───────────┘
                  │ calls
                  ▼
           ┌──────────────────┐
           │   Service Layer  │
           │ (RAG Pipeline)   │
           │ - Text extraction│
           │ - Chunking       │
           │ - Embeddings     │
           │ - FAISS search   │
           │ - LLM synthesis  │
           └──────┬───────────┘
                  │ DB reads/writes
                  ▼
           ┌───────────────────┐
           │   Model Layer     │
           │  (SQLite + FAISS) │
           └───────────────────┘

Project structure

Below is the high-level project structure for Guidely:

guidely/
├── backend/
│   ├── main.py                                 # FastAPI app entrypoint
│   ├── requirements.txt                        # dependencies for python backend
│   ├── guidely_api.db                          # Relation DB
│   │
│   ├── routes/                                 # FastAPI controllers (HTTP endpoints)
│   │   ├── ask.py                              # Chat + RAG queries
│   │   │   • POST /api/ask/new                 – create new conversation
│   │   │   • POST /api/ask/{id}                – send message in existing conversation
│   │   │   • GET  /api/ask/{id}                – fetch conversation history
│   │   │
│   │   ├── auth.py                             # Authentication layer
│   │   │   • POST /api/auth/login
│   │   │   • POST /api/auth/logout
│   │   │   • POST /api/auth/signup
│   │   │   • GET /api/auth/check
│   │   │
│   │   ├── files.py                            # File ingestion + indexing
│   │   │   • POST /api/file/upload             – upload new file
│   │   │   • POST /api/file/index/{file_id}    – index uploaded/ edited file
│   │   │   • GET /api/file/edit/{file_id}      – get annotated content for editing
│   │   │   • POST /api/file/edit/{file_id}     – save edited annotated content
│   │   │
│   │   ├── profile.py
│   │   │   • GET /api/profile                  – get user profile
│   │   │
│   │   └── utils.py
│   │
│   ├── models/
│   │   ├── chunk.py
│   │   ├── conversation.py
│   │   ├── db.py             # Initilizes the other models and hold their intances
│   │   ├── file.py
│   │   ├── question.py
│   │   ├── session.py
│   │   ├── user.py
│   │   ├── utils.py
│   │   └── guidely_api.sql   # DB schema
│   │
│   ├── services/
│   │   └── annotation.db   # File text extraction (PDF, DOCX, HTML, etc.)
│   │   └── chunking.db     # Splits text into chunks for embeddings
│   │   └── faiss.py        # FAISS index + Google GenAI embeddings + calls
│   │
│   ├── uploaded_files/     # Raw uploaded files
│   └── data/               # annotated txt files
│
├── frontend/
│   ├── dist/               # Production build (served by FastAPI)
│   ├── public/
│   └── src/
│       ├── pages/          # Main screens
│       │   ├── chat.jsx
│       │   ├── EditFile.jsx
│       │   ├── Login.jsx
│       │   ├── Profile.jsx
│       │   └── Signup.jsx
│       │
│       ├── services/api.js # API wrapper for backend
│       │
│       ├── index.css       # global styles
│       └── App.jsx         # App routing + layout
│
├── Makefile                # Setup, run, and clean commands
└── README.md

Setup and Installation

Guidely includes a Makefile to simplify installation, environment setup, development, and running the full-stack application. Instead of manually creating virtual environments, installing npm packages, or running long commands, developers can use short Makefile tasks to streamline the workflow.

Prerequisites

To use the Makefile, ensure the following tools are installed on your system:

Python 3.9+
npm / Node.js
make (Linux/macOS; optional on Windows)
Windows users: Make can be run via WSL, Git Bash, or the GNU Make port for Windows.

The backend dependencies are managed by requirements.txt, and the frontend libraries are listed in package.json.

Makefile Commands

The Makefile includes the following commands:

make install: Installs backend Python dependencies and frontend npm packages.
make run: Builds the frontend and starts the FastAPI with the bundled UI
make clean: Removes the backend virtual environenment and frontend node_modules.
make help: Displays all available commands.

Google GenAI API Key

To use Guidely, you need a Google genai API key. The genai.Client() automatically picks up the API key from the environment variable GEMINI_API_KEY. Below are instructions to set this variable on different operating systems.

macOS / Linux (bash, zsh)

Temporarily (for the current terminal session):

export GEMINI_API_KEY="your_api_key_here"

Permanently (every terminal session):

Open ~/.bashrc or ~/.zshrc in a text editor.
Add:

export GEMINI_API_KEY="your_api_key_here"

eload:

source ~/.bashrc
 or
source ~/.zshrc

Windows (PowerShell)

Temporarily (current session):

$env:GEMINI_API_KEY="your_api_key_here"

Permanently (every session):

setx GEMINI_API_KEY "your_api_key_here"

After setx, restart the terminal to pick up the variable.

Usage

Guidely provides a simple development workflow using:

make run
Building frontend...
cd frontend && npm run build

> frontend@0.0.0 build
> vite build

vite v7.2.2 building client environment for production...
✓ 48 modules transformed.
dist/index.html                   0.45 kB │ gzip:  0.29 kB
dist/assets/index-CiLkVGEj.css    7.15 kB │ gzip:  1.85 kB
dist/assets/index-B010lV47.js   279.08 kB │ gzip: 88.19 kB
✓ built in 530ms

Starting full stack app...
cd backend && venv/bin/uvicorn main:app --reload
INFO:     Will watch for changes in these directories: [...]
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [...]
Admin user already exists: admin
Initialized new FAISS index for source: 'README' (Vectors: 5)
INFO:     Waiting for application startup.
INFO:     Application startup complete.

This means:

The frontend successfully built.
FastAPI is running with auto-reload enabled.
A default admin user was created (if it didn’t already exist).
FAISS initialized properly.

After startup, open:

http://127.0.0.1:8000 (or whatever host/port uvicorn prints)
You will be greeted by a login page with a link to signup

Default Admin Account

Guidely automatically creates an admin user on first startup (in main.py):

username: admin
password: admin123

This behavior is purely for local development convenience. Developers are encouraged to:

Change these credentials
Or remove the auto-creation logic if deploying to production

Limitations

Data and Indexing Constraints

Dataset Size: The system is optimized for small to medium document sets (e.g., hundreds of files). Performance may degrade with very large, multi-terabyte datasets.
Memory Dependency: The current in-memory FAISS index strictly limits the maximum dataset size by the available RAM on the host machine. Scaling for very large datasets requires persistent indexing and dynamic loading of index partitions, typically using disk-based ANN structures (like Faiss on disk, HNSW stored externally, or specialized vector databases) to manage data that exceeds memory capacity.
Input Quality: Basic text extraction may not perfectly capture complex visual elements (e.g., tables, figures) or logical document structure (e.g., deeply nested headings), potentially leading to suboptimal chunking.
Media Support: There is no processing for non-textual content such as images, tables, or video files.

Update and Maintenance Overhead

Document Update Overhead: While document edits are fully handled by deleting, re-chunking, and re-indexing the old version, this process is computationally intensive and introduces brief unavailability for the specific document during the index rebuild.
Cache Staleness Risk: Although document version tracking is implemented, the system currently relies on simple question-hash matching for caching. When a document is re-indexed, old, cached answers may become factually stale or incorrect if the underlying source data has changed. To ensure accuracy, the cache for the re-indexed document must be manually invalidated or flushed, as there is currently no automatic version validation performed when retrieving a cached answer.

Generative AI Dependencies

API Performance and Cost: The system is limited by the Google GenAI API's capabilities, bandwidth, and latency.
Cost/Quality Trade-off: The use of Gemini 2.5 Flash-Lite ensures low-cost scaling and ultra-low latency, but this prioritization comes with a reduced peak performance on highly complex reasoning or multi-step synthesis tasks compared to the larger Flash or Pro models.
Latent Reliability Issues: The system is subject to external API performance fluctuations (e.g., rate limits, server traffic). Flash-Lite, being highly compressed, may occasionally produce less reliable or truncated output under peak load or with high complexity.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Guidely.md		Guidely.md
Makefile		Makefile
README.md		README.md
README_testing_metris.md		README_testing_metris.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Guidely

Overview

Program Flow

What is Retrieval-Augmented Generation (RAG)?

Features

Account Types

Admin Features

User Features

Supported Document Types

Technical Details

Tech Stack

Vector Database (FAISS)

Why not Weaviate or Pinecone?

Choice of LLM

Implementation detail:

Logic Layers

Project structure

Setup and Installation

Prerequisites

Makefile Commands

Google GenAI API Key

macOS / Linux (bash, zsh)

Windows (PowerShell)

Usage

Default Admin Account

Limitations

Data and Indexing Constraints

Update and Maintenance Overhead

Generative AI Dependencies

About

Uh oh!

Releases

Packages

Languages

AllenLeeyn/guidely

Folders and files

Latest commit

History

Repository files navigation

Guidely

Overview

Program Flow

What is Retrieval-Augmented Generation (RAG)?

Features

Account Types

Admin Features

User Features

Supported Document Types

Technical Details

Tech Stack

Vector Database (FAISS)

Why not Weaviate or Pinecone?

Choice of LLM

Implementation detail:

Logic Layers

Project structure

Setup and Installation

Prerequisites

Makefile Commands

Google GenAI API Key

macOS / Linux (bash, zsh)

Windows (PowerShell)

Usage

Default Admin Account

Limitations

Data and Indexing Constraints

Update and Maintenance Overhead

Generative AI Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages