Skip to content

josephsenior/llamaindex-enterprise-knowledge-base

Repository files navigation

Enterprise Knowledge Base Assistant

Python FastAPI License LlamaIndex

A production-ready intelligent knowledge management system using LlamaIndex that enables organizations to query, analyze, and extract insights from multiple data sources (documents, databases, APIs, web content) through natural language.

Key Features

  • Multi-Source Data Ingestion: Documents (PDF, Word, Markdown), databases, APIs, web content
  • Advanced Query Engines: Sub-question decomposition, SQL generation, router query engine, hybrid search
  • Hybrid Search: Vector similarity + keyword search for better retrieval
  • Query Analytics: Track query patterns, popular topics, knowledge gaps
  • Multi-Tenant Support: Organization-level data isolation
  • Real-Time Updates: Incremental indexing, document versioning

Tech Stack

  • LlamaIndex: Data indexing, query engines, RAG
  • Gemini: GPT-4 for query understanding and generation
  • FastAPI: REST API backend
  • Streamlit: Web UI for querying and management
  • ChromaDB/Pinecone: Vector storage
  • PostgreSQL: Metadata and analytics storage
  • Redis: Query caching

Quick Start

1. Install Dependencies

cd enterprise_knowledge_base
pip install -r requirements.txt

2. Set Up Environment

Create a .env file in the project root:

# Copy the example
cp .env.example .env

# Edit .env and add your API keys
GEMINI_API_KEY=your_GEMINI_API_KEY_here
GEMINI_MODEL=gemini-3-flash-preview

3. Run Backend API

In one terminal:

cd enterprise_knowledge_base
python -m uvicorn backend.api.main:app --reload --port 8000

The API will be available at http://localhost:8000

4. Run Frontend

In another terminal:

cd enterprise_knowledge_base
streamlit run frontend/app.py --server.port 8501

The UI will be available at http://localhost:8501

5. Test the System

  1. Open the Streamlit UI at http://localhost:8501
  2. Go to "Ingest Documents" tab
  3. Upload a PDF or document
  4. Go to "Query" tab
  5. Ask a question about your document

Architecture

User Query
    ↓
[Query Router] → Determines query type
    ↓
    ├─→ [Document Query Engine] → RAG over documents
    ├─→ [SQL Query Engine] → Natural language to SQL
    ├─→ [API Query Engine] → Query external APIs
    └─→ [Hybrid Query Engine] → Combines multiple sources
    ↓
[Response Synthesizer] → Combines results
    ↓
[Citation Generator] → Adds source citations
    ↓
Response + Sources

Project Structure

enterprise_knowledge_base/
├── backend/
│   ├── core/           # Core LlamaIndex setup
│   ├── engines/        # Query engines
│   ├── ingestion/      # Data ingestion
│   ├── api/            # FastAPI endpoints
│   ├── models/         # Database models
│   └── utils/          # Utilities
├── frontend/           # Streamlit UI
├── data/               # Data storage
└── tests/              # Tests

Usage Examples

Query Documents

from backend.core.knowledge_base import KnowledgeBase

kb = KnowledgeBase()
response = kb.query("What are the key features of our product?")
print(response)

Ingest Documents

kb.ingest_document("path/to/document.pdf", organization_id="org_123")

Query with SQL Generation

response = kb.query_sql("What are the top 10 customers by revenue?")

Tech Stack

  • LlamaIndex: Data indexing, query engines, RAG
  • FastAPI: REST API backend
  • Streamlit: Web UI for querying and management
  • ChromaDB/Pinecone: Vector storage
  • PostgreSQL: Metadata and analytics storage
  • Redis: Query caching

Use Cases

  • Enterprise Knowledge Management: Centralized knowledge base for organizations
  • Document Q&A: Natural language querying of documents
  • Data Integration: Query across multiple data sources
  • RAG Applications: Retrieval-augmented generation systems
  • Multi-Tenant Knowledge: Organization-level data isolation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Production-ready intelligent knowledge management system built with LlamaIndex. Enables organizations to query, analyze, and extract insights from multiple data sources (documents, databases, APIs) through natural language using advanced RAG capabilities, multi-tenant architecture, and specialized query engines: SQL generation /intelligent routing.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages