A production-ready intelligent knowledge management system using LlamaIndex that enables organizations to query, analyze, and extract insights from multiple data sources (documents, databases, APIs, web content) through natural language.
- Multi-Source Data Ingestion: Documents (PDF, Word, Markdown), databases, APIs, web content
- Advanced Query Engines: Sub-question decomposition, SQL generation, router query engine, hybrid search
- Hybrid Search: Vector similarity + keyword search for better retrieval
- Query Analytics: Track query patterns, popular topics, knowledge gaps
- Multi-Tenant Support: Organization-level data isolation
- Real-Time Updates: Incremental indexing, document versioning
- LlamaIndex: Data indexing, query engines, RAG
- Gemini: GPT-4 for query understanding and generation
- FastAPI: REST API backend
- Streamlit: Web UI for querying and management
- ChromaDB/Pinecone: Vector storage
- PostgreSQL: Metadata and analytics storage
- Redis: Query caching
cd enterprise_knowledge_base
pip install -r requirements.txtCreate a .env file in the project root:
# Copy the example
cp .env.example .env
# Edit .env and add your API keys
GEMINI_API_KEY=your_GEMINI_API_KEY_here
GEMINI_MODEL=gemini-3-flash-previewIn one terminal:
cd enterprise_knowledge_base
python -m uvicorn backend.api.main:app --reload --port 8000The API will be available at http://localhost:8000
In another terminal:
cd enterprise_knowledge_base
streamlit run frontend/app.py --server.port 8501The UI will be available at http://localhost:8501
- Open the Streamlit UI at
http://localhost:8501 - Go to "Ingest Documents" tab
- Upload a PDF or document
- Go to "Query" tab
- Ask a question about your document
User Query
↓
[Query Router] → Determines query type
↓
├─→ [Document Query Engine] → RAG over documents
├─→ [SQL Query Engine] → Natural language to SQL
├─→ [API Query Engine] → Query external APIs
└─→ [Hybrid Query Engine] → Combines multiple sources
↓
[Response Synthesizer] → Combines results
↓
[Citation Generator] → Adds source citations
↓
Response + Sources
enterprise_knowledge_base/
├── backend/
│ ├── core/ # Core LlamaIndex setup
│ ├── engines/ # Query engines
│ ├── ingestion/ # Data ingestion
│ ├── api/ # FastAPI endpoints
│ ├── models/ # Database models
│ └── utils/ # Utilities
├── frontend/ # Streamlit UI
├── data/ # Data storage
└── tests/ # Tests
from backend.core.knowledge_base import KnowledgeBase
kb = KnowledgeBase()
response = kb.query("What are the key features of our product?")
print(response)kb.ingest_document("path/to/document.pdf", organization_id="org_123")response = kb.query_sql("What are the top 10 customers by revenue?")- LlamaIndex: Data indexing, query engines, RAG
- FastAPI: REST API backend
- Streamlit: Web UI for querying and management
- ChromaDB/Pinecone: Vector storage
- PostgreSQL: Metadata and analytics storage
- Redis: Query caching
- Enterprise Knowledge Management: Centralized knowledge base for organizations
- Document Q&A: Natural language querying of documents
- Data Integration: Query across multiple data sources
- RAG Applications: Retrieval-augmented generation systems
- Multi-Tenant Knowledge: Organization-level data isolation
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.