🧠 GenAI Bedrock with Chroma Knowledgebase

A powerful document search and question-answering system built with AWS Bedrock, LangChain, Vhroma, and Streamlit. Upload your documents, ask questions in natural language, and get AI-powered answers with source citations.

CLICK HERE TO USE THE STRANDS VERSION

✨ Features

📄 Multi-format Support: Upload PDF, TXT, and Markdown files
🔍 Intelligent Search: Vector-based similarity search using AWS Bedrock embeddings
💬 Natural Language Q&A: Ask questions and get contextual answers from your documents
📚 Source Citations: See which documents were used to generate each answer
🌐 Web Interface: Easy-to-use Streamlit interface
🗂️ File Management: Upload, view, and delete documents with ease
🔄 Real-time Indexing: Re-index your knowledgebase whenever you add new documents

🏗️ Architecture

Documents (PDF/TXT/MD) → Text Extraction → Chunking → Vector Embeddings → ChromaDB
                                                                              ↓
User Question → Similarity Search → Relevant Chunks → AWS Bedrock LLM → Answer + Sources

🚀 Quick Start

Prerequisites

AWS Account with Bedrock access
Python 3.8+
AWS CLI configured or environment variables set

Installation

Clone the repository

git clone <your-repo-url>
cd streamlit-kb

Install dependencies

Option A: Using pip (traditional)

pip install -r requirements.txt

Option B: Using uv (faster, recommended)

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies with uv
uv pip install -r requirements.txt

Set up AWS credentials (choose one method):

Option A: AWS CLI

aws configure

Option B: Environment variables

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-west-2

Option C: .env file

# Create .env file in project root
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_DEFAULT_REGION=us-west-2

Enable Bedrock Models (in AWS Console)
- Go to AWS Bedrock Console
- Navigate to "Model access"
- Enable these models:
  - amazon.titan-embed-text-v1 (for embeddings)
  - us.amazon.nova-micro-v1:0 (for Q&A)
Run the application
```
streamlit run app.py
```
Open your browser to http://localhost:8501

📖 Usage Guide

1. Upload Documents

Navigate to the "📤 Upload Files" tab
Upload PDF, TXT, or MD files
Click "🔄 Re-index Knowledgebase" to process documents

2. Ask Questions

Go to the "💬 Ask Questions" tab
Type your question in natural language
Click "Generate Answer" to get AI-powered responses
Review source documents to verify accuracy

3. Manage Files

Use the "🗑️ Delete Files" tab to remove documents
Re-index after deleting files to update the search index

🔧 Configuration

Model Settings

You can modify these constants in app.py:

# Embedding model for document vectorization
AWS_BEDROCK_EMBEDDING_MODEL_ID = "amazon.titan-embed-text-v1"

# Language model for question answering
AWS_BEDROCK_LLM_MODEL_ID = "us.amazon.nova-micro-v1:0"

# AWS region
AWS_REGION = "us-west-2"

# Text chunking parameters
chunk_size = 500        # Characters per chunk
chunk_overlap = 50      # Overlap between chunks

Directory Structure

project/
├── app.py              # Main application
├── data/               # Uploaded documents (auto-created)
├── requirements.txt    # Python dependencies
├── .env               # AWS credentials (optional)
└── README.md          # This file

📦 Dependencies

Create a requirements.txt file with:

streamlit>=1.28.0
langchain>=0.1.0
langchain-community>=0.0.10
boto3>=1.34.0
chromadb>=0.4.0
PyPDF2>=3.0.0
python-dotenv>=1.0.0

🛠️ How It Works

Document Processing

Text Extraction: PDFs are converted to text using PyPDF2
Chunking: Documents are split into ~500 character chunks with 50 character overlap
Vectorization: Each chunk is converted to embeddings using AWS Bedrock Titan
Storage: Vectors are stored in ChromaDB for fast similarity search

Question Answering

Query Processing: User question is converted to vector embedding
Similarity Search: Find the 3 most relevant document chunks
Context Assembly: Relevant chunks are sent to AWS Bedrock Nova Micro
Answer Generation: LLM generates answer based on document context
Source Attribution: Original document chunks are shown for verification

🔒 Security Notes

Temporary Storage: ChromaDB uses temporary directories that are cleaned up automatically
Local Processing: Documents are processed locally before sending to AWS
AWS IAM: Ensure your AWS credentials have minimal required Bedrock permissions
Data Privacy: Consider data sensitivity when using cloud AI services

🐛 Troubleshooting

Common Issues

Q: "No module named 'streamlit'"

pip install streamlit

Q: "Unable to locate credentials"

Verify AWS credentials are configured
Check AWS region has Bedrock access
Ensure Bedrock models are enabled in AWS Console

Q: "No valid documents found to index"

Upload documents first via "Upload Files" tab
Ensure files are PDF, TXT, or MD format
Check that files have readable content

Q: "Error generating answer"

Re-index your knowledgebase
Verify Bedrock models are enabled
Check AWS credentials and permissions

Debug Mode

Add this to see more detailed logs:

import logging
logging.basicConfig(level=logging.DEBUG)

🚀 Deployment Options

Local Development

streamlit run app.py

Cloud Deployment

Streamlit Cloud: Connect your GitHub repo
AWS EC2: Deploy on EC2 instance with IAM role
Docker: Containerize the application

Example Docker Deployment

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

AWS Bedrock for AI models
LangChain for AI framework
Streamlit for web interface
ChromaDB for vector storage

📞 Support

🐛 Issues: GitHub Issues
📧 Email: patweb99@gmail.com

⭐ Star this repo if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
STREAMLIT-PRIMER.md		STREAMLIT-PRIMER.md
app.py		app.py
requirements.txt		requirements.txt

patweb99/streamlit-kb

Folders and files

Latest commit

History

Repository files navigation