An in-depth, guided notebook that builds a complete RAG (Retrieval-Augmented Generation) chatbot using Amazon DocumentDB as the single store for both source documents and vector embeddings. Unlike a typical quickstart, this notebook explains why each parameter matters and demonstrates the impact of changing them with targeted examples.
Most RAG tutorials show you the "happy path." This notebook goes deeper:
- Single-store architecture - vectors and source text live in the same document, eliminating the multi-database sync problem
- HNSW parameter tuning - measures recall vs. latency across different
m,efConstruction, andefSearchvalues so you can see the tradeoffs - Chunk size and overlap experiments - demonstrates what happens when chunks split mid-sentence and how overlap preserves context
- RAG vs. no-RAG comparison - side-by-side output showing what the LLM gets right (and wrong) without retrieval
- Production resilience patterns - rate limiting, circuit breakers, and retry logic for Bedrock API calls
Check out the AWS Events YouTube channel for a walkthrough of this demo.
- Amazon DocumentDB (single collection with HNSW index)
- Amazon Titan Text Embeddings V2
- Anthropic Claude Haiku 4.5 via Amazon Bedrock
- Gradio chat interface
- Amazon SageMaker or any Jupyter-compatible environment
| # | Section | Purpose |
|---|---|---|
| — | Required Configuration | Secret name, AWS region, connection mode |
| 1 | Install Dependencies | Required Python packages |
| 2 | Import Libraries | Core imports |
| 3 | Resilience Utilities | Rate limiter and circuit breaker decorators |
| 4 | Connect to DocumentDB | Secrets Manager credentials, HNSW index creation |
| 5 | Load and Chunk Documents | PDFs, blog posts, and doc pages → chunked text |
| 6 | Generate Embeddings and Insert | Titan embeddings → batch insert into DocumentDB |
| 7 | Single-Store vs. Multi-Store | Benchmarks the single-collection approach against a multi-collection pattern |
| 8 | HNSW Parameter Tuning | Recall vs. latency across index configurations |
| 9 | Configure LLM and Vector Store | Claude Haiku + DocumentDB vector store setup |
| 10 | Chunk Overlap - Why It Matters | Same document chunked with/without overlap, then queried |
| 11 | Prompt Template | System prompt for grounded Q&A |
| 12 | RAG vs No-RAG Comparison | Side-by-side LLM output with and without retrieval |
| 13 | Chat Configuration | History length and retrieval limits |
| 14 | Launch Chatbot | Gradio chat interface with caching and resilience |
| 15 | Cleanup | Close connections |
- AWS account with access to Amazon Bedrock (Titan Text Embeddings V2 and Claude Haiku 4.5)
- Amazon DocumentDB cluster (version 5.0+)
- AWS Secrets Manager secret containing your Amazon DocumentDB credentials
- Python 3.10+
This notebook ingests PDF documents as its knowledge base. You can use any PDFs, but to reproduce the demo as shown, download:
Place the PDF files in the same directory as the notebook.
-
Install dependencies:
pip install -r requirements.txt
-
Download the Amazon DocumentDB TLS certificate:
wget https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem
-
Download the sample PDFs (see Sample Data above) into this directory.
-
Open
docdb-rag-deep-dive.ipynband update the Required Configuration cell at the top:secret_name— your Secrets Manager secret nameaws_region— your AWS regionis_bastion— set to'y'if connecting through a bastion host
-
Run the notebook cells sequentially.
