Serverless RAG Research System

Overview

A complete serverless research paper ingestion and search system built on AWS. The system fetches papers from arXiv, generates embeddings using Amazon Titan, stores them in PostgreSQL with pgvector, and provides semantic search with AI-powered chat responses using OpenAI models via AWS Bedrock.

Architecture

API Gateway - REST endpoints for ingestion and chat
AWS Lambda - Serverless compute for paper processing and chat
Amazon Bedrock - AI services (Titan embeddings + OpenAI chat models)
Aurora PostgreSQL - Vector database with pgvector extension
AWS CDK - Infrastructure as code

Features

Paper Ingestion - Fetch and process papers from arXiv API
Vector Embeddings - Amazon Titan Text Embeddings v2 (1024 dimensions)
Semantic Search - pgvector-powered similarity search
AI Chat - OpenAI GPT models via Bedrock for intelligent responses
Topic Management - Automatic topic categorization and management
Deduplication - Prevents duplicate paper storage
Serverless - Fully managed, auto-scaling infrastructure

Project Structure

├── infrastructure/          # AWS CDK infrastructure code
│   ├── lib/
│   │   └── serverless-rag-stack.ts
│   ├── bin/
│   │   └── app.ts
│   └── package.json
├── lambda_ingest/          # Paper ingestion Lambda function
│   ├── handler.py
│   ├── requirements.txt
│   └── common/
│       ├── bedrock_utils.py
│       └── db_utils.py
├── lambda_chat/           # Chat/search Lambda function
│   ├── handler.py
│   ├── requirements.txt
│   └── common/
│       ├── bedrock_utils.py
│       └── db_utils.py
├── chat_ui/               # Web interface
│   ├── index.html
│   ├── style.css
│   ├── script.js
│   └── README.md
├── lambda_layer/          # Python dependencies layer
│   └── python/
├── common/               # Shared utilities
│   ├── bedrock_utils.py  # AI services integration
│   └── db_utils.py       # Database operations
└── AWS_DEPLOYMENT_GUIDE.md

Setup & Deployment

Prerequisites

AWS Account with appropriate permissions
Node.js (v18+) for CDK
Python (3.11+) for Lambda functions
AWS CLI configured with credentials

Quick Deploy

Clone and install dependencies:

git clone <repository-url>
cd Custom-Research/infrastructure
npm install

Deploy the infrastructure:

cdk deploy --require-approval never

Access the deployed system:

Outputs:
ServerlessRagStack.ApiEndpoint = https://xxxxx.execute-api.us-east-1.amazonaws.com/prod/
ServerlessRagStack.ChatUIUrl = http://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com
ServerlessRagStack.ChatUIInstanceId = i-1234567890abcdef0

The Chat UI will be automatically deployed and configured with your API endpoint.

Optional - Local Development:

cd chat_ui
python -m http.server 8000
# Then manually configure the API endpoint in the UI

Usage

Web Interface (Recommended)

The system includes a hosted web interface that's automatically deployed:

Access the web interface using the ChatUIUrl from deployment outputs
Start using immediately - the API endpoint is pre-configured
Ingest papers on topics of interest
Chat with the AI about your research papers

Local Web Interface

For development or customization:

Open chat_ui/index.html in your browser
Configure your API endpoint URL
Ingest papers on topics of interest
Chat with the AI about your research papers

API Endpoints

The system provides REST API endpoints for paper ingestion and semantic search. Use any HTTP client or build your own interface.

Direct API Usage

1. Ingest Papers

curl -X POST "https://your-api-endpoint/prod/ingest" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "quantum computing",
    "max_papers": 10
  }'

Response:

{
  "message": "Successfully processed 10 papers",
  "total_papers_fetched": 10,
  "processed_count": 10,
  "database_enabled": true,
  "topic": "Quantum Computing",
  "papers": [...]
}

2. Chat/Search

curl -X POST "https://your-api-endpoint/prod/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what are quantum algorithms?",
    "topic": "Quantum Computing",
    "top_k": 5
  }'

Response:

{
  "response": "Quantum algorithms are computational procedures...",
  "sources": [
    {
      "paper_title": "Quantum Algorithm Design",
      "arxiv_id": "2301.12345",
      "similarity": 0.89,
      "url": "https://arxiv.org/abs/2301.12345"
    }
  ],
  "context_chunks": 5
}

Configuration

Building Custom Interfaces

You can build custom interfaces using the API endpoints:

import requests

# Ingest papers
response = requests.post(
    "https://your-api-endpoint/prod/ingest",
    json={"query": "quantum computing", "max_papers": 5}
)

# Search and chat
response = requests.post(
    "https://your-api-endpoint/prod/chat",
    json={"query": "what are quantum algorithms?", "topic": "Quantum Computing"}
)

Environment Variables (Lambda)

DB_SECRET_ARN - RDS secret ARN (auto-configured)
DB_CLUSTER_ARN - Aurora cluster ARN (auto-configured)
REGION - AWS region (auto-configured)

Models Used

Embeddings: amazon.titan-embed-text-v2:0 (1024 dimensions)
Chat: openai.gpt-oss-20b-1:0 (OpenAI model via Bedrock)

Database Schema

Topics Table

CREATE TABLE topics (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) UNIQUE NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Papers Table

CREATE TABLE papers (
    id SERIAL PRIMARY KEY,
    arxiv_id VARCHAR(50) UNIQUE NOT NULL,
    title TEXT NOT NULL,
    authors TEXT,
    abstract TEXT,
    published_date DATE,
    categories TEXT,
    topic_id INTEGER REFERENCES topics(id),
    embedding VECTOR(1024),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Security & Permissions

The system uses least-privilege IAM roles with permissions for:

Bedrock model invocation (Titan embeddings + OpenAI chat)
RDS cluster access via IAM database authentication
Secrets Manager for database credentials
VPC access for Lambda functions

Cost Optimization

Lambda - Pay per invocation, cold start optimized
API Gateway - Pay per request
Aurora Serverless - Auto-scaling database, pay for what you use
Bedrock - Pay per token for embeddings and chat

Monitoring

CloudWatch logs for Lambda functions
Bedrock usage metrics
API Gateway metrics and access logs
Aurora performance insights

Troubleshooting

Common Issues

"Model not available" - Ensure Bedrock models are enabled in your region
Database connection errors - Check VPC configuration and security groups
Cold start timeouts - Increase Lambda timeout for large paper batches
Out of memory - Increase Lambda memory for embedding generation

Debug Commands

# Check Lambda logs
aws logs tail "/aws/lambda/ServerlessRagStack-IngestFunction..." --follow

# Test database connectivity
aws rds describe-db-clusters --db-cluster-identifier your-cluster-name

# Verify Bedrock model access
aws bedrock list-foundation-models --query 'modelSummaries[?contains(modelId, `titan`)]'

Cleanup

To avoid ongoing costs, destroy the infrastructure when done:

cd infrastructure
cdk destroy

Future Enhancements

Multi-modal embeddings for figures and equations
Real-time paper notifications
Advanced query parsing and filters
Paper recommendation system
Collaboration features
Citation network analysis

Cost Comparison Table

Component	Your AWS RAG System	ChatGPT Research Mode
Embedding cost	$0.0001–$0.0004 per 1K tokens (Titan v2)	Included in subscription (no transparency)
Vector DB retrieval	$0.000005–$0.00003	Included
LLM inference	$0.0004–$0.02 per 1K tokens (depends on model)	Included (but runs expensive GPT-4/5-level models)
API Gateway	$0.000002 per request	N/A
Lambda execution	$0.00002–$0.00008 per request	N/A
EC2 hosting UI	Fixed cost ~$9/month	Included
TOTAL per request (realistic)	$0.002 – $0.015 per query	~$0.04 – $0.12 per query (hidden cost)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
chat_ui		chat_ui
common		common
infrastructure		infrastructure
lambda_chat		lambda_chat
lambda_ingest		lambda_ingest
lambda_layer/python		lambda_layer/python
.env		.env
.gitignore		.gitignore
AWS_DEPLOYMENT_GUIDE.md		AWS_DEPLOYMENT_GUIDE.md
README.md		README.md
deploy.sh		deploy.sh
package-lock.json		package-lock.json

Folders and files

Latest commit

History

Repository files navigation

Serverless RAG Research System

Overview

Architecture

Features

Project Structure

Setup & Deployment

Prerequisites

Quick Deploy

Usage

Web Interface (Recommended)

Local Web Interface

API Endpoints

Direct API Usage

1. Ingest Papers

2. Chat/Search

Configuration

Building Custom Interfaces

Environment Variables (Lambda)

Models Used

Database Schema

Topics Table

Papers Table

Security & Permissions

Cost Optimization

Monitoring

Troubleshooting

Common Issues

Debug Commands

Cleanup

Future Enhancements

Cost Comparison Table

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages