A complete serverless research paper ingestion and search system built on AWS. The system fetches papers from arXiv, generates embeddings using Amazon Titan, stores them in PostgreSQL with pgvector, and provides semantic search with AI-powered chat responses using OpenAI models via AWS Bedrock.
- API Gateway - REST endpoints for ingestion and chat
- AWS Lambda - Serverless compute for paper processing and chat
- Amazon Bedrock - AI services (Titan embeddings + OpenAI chat models)
- Aurora PostgreSQL - Vector database with pgvector extension
- AWS CDK - Infrastructure as code
- Paper Ingestion - Fetch and process papers from arXiv API
- Vector Embeddings - Amazon Titan Text Embeddings v2 (1024 dimensions)
- Semantic Search - pgvector-powered similarity search
- AI Chat - OpenAI GPT models via Bedrock for intelligent responses
- Topic Management - Automatic topic categorization and management
- Deduplication - Prevents duplicate paper storage
- Serverless - Fully managed, auto-scaling infrastructure
├── infrastructure/ # AWS CDK infrastructure code
│ ├── lib/
│ │ └── serverless-rag-stack.ts
│ ├── bin/
│ │ └── app.ts
│ └── package.json
├── lambda_ingest/ # Paper ingestion Lambda function
│ ├── handler.py
│ ├── requirements.txt
│ └── common/
│ ├── bedrock_utils.py
│ └── db_utils.py
├── lambda_chat/ # Chat/search Lambda function
│ ├── handler.py
│ ├── requirements.txt
│ └── common/
│ ├── bedrock_utils.py
│ └── db_utils.py
├── chat_ui/ # Web interface
│ ├── index.html
│ ├── style.css
│ ├── script.js
│ └── README.md
├── lambda_layer/ # Python dependencies layer
│ └── python/
├── common/ # Shared utilities
│ ├── bedrock_utils.py # AI services integration
│ └── db_utils.py # Database operations
└── AWS_DEPLOYMENT_GUIDE.md
- AWS Account with appropriate permissions
- Node.js (v18+) for CDK
- Python (3.11+) for Lambda functions
- AWS CLI configured with credentials
- Clone and install dependencies:
git clone <repository-url>
cd Custom-Research/infrastructure
npm install- Deploy the infrastructure:
cdk deploy --require-approval never- Access the deployed system:
Outputs:
ServerlessRagStack.ApiEndpoint = https://xxxxx.execute-api.us-east-1.amazonaws.com/prod/
ServerlessRagStack.ChatUIUrl = http://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com
ServerlessRagStack.ChatUIInstanceId = i-1234567890abcdef0
The Chat UI will be automatically deployed and configured with your API endpoint.
- Optional - Local Development:
cd chat_ui
python -m http.server 8000
# Then manually configure the API endpoint in the UIThe system includes a hosted web interface that's automatically deployed:
- Access the web interface using the ChatUIUrl from deployment outputs
- Start using immediately - the API endpoint is pre-configured
- Ingest papers on topics of interest
- Chat with the AI about your research papers
For development or customization:
- Open
chat_ui/index.htmlin your browser - Configure your API endpoint URL
- Ingest papers on topics of interest
- Chat with the AI about your research papers
The system provides REST API endpoints for paper ingestion and semantic search. Use any HTTP client or build your own interface.
curl -X POST "https://your-api-endpoint/prod/ingest" \
-H "Content-Type: application/json" \
-d '{
"query": "quantum computing",
"max_papers": 10
}'Response:
{
"message": "Successfully processed 10 papers",
"total_papers_fetched": 10,
"processed_count": 10,
"database_enabled": true,
"topic": "Quantum Computing",
"papers": [...]
}curl -X POST "https://your-api-endpoint/prod/chat" \
-H "Content-Type: application/json" \
-d '{
"query": "what are quantum algorithms?",
"topic": "Quantum Computing",
"top_k": 5
}'Response:
{
"response": "Quantum algorithms are computational procedures...",
"sources": [
{
"paper_title": "Quantum Algorithm Design",
"arxiv_id": "2301.12345",
"similarity": 0.89,
"url": "https://arxiv.org/abs/2301.12345"
}
],
"context_chunks": 5
}You can build custom interfaces using the API endpoints:
import requests
# Ingest papers
response = requests.post(
"https://your-api-endpoint/prod/ingest",
json={"query": "quantum computing", "max_papers": 5}
)
# Search and chat
response = requests.post(
"https://your-api-endpoint/prod/chat",
json={"query": "what are quantum algorithms?", "topic": "Quantum Computing"}
)DB_SECRET_ARN- RDS secret ARN (auto-configured)DB_CLUSTER_ARN- Aurora cluster ARN (auto-configured)REGION- AWS region (auto-configured)
- Embeddings:
amazon.titan-embed-text-v2:0(1024 dimensions) - Chat:
openai.gpt-oss-20b-1:0(OpenAI model via Bedrock)
CREATE TABLE topics (
id SERIAL PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);CREATE TABLE papers (
id SERIAL PRIMARY KEY,
arxiv_id VARCHAR(50) UNIQUE NOT NULL,
title TEXT NOT NULL,
authors TEXT,
abstract TEXT,
published_date DATE,
categories TEXT,
topic_id INTEGER REFERENCES topics(id),
embedding VECTOR(1024),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);The system uses least-privilege IAM roles with permissions for:
- Bedrock model invocation (Titan embeddings + OpenAI chat)
- RDS cluster access via IAM database authentication
- Secrets Manager for database credentials
- VPC access for Lambda functions
- Lambda - Pay per invocation, cold start optimized
- API Gateway - Pay per request
- Aurora Serverless - Auto-scaling database, pay for what you use
- Bedrock - Pay per token for embeddings and chat
- CloudWatch logs for Lambda functions
- Bedrock usage metrics
- API Gateway metrics and access logs
- Aurora performance insights
- "Model not available" - Ensure Bedrock models are enabled in your region
- Database connection errors - Check VPC configuration and security groups
- Cold start timeouts - Increase Lambda timeout for large paper batches
- Out of memory - Increase Lambda memory for embedding generation
# Check Lambda logs
aws logs tail "/aws/lambda/ServerlessRagStack-IngestFunction..." --follow
# Test database connectivity
aws rds describe-db-clusters --db-cluster-identifier your-cluster-name
# Verify Bedrock model access
aws bedrock list-foundation-models --query 'modelSummaries[?contains(modelId, `titan`)]'To avoid ongoing costs, destroy the infrastructure when done:
cd infrastructure
cdk destroy- Multi-modal embeddings for figures and equations
- Real-time paper notifications
- Advanced query parsing and filters
- Paper recommendation system
- Collaboration features
- Citation network analysis
| Component | Your AWS RAG System | ChatGPT Research Mode |
|---|---|---|
| Embedding cost | $0.0001–$0.0004 per 1K tokens (Titan v2) | Included in subscription (no transparency) |
| Vector DB retrieval | $0.000005–$0.00003 | Included |
| LLM inference | $0.0004–$0.02 per 1K tokens (depends on model) | Included (but runs expensive GPT-4/5-level models) |
| API Gateway | $0.000002 per request | N/A |
| Lambda execution | $0.00002–$0.00008 per request | N/A |
| EC2 hosting UI | Fixed cost ~$9/month | Included |
| TOTAL per request (realistic) | $0.002 – $0.015 per query | ~$0.04 – $0.12 per query (hidden cost) |