LangChain + OpenAI + Pinecone RAG Demo

A simple Retrieval Augmented Generation (RAG) application integrating LangChain, OpenAI, and Pinecone vector database, containerized with Docker.

Features

LangChain: Orchestrates the RAG pipeline
OpenAI: Provides embeddings (text-embedding-ada-002) and LLM (GPT-3.5-turbo)
Pinecone: Vector database for semantic search
Docker: Easy deployment and environment consistency

Project Structure

.
├── src/
│   └── main.py           # Main application code
├── data/                 # Data directory (for future use)
├── requirements.txt      # Python dependencies
├── Dockerfile           # Docker container configuration
├── docker-compose.yml   # Docker Compose setup
├── .env.example         # Environment variables template
└── README.md           # This file

Prerequisites

Docker & Docker Compose installed on your system
OpenAI API Key - Get it from OpenAI Platform
Pinecone Account - Sign up at Pinecone

Setup Instructions

1. Clone or Navigate to Project Directory

cd LangChain&RAG_demo

2. Configure Environment Variables

Create a .env file from the example:

cp .env.example .env

Edit .env and add your credentials:

OPENAI_API_KEY=sk-your-actual-openai-api-key
PINECONE_API_KEY=your-actual-pinecone-api-key
PINECONE_ENVIRONMENT=your-pinecone-environment
PINECONE_INDEX_NAME=langchain-demo

How to get these values:

OpenAI API Key: Go to https://platform.openai.com/api-keys
Pinecone API Key: Go to https://app.pinecone.io/ → API Keys
Pinecone Environment: Found in your Pinecone console (e.g., us-east-1-aws)

3. Run with Docker Compose

docker-compose up --build

This will:

Build the Docker image
Install all dependencies
Run the demo application

4. Run Without Docker (Optional)

If you prefer to run locally without Docker:

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the application
python src/main.py

What the Demo Does

The application demonstrates a complete RAG workflow:

Document Ingestion: Ingests sample documents about AI topics
Embedding Generation: Creates vector embeddings using OpenAI
Vector Storage: Stores embeddings in Pinecone
Semantic Search: Retrieves relevant documents for queries
Answer Generation: Uses GPT-3.5-turbo to generate answers based on retrieved context

Sample Output

Question: What is Artificial Intelligence?
Answer: Artificial Intelligence (AI) is the simulation of human intelligence
processes by machines, especially computer systems...

Architecture

User Query
    ↓
LangChain QA Chain
    ↓
OpenAI Embeddings (text-embedding-ada-002)
    ↓
Pinecone Vector Search (semantic similarity)
    ↓
Retrieved Context Documents
    ↓
OpenAI LLM (gpt-3.5-turbo)
    ↓
Generated Answer

Customization

Add Your Own Documents

Edit src/main.py and modify the sample_documents list:

sample_documents = [
    "Your first document text...",
    "Your second document text...",
    # Add more documents
]

Change LLM Model

Modify the ChatOpenAI initialization in src/main.py:

self.llm = ChatOpenAI(
    temperature=0.7,
    model="gpt-4",  # or "gpt-4-turbo-preview"
    openai_api_key=self.openai_api_key
)

Adjust Retrieval Parameters

Modify the retriever configuration:

retriever=self.vectorstore.as_retriever(
    search_kwargs={"k": 5}  # Retrieve top 5 chunks instead of 3
)

Troubleshooting

Issue: Pinecone Index Creation Fails

Ensure your Pinecone API key is correct
Check that you're using a valid environment/region
Free tier Pinecone accounts have limitations on number of indexes

Issue: OpenAI API Errors

Verify your API key is valid
Check you have credits in your OpenAI account
Ensure you're not hitting rate limits

Issue: Docker Build Fails

Make sure Docker is running
Try cleaning Docker cache: docker system prune -a
Check your internet connection

Cost Considerations

OpenAI: Charges for embeddings and LLM API calls
Pinecone: Free tier available, paid plans for production
Estimated cost for demo: ~$0.01-0.05 per run

Next Steps

Load documents from files (PDF, TXT, etc.)
Add a web interface with Streamlit or FastAPI
Implement chat history and conversation memory
Add document metadata filtering
Deploy to cloud platforms (AWS, GCP, Azure)

Resources

License

MIT License - feel free to use this for learning and projects!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LangChain + OpenAI + Pinecone RAG Demo

Features

Project Structure

Prerequisites

Setup Instructions

1. Clone or Navigate to Project Directory

2. Configure Environment Variables

3. Run with Docker Compose

4. Run Without Docker (Optional)

What the Demo Does

Sample Output

Architecture

Customization

Add Your Own Documents

Change LLM Model

Adjust Retrieval Parameters

Troubleshooting

Issue: Pinecone Index Creation Fails

Issue: OpenAI API Errors

Issue: Docker Build Fails

Cost Considerations

Next Steps

Resources

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

tmmsunny012/LangChain-RAG_demo

Folders and files

Latest commit

History

Repository files navigation

LangChain + OpenAI + Pinecone RAG Demo

Features

Project Structure

Prerequisites

Setup Instructions

1. Clone or Navigate to Project Directory

2. Configure Environment Variables

3. Run with Docker Compose

4. Run Without Docker (Optional)

What the Demo Does

Sample Output

Architecture

Customization

Add Your Own Documents

Change LLM Model

Adjust Retrieval Parameters

Troubleshooting

Issue: Pinecone Index Creation Fails

Issue: OpenAI API Errors

Issue: Docker Build Fails

Cost Considerations

Next Steps

Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages