This repository provides a simple introduction to using vector databases with LangChain and Pinecone. The code demonstrates how to load text documents, split them into chunks, create embeddings using OpenAI, and store these embeddings in a Pinecone vector store. A basic RetrievalQA chain is then implemented to answer queries using the vector store.
Vector databases are powerful tools for managing and querying embeddings generated from textual data. This project showcases a basic implementation using LangChain and Pinecone, demonstrating how to create, store, and retrieve information from a vector store using OpenAI's models.
To run this code, you need the following:
- Python 3.8+
- Pinecone account and API key
- OpenAI account and API key
-
Clone the repository:
git clone https://github.com/tanersekmen/intro-vector-db.git cd intro-vector-db -
Install the required Python packages:
pip install langchain langchain_community langchain_openai langchain_pinecone
-
Set up your environment variables:
You'll need to set up your Pinecone and OpenAI API keys as environment variables:
export PINECONE_API_KEY='your-pinecone-api-key' export OPENAI_API_KEY='your-openai-api-key'
To run the example code:
-
Place your text file (e.g.,
text_file.txt) in theblog/directory. -
Modify the
file_pathin themain()function if necessary. -
Run the script:
python main.py
-
The code will process the text file, create embeddings, store them in Pinecone, and then answer the sample query: "veri bilimi nedir? kısaca açıklar mısın?"
This function loads the text document using the TextLoader from LangChain.
This function splits the loaded document into smaller chunks, facilitating better embedding creation and retrieval.
This function creates a vector store using Pinecone, storing the embeddings generated from the document chunks.
This function sets up a RetrievalQA chain, using the vector store to retrieve relevant information based on the query.
The main() function ties everything together, loading the documents, creating the vector store, and answering a sample query.
Contributions are welcome! Please feel free to submit a Pull Request.