Skip to content

A Streamlit-based chatbot that interacts with multiple PDF files using Google Generative AI, LangChain, and FAISS for intelligent document querying.

Notifications You must be signed in to change notification settings

Prayesh13/ChatPDF

Repository files navigation

📝 Description

The PDF's Chat Agent is a Streamlit-based web application designed to facilitate interactive conversations with a chatbot. The app allows users to upload multiple PDF documents, extract text information from them, and train a chatbot using this extracted content. Users can then engage in real-time conversations with the chatbot.

🎯 How It Works:

The application follows these steps to provide responses to your questions:

  1. PDF Loading : The app reads multiple PDF documents and extracts their text content.

  2. Text Chunking : The extracted text is divided into smaller chunks that can be processed effectively.

  3. Language Model : The application utilizes a language model to generate vector representations (embeddings) of the text chunks.

  4. Similarity Matching : When you ask a question, the app compares it with the text chunks and identifies the most semantically similar ones.

  5. Response Generation : The selected chunks are passed to the language model, which generates a response based on the relevant content of the PDFs.

📽️ Demo

▶️ Watch the Demo Video

🌟Requirements

  • Streamlit : A Python library for building web applications with interactive elements.
  • google-generativeai : It is a package that provides generative AI capabilities for chatbots and virtual agents. It can be used in a variety of applications such as content generation, dialogue agents, summarization and classification systems and more.
  • python-dotenv : A library for loading environment variables from a .env file. This is commonly used to store configuration settings, API keys, and other sensitive information outside of your code.
  • langchain : A custom library for natural language processing tasks, including conversational retrieval, text splitting, embeddings, vector stores, chat models, and memory.
  • pdfminer.six: pdfminer.six is a robust library for extracting text, metadata, and layout information from PDF documents in Python. Unlike PyPDF2, which focuses on PDF manipulation (like merging or splitting), pdfminer.six is specialized for accurate and granular text extraction. In the context of a multipdf chatbot, it enables the app to extract raw textual content from uploaded PDF files, which is then processed and used to generate meaningful responses to user queries.
  • manipulation and generation of PDF files based on user input or responses.
  • faiss-cpu : FAISS (Facebook AI Similarity Search) is a library developed by Facebook for efficient similarity search, Machine Learning Embeddings,Information Retrieval, content-based filtering and clustering of dense vectors.
  • langchain_google_genai : It is a package that provides an integration between LangChain and Google’s generative-ai SDK. It contains classes that extend the Embeddings class and provide methods for generating embeddings. The package can be used in a multipdf chatbot application to extract textual data from PDF documents and generate Accurate responses to user queries.

▶️Installation

Install the required Python packages:

pip install -r requirements.txt

Set up your Google API key from https://makersuite.google.com/app/apikey by creating a .env file in the root directory of the project with the following contents:

GOOGLE_API_KEY =<your-api-key-here>

Run the Streamlit app:

streamlit run app.py


💡Usage

In case You want to run & implement project on your system then follow these steps:

  1. Ensure that you have installed the required dependencies and added the Google API key to the .env file (MUST).
  2. Run the app.py file using the Streamlit CLI. Execute the following command:
    streamlit run chatapp.py
    
  3. The application will launch in your default web browser, displaying the user interface.
  4. Upload PDF documents into the app by following the provided instructions at sidebar. On the sidebar, you'll find an option to upload PDF documents. Click on the "Upload your documents here and click on Process" button and select one or more PDF files.
  5. Don't forget to click on Submit & Process Button.
  6. Ask questions in natural language about the loaded PDFs using the chat interface.
  7. Chatting with the Documents: After uploading and processing the PDF documents, you can ask questions by typing them in the text input field. Press Enter or click the "Ask" button to submit your question.

The application will use conversational AI to provide responses based on the content of the uploaded documents. The responses will be displayed in the chat interface.

About

A Streamlit-based chatbot that interacts with multiple PDF files using Google Generative AI, LangChain, and FAISS for intelligent document querying.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages