Research Assistant Tool

This project is designed to assist users in gathering information for their research. It consists of four main components: Scraper, PDF Converter, Chatbot, Speech Module, and a user-friendly Gradio interface to integrate all functionalities.

Components

1. Scraper:

Designed to help users gather information for their research. It performs the following steps:

Takes a research topic as input from the user.
Conducts a Google search to find relevant articles and information.
Scrapes outlines from the retrieved URLs to provide an overview of the content.

2. PDF Converter:

Using "weasyprint" library, this modules enables converting a selected webpage to a PDF document, so users can easily archive and share it and chat about it.

3. Chatbot:

This the heart of the project, where the magic happens. This module is responsible for creating a Conversational Retrieval Chain using langchain:

Instantiate an LLM model, either from local source (Ollama phi model) or from HuggingFace (OpenAssistant)
Load given pdf file
Split it to chunks
Create and apply an embedding function on those chunks
Create a vector database and fill it with the embedded vectors
Create a retriever chain which is "history aware", which given chat history and a question, can reformulate it by adding context to it.
Create a stuff documents chain, used to pass documents into a model
Create the main chain, retrieval chain, which is built over the previous 2 chains

Then, it provides answer_query method, which gets the answer from the retrieval chain and saves chat history.

Chatbot Architecture Overview:

4. Speech Module

Provides voice interaction capabilities with the chatbot, allowing users to use voice commands to interact with their PDFs. This module includes:

Audio to Text: Converts spoken input from the user into text using the audio_to_text method.
Conversational Retrieval Chain: The chatbot processes the text input to retrieve relevant information from the loaded PDF.
Text to Audio: Converts the chatbot's text response back into spoken output using the str_to_audio method.
Voice Interaction: Facilitates continuous voice conversation with the user, providing a seamless and interactive research experience.

5. App:

All the project functionalities are integrated into one Gradio interface, to provide an intuitive and interactive experience for the users. The interface consists of 3 main tabs:

Scrape Google Tab: Allows users to input a research topic and specify the number of links to scrape from Google. Then the user can take a look at the outlines of any of the scraped websites to choose the most relevant for his research.
Convert To PDF Tab: Lets users input a webpage URL to convert it to PDF, or upload a pdf from their device. Then, they can load it to the chatbot.
Chat with your PDF: Represents the retriever chain interface, which accepts questions input from user and displays chatbot answers, preserving the chat history (which can be cleared by the user)
Talk with your PDF: Includes audio input and output components for user-bot conversation. Users can speak into the microphone, and the bot will respond with voice answers.

Getting Started

Installation

Clone the repository:

git clone https://github.com/LaraKhansa/Research_Assistant.git

Install the necessary dependencies. Make sure you have python>=3.9.9 and <=3.13:
```
pip install -r requirements.txt
crawl4ai-setup
```

Running the App

To start the Gradio interface, run the following command:

  python app.py

Usage

Scrape Google Tab: Enter a research topic and specify the number of links to scrape. Review the outlines and choose the relevant articles.
Convert To PDF Tab: Enter a webpage URL to convert to PDF or upload a PDF file from your device.
Chat with your PDF Tab: Load the PDF into the chatbot, ask questions, and get answers while preserving the chat history.
Talk with PDF Tab: Speak into the microphone to interact with the chatbot using voice commands, and receive voice responses from the bot.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
app		app
utils		utils
.gitignore		.gitignore
FineTuning VS RAG.md		FineTuning VS RAG.md
LICENSE		LICENSE
README.md		README.md
chatbot_architecture.jpeg		chatbot_architecture.jpeg
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Assistant Tool

Components

1. Scraper:

2. PDF Converter:

3. Chatbot:

4. Speech Module

5. App:

Getting Started

Installation

Running the App

Usage

About

Uh oh!

Releases

Packages

Languages

License

LaraKhansa/Research_Assistant

Folders and files

Latest commit

History

Repository files navigation

Research Assistant Tool

Components

1. Scraper:

2. PDF Converter:

3. Chatbot:

4. Speech Module

5. App:

Getting Started

Installation

Running the App

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages