GitHub - v4nui/Dad-GPT: local machine chatbot

A Streamlit-based web app that generates creative dad jokes using a local language model (Ollama) integrated via LangChain. The app also enhances the experience by fetching a related GIF from Giphy. Additionally, the project includes scripts to collect dad jokes from Reddit using PRAW and perform exploratory data analysis (EDA) to prepare the data for a Retrieval-Augmented Generation (RAG) pipeline.

Features

🤓 Generative Dad Jokes:
Uses a fine-tuned local language model to generate creative and unique dad jokes.
🎞️ Dynamic GIF Integration:
Retrieves a related GIF from Giphy based on the generated joke context.
👨‍💻 Reddit Data Collection:
Utilizes PRAW (Python Reddit API Wrapper) to collect dad jokes from the dadjokes subreddit.
📊 Data Preparation & EDA for RAG:
Performs exploratory data analysis (EDA) on the collected jokes to clean, analyze, and format the data for use in a Retrieval-Augmented Generation (RAG) system.

🧰 Prerequisites

🐍 Python Version: Python 3.8 – 3.11 (Python 3.12 may cause compatibility issues with some dependencies)
📦 Virtual Environment: Recommended for dependency management
🔑 API Keys:
- Giphy API Key (Get one here)
- Reddit API credentials (for PRAW; Learn how to obtain them)

⚙️ Installation

Clone the Repository:

git clone <repository_url>
cd <repository_folder>

Create and Activate a Virtual Environment:

python -m venv chatbot-venv
# On Windows:
chatbot-venv\Scripts\activate
# On macOS/Linux:
source chatbot-venv/bin/activate

Install Dependencies: Install them with:
```
pip install -r requirements.txt
```

4 Configure Environment Variables: Create a .env file in the project root with your API credentials:

GIPHY_API_KEY=your_giphy_api_key_here
REDDIT_CLIENT_ID=your_reddit_client_id_here
REDDIT_CLIENT_SECRET=your_reddit_client_secret_here

🚀 Usage

1. 🧠 Collecting Dad Jokes from Reddit

Use the provided script (collect_dadjokes.py) to fetch dad jokes from the dadjokes subreddit using PRAW.

python collect_dadjokes.py

This script will:

Connect to Reddit using your API credentials.
Fetch dad jokes from the subreddit.
Save the jokes locally (e.g., in a JSON or CSV file).

2. 📊 Exploratory Data Analysis (EDA)

After collecting the data, run the EDA.ipynb to explore and prepare the jokes for the RAG pipeline.

This step includes:

Data cleaning and formatting.
Sentiment analysis and keyword extraction.
Preparing the jokes for efficient retrieval in RAG.

3. 😜 Running Dad-GPT App

to start the Streamlit app, run:

streamlit run app.py

The app will:

Display a form where you can type a prompt.
Generate a creative dad joke using a local LLM via LangChain.
Display a matching GIF using the Giphy API.

📁 Project Structure

.
├── app.py                     # Streamlit app for generating dad jokes
├── collect_dadjokes.py        # Reddit PRAW data collector
├── EDA.ipynb                  # EDA and RAG preparation script
├── vector.py                  # Vector database setup
├── chroma_langchain_db        # Vector database
├── chatbot-venv               # My virtual environment
├── data/                      # Data folder
│   ├── cleaned_dadjokes.csv   # Cleaned dad jokes CSV file
│   └── dadjokes_partial.csv   # Row data jokes CSV file
├── main.py                    # Terminal app
├── dad-gpt_icon.png           # Logo
├── requirements.txt           # Dependencies
├── load_jokes.py              # Script to collect top joke from Reddit
├── dadjokes_partial_data.csv  # Raw pulled data
├── .env                       # Environment variables (API keys)
└── README.md                  # Project documentation (this file)

🤝 Contributing

Contributes are welcome! If you have ideas, find bugs, or want to help expand this project, feel free to open an issue or submit a pull request.

🙌 Acknowledgements

Streamlit – For making it easy to build beautiful, interactive web apps in Python.
LangChain – For enabling LLM-based pipelines and integration with local models.
Ollama – For running local LLMs like LLaMA, Mistral, etc., with simplicity and speed.
Giphy Developers – For providing the GIF API that brings visual humor to the app.
PRAW – For simple and powerful access to Reddit’s API, used to collect dad jokes.
r/dadjokes – For the treasure trove of community-sourced dad jokes.
TechWithTim – For helpful tutorials and Python project guidance.
ChatGPT – For assisting in planning and refining project code, documentation and images.

Hope this project brings a smile to your face — happy coding! 🧡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

🧰 Prerequisites

⚙️ Installation

🚀 Usage

1. 🧠 Collecting Dad Jokes from Reddit

2. 📊 Exploratory Data Analysis (EDA)

3. 😜 Running Dad-GPT App

📁 Project Structure

🤝 Contributing

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.devcontainer		.devcontainer
chatbot-venv		chatbot-venv
chroma_langchain_db		chroma_langchain_db
data		data
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
README.md		README.md
app.py		app.py
collect_dadjokes.py		collect_dadjokes.py
dad-gpt_icon.png		dad-gpt_icon.png
main.py		main.py
requirements.txt		requirements.txt
vector.py		vector.py

Folders and files

Latest commit

History

Repository files navigation

Features

🧰 Prerequisites

⚙️ Installation

🚀 Usage

1. 🧠 Collecting Dad Jokes from Reddit

2. 📊 Exploratory Data Analysis (EDA)

3. 😜 Running Dad-GPT App

📁 Project Structure

🤝 Contributing

🙌 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages