A Streamlit-based web app that generates creative dad jokes using a local language model (Ollama) integrated via LangChain. The app also enhances the experience by fetching a related GIF from Giphy. Additionally, the project includes scripts to collect dad jokes from Reddit using PRAW and perform exploratory data analysis (EDA) to prepare the data for a Retrieval-Augmented Generation (RAG) pipeline.
-
🤓 Generative Dad Jokes:
Uses a fine-tuned local language model to generate creative and unique dad jokes. -
🎞️ Dynamic GIF Integration:
Retrieves a related GIF from Giphy based on the generated joke context. -
👨💻 Reddit Data Collection:
Utilizes PRAW (Python Reddit API Wrapper) to collect dad jokes from the dadjokes subreddit. -
📊 Data Preparation & EDA for RAG:
Performs exploratory data analysis (EDA) on the collected jokes to clean, analyze, and format the data for use in a Retrieval-Augmented Generation (RAG) system.
- 🐍 Python Version: Python 3.8 – 3.11 (Python 3.12 may cause compatibility issues with some dependencies)
- 📦 Virtual Environment: Recommended for dependency management
- 🔑 API Keys:
- Giphy API Key (Get one here)
- Reddit API credentials (for PRAW; Learn how to obtain them)
-
Clone the Repository:
git clone <repository_url> cd <repository_folder>
-
Create and Activate a Virtual Environment:
python -m venv chatbot-venv # On Windows: chatbot-venv\Scripts\activate # On macOS/Linux: source chatbot-venv/bin/activate -
Install Dependencies: Install them with:
pip install -r requirements.txt
4 Configure Environment Variables: Create a .env file in the project root with your API credentials:
GIPHY_API_KEY=your_giphy_api_key_here
REDDIT_CLIENT_ID=your_reddit_client_id_here
REDDIT_CLIENT_SECRET=your_reddit_client_secret_here
Use the provided script (collect_dadjokes.py) to fetch dad jokes from the dadjokes subreddit using PRAW.
python collect_dadjokes.pyThis script will:
- Connect to Reddit using your API credentials.
- Fetch dad jokes from the subreddit.
- Save the jokes locally (e.g., in a JSON or CSV file).
After collecting the data, run the EDA.ipynb to explore and prepare the jokes for the RAG pipeline.
This step includes:
- Data cleaning and formatting.
- Sentiment analysis and keyword extraction.
- Preparing the jokes for efficient retrieval in RAG.
to start the Streamlit app, run:
streamlit run app.py
The app will:
- Display a form where you can type a prompt.
- Generate a creative dad joke using a local LLM via LangChain.
- Display a matching GIF using the Giphy API.
.
├── app.py # Streamlit app for generating dad jokes
├── collect_dadjokes.py # Reddit PRAW data collector
├── EDA.ipynb # EDA and RAG preparation script
├── vector.py # Vector database setup
├── chroma_langchain_db # Vector database
├── chatbot-venv # My virtual environment
├── data/ # Data folder
│ ├── cleaned_dadjokes.csv # Cleaned dad jokes CSV file
│ └── dadjokes_partial.csv # Row data jokes CSV file
├── main.py # Terminal app
├── dad-gpt_icon.png # Logo
├── requirements.txt # Dependencies
├── load_jokes.py # Script to collect top joke from Reddit
├── dadjokes_partial_data.csv # Raw pulled data
├── .env # Environment variables (API keys)
└── README.md # Project documentation (this file)
Contributes are welcome! If you have ideas, find bugs, or want to help expand this project, feel free to open an issue or submit a pull request.
- Streamlit – For making it easy to build beautiful, interactive web apps in Python.
- LangChain – For enabling LLM-based pipelines and integration with local models.
- Ollama – For running local LLMs like LLaMA, Mistral, etc., with simplicity and speed.
- Giphy Developers – For providing the GIF API that brings visual humor to the app.
- PRAW – For simple and powerful access to Reddit’s API, used to collect dad jokes.
- r/dadjokes – For the treasure trove of community-sourced dad jokes.
- TechWithTim – For helpful tutorials and Python project guidance.
- ChatGPT – For assisting in planning and refining project code, documentation and images.
Hope this project brings a smile to your face — happy coding! 🧡