Skip to content

Latest commit

Β 

History

History
144 lines (105 loc) Β· 5.36 KB

File metadata and controls

144 lines (105 loc) Β· 5.36 KB

Dad-GPT Logo

A Streamlit-based web app that generates creative dad jokes using a local language model (Ollama) integrated via LangChain. The app also enhances the experience by fetching a related GIF from Giphy. Additionally, the project includes scripts to collect dad jokes from Reddit using PRAW and perform exploratory data analysis (EDA) to prepare the data for a Retrieval-Augmented Generation (RAG) pipeline.

Features

  • πŸ€“ Generative Dad Jokes:
    Uses a fine-tuned local language model to generate creative and unique dad jokes.

  • 🎞️ Dynamic GIF Integration:
    Retrieves a related GIF from Giphy based on the generated joke context.

  • πŸ‘¨β€πŸ’» Reddit Data Collection:
    Utilizes PRAW (Python Reddit API Wrapper) to collect dad jokes from the dadjokes subreddit.

  • πŸ“Š Data Preparation & EDA for RAG:
    Performs exploratory data analysis (EDA) on the collected jokes to clean, analyze, and format the data for use in a Retrieval-Augmented Generation (RAG) system.

🧰 Prerequisites

  • 🐍 Python Version: Python 3.8 – 3.11 (Python 3.12 may cause compatibility issues with some dependencies)
  • πŸ“¦ Virtual Environment: Recommended for dependency management
  • πŸ”‘ API Keys:

βš™οΈ Installation

  1. Clone the Repository:

    git clone <repository_url>
    cd <repository_folder>
  2. Create and Activate a Virtual Environment:

    python -m venv chatbot-venv
    # On Windows:
    chatbot-venv\Scripts\activate
    # On macOS/Linux:
    source chatbot-venv/bin/activate
    
  3. Install Dependencies: Install them with:

    pip install -r requirements.txt
    

4 Configure Environment Variables: Create a .env file in the project root with your API credentials:

GIPHY_API_KEY=your_giphy_api_key_here
REDDIT_CLIENT_ID=your_reddit_client_id_here
REDDIT_CLIENT_SECRET=your_reddit_client_secret_here

πŸš€ Usage

1. 🧠 Collecting Dad Jokes from Reddit

Use the provided script (collect_dadjokes.py) to fetch dad jokes from the dadjokes subreddit using PRAW.

python collect_dadjokes.py

This script will:

  • Connect to Reddit using your API credentials.
  • Fetch dad jokes from the subreddit.
  • Save the jokes locally (e.g., in a JSON or CSV file).

2. πŸ“Š Exploratory Data Analysis (EDA)

After collecting the data, run the EDA.ipynb to explore and prepare the jokes for the RAG pipeline.

This step includes:

  • Data cleaning and formatting.
  • Sentiment analysis and keyword extraction.
  • Preparing the jokes for efficient retrieval in RAG.

3. 😜 Running Dad-GPT App

to start the Streamlit app, run:

streamlit run app.py

The app will:

  • Display a form where you can type a prompt.
  • Generate a creative dad joke using a local LLM via LangChain.
  • Display a matching GIF using the Giphy API.

πŸ“ Project Structure

.
β”œβ”€β”€ app.py                     # Streamlit app for generating dad jokes
β”œβ”€β”€ collect_dadjokes.py        # Reddit PRAW data collector
β”œβ”€β”€ EDA.ipynb                  # EDA and RAG preparation script
β”œβ”€β”€ vector.py                  # Vector database setup
β”œβ”€β”€ chroma_langchain_db        # Vector database
β”œβ”€β”€ chatbot-venv               # My virtual environment
β”œβ”€β”€ data/                      # Data folder
β”‚   β”œβ”€β”€ cleaned_dadjokes.csv   # Cleaned dad jokes CSV file
β”‚   └── dadjokes_partial.csv   # Row data jokes CSV file
β”œβ”€β”€ main.py                    # Terminal app
β”œβ”€β”€ dad-gpt_icon.png           # Logo
β”œβ”€β”€ requirements.txt           # Dependencies
β”œβ”€β”€ load_jokes.py              # Script to collect top joke from Reddit
β”œβ”€β”€ dadjokes_partial_data.csv  # Raw pulled data
β”œβ”€β”€ .env                       # Environment variables (API keys)
└── README.md                  # Project documentation (this file)

🀝 Contributing

Contributes are welcome! If you have ideas, find bugs, or want to help expand this project, feel free to open an issue or submit a pull request.

πŸ™Œ Acknowledgements

  • Streamlit – For making it easy to build beautiful, interactive web apps in Python.
  • LangChain – For enabling LLM-based pipelines and integration with local models.
  • Ollama – For running local LLMs like LLaMA, Mistral, etc., with simplicity and speed.
  • Giphy Developers – For providing the GIF API that brings visual humor to the app.
  • PRAW – For simple and powerful access to Reddit’s API, used to collect dad jokes.
  • r/dadjokes – For the treasure trove of community-sourced dad jokes.
  • TechWithTim – For helpful tutorials and Python project guidance.
  • ChatGPT – For assisting in planning and refining project code, documentation and images.


Hope this project brings a smile to your face β€” happy coding! 🧑