From d1beb28c41da6426a41cfa8d7ee3673771cdd04c Mon Sep 17 00:00:00 2001
From: maithili74 <maithili.a7@gmail.com>
Date: Mon, 26 May 2025 13:27:40 -0400
Subject: [PATCH] Added gmail automation code
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pre-commit checks:
All checks passed ✅
---
 tutorial_gmail_automation/README.md           |  32 +
 .../tutorial_gmail_automation.ipynb           | 939 ++++++++++++++++++
 2 files changed, 971 insertions(+)
 create mode 100644 tutorial_gmail_automation/README.md
 create mode 100644 tutorial_gmail_automation/tutorial_gmail_automation.ipynb
diff --git a/tutorial_gmail_automation/README.md b/tutorial_gmail_automation/README.md
new file mode 100644
index 0000000000..22f270de3d
--- /dev/null
+++ b/tutorial_gmail_automation/README.md
@@ -0,0 +1,32 @@
+# Gmail Automation 
+
+This project is a Python-based utility to connect to Gmail, authenticate using OAuth 2.0, and retrieve emails based on a custom search query. The results are processed and displayed in a Pandas DataFrame. The utility also extracts unique email addresses from senders and email content.
+
+## Features
+
+- OAuth 2.0 authentication using `credentials.json`
+- Retrieve emails using Gmail API with custom search queries
+- Display email metadata (sender, subject, date, snippet) in a DataFrame
+- Extract and list unique email addresses from sender fields and message bodies
+- Designed to work inside a Jupyter Notebook environment
+
+---
+
+## Google Cloud Setup
+
+1. Go to [Google Cloud Console](https://console.cloud.google.com/).
+2. Create a new project (or select an existing one).
+3. Enable the **Gmail API** for the project.
+4. Navigate to **APIs & Services > Credentials**, click **Create Credentials > OAuth 2.0 Client ID**.
+5. Choose **Desktop App** as the application type.
+6. Download the `credentials.json` file and place it in the root directory of your project (same folder as the notebook).
+
+
+## Usage
+
+1. Clone this repository or download the notebook.
+2. Make sure `credentials.json` is in the project directory.
+3. Open tutorial_gmail_automation.ipynb in Jupyter Notebook and run all cells:
+    -The first time, you will be prompted to authorize access via a browser.
+    -Once authenticated, a token.json will be saved for future sessions.
+4. Modify the search_query variable inside the notebook to filter emails (e.g., 'subject:invoice', 'after:2024/01/01').
diff --git a/tutorial_gmail_automation/tutorial_gmail_automation.ipynb b/tutorial_gmail_automation/tutorial_gmail_automation.ipynb
new file mode 100644
index 0000000000..6caacee3a8
--- /dev/null
+++ b/tutorial_gmail_automation/tutorial_gmail_automation.ipynb
@@ -0,0 +1,939 @@
+{
+    "cells": [
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "CONTENTS:\n",
+                "    - [Gmail Email Query & Processing](#gmail-email-query-&-processing)\n",
+                "    - [Importing all the necessary libraries](#importing-all-the-necessary-libraries)\n",
+                "    - [fetch_emails(query: str) supports flexible search queries](#fetch_emails(query:-str)-supports-flexible-search-queries)\n",
+                "    - [Trying out with a keyword \"interview\"](#trying-out-with-a-keyword-\"interview\")\n",
+                "    - [Trying out with a dates](#trying-out-with-a-dates)\n",
+                "    - [Cleaning the dataset](#cleaning-the-dataset)\n",
+                "    - [Extracting unique email address](#extracting-unique-email-address)"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "id": "20c23aeb",
+            "metadata": {},
+            "source": [
+                "<a name='gmail-email-query-&-processing'></a>\n",
+                "### Gmail Email Query & Processing"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "id": "dba266e8",
+            "metadata": {},
+            "source": [
+                "This notebook demonstrates a Python-based utility for connecting to Gmail, retrieving emails using flexible search queries and processing them within a Jupyter notebook environment.\n",
+                "\n",
+                "**Features:**\n",
+                "- Authenticate with Gmail using OAuth2 and the Gmail API\n",
+                "- Flexible search queries\n",
+                "- Results displayed in a Pandas DataFrame: Sender, Subject, Date, Body\n",
+                "- Clean email bodies (plain text/HTML)\n",
+                "- Extract unique email addresses from sender and body fields\n",
+                "- Notebook-friendly, modular code"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 1,
+            "id": "0f5f4ab4",
+            "metadata": {},
+            "outputs": [
+                {
+                    "name": "stdout",
+                    "output_type": "stream",
+                    "text": [
+                        "Requirement already satisfied: google-api-python-client in c:\\users\\maithili\\anaconda3\\lib\\site-packages (2.169.0)\n",
+                        "Collecting google-api-python-client\n",
+                        "  Downloading google_api_python_client-2.170.0-py3-none-any.whl (13.5 MB)\n",
+                        "     --------------------------------------- 13.5/13.5 MB 50.1 MB/s eta 0:00:00\n",
+                        "Requirement already satisfied: google-auth-httplib2 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (0.2.0)\n",
+                        "Requirement already satisfied: google-auth-oauthlib in c:\\users\\maithili\\anaconda3\\lib\\site-packages (1.2.2)\n",
+                        "Requirement already satisfied: google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-api-python-client) (2.24.2)\n",
+                        "Requirement already satisfied: uritemplate<5,>=3.0.1 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-api-python-client) (4.1.1)\n",
+                        "Requirement already satisfied: google-auth!=2.24.0,!=2.25.0,<3.0.0,>=1.32.0 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-api-python-client) (2.17.3)\n",
+                        "Requirement already satisfied: httplib2<1.0.0,>=0.19.0 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-api-python-client) (0.22.0)\n",
+                        "Requirement already satisfied: requests-oauthlib>=0.7.0 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-auth-oauthlib) (1.3.1)\n",
+                        "Requirement already satisfied: requests<3.0.0,>=2.18.0 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client) (2.32.3)\n",
+                        "Requirement already satisfied: proto-plus<2.0.0,>=1.22.3 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client) (1.23.0)\n",
+                        "Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<7.0.0,>=3.19.5 in c:\\users\\maithili\\appdata\\roaming\\python\\python39\\site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client) (3.20.3)\n",
+                        "Requirement already satisfied: googleapis-common-protos<2.0.0,>=1.56.2 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client) (1.70.0)\n",
+                        "Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-auth!=2.24.0,!=2.25.0,<3.0.0,>=1.32.0->google-api-python-client) (0.2.8)\n",
+                        "Requirement already satisfied: rsa<5,>=3.1.4 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-auth!=2.24.0,!=2.25.0,<3.0.0,>=1.32.0->google-api-python-client) (4.9)\n",
+                        "Requirement already satisfied: six>=1.9.0 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-auth!=2.24.0,!=2.25.0,<3.0.0,>=1.32.0->google-api-python-client) (1.16.0)\n",
+                        "Requirement already satisfied: cachetools<6.0,>=2.0.0 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from google-auth!=2.24.0,!=2.25.0,<3.0.0,>=1.32.0->google-api-python-client) (5.3.0)\n",
+                        "Requirement already satisfied: pyparsing!=3.0.0,!=3.0.1,!=3.0.2,!=3.0.3,<4,>=2.4.2 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from httplib2<1.0.0,>=0.19.0->google-api-python-client) (3.0.9)\n",
+                        "Requirement already satisfied: oauthlib>=3.0.0 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib) (3.2.2)\n",
+                        "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from pyasn1-modules>=0.2.1->google-auth!=2.24.0,!=2.25.0,<3.0.0,>=1.32.0->google-api-python-client) (0.4.8)\n",
+                        "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from requests<3.0.0,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client) (2024.2.2)\n",
+                        "Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from requests<3.0.0,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client) (1.26.20)\n",
+                        "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from requests<3.0.0,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client) (2.6)\n",
+                        "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from requests<3.0.0,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0,>=1.31.5->google-api-python-client) (2.0.4)\n",
+                        "Installing collected packages: google-api-python-client\n",
+                        "  Attempting uninstall: google-api-python-client\n",
+                        "    Found existing installation: google-api-python-client 2.169.0\n",
+                        "    Uninstalling google-api-python-client-2.169.0:\n",
+                        "      Successfully uninstalled google-api-python-client-2.169.0\n",
+                        "Successfully installed google-api-python-client-2.170.0\n"
+                    ]
+                }
+            ],
+            "source": [
+                "!pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "id": "b1f116a0",
+            "metadata": {},
+            "source": [
+                "<a name='importing-all-the-necessary-libraries'></a>\n",
+                "### Importing all the necessary libraries "
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 2,
+            "id": "caa0cb7c",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "import os.path"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 3,
+            "id": "0a736608",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "from google.auth.transport.requests import Request\n",
+                "from google.oauth2.credentials import Credentials\n",
+                "from google_auth_oauthlib.flow import InstalledAppFlow\n",
+                "from googleapiclient.discovery import build\n",
+                "from googleapiclient.errors import HttpError"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 4,
+            "id": "5b1ca840",
+            "metadata": {},
+            "outputs": [
+                {
+                    "name": "stdout",
+                    "output_type": "stream",
+                    "text": [
+                        "Requirement already satisfied: numexpr in c:\\users\\maithili\\anaconda3\\lib\\site-packages (2.10.2)\n",
+                        "Requirement already satisfied: bottleneck in c:\\users\\maithili\\anaconda3\\lib\\site-packages (1.4.2)\n",
+                        "Collecting bottleneck\n",
+                        "  Downloading bottleneck-1.5.0-cp39-cp39-win_amd64.whl (112 kB)\n",
+                        "     -------------------------------------- 112.1/112.1 kB 2.2 MB/s eta 0:00:00\n",
+                        "Requirement already satisfied: numpy>=1.23.0 in c:\\users\\maithili\\anaconda3\\lib\\site-packages (from numexpr) (1.24.4)\n",
+                        "Installing collected packages: bottleneck\n",
+                        "  Attempting uninstall: bottleneck\n",
+                        "    Found existing installation: Bottleneck 1.4.2\n",
+                        "    Uninstalling Bottleneck-1.4.2:\n",
+                        "      Successfully uninstalled Bottleneck-1.4.2\n",
+                        "Successfully installed bottleneck-1.5.0\n",
+                        "Note: you may need to restart the kernel to use updated packages.\n"
+                    ]
+                }
+            ],
+            "source": [
+                "%pip install --upgrade numexpr bottleneck\n",
+                "\n",
+                "#Upgrading numexpr and bottleneck ensures optimal performance and compatibility for pandas operations"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 5,
+            "id": "f8426aad",
+            "metadata": {
+                "scrolled": true
+            },
+            "outputs": [],
+            "source": [
+                "import base64\n",
+                "import pandas as pd\n",
+                "from email import policy\n",
+                "import email\n",
+                "\n",
+                "from bs4 import BeautifulSoup\n",
+                "import re"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 6,
+            "id": "916ed81c",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "SCOPES=['https://www.googleapis.com/auth/gmail.readonly']"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 8,
+            "id": "79632735",
+            "metadata": {},
+            "outputs": [
+                {
+                    "name": "stdout",
+                    "output_type": "stream",
+                    "text": [
+                        "Refresh failed, regenerating token from scratch...\n",
+                        "Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=265589690150-l4lpc8b29q6nb31afis0k72v7e0nbbld.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A62732%2F&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.readonly&state=PWn22TXVDudt7nzzDCBgmzJz9y01IO&access_type=offline\n"
+                    ]
+                }
+            ],
+            "source": [
+                "import os\n",
+                "from google_auth_oauthlib.flow import InstalledAppFlow\n",
+                "from google.oauth2.credentials import Credentials\n",
+                "from google.auth.transport.requests import Request\n",
+                "\n",
+                "SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']\n",
+                "\n",
+                "def authenticate_gmail():\n",
+                "    creds = None\n",
+                "\n",
+                "    if os.path.exists('token.json'):\n",
+                "        try:\n",
+                "            creds = Credentials.from_authorized_user_file('token.json', SCOPES)\n",
+                "        except Exception as e:\n",
+                "            print(\"Corrupted token.json, deleting and regenerating...\")\n",
+                "            os.remove('token.json')\n",
+                "            creds = None\n",
+                "    \n",
+                "    if not creds or not creds.valid:\n",
+                "        if creds and creds.expired and creds.refresh_token:\n",
+                "            try:\n",
+                "                creds.refresh(Request())\n",
+                "            except Exception as e:\n",
+                "                print(\"Refresh failed, regenerating token from scratch...\")\n",
+                "                os.remove('token.json')\n",
+                "                creds = None\n",
+                "\n",
+                "        if not creds:\n",
+                "            flow = InstalledAppFlow.from_client_secrets_file('client.json', SCOPES)\n",
+                "            creds = flow.run_local_server(port=0)\n",
+                "            with open('token.json', 'w') as token:\n",
+                "                token.write(creds.to_json())\n",
+                "\n",
+                "    return creds \n",
+                "\n",
+                "creds = authenticate_gmail()\n"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 9,
+            "id": "c69601f5",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "service = build('gmail', 'v1', credentials=creds)"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "id": "2e9d4321",
+            "metadata": {},
+            "source": [
+                "<a name='fetch_emails(query:-str)-supports-flexible-search-queries'></a>\n",
+                "### fetch_emails(query: str) supports flexible search queries"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 10,
+            "id": "89dd5027",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "import base64\n",
+                "import pandas as pd\n",
+                "from email import policy\n",
+                "import email\n",
+                "\n",
+                "def fetch_emails(service, query='', max_results=100):\n",
+                "    all_emails = []\n",
+                "    next_page_token = None\n",
+                "    fetched = 0\n",
+                "\n",
+                "    while True:\n",
+                "        response = service.users().messages().list(\n",
+                "            userId='me',\n",
+                "            q=query, \n",
+                "            pageToken=next_page_token,\n",
+                "            maxResults=min(100, max_results - fetched)\n",
+                "        ).execute()\n",
+                "        messages = response.get('messages', [])\n",
+                "        if not messages:\n",
+                "            break\n",
+                "\n",
+                "        for msg_meta in messages:\n",
+                "            try:\n",
+                "                msg = service.users().messages().get(\n",
+                "                    userId='me',\n",
+                "                    id=msg_meta['id'],\n",
+                "                    format='raw'\n",
+                "                ).execute()\n",
+                "                raw_bytes = base64.urlsafe_b64decode(msg['raw'])\n",
+                "                mime_msg = email.message_from_bytes(raw_bytes, policy=policy.default)\n",
+                "                sender = mime_msg.get('From', '')\n",
+                "                subject = mime_msg.get('Subject', '')\n",
+                "                date = mime_msg.get('Date', '')\n",
+                "\n",
+                "                # Extracting body\n",
+                "                body = \"\"\n",
+                "                if mime_msg.is_multipart():\n",
+                "                    for part in mime_msg.walk():\n",
+                "                        ctype = part.get_content_type()\n",
+                "                        payload = part.get_payload(decode=True)\n",
+                "                        if ctype == 'text/plain' and payload:\n",
+                "                            body = payload.decode(errors='replace')\n",
+                "                            break\n",
+                "                        elif ctype == 'text/html' and payload and not body:\n",
+                "                            body = payload.decode(errors='replace')\n",
+                "                else:\n",
+                "                    payload = mime_msg.get_payload(decode=True)\n",
+                "                    if payload:\n",
+                "                        body = payload.decode(errors='replace')\n",
+                "\n",
+                "                all_emails.append([sender, subject, date, body])\n",
+                "                fetched += 1\n",
+                "                if fetched >= max_results:\n",
+                "                    break\n",
+                "            except Exception as e:\n",
+                "                print(f\"Error processing message {msg_meta['id']}: {e}\")\n",
+                "                continue\n",
+                "\n",
+                "        if fetched >= max_results:\n",
+                "            break\n",
+                "        next_page_token = response.get('nextPageToken')\n",
+                "        if not next_page_token:\n",
+                "            break\n",
+                "\n",
+                "    df = pd.DataFrame(all_emails, columns=['Sender', 'Subject', 'Date', 'Body'])\n",
+                "    return df"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "id": "45acffec",
+            "metadata": {},
+            "source": [
+                "<a name='trying-out-with-a-keyword-\"interview\"'></a>\n",
+                "### Trying out with a keyword \"interview\"\n",
+                "\n",
+                "* It will give us a dataframe where interview is in the subject of the mail"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 10,
+            "id": "f4070e44",
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/html": [
+                            "<div>\n",
+                            "<style scoped>\n",
+                            "    .dataframe tbody tr th:only-of-type {\n",
+                            "        vertical-align: middle;\n",
+                            "    }\n",
+                            "\n",
+                            "    .dataframe tbody tr th {\n",
+                            "        vertical-align: top;\n",
+                            "    }\n",
+                            "\n",
+                            "    .dataframe thead th {\n",
+                            "        text-align: right;\n",
+                            "    }\n",
+                            "</style>\n",
+                            "<table border=\"1\" class=\"dataframe\">\n",
+                            "  <thead>\n",
+                            "    <tr style=\"text-align: right;\">\n",
+                            "      <th></th>\n",
+                            "      <th>Sender</th>\n",
+                            "      <th>Subject</th>\n",
+                            "      <th>Date</th>\n",
+                            "      <th>Body</th>\n",
+                            "    </tr>\n",
+                            "  </thead>\n",
+                            "  <tbody>\n",
+                            "    <tr>\n",
+                            "      <th>0</th>\n",
+                            "      <td>Citizens &lt;noreply@mail.modernhire.com&gt;</td>\n",
+                            "      <td>Your Citizens interview link is a click away! ...</td>\n",
+                            "      <td>Fri, 25 Apr 2025 20:47:19 +0000</td>\n",
+                            "      <td>&lt;style type=\"text/css\"&gt;#EmailBody a {color:#02...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>1</th>\n",
+                            "      <td>Quora Digest &lt;english-personalized-digest@quor...</td>\n",
+                            "      <td>How do I get a 3.6 in a Google interview?</td>\n",
+                            "      <td>Thu, 24 Apr 2025 17:09:35 +0000</td>\n",
+                            "      <td>Top stories for Maithili\\r\\n\\r\\n-----\\r\\n\\r\\nQ...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>2</th>\n",
+                            "      <td>Team Unstop &lt;noreply@unstop.news&gt;</td>\n",
+                            "      <td>Top Companies are Hiring [2025]! - Ace Your In...</td>\n",
+                            "      <td>Thu, 27 Feb 2025 13:33:03 +0530</td>\n",
+                            "      <td>&lt;!DOCTYPE html&gt;\\n\\n&lt;html&gt;&lt;head&gt;&lt;title&gt;&lt;/title&gt;...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>3</th>\n",
+                            "      <td>\"Atria Convergence Technologies (ACT)\" &lt;udita....</td>\n",
+                            "      <td>\u2709\ufe0f Walk-in interview | Collection Executive in...</td>\n",
+                            "      <td>Mon, 24 Feb 2025 17:49:37 +0530</td>\n",
+                            "      <td>\\r\\n&lt;!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>4</th>\n",
+                            "      <td>Katie Warren from TopResume &lt;katie@topresume.com&gt;</td>\n",
+                            "      <td>Your interview</td>\n",
+                            "      <td>Thu, 13 Feb 2025 07:13:06 -0500</td>\n",
+                            "      <td>A few minutes of this will improve your chance...</td>\n",
+                            "    </tr>\n",
+                            "  </tbody>\n",
+                            "</table>\n",
+                            "</div>"
+                        ],
+                        "text/plain": [
+                            "                                              Sender  \\\n",
+                            "0             Citizens <noreply@mail.modernhire.com>   \n",
+                            "1  Quora Digest <english-personalized-digest@quor...   \n",
+                            "2                  Team Unstop <noreply@unstop.news>   \n",
+                            "3  \"Atria Convergence Technologies (ACT)\" <udita....   \n",
+                            "4  Katie Warren from TopResume <katie@topresume.com>   \n",
+                            "\n",
+                            "                                             Subject  \\\n",
+                            "0  Your Citizens interview link is a click away! ...   \n",
+                            "1          How do I get a 3.6 in a Google interview?   \n",
+                            "2  Top Companies are Hiring [2025]! - Ace Your In...   \n",
+                            "3  \u2709\ufe0f Walk-in interview | Collection Executive in...   \n",
+                            "4                                     Your interview   \n",
+                            "\n",
+                            "                              Date  \\\n",
+                            "0  Fri, 25 Apr 2025 20:47:19 +0000   \n",
+                            "1  Thu, 24 Apr 2025 17:09:35 +0000   \n",
+                            "2  Thu, 27 Feb 2025 13:33:03 +0530   \n",
+                            "3  Mon, 24 Feb 2025 17:49:37 +0530   \n",
+                            "4  Thu, 13 Feb 2025 07:13:06 -0500   \n",
+                            "\n",
+                            "                                                Body  \n",
+                            "0  <style type=\"text/css\">#EmailBody a {color:#02...  \n",
+                            "1  Top stories for Maithili\\r\\n\\r\\n-----\\r\\n\\r\\nQ...  \n",
+                            "2  <!DOCTYPE html>\\n\\n<html><head><title></title>...  \n",
+                            "3  \\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1...  \n",
+                            "4  A few minutes of this will improve your chance...  "
+                        ]
+                    },
+                    "metadata": {},
+                    "output_type": "display_data"
+                }
+            ],
+            "source": [
+                "emails_df = fetch_emails(service, query='subject:interview', max_results=50)\n",
+                "display(emails_df.head())"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "id": "912ed1ad",
+            "metadata": {},
+            "source": [
+                "<a name='trying-out-with-a-dates'></a>\n",
+                "### Trying out with a dates"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 11,
+            "id": "203be7ea",
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/html": [
+                            "<div>\n",
+                            "<style scoped>\n",
+                            "    .dataframe tbody tr th:only-of-type {\n",
+                            "        vertical-align: middle;\n",
+                            "    }\n",
+                            "\n",
+                            "    .dataframe tbody tr th {\n",
+                            "        vertical-align: top;\n",
+                            "    }\n",
+                            "\n",
+                            "    .dataframe thead th {\n",
+                            "        text-align: right;\n",
+                            "    }\n",
+                            "</style>\n",
+                            "<table border=\"1\" class=\"dataframe\">\n",
+                            "  <thead>\n",
+                            "    <tr style=\"text-align: right;\">\n",
+                            "      <th></th>\n",
+                            "      <th>Sender</th>\n",
+                            "      <th>Subject</th>\n",
+                            "      <th>Date</th>\n",
+                            "      <th>Body</th>\n",
+                            "    </tr>\n",
+                            "  </thead>\n",
+                            "  <tbody>\n",
+                            "    <tr>\n",
+                            "      <th>0</th>\n",
+                            "      <td>Caleb Ralphs &lt;caleb.ralphs@valisinsights.com&gt;</td>\n",
+                            "      <td>Data Science Intern - VALIS Insights</td>\n",
+                            "      <td>Wed, 30 Apr 2025 23:24:51 +0000</td>\n",
+                            "      <td>Hello Maithili,\\r\\n\\r\\nThank you for applying ...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>1</th>\n",
+                            "      <td>Workable &lt;noreply@candidates.workablemail.com&gt;</td>\n",
+                            "      <td>Thanks for applying to VALIS Insights</td>\n",
+                            "      <td>Wed, 30 Apr 2025 22:24:30 +0000</td>\n",
+                            "      <td>VALIS Insights\\r\\n\\r\\n------------------------...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>2</th>\n",
+                            "      <td>\"AncestryRecruiting@ancestry.com\" &lt;AncestryRec...</td>\n",
+                            "      <td>Thank you for applying to Ancestry!</td>\n",
+                            "      <td>Wed, 30 Apr 2025 14:34:03 -0700</td>\n",
+                            "      <td>&lt;!doctype html&gt;&lt;html lang=en xmlns=\"http://www...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>3</th>\n",
+                            "      <td>Stack Sports &lt;do-not-reply@mail.paylocity.com&gt;</td>\n",
+                            "      <td>Thank you for applying!</td>\n",
+                            "      <td>Wed, 30 Apr 2025 21:27:12 +0000</td>\n",
+                            "      <td>&lt;html style=\"background-color: #F4F6F8;\"&gt;&lt;head...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>4</th>\n",
+                            "      <td>no-reply@us.greenhouse-mail.io</td>\n",
+                            "      <td>Thank you for applying to DeepIntent</td>\n",
+                            "      <td>Wed, 30 Apr 2025 21:23:04 +0000</td>\n",
+                            "      <td>Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...</td>\n",
+                            "    </tr>\n",
+                            "  </tbody>\n",
+                            "</table>\n",
+                            "</div>"
+                        ],
+                        "text/plain": [
+                            "                                              Sender  \\\n",
+                            "0      Caleb Ralphs <caleb.ralphs@valisinsights.com>   \n",
+                            "1     Workable <noreply@candidates.workablemail.com>   \n",
+                            "2  \"AncestryRecruiting@ancestry.com\" <AncestryRec...   \n",
+                            "3     Stack Sports <do-not-reply@mail.paylocity.com>   \n",
+                            "4                     no-reply@us.greenhouse-mail.io   \n",
+                            "\n",
+                            "                                 Subject                             Date  \\\n",
+                            "0   Data Science Intern - VALIS Insights  Wed, 30 Apr 2025 23:24:51 +0000   \n",
+                            "1  Thanks for applying to VALIS Insights  Wed, 30 Apr 2025 22:24:30 +0000   \n",
+                            "2    Thank you for applying to Ancestry!  Wed, 30 Apr 2025 14:34:03 -0700   \n",
+                            "3                Thank you for applying!  Wed, 30 Apr 2025 21:27:12 +0000   \n",
+                            "4   Thank you for applying to DeepIntent  Wed, 30 Apr 2025 21:23:04 +0000   \n",
+                            "\n",
+                            "                                                Body  \n",
+                            "0  Hello Maithili,\\r\\n\\r\\nThank you for applying ...  \n",
+                            "1  VALIS Insights\\r\\n\\r\\n------------------------...  \n",
+                            "2  <!doctype html><html lang=en xmlns=\"http://www...  \n",
+                            "3  <html style=\"background-color: #F4F6F8;\"><head...  \n",
+                            "4  Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...  "
+                        ]
+                    },
+                    "execution_count": 11,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "query = 'after:2025/04/01 before:2025/05/01'\n",
+                "emails_df = fetch_emails(service, query=query, max_results=100)\n",
+                "emails_df.head()"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "id": "3a262c3f",
+            "metadata": {},
+            "source": [
+                "<a name='cleaning-the-dataset'></a>\n",
+                "### Cleaning the dataset "
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 12,
+            "id": "59d53423",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "from bs4 import BeautifulSoup\n",
+                "import re\n",
+                "\n",
+                "def is_css_or_junk(text):\n",
+                "    t = text.strip()\n",
+                "    if not t:\n",
+                "        return True\n",
+                "    if re.match(r'^([.#]?\\w+\\s*\\{)', t):\n",
+                "        return True\n",
+                "    if len(t) < 40 and all(c in '{};:. \\r\\n\\t' for c in t):\n",
+                "        return True\n",
+                "    lines = t.splitlines()\n",
+                "    if lines:\n",
+                "        css_lines = [l for l in lines if l.strip().endswith('{') or l.strip().endswith('}')]\n",
+                "        if len(css_lines) / len(lines) > 0.7:\n",
+                "            return True\n",
+                "    return False\n",
+                "\n",
+                "def clean_html_body(html_body):\n",
+                "    soup = BeautifulSoup(html_body, \"html.parser\")\n",
+                "    for s in soup([\"script\", \"style\"]):\n",
+                "        s.decompose()\n",
+                "    text = soup.get_text(separator=\"\\n\", strip=True)\n",
+                "    text = re.sub(r'\\n+', '\\n', text)\n",
+                "    text = re.sub(r'[ \\t]+', ' ', text)\n",
+                "    return text.strip()\n",
+                "\n",
+                "def clean_body(text):\n",
+                "    if not isinstance(text, str):\n",
+                "        return \"\"\n",
+                "    if is_css_or_junk(text):\n",
+                "        return \"\"\n",
+                "    if '<html' in text.lower() or '<body' in text.lower() or '<div' in text.lower() or '<!doctype' in text.lower():\n",
+                "        return clean_html_body(text)\n",
+                "    return text.strip()\n",
+                "\n",
+                "def add_cleaned_body_column(df):\n",
+                "    df['Body_Clean'] = df['Body'].apply(clean_body)\n",
+                "    return df\n"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 13,
+            "id": "c099ae10",
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/html": [
+                            "<div>\n",
+                            "<style scoped>\n",
+                            "    .dataframe tbody tr th:only-of-type {\n",
+                            "        vertical-align: middle;\n",
+                            "    }\n",
+                            "\n",
+                            "    .dataframe tbody tr th {\n",
+                            "        vertical-align: top;\n",
+                            "    }\n",
+                            "\n",
+                            "    .dataframe thead th {\n",
+                            "        text-align: right;\n",
+                            "    }\n",
+                            "</style>\n",
+                            "<table border=\"1\" class=\"dataframe\">\n",
+                            "  <thead>\n",
+                            "    <tr style=\"text-align: right;\">\n",
+                            "      <th></th>\n",
+                            "      <th>Sender</th>\n",
+                            "      <th>Subject</th>\n",
+                            "      <th>Date</th>\n",
+                            "      <th>Body</th>\n",
+                            "      <th>Body_Clean</th>\n",
+                            "    </tr>\n",
+                            "  </thead>\n",
+                            "  <tbody>\n",
+                            "    <tr>\n",
+                            "      <th>0</th>\n",
+                            "      <td>Caleb Ralphs &lt;caleb.ralphs@valisinsights.com&gt;</td>\n",
+                            "      <td>Data Science Intern - VALIS Insights</td>\n",
+                            "      <td>Wed, 30 Apr 2025 23:24:51 +0000</td>\n",
+                            "      <td>Hello Maithili,\\r\\n\\r\\nThank you for applying ...</td>\n",
+                            "      <td>Hello Maithili,\\r\\n\\r\\nThank you for applying ...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>1</th>\n",
+                            "      <td>Workable &lt;noreply@candidates.workablemail.com&gt;</td>\n",
+                            "      <td>Thanks for applying to VALIS Insights</td>\n",
+                            "      <td>Wed, 30 Apr 2025 22:24:30 +0000</td>\n",
+                            "      <td>VALIS Insights\\r\\n\\r\\n------------------------...</td>\n",
+                            "      <td>VALIS Insights\\r\\n\\r\\n------------------------...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>2</th>\n",
+                            "      <td>\"AncestryRecruiting@ancestry.com\" &lt;AncestryRec...</td>\n",
+                            "      <td>Thank you for applying to Ancestry!</td>\n",
+                            "      <td>Wed, 30 Apr 2025 14:34:03 -0700</td>\n",
+                            "      <td>&lt;!doctype html&gt;&lt;html lang=en xmlns=\"http://www...</td>\n",
+                            "      <td>Hi Maithili,\\nThank you for taking the time to...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>3</th>\n",
+                            "      <td>Stack Sports &lt;do-not-reply@mail.paylocity.com&gt;</td>\n",
+                            "      <td>Thank you for applying!</td>\n",
+                            "      <td>Wed, 30 Apr 2025 21:27:12 +0000</td>\n",
+                            "      <td>&lt;html style=\"background-color: #F4F6F8;\"&gt;&lt;head...</td>\n",
+                            "      <td>Dear Maithili,Thank you for your interest in a...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>4</th>\n",
+                            "      <td>no-reply@us.greenhouse-mail.io</td>\n",
+                            "      <td>Thank you for applying to DeepIntent</td>\n",
+                            "      <td>Wed, 30 Apr 2025 21:23:04 +0000</td>\n",
+                            "      <td>Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...</td>\n",
+                            "      <td>Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...</td>\n",
+                            "    </tr>\n",
+                            "  </tbody>\n",
+                            "</table>\n",
+                            "</div>"
+                        ],
+                        "text/plain": [
+                            "                                              Sender  \\\n",
+                            "0      Caleb Ralphs <caleb.ralphs@valisinsights.com>   \n",
+                            "1     Workable <noreply@candidates.workablemail.com>   \n",
+                            "2  \"AncestryRecruiting@ancestry.com\" <AncestryRec...   \n",
+                            "3     Stack Sports <do-not-reply@mail.paylocity.com>   \n",
+                            "4                     no-reply@us.greenhouse-mail.io   \n",
+                            "\n",
+                            "                                 Subject                             Date  \\\n",
+                            "0   Data Science Intern - VALIS Insights  Wed, 30 Apr 2025 23:24:51 +0000   \n",
+                            "1  Thanks for applying to VALIS Insights  Wed, 30 Apr 2025 22:24:30 +0000   \n",
+                            "2    Thank you for applying to Ancestry!  Wed, 30 Apr 2025 14:34:03 -0700   \n",
+                            "3                Thank you for applying!  Wed, 30 Apr 2025 21:27:12 +0000   \n",
+                            "4   Thank you for applying to DeepIntent  Wed, 30 Apr 2025 21:23:04 +0000   \n",
+                            "\n",
+                            "                                                Body  \\\n",
+                            "0  Hello Maithili,\\r\\n\\r\\nThank you for applying ...   \n",
+                            "1  VALIS Insights\\r\\n\\r\\n------------------------...   \n",
+                            "2  <!doctype html><html lang=en xmlns=\"http://www...   \n",
+                            "3  <html style=\"background-color: #F4F6F8;\"><head...   \n",
+                            "4  Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...   \n",
+                            "\n",
+                            "                                          Body_Clean  \n",
+                            "0  Hello Maithili,\\r\\n\\r\\nThank you for applying ...  \n",
+                            "1  VALIS Insights\\r\\n\\r\\n------------------------...  \n",
+                            "2  Hi Maithili,\\nThank you for taking the time to...  \n",
+                            "3  Dear Maithili,Thank you for your interest in a...  \n",
+                            "4  Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...  "
+                        ]
+                    },
+                    "execution_count": 13,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "emails_df = add_cleaned_body_column(emails_df)\n",
+                "emails_df.head()"
+            ]
+        },
+        {
+            "cell_type": "markdown",
+            "id": "d39b50f2",
+            "metadata": {},
+            "source": [
+                "<a name='extracting-unique-email-address'></a>\n",
+                "### Extracting unique email address "
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 14,
+            "id": "a23559da",
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "def extract_unique_email(text):\n",
+                "    if not isinstance(text, str):\n",
+                "        return []\n",
+                "    return list(set(re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}', text)))\n",
+                "\n",
+                "\n",
+                "def add_extracted_emails_column(df):\n",
+                "    def get_emails(row):\n",
+                "        emails = extract_unique_email(row['Sender']) + extract_unique_email(row['Body_Clean'])\n",
+                "        return list(set(emails))\n",
+                "    df['Extracted_Emails'] = df.apply(get_emails, axis=1)\n",
+                "    return df"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": 15,
+            "id": "6fa984f8",
+            "metadata": {},
+            "outputs": [
+                {
+                    "data": {
+                        "text/html": [
+                            "<div>\n",
+                            "<style scoped>\n",
+                            "    .dataframe tbody tr th:only-of-type {\n",
+                            "        vertical-align: middle;\n",
+                            "    }\n",
+                            "\n",
+                            "    .dataframe tbody tr th {\n",
+                            "        vertical-align: top;\n",
+                            "    }\n",
+                            "\n",
+                            "    .dataframe thead th {\n",
+                            "        text-align: right;\n",
+                            "    }\n",
+                            "</style>\n",
+                            "<table border=\"1\" class=\"dataframe\">\n",
+                            "  <thead>\n",
+                            "    <tr style=\"text-align: right;\">\n",
+                            "      <th></th>\n",
+                            "      <th>Sender</th>\n",
+                            "      <th>Subject</th>\n",
+                            "      <th>Date</th>\n",
+                            "      <th>Body</th>\n",
+                            "      <th>Body_Clean</th>\n",
+                            "      <th>Extracted_Emails</th>\n",
+                            "    </tr>\n",
+                            "  </thead>\n",
+                            "  <tbody>\n",
+                            "    <tr>\n",
+                            "      <th>0</th>\n",
+                            "      <td>Caleb Ralphs &lt;caleb.ralphs@valisinsights.com&gt;</td>\n",
+                            "      <td>Data Science Intern - VALIS Insights</td>\n",
+                            "      <td>Wed, 30 Apr 2025 23:24:51 +0000</td>\n",
+                            "      <td>Hello Maithili,\\r\\n\\r\\nThank you for applying ...</td>\n",
+                            "      <td>Hello Maithili,\\r\\n\\r\\nThank you for applying ...</td>\n",
+                            "      <td>[caleb.ralphs@valisinsights.com]</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>1</th>\n",
+                            "      <td>Workable &lt;noreply@candidates.workablemail.com&gt;</td>\n",
+                            "      <td>Thanks for applying to VALIS Insights</td>\n",
+                            "      <td>Wed, 30 Apr 2025 22:24:30 +0000</td>\n",
+                            "      <td>VALIS Insights\\r\\n\\r\\n------------------------...</td>\n",
+                            "      <td>VALIS Insights\\r\\n\\r\\n------------------------...</td>\n",
+                            "      <td>[noreply@candidates.workablemail.com, maithili...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>2</th>\n",
+                            "      <td>\"AncestryRecruiting@ancestry.com\" &lt;AncestryRec...</td>\n",
+                            "      <td>Thank you for applying to Ancestry!</td>\n",
+                            "      <td>Wed, 30 Apr 2025 14:34:03 -0700</td>\n",
+                            "      <td>&lt;!doctype html&gt;&lt;html lang=en xmlns=\"http://www...</td>\n",
+                            "      <td>Hi Maithili,\\nThank you for taking the time to...</td>\n",
+                            "      <td>[maithili.a7@gmail.com, AncestryRecruiting@anc...</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>3</th>\n",
+                            "      <td>Stack Sports &lt;do-not-reply@mail.paylocity.com&gt;</td>\n",
+                            "      <td>Thank you for applying!</td>\n",
+                            "      <td>Wed, 30 Apr 2025 21:27:12 +0000</td>\n",
+                            "      <td>&lt;html style=\"background-color: #F4F6F8;\"&gt;&lt;head...</td>\n",
+                            "      <td>Dear Maithili,Thank you for your interest in a...</td>\n",
+                            "      <td>[do-not-reply@mail.paylocity.com]</td>\n",
+                            "    </tr>\n",
+                            "    <tr>\n",
+                            "      <th>4</th>\n",
+                            "      <td>no-reply@us.greenhouse-mail.io</td>\n",
+                            "      <td>Thank you for applying to DeepIntent</td>\n",
+                            "      <td>Wed, 30 Apr 2025 21:23:04 +0000</td>\n",
+                            "      <td>Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...</td>\n",
+                            "      <td>Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...</td>\n",
+                            "      <td>[no-reply@us.greenhouse-mail.io]</td>\n",
+                            "    </tr>\n",
+                            "  </tbody>\n",
+                            "</table>\n",
+                            "</div>"
+                        ],
+                        "text/plain": [
+                            "                                              Sender  \\\n",
+                            "0      Caleb Ralphs <caleb.ralphs@valisinsights.com>   \n",
+                            "1     Workable <noreply@candidates.workablemail.com>   \n",
+                            "2  \"AncestryRecruiting@ancestry.com\" <AncestryRec...   \n",
+                            "3     Stack Sports <do-not-reply@mail.paylocity.com>   \n",
+                            "4                     no-reply@us.greenhouse-mail.io   \n",
+                            "\n",
+                            "                                 Subject                             Date  \\\n",
+                            "0   Data Science Intern - VALIS Insights  Wed, 30 Apr 2025 23:24:51 +0000   \n",
+                            "1  Thanks for applying to VALIS Insights  Wed, 30 Apr 2025 22:24:30 +0000   \n",
+                            "2    Thank you for applying to Ancestry!  Wed, 30 Apr 2025 14:34:03 -0700   \n",
+                            "3                Thank you for applying!  Wed, 30 Apr 2025 21:27:12 +0000   \n",
+                            "4   Thank you for applying to DeepIntent  Wed, 30 Apr 2025 21:23:04 +0000   \n",
+                            "\n",
+                            "                                                Body  \\\n",
+                            "0  Hello Maithili,\\r\\n\\r\\nThank you for applying ...   \n",
+                            "1  VALIS Insights\\r\\n\\r\\n------------------------...   \n",
+                            "2  <!doctype html><html lang=en xmlns=\"http://www...   \n",
+                            "3  <html style=\"background-color: #F4F6F8;\"><head...   \n",
+                            "4  Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...   \n",
+                            "\n",
+                            "                                          Body_Clean  \\\n",
+                            "0  Hello Maithili,\\r\\n\\r\\nThank you for applying ...   \n",
+                            "1  VALIS Insights\\r\\n\\r\\n------------------------...   \n",
+                            "2  Hi Maithili,\\nThank you for taking the time to...   \n",
+                            "3  Dear Maithili,Thank you for your interest in a...   \n",
+                            "4  Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...   \n",
+                            "\n",
+                            "                                    Extracted_Emails  \n",
+                            "0                   [caleb.ralphs@valisinsights.com]  \n",
+                            "1  [noreply@candidates.workablemail.com, maithili...  \n",
+                            "2  [maithili.a7@gmail.com, AncestryRecruiting@anc...  \n",
+                            "3                  [do-not-reply@mail.paylocity.com]  \n",
+                            "4                   [no-reply@us.greenhouse-mail.io]  "
+                        ]
+                    },
+                    "execution_count": 15,
+                    "metadata": {},
+                    "output_type": "execute_result"
+                }
+            ],
+            "source": [
+                "emails_df = add_extracted_emails_column(emails_df)\n",
+                "emails_df.head()"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "id": "866ef2d9",
+            "metadata": {},
+            "outputs": [],
+            "source": []
+        }
+    ],
+    "metadata": {
+        "kernelspec": {
+            "display_name": "Python 3 (ipykernel)",
+            "language": "python",
+            "name": "python3"
+        },
+        "language_info": {
+            "codemirror_mode": {
+                "name": "ipython",
+                "version": 3
+            },
+            "file_extension": ".py",
+            "mimetype": "text/x-python",
+            "name": "python",
+            "nbconvert_exporter": "python",
+            "pygments_lexer": "ipython3",
+            "version": "3.9.13"
+        }
+    },
+    "nbformat": 4,
+    "nbformat_minor": 5
+}

	Sender	Subject	Date	Body
0	Citizens <noreply@mail.modernhire.com>	Your Citizens interview link is a click away! ...	Fri, 25 Apr 2025 20:47:19 +0000	<style type=\"text/css\">#EmailBody a {color:#02...
1	Quora Digest <english-personalized-digest@quor...	How do I get a 3.6 in a Google interview?	Thu, 24 Apr 2025 17:09:35 +0000	Top stories for Maithili\\r\\n\\r\\n-----\\r\\n\\r\\nQ...
2	Team Unstop <noreply@unstop.news>	Top Companies are Hiring [2025]! - Ace Your In...	Thu, 27 Feb 2025 13:33:03 +0530	<!DOCTYPE html>\\n\\n<html><head><title></title>...
3	\"Atria Convergence Technologies (ACT)\" <udita....	\u2709\ufe0f Walk-in interview \| Collection Executive in...	Mon, 24 Feb 2025 17:49:37 +0530	\\r\\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1...
4	Katie Warren from TopResume <katie@topresume.com>	Your interview	Thu, 13 Feb 2025 07:13:06 -0500	A few minutes of this will improve your chance...
	Sender	Subject	Date	Body
0	Caleb Ralphs <caleb.ralphs@valisinsights.com>	Data Science Intern - VALIS Insights	Wed, 30 Apr 2025 23:24:51 +0000	Hello Maithili,\\r\\n\\r\\nThank you for applying ...
1	Workable <noreply@candidates.workablemail.com>	Thanks for applying to VALIS Insights	Wed, 30 Apr 2025 22:24:30 +0000	VALIS Insights\\r\\n\\r\\n------------------------...
2	\"AncestryRecruiting@ancestry.com\" <AncestryRec...	Thank you for applying to Ancestry!	Wed, 30 Apr 2025 14:34:03 -0700	<!doctype html><html lang=en xmlns=\"http://www...
3	Stack Sports <do-not-reply@mail.paylocity.com>	Thank you for applying!	Wed, 30 Apr 2025 21:27:12 +0000	<html style=\"background-color: #F4F6F8;\"><head...
4	no-reply@us.greenhouse-mail.io	Thank you for applying to DeepIntent	Wed, 30 Apr 2025 21:23:04 +0000	Maithili,\\r\\n\\r\\nThanks for applying to DeepIn...