This Python script connects to your Gmail account using the Gmail API to analyze and categorize your emails. It processes emails in batches, allowing you to scan your entire mailbox over multiple runs. The script outputs a table summarizing the services or companies that have sent you emails, along with the type of communication and the number of emails received.
- Batch Processing: Processes emails in user-defined batches to manage execution time and resources.
- Email Categorization: Categorizes emails into Personal, Social, Promotions, Updates, Forums, Subscription, or Data Holder.
- Sender Analysis: Extracts sender information and counts the number of emails from each sender.
- Progress Persistence: Remembers processed emails between runs to avoid duplication.
- Customizable: Easily adjust batch sizes and categories as needed.
- Python 3.6 or higher
- A Google account with Gmail enabled
- Internet connection
git clone https://github.com/yourusername/gmail-analyzer.git
cd gmail-analyzerCreate a virtual environment to manage dependencies.
On macOS/Linux:
python3 -m venv venvOn Windows:
python -m venv venvOn macOS/Linux:
source venv/bin/activateOn Windows:
venv\Scripts\activateInstall the required Python packages using pip.
pip install -r requirements.txtContents of requirements.txt:
google-api-python-client
google-auth-httplib2
google-auth-oauthlib
prettytable
- Go to the Google Cloud Console.
- Create a new project or select an existing one.
- Enable the Gmail API for your project:
- Navigate to APIs & Services > Library.
- Search for Gmail API and click Enable.
- Go to APIs & Services > Credentials.
- Click Create Credentials and select OAuth client ID.
- Choose Desktop app as the application type.
- Name your client (e.g., "Gmail Analyzer") and click Create.
- Click Download JSON to get your
credentials.jsonfile. - Place the
credentials.jsonfile in the root directory of the cloned repository.
Important: Do not share your
credentials.jsonfile or upload it to any public repository.
With everything set up, you can run the script:
python gmail_analyzer.pyOn the first run, a browser window will open for you to authorize the application. After authorization, the script will begin processing your emails in batches.
You can adjust the batch size by modifying the BATCH_SIZE variable in the script:
BATCH_SIZE = 1000 # Number of emails to process per run- Incremental Processing: Run the script multiple times to process all emails. It will pick up where it left off.
- Viewing Results: After each run, the script outputs a table of the services and a summary of categories.
- Resetting Progress: To start over, delete the
processed_ids.picklefile.
credentials.json: Contains your OAuth 2.0 Client ID and Client Secret. Do not commit this file to any public repository.token.pickle: Stores your access and refresh tokens after authorization. This file should also be kept private.
Ensure that sensitive files are excluded from version control by adding them to your .gitignore file:
credentials.json
token.pickle
processed_ids.pickle
Be mindful of Gmail API usage limits to avoid exceeding quotas. For more information, refer to the Gmail API Usage Limits.
The script processes your personal emails. Ensure that any output data is handled securely. Review and comply with Google's API Services User Data Policy.
This project is licensed under the MIT License. See the LICENSE file for details.