Skip to content

pfptcommunity/senderstats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Proofpoint Sender Analyzer

PyPI Downloads
This tool helps identify the top senders based on smart search outbound message exports or CSV data.

Requirements:

Installation

SenderStats provides both a command-line interface (CLI) and a graphical user interface (GUI). You may install either or both depending on your use case.


⚠️ Recommended: Use pipx (especially on Ubuntu 24.04+)

Ubuntu 24.04 and other PEP-668–restricted systems prevent system-wide pip install by default.

To avoid errors such as:

error: externally-managed-environment

it is strongly recommended to install SenderStats using pipx, which automatically creates an isolated virtual environment:

pipx install senderstats
pipx install senderstats[gui]   # for GUI support

CLI Installation

From PyPI (preferred)

pipx install senderstats

From GitHub (main branch)

pipx install "git+https://github.com/pfptcommunity/senderstats.git"  

Run the CLI

senderstats --help

GUI Installation

The GUI version requires optional dependencies (tkinterdnd2) and a system package (python3-tk on Linux).

1. Install system dependency (Linux only)

Ubuntu / Debian:

sudo apt install python3-tk

Fedora:

sudo dnf install python3-tkinter

Arch:

sudo pacman -S tk

2. Install the SenderStats GUI

From PyPI

pipx install "senderstats[gui]"

From GitHub (main branch)

pipx install "senderstats[gui]@git+https://github.com/pfptcommunity/senderstats.git"

Launch the GUI

senderstats-gui

Using a Virtual Environment Instead of pipx (advanced option)

If you prefer manual venv management:

python3 -m venv venv
source venv/bin/activate
pip install senderstats
pip install "senderstats[gui]"

Quick Commands Summary

Task Command
Install CLI (PyPI) pipx install senderstats
Install GUI (PyPI) pipx install "senderstats[gui]"
Install CLI (GitHub) pipx install "git+https://github.com/pfptcommunity/senderstats.git#egg=senderstats"
Install GUI (GitHub) pipx install "git+https://github.com/pfptcommunity/senderstats.git#egg=senderstats[gui]"
Run CLI senderstats ...
Run GUI senderstats-gui
Run package module python3 -m senderstats

Use Cases

Outbound message volumes and data transferred by:

  • Envelope sender
  • Header From:
  • Return-Path:
  • Envelope header: From:, MessageID Host, MessageID Domain (helpful to identify original sender)
  • Envelope sender and header From: for SPF alignment purposes
  • Top N subject line sampling to help understand the type of traffic
  • Peak Hourly Volumes

Summarize message volume information:

  • Estimated application email traffic based on sender volume threshold:
    • Estimated application data
    • Estimated application messages
    • Estimated application average size
  • Estimated application email traffic based on probability (optional)
    • Estimated application data
    • Estimated application messages
    • Estimated application average size
  • Total outbound data
    • Total outbound data
    • Total outbound messages
    • Total outbound average size
    • Total outbound peak hourly volume

Processing Behavior

The primary purpose of this tool is to identify sender message volumes and calculate data transfer rates for legitimate emails.

Input Requirements

  • Expected Fields: The input CSV should include at least the envelope sender and message size fields.
  • Exclusions: Messages will be excluded if:
    • The envelope sender is empty (common for bounce replies or calendar actions).
    • The message size is missing or not a valid number (typically rejects that can skew reporting).

Exclusion Rules

  1. Domain-Based Exclusions:

    • Messages from system domains such as ppops.net, pphosted.com, and knowledgefront.com are omitted by default to filter out monitoring messages.
    • To include these messages, use the --no-default-exclude-domains flag.
  2. IP-Based Exclusions:

    • For messages from 127.0.0.1 (e.g., system reports and digests on Proofpoint Protection Gateway), use the --exclude-ips flag to exclude them.
    • This option requires sender IP addresses to be included in the CSV.

Each exclusion step ensures the accuracy of volume and average message size reporting by filtering out unnecessary data.

Exclude / Restrict Domain Behavior

Since restrict / exclude behavior conceptually mean "in domain" / "not in domain" subdomains are coalesced. Specifying "api.example.com", "test.example.com", and "example.com" will only result in "example.com" as all of subdomains will be matched.

If there is a need for exact domain matching, this can be addressed in a future version. Most use cases provided so far have been limited to organizations seeking to limit or exclude data associated with domains they own.

While the idea of supporting free-form regex or pattern matching has been considered, it is currently avoided to prevent unintended consequences. This may be revisited in a future release with appropriate safeguards.

Usage Options

usage: senderstats [-h] [--version] -i <file> [<file> ...] -o <xlsx> [--ip IP]
                   [--mfrom MFrom] [--hfrom HFrom] [--rcpts Rcpts]
                   [--rpath RPath] [--msgid MsgID] [--subject Subject]
                   [--size MsgSz] [--date Date] [--gen-hfrom] [--gen-rpath]
                   [--gen-alignment] [--gen-msgid] [--expand-recipients]
                   [--no-display-name] [--remove-prvs] [--decode-srs]
                   [--normalize-bounces] [--normalize-entropy]
                   [--no-empty-hfrom] [--sample-subject] [--with-probability]
                   [--exclude-ips <ip> [<ip> ...]]
                   [--exclude-domains <domain> [<domain> ...]]
                   [--restrict-domains <domain> [<domain> ...]]
                   [--exclude-senders <sender> [<sender> ...]]
                   [--exclude-dup-msgids] [--date-format DateFmt]
                   [--no-default-exclude-domains] [--no-default-exclude-ips]

This tool helps identify the top senders based on smart search outbound
message exports.

Input / Output arguments (required):
  -i, --input <file> [<file> ...]             Smart search files to read.
  -o, --output <xlsx>                         Output file

Field mapping arguments (optional):
  --ip IP                                     CSV field of the IP address.
                                              (default=Sender_IP_Address)
  --mfrom MFrom                               CSV field of the envelope sender
                                              address. (default=Sender)
  --hfrom HFrom                               CSV field of the header From:
                                              address. (default=Header_From)
  --rcpts Rcpts                               CSV field of the header
                                              recipient addresses.
                                              (default=Recipients)
  --rpath RPath                               CSV field of the Return-Path:
                                              address. (default=Header_Return-
                                              Path)
  --msgid MsgID                               CSV field of the message ID.
                                              (default=Message_ID)
  --subject Subject                           CSV field of the Subject, only
                                              used if --sample-subject is
                                              specified. (default=Subject)
  --size MsgSz                                CSV field of message size.
                                              (default=Message_Size)
  --date Date                                 CSV field of message date/time.
                                              (default=Date)

Reporting control arguments (optional):
  --gen-hfrom                                 Generate report showing the
                                              header From: data for messages
                                              being sent.
  --gen-rpath                                 Generate report showing return
                                              path for messages being sent.
  --gen-alignment                             Generate report showing envelope
                                              sender and header From:
                                              alignment
  --gen-msgid                                 Generate report showing parsed
                                              Message ID. Helps determine the
                                              sending system

Parsing behavior arguments (optional):
  --expand-recipients                         Expand recipients counts
                                              messages by destination. E.g. 1
                                              message going to 3 people, is 3
                                              messages sent.
  --no-display-name                           Remove display and use address
                                              only. Converts 'Display Name
                                              <user@domain.com>' to
                                              'user@domain.com'
  --remove-prvs                               Remove return path verification
                                              strings e.g.
                                              prvs=tag=sender@domain.com
  --decode-srs                                Convert sender rewrite scheme, f
                                              orwardmailbox+srs=hash=tt=domain
                                              .com=user to user@domain.com
  --normalize-bounces                         Convert bounce scheme, bounces<u
                                              nique_tracking>@domain.com to
                                              bounces@domain.com
  --normalize-entropy                         Convert bounce scheme,
                                              <random_tracking_id>@domain.com
                                              to #entropy#@domain.com
  --no-empty-hfrom                            If the header From: is empty the
                                              envelope sender address is used
  --sample-subject                            Enable sampling of subject lines found
                                              during processing with counts.
  --with-probability                          Compute app probability score
                                              (requires --sample-subject)
  --exclude-ips <ip> [<ip> ...]               Exclude ips from processing.
  --exclude-domains <domain> [<domain> ...]   Exclude domains from processing.
                                              (Subdomains are coalesced)
  --restrict-domains <domain> [<domain> ...]  Constrain domains for
                                              processing. (Subdomains are
                                              coalesced)
  --exclude-senders <sender> [<sender> ...]   Exclude senders from processing.
  --exclude-dup-msgids                        Exclude messages where message
                                              id is a duplicate.
  --date-format DateFmt                       Date format used to parse the
                                              timestamps.
                                              (default=%Y-%m-%dT%H:%M:%S.%f%z)

Extended processing controls (optional):
  --no-default-exclude-domains                Will not include the default
                                              Proofpoint excluded domains.
  --no-default-exclude-ips                    Will not include the default
                                              localhost ip exclusion.

Usage:
  -h, --help                                  Show this help message and exit
  --version                                   Show the program's version and
                                              exit

Using the Tool with Proofpoint Smart Search

Export all outbound message traffic as a smart search CSV. You may need to export multiple CSVs if the data per time window exceeds 1M records. The tool can ingest multiple CSVs files at once.

smart_search_outbound

Once the files are downloaded to a target folder, you can run the following command with the path to the files you downloaded and specify a wildard.

The following example is the most basic usage:

# Windows
senderstats -i C:\path\to\downloaded\files\smart_search_results_cluster_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx
# Linux
senderstats -i /path/to/downloaded/files/smart_search_results_cluster_hosted_2024_03_04_*.csv -o /path/to/output/file/my_cluster_hosted.xlsx

For a more comprehensive report use the following command:

# Windows
senderstats -i C:\path\to\downloaded\files\smart_search_results_cluster_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject --exclude-ips 127.0.0.1
# Linux
senderstats -i /path/to/downloaded/files/smart_search_results_cluster_hosted_2024_03_04_*.csv -o /path/to/output/file/my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject --exclude-ips 127.0.0.1

Expanding recipients counts messages by destination via --expand-recipients:

This is useful if you need to determine how many messages were sent to a destination, as a single message can be addressed to multiple recipients.

# Windows
senderstats -i C:\path\to\downloaded\files\smart_search_results_cluster_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject --expand-recipients --exclude-ips 127.0.0.1
# Linux
senderstats -i /path/to/downloaded/files/smart_search_results_cluster_hosted_2024_03_04_*.csv -o /path/to/output/file/my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject --expand-recipients --exclude-ips 127.0.0.1

Sample Output

The execution results should look similar to the following depending the options you select.

Files to be processed:
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173552.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173855.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173656.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173754.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173834.csv

Domains excluded from processing:
knowledgefront.com
pphosted.com
ppops.net

Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173552.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173855.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173656.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173754.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173834.csv

Messages excluded by empty senders: 1523573
Messages excluded by domain: 484664
Messages excluded by sender: 0
Messages excluded by constraint: 0

Generating report, please wait.

Please see report: C:\Users\ljerabek\Downloads\my_cluster_hosted.xlsx

Sample Summary Statistics

image

Sample Details (Sender by Volume / Probability):

image

Sample Details (Sender + From by Volume):

image

Sample Details (Message ID) Inferencing:

image

Sample Details (Hourly Metrics):

image

GUI Example

image

Default Recommended Options for Proofpoint Smart Search CSV files

When running the application for the first time, these options will be automatically set.

Default Report Options

Notice that all additional reporting is disabled on by default.

image

Default Parsing Options

Please validate your options are correctly set as shown below if you are trying to determine application volumes.

image

About

Sender Analyzer

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages