RABBIT is a machine-learning based tool designed to identify bot accounts among GitHub contributors. Unlike tools that rely on profile metadata, RABBIT analyzes behavioral activity sequences to compute 38 distinct features.
RABBIT is developed by the Software Engineering Lab (SGL) at the University of Mons (UMONS), Belgium.
Why RABBIT?
- Behavioral Analysis: Classifies users based on interaction timing, repository switching patterns, and activity diversity, rather than just static account details.
- High Efficiency & Scalability: RABBIT is designed for large-scale mining. Thanks to its incremental early-stopping mechanism, it can predict thousands of accounts per hour without reaching GitHub's imposed API rate limit (5,000 queries/hour for authorized users).
- Overview
- CLI usage
- Python Library usage
- How it Works
- Citation
- Contributions
- Authors & Credits
- License
RABBIT requires at least Python 3.11.
Option A: Using uv
This installs RABBIT in an isolated environment, keeping your system clean.
You can find more details on how to install uv in its official documentation.
$ uv tool install rabbitIt's recommended to use a virtual environment to avoid conflicts with other packages on your system.
# Create and activate a virtual environment
$ python3 -m venv rabbit-env
$ source rabbit-env/bin/activate # On Windows use `rabbit-env\Scripts\activate`
# Install RABBIT
$ pip install rabbitRABBIT is also available via Nix
$ nix-shell -p rabbitTo execute RABBIT for many contributors, you need to provide a
GitHub personal access token (API key).
You can follow the instructions here to obtain such a token.
Without an API key, RABBIT will be limited to 60 API queries per hour and
API queries will stop once the limit is reached instead of waiting for the
limit to reset.
Set the GITHUB_TOKEN environment variable to your GitHub personal access token.
$ export GITHUB_API_KEY=your_token_here # On Linux/Mac
$ setx GITHUB_API_KEY "your_token_here" # On WindowsYou can also create a .env file in your working directory with the following content:
GITHUB_API_KEY=your_token_here
You can also provide the API key directly when running RABBIT using the --key argument.
$ rabbit --key your_token_here <other_arguments>By default, RABBIT allows you to provide a list of GitHub contributor login names. You can then provide different options to customize the analysis. The different available commands are:
$ rabbit --help
Usage: rabbit [OPTIONS] [CONTRIBUTORS]...
RABBIT is an Activity Based Bot Identification Tool that identifies bots based on their recent activities in GitHub.
The simplest way to use RABBIT is to provide a list of GitHub usernames (e.g. rabbit user1 user2 ...)
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ contributors [CONTRIBUTORS]... Login names of contributors to analyze (Ex: 'user1 user2 ...'). โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Inputs โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --input-file -i FILE Path to a file containing login names (one per line). โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --key -k TEXT GitHub API key (either in command line or in โ
โ GITHUB_API_KEY env variable). โ
โ [env var: GITHUB_API_KEY] โ
โ --min-events INTEGER RANGE [1<=x<=300] Min number of events required. [default: 5] โ
โ --min-confidence FLOAT RANGE [0.0<=x<=1.0] Confidence threshold to stop querying. [default: 1.0] โ
โ --max-queries INTEGER RANGE [1<=x<=3] Max API queries per contributor. [default: 3] โ
โ --no-wait Do not wait when rate limit is reached; exit immediately.โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Output โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --features Display computed features for each contributor. โ
โ --format -f [text|csv] Format of the output. [default: text] โ
โ --verbose -v INTEGER Increase verbosity level (can be used multiple times. -v or -vv). โ
โ [default: 0] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ1 - Simple example
You can provide the contributor login names as positional arguments or in an input file. (Can be combined.)
$ rabbit natarajan-chidambaram "github-actions[bot]" tensorflow-jenkins astral-sh inactiveUser notFoundUser
CONTRIBUTOR TYPE CONFIDENCE
natarajan-chidambaram Human 0.96
github-actions[bot] Bot 1.0
tensorflow-jenkins Bot 0.838
astral-sh Organization 1.0
inactiveUser Unknown -
notFoundUser Invalid -2 - Export results to CSV file
$ rabbit tensorflow-jenkins --input-file logins.txt --format csv > results.csv3 - Export with feature values
$ rabbit --input-file logins.txt --features > results_with_features.txt4 - Increase verbosity level
By default, only Error messages are shown.
$ rabbit --input-file logins.txt -v # Show info and warning messages
$ rabbit --input-file logins.txt -vv # Show debug messagesRABBIT requires at least Python 3.11.
If your project does not already use uv, you can initialize it first by running uv init in your project directory.
More information can be found in the official documentation.
Then, you can add RABBIT as a dependency:
$ uv add rabbitIf your project does not already have a virtual environment, it's recommended to create one to avoid conflicts with other packages on your system.
# Create and activate a virtual environment
$ python3 -m venv rabbit-env
$ source rabbit-env/bin/activate # On Windows use `rabbit-env\Scripts\activate`
# Install RABBIT
$ pip install rabbitThe main function to use is run_rabbit which is an iterator yielding result for each contributor one by one.
from rabbit import run_rabbit
from dotenv import load_dotenv
import os
load_dotenv() # Load GITHUB_API_KEY from .env file if present
API_KEY = os.getenv('GITHUB_API_KEY')
for result in run_rabbit(
contributors=['MrRose765', 'github-actions[bot]'],
api_key=API_KEY,
min_events=5,
min_confidence=1.0,
max_queries=3,
no_wait=False
):
# Each result is an ContributorResult object with 'contributor', 'type', 'confidence', and 'features' attributes
print(f"{result.contributor}: {result.user_type} (Confidence: {result.confidence})")
# Output:
# MrRose765: Human (Confidence: 0.987)
# github-actions[bot]: Bot (Confidence: 1.0)You can also use RABBIT on events data you have already collected, without making any API calls.
In that case, you need to provide a list of events for each contributor as input and write a custom function to use RABBIT:
from rabbit.predictor import ONNXPredictor, predict_user_type
events = {
'MrRose765': [
# List of event dictionaries for MrRose765 ONLY
],
'testuser': [
# List of event dictionaries for testuser ONLY
],
}
# Load the pre-trained model
predictor = ONNXPredictor() # Default model path is used, you can provide a custom path if needed.
for contributor, user_events in events.items():
result = predict_user_type(
username=contributor,
events=user_events,
predictor=predictor,
)
print(f"{contributor}: {result.user_type} (Confidence: {result.confidence})")
# Output:
# MrRose765: Human (Confidence: 0.987)
# testuser: Bot (Confidence: 0.912)RABBIT follow a strict decision pipeline to classify a GitHub contributor that aims to minimize the number of API queries used.
-
Validation & Existence Check
RABBIT first verifies if the login exists on GitHub.- If the user does not exist: Returns
Invalid.
- If the user does not exist: Returns
-
Metadata Filtering (Fast Check)
Before running complex analysis, RABBIT checks the account type provided by the GitHub Users API.- If the type is
OrganizationorBot(e.g., GitHub Apps): It returns this type immediately without further analysis. - If the type is
User: It proceeds to the behavioral analysis.
- If the type is
-
Event Extraction
RABBIT fetches the latest public events using the GitHub Events API.- If the number of events is below the threshold (default: 5): Returns
Unknown(Insufficient data).
- If the number of events is below the threshold (default: 5): Returns
-
Feature Extraction
The retrieved events are converted into activity sequences (using the ghmap tool.
RABBIT computes 38 behavioral features covering volume, timing (inter-arrival time), and switching patterns (between repositories and activity types). -
Prediction (BIMBAS Model)
The computed features are fed into the machine learning model (Gradient Boosting).- Returns:
HumanorBot. - Confidence Score: A value between 0.0 and 1.0 indicating the certainty of the prediction.
- Returns:
RABBIT is based on a probabilistic machine learning model trained on a ground-truth dataset. While it achieves high accuracy, it is not infallible.
- Misclassifications: It is possible for a Human to be classified as a Bot (or vice versa), especially if their activity pattern is highly repetitive or unusual.
- Data Scarcity: Accounts with very few public events are harder to classify. The tool defaults to
Unknownto avoid guessing when data is scarce.
If you encounter a clear misclassification, please open an issue on GitHub so we can investigate and improve the model.
This tool was developed as part of the research work by Natarajan Chidambaram, Tom Mens and Alexandre Decan. It is part of a research article titled "A Bot Identification Model and Tool based on GitHub Activity Sequences" (doi)
If you use RABBIT in your research, please cite it using the following BibTeX entry:
@article{Chidambaram_RABBIT_A_tool,
author = {Chidambaram, Natarajan and Mens, Tom and Decan, Alexandre},
doi = {10.1145/3643991.3644877},
title = {{RABBIT: A tool for identifying bot accounts based on their recent GitHub event history}}
}Contributions to RABBIT are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request directly on GitHub.
When contributing, please ensure that your code adheres to the existing coding style and includes appropriate tests.
Also, make sure to clearly document why the changes are necessary in the commit messages and pull request descriptions.
We use uv for managing the development environment.
# Clone the repository (must be your fork if you plan to contribute)
$ git clone https://github.com/sgl-umons/RABBIT.git
$ cd RABBIT
# Install development dependencies
$ uv sync --dev
# Run tests
$ uv run pytest
# Lint the code
$ uv run ruff check
# Format the code
$ uv run ruff format This tool is maintained by the Software Engineering Lab (SGL) of the University of Mons (UMONS), Belgium.
- Natarajan Chidambaram (Original author)
- Alexandre Decan
- Tom Mens
- Cyril Moreau
This tool is distributed under Apache-2.0. See the LICENSE file for more details.
