Bug Report Classifier for Mozilla Bugzilla

An automated bug report triage system that leverages Large Language Models (LLMs) to classify Mozilla Bugzilla bug reports as valid or invalid, with multi-modal support for text descriptions and images.

📋 Project Overview

This project implements an intelligent bug triage system designed to automatically evaluate and classify bug reports from Mozilla's Bugzilla platform. The system utilizes state-of-the-art LLMs (GPT-4.1, o4-mini, Grok-3, DeepSeek-R1-0528) to assess whether bug reports contain sufficient information to be actionable and reproducible. The classifier supports three evaluation scenarios:

Description only: Text-based classification
Description and image: Multi-modal analysis combining text and screenshots
Image only: Visual-only classification

The system includes comprehensive evaluation metrics using semantic similarity, BERTScore, and cross-encoder models to validate classification accuracy against ground truth data.

🏗️ Architecture & Directory Map

.
├── add_image_descriptions.py
├── bug_evaluator.py
├── bug_evaluator_main.ipynb
├── bug_evaluator_notebook.ipynb
├── bug_evaluator_test.py
├── csvtojson.py
├── find_ground_truth.py
├── llm_bug_classifier.py
├── preprocess.ipynb
├── retrieve.ipynb
├── validvsinvalidbug.py
├── retry_eval.sh
├── run_all_eval.sh
├── sample_1000.csv
├── sample_1000_preprocessed.csv
├── tree.txt
├── comments/
│   ├── 1004432.csv
│   ├── 1005664.csv
│   └── ... (3000+ comment files)
├── images/
├── jsons/
├── results/
└── venv/

📁 Directory & File Descriptions

Component	Type	Description
Core Classification
`llm_bug_classifier.py`	Script	Main classification engine that processes bug reports through various LLM models (GPT-4.1, o4-mini, Grok-3, DeepSeek-R1) across different scenarios (description_only, description_and_image, image_only)
`validvsinvalidbug.py`	Script	Original bug classification prototype using Azure OpenAI to evaluate bug validity based on completeness and reproducibility criteria
Data Processing
`csvtojson.py`	Script	Converts bug report CSV data into structured JSON format, extracting key fields (Bug_ID, Type, Summary, Product, Component, Status, Resolution, etc.) and merging with comment data
`preprocess.ipynb`	Notebook	Data preprocessing pipeline for cleaning and preparing raw Bugzilla data for analysis
`retrieve.ipynb`	Notebook	Data retrieval and exploration notebook for querying bug reports, particularly those containing images
Evaluation Framework
`bug_evaluator.py`	Module	Evaluation framework implementing multiple similarity metrics: cosine similarity with SentenceTransformers, cross-encoder scoring, BERTScore, and standard classification metrics (accuracy, precision, recall, F1)
`bug_evaluator_main.ipynb`	Notebook	Primary evaluation interface for running experiments and analyzing results
`bug_evaluator_notebook.ipynb`	Notebook	Alternative evaluation notebook with additional analysis capabilities
`bug_evaluator_test.py`	Script	Unit tests for the bug evaluator module
Ground Truth & Augmentation
`find_ground_truth.py`	Script	Analyzes bug discussion comments to identify the most authoritative explanation for why a bug was marked as "Invalid" using GPT-5-mini
`add_image_descriptions.py`	Script	Enhances bug reports by generating natural language descriptions of attached images using multi-modal LLM vision capabilities
Execution Scripts
`run_all_eval.sh`	Shell	Batch execution script to run all model evaluations across all scenarios
`retry_eval.sh`	Shell	Retry mechanism for failed evaluations
Data Directories
`comments/`	Folder	Individual CSV files containing discussion threads for each bug report (organized by Bug_ID)
`images/`	Folder	Repository of screenshot attachments referenced in bug reports
`jsons/`	Folder	Processed bug report data in JSON format, ready for LLM consumption
`results/`	Folder	Output directory for classification results and evaluation metrics
Sample Data
`sample_1000.csv`	Data	Raw sample dataset containing 1000 bug reports from Bugzilla
`sample_1000_preprocessed.csv`	Data	Cleaned and preprocessed version of the sample dataset

🛠️ Tech Stack

Language: Python 3.8+
LLM Integration: Azure OpenAI API (GPT-4.1, o4-mini, Grok-3, DeepSeek-R1-0528)
NLP & Embeddings:
- Sentence-Transformers (all-MiniLM-L6-v2)
- BERTScore
- Cross-Encoder (stsb-roberta-base)
Data Processing: Pandas, NumPy
Evaluation: scikit-learn
Notebooks: Jupyter
Environment: Python venv

🚀 Setup Instructions

1. Clone the Repository

git clone https://github.com/yourusername/cs588group4.git
cd cs588group4

2. Create Virtual Environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install sentence-transformers scikit-learn numpy pandas tqdm openai bert-score jupyter

4. Configure Azure OpenAI Credentials

Set the following environment variables:

export ENDPOINT_URL="your-azure-endpoint"
export AZURE_OPENAI_API_KEY="your-api-key"
export DEPLOYMENT_NAME="gpt-4.1"  # or your preferred model

Alternatively, update the credentials directly in the Python files (not recommended for production).

5. Prepare Data

Ensure your data files are in place:

sample_1000_preprocessed.csv (bug report data)
comments/ folder with individual bug comment CSV files
images/ folder with screenshot attachments (if using image scenarios)

💡 Usage

Run Bug Classification

Basic classification with description only:

python llm_bug_classifier.py --model gpt-4.1 --scenario description_only

Multi-modal classification with images:

python llm_bug_classifier.py --model o4-mini --scenario description_and_image

Image-only classification:

python llm_bug_classifier.py --model grok-3 --scenario image_only

Available models: gpt-4.1, o4-mini, grok-3, DeepSeek-R1-0528
Available scenarios: description_only, description_and_image, image_only

Convert CSV to JSON

python csvtojson.py

This processes sample_1000_preprocessed.csv and creates individual JSON files in the jsons/ directory.

Generate Image Descriptions

python add_image_descriptions.py

Enhances JSON files with AI-generated descriptions of attached screenshots.

Find Ground Truth Comments

python find_ground_truth.py

Identifies the most authoritative comment explaining why each bug was marked invalid.

Run Batch Evaluations

bash run_all_eval.sh

Executes all model/scenario combinations for comprehensive evaluation.

Evaluate Results

Open bug_evaluator_main.ipynb in Jupyter:

jupyter notebook bug_evaluator_main.ipynb

Run the evaluation cells to compute:

Triage accuracy, precision, recall, F1
Semantic similarity scores
BERTScore metrics
Cross-encoder similarity

📊 Example Output

Classification Result (from llm_bug_classifier.py):

{
  "decision": "invalid",
  "fix": "The bug report lacks specific steps to reproduce the issue. While it mentions a crash, it doesn't provide environment details, browser version, or exact user actions leading to the crash.",
}

Evaluation Metrics (from bug_evaluator.py):

{
  "accuracy": 0.87,
  "precision": 0.85,
  "recall": 0.89,
  "f1_score": 0.87,
  "mean_similarity": 0.78
}

🧪 Testing

Run unit tests for the evaluator module:

python bug_evaluator_test.py

📝 Notes

The system requires active Azure OpenAI API credentials to function
Image processing scenarios require images to be base64-encoded
Results are automatically saved to the results/ directory with timestamped filenames
The venv/ directory is excluded from version control

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bug Report Classifier for Mozilla Bugzilla

📋 Project Overview

🏗️ Architecture & Directory Map

📁 Directory & File Descriptions

🛠️ Tech Stack

🚀 Setup Instructions

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Configure Azure OpenAI Credentials

5. Prepare Data

💡 Usage

Run Bug Classification

Convert CSV to JSON

Generate Image Descriptions

Find Ground Truth Comments

Run Batch Evaluations

Evaluate Results

📊 Example Output

🧪 Testing

📝 Notes

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
comments		comments
images		images
jsons		jsons
results		results
README.md		README.md
add_image_descriptions.py		add_image_descriptions.py
bug_evaluator.py		bug_evaluator.py
bug_evaluator_main.ipynb		bug_evaluator_main.ipynb
bug_evaluator_notebook.ipynb		bug_evaluator_notebook.ipynb
bug_evaluator_test.py		bug_evaluator_test.py
csvtojson.py		csvtojson.py
find_ground_truth.py		find_ground_truth.py
llm_bug_classifier.py		llm_bug_classifier.py
preprocess.ipynb		preprocess.ipynb
retrieve.ipynb		retrieve.ipynb
retry_eval.sh		retry_eval.sh
run_all_eval.sh		run_all_eval.sh
sample_1000.csv		sample_1000.csv
sample_1000_preprocessed.csv		sample_1000_preprocessed.csv
validvsinvalidbug.py		validvsinvalidbug.py

ardaicoz/cs588group4

Folders and files

Latest commit

History

Repository files navigation

Bug Report Classifier for Mozilla Bugzilla

📋 Project Overview

🏗️ Architecture & Directory Map

📁 Directory & File Descriptions

🛠️ Tech Stack

🚀 Setup Instructions

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Configure Azure OpenAI Credentials

5. Prepare Data

💡 Usage

Run Bug Classification

Convert CSV to JSON

Generate Image Descriptions

Find Ground Truth Comments

Run Batch Evaluations

Evaluate Results

📊 Example Output

🧪 Testing

📝 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages