Skip to content
This repository was archived by the owner on Jan 24, 2026. It is now read-only.
/ cs588group4 Public archive

Term project of Group 4 for CS 588 Data Science for Software Engineering course.

Notifications You must be signed in to change notification settings

ardaicoz/cs588group4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bug Report Classifier for Mozilla Bugzilla

An automated bug report triage system that leverages Large Language Models (LLMs) to classify Mozilla Bugzilla bug reports as valid or invalid, with multi-modal support for text descriptions and images.


📋 Project Overview

This project implements an intelligent bug triage system designed to automatically evaluate and classify bug reports from Mozilla's Bugzilla platform. The system utilizes state-of-the-art LLMs (GPT-4.1, o4-mini, Grok-3, DeepSeek-R1-0528) to assess whether bug reports contain sufficient information to be actionable and reproducible. The classifier supports three evaluation scenarios:

  • Description only: Text-based classification
  • Description and image: Multi-modal analysis combining text and screenshots
  • Image only: Visual-only classification

The system includes comprehensive evaluation metrics using semantic similarity, BERTScore, and cross-encoder models to validate classification accuracy against ground truth data.


🏗️ Architecture & Directory Map

.
├── add_image_descriptions.py
├── bug_evaluator.py
├── bug_evaluator_main.ipynb
├── bug_evaluator_notebook.ipynb
├── bug_evaluator_test.py
├── csvtojson.py
├── find_ground_truth.py
├── llm_bug_classifier.py
├── preprocess.ipynb
├── retrieve.ipynb
├── validvsinvalidbug.py
├── retry_eval.sh
├── run_all_eval.sh
├── sample_1000.csv
├── sample_1000_preprocessed.csv
├── tree.txt
├── comments/
│   ├── 1004432.csv
│   ├── 1005664.csv
│   └── ... (3000+ comment files)
├── images/
├── jsons/
├── results/
└── venv/

📁 Directory & File Descriptions

Component Type Description
Core Classification
llm_bug_classifier.py Script Main classification engine that processes bug reports through various LLM models (GPT-4.1, o4-mini, Grok-3, DeepSeek-R1) across different scenarios (description_only, description_and_image, image_only)
validvsinvalidbug.py Script Original bug classification prototype using Azure OpenAI to evaluate bug validity based on completeness and reproducibility criteria
Data Processing
csvtojson.py Script Converts bug report CSV data into structured JSON format, extracting key fields (Bug_ID, Type, Summary, Product, Component, Status, Resolution, etc.) and merging with comment data
preprocess.ipynb Notebook Data preprocessing pipeline for cleaning and preparing raw Bugzilla data for analysis
retrieve.ipynb Notebook Data retrieval and exploration notebook for querying bug reports, particularly those containing images
Evaluation Framework
bug_evaluator.py Module Evaluation framework implementing multiple similarity metrics: cosine similarity with SentenceTransformers, cross-encoder scoring, BERTScore, and standard classification metrics (accuracy, precision, recall, F1)
bug_evaluator_main.ipynb Notebook Primary evaluation interface for running experiments and analyzing results
bug_evaluator_notebook.ipynb Notebook Alternative evaluation notebook with additional analysis capabilities
bug_evaluator_test.py Script Unit tests for the bug evaluator module
Ground Truth & Augmentation
find_ground_truth.py Script Analyzes bug discussion comments to identify the most authoritative explanation for why a bug was marked as "Invalid" using GPT-5-mini
add_image_descriptions.py Script Enhances bug reports by generating natural language descriptions of attached images using multi-modal LLM vision capabilities
Execution Scripts
run_all_eval.sh Shell Batch execution script to run all model evaluations across all scenarios
retry_eval.sh Shell Retry mechanism for failed evaluations
Data Directories
comments/ Folder Individual CSV files containing discussion threads for each bug report (organized by Bug_ID)
images/ Folder Repository of screenshot attachments referenced in bug reports
jsons/ Folder Processed bug report data in JSON format, ready for LLM consumption
results/ Folder Output directory for classification results and evaluation metrics
Sample Data
sample_1000.csv Data Raw sample dataset containing 1000 bug reports from Bugzilla
sample_1000_preprocessed.csv Data Cleaned and preprocessed version of the sample dataset

🛠️ Tech Stack

  • Language: Python 3.8+
  • LLM Integration: Azure OpenAI API (GPT-4.1, o4-mini, Grok-3, DeepSeek-R1-0528)
  • NLP & Embeddings:
    • Sentence-Transformers (all-MiniLM-L6-v2)
    • BERTScore
    • Cross-Encoder (stsb-roberta-base)
  • Data Processing: Pandas, NumPy
  • Evaluation: scikit-learn
  • Notebooks: Jupyter
  • Environment: Python venv

🚀 Setup Instructions

1. Clone the Repository

git clone https://github.com/yourusername/cs588group4.git
cd cs588group4

2. Create Virtual Environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install sentence-transformers scikit-learn numpy pandas tqdm openai bert-score jupyter

4. Configure Azure OpenAI Credentials

Set the following environment variables:

export ENDPOINT_URL="your-azure-endpoint"
export AZURE_OPENAI_API_KEY="your-api-key"
export DEPLOYMENT_NAME="gpt-4.1"  # or your preferred model

Alternatively, update the credentials directly in the Python files (not recommended for production).

5. Prepare Data

Ensure your data files are in place:

  • sample_1000_preprocessed.csv (bug report data)
  • comments/ folder with individual bug comment CSV files
  • images/ folder with screenshot attachments (if using image scenarios)

💡 Usage

Run Bug Classification

Basic classification with description only:

python llm_bug_classifier.py --model gpt-4.1 --scenario description_only

Multi-modal classification with images:

python llm_bug_classifier.py --model o4-mini --scenario description_and_image

Image-only classification:

python llm_bug_classifier.py --model grok-3 --scenario image_only

Available models: gpt-4.1, o4-mini, grok-3, DeepSeek-R1-0528
Available scenarios: description_only, description_and_image, image_only

Convert CSV to JSON

python csvtojson.py

This processes sample_1000_preprocessed.csv and creates individual JSON files in the jsons/ directory.

Generate Image Descriptions

python add_image_descriptions.py

Enhances JSON files with AI-generated descriptions of attached screenshots.

Find Ground Truth Comments

python find_ground_truth.py

Identifies the most authoritative comment explaining why each bug was marked invalid.

Run Batch Evaluations

bash run_all_eval.sh

Executes all model/scenario combinations for comprehensive evaluation.

Evaluate Results

Open bug_evaluator_main.ipynb in Jupyter:

jupyter notebook bug_evaluator_main.ipynb

Run the evaluation cells to compute:

  • Triage accuracy, precision, recall, F1
  • Semantic similarity scores
  • BERTScore metrics
  • Cross-encoder similarity

📊 Example Output

Classification Result (from llm_bug_classifier.py):

{
  "decision": "invalid",
  "fix": "The bug report lacks specific steps to reproduce the issue. While it mentions a crash, it doesn't provide environment details, browser version, or exact user actions leading to the crash.",
}

Evaluation Metrics (from bug_evaluator.py):

{
  "accuracy": 0.87,
  "precision": 0.85,
  "recall": 0.89,
  "f1_score": 0.87,
  "mean_similarity": 0.78
}

🧪 Testing

Run unit tests for the evaluator module:

python bug_evaluator_test.py

📝 Notes

  • The system requires active Azure OpenAI API credentials to function
  • Image processing scenarios require images to be base64-encoded
  • Results are automatically saved to the results/ directory with timestamped filenames
  • The venv/ directory is excluded from version control

About

Term project of Group 4 for CS 588 Data Science for Software Engineering course.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •