complaintTag

ComplaintTag Classification System

Consumers often encounter challenges with financial products and services, leading to unresolved complaints with financial institutions. The Consumer Financial Protection Bureau (CFPB) serves as a mediator in these cases, but accurately categorizing complaints can be difficult for consumers, causing delays in the resolution process.

Our project streamlines complaint submission by automatically categorizing complaints based on the narrative descriptions. This improves the efficiency of complaint handling and ensures they are quickly routed to the appropriate teams for resolution.

This project provides a system to classify consumer complaints related to various financial products and issues handled by the Consumer Financial Protection Bureau (CFPB). It includes a Streamlit application for complaint classification and a pipeline to train custom models using transformer-based models such as BERT.

Features

Classify consumer complaints into CFPB categories like Product, Sub-product, Issue, and Sub-issue.
Use pre-trained transformer models for classification.
Train custom models on your own datasets.
Display classification results with probabilities via an interactive Streamlit application.

Folder Structure

complaintTag/
│
├── app.py                        # Streamlit application script
├── models/                       # Folder for model loading, training, and inference logic
│   ├── __init__.py               # Makes 'models' a package
│   ├── dataset_handler.py        # Dataset preparation (DatasetHandler)
│   ├── model_handler.py          # Model setup, training, evaluation (TransformerModelHandler)
│   └── classifier.py             # Classification functions used in Streamlit app
├── config/                       # Configuration files for hyperparameters
│   └── hyperparams.py            # Stores training hyperparameters
├── training_pipeline/            # Folder for training pipeline
│   └── train.py                  # Main script to train custom models
├── logs/                         # Stores training logs
├── output/                       # Stores trained models and tokenizer output
├── notebooks/                    # Jupyter notebooks for experimentation
├── data/                         # Folder for datasets
├── .env                          # Environment variable configuration (model paths, etc.)
├── requirements.txt              # Dependencies for the project
└── README.md                     # Project documentation

Loading Pre-Trained Models and Tokenizers

The following models have been trained and are available on Hugging Face for direct use:

Example: How to Load the Model and Tokenizer

You can load any of these models and their tokenizers using the transformers library from Hugging Face. Here's an example of how to do this:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Specify the model you want to use
model_name = "harshan1823/cfpb_product_complaint_classifier"

# Load the pre-trained model
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example usage: tokenize a sample complaint and get predictions
text = "Sample consumer complaint text"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# Get the logits (model predictions)
logits = outputs.logits
print(logits)

Running the Streamlit Application

Follow these steps to run the Streamlit application for complaint classification.

Prerequisites

Python 3.7+
Required dependencies from requirements.txt
Pre-trained models for Product, Sub-product, Issue, and Sub-issue classification

Steps

Clone the repository:

git clone https://github.com/Harshan1823/complaintTag.git
cd complaintTag

Install dependencies:

It's recommended to use a virtual environment. You can install the dependencies with:
```
pip install -r requirements.txt
```

Set up environment variables:

Create a .env file in the root directory with paths to your model checkpoints:

PRODUCT_MODEL_PATH=./product_model_checkpoint
SUB_PRODUCT_MODEL_PATH=./sub_product_model_checkpoint
ISSUE_MODEL_PATH=./issue_model_checkpoint
SUB_ISSUE_MODEL_PATH=./sub_issue_model_checkpoint

Run the Streamlit app:

Start the Streamlit app by running:
```
streamlit run app.py
```
Use the application:
- Open the app in your browser at http://localhost:8501.
- Enter a customer complaint narrative, and the app will classify it into the corresponding product, sub-product, issue, and sub-issue categories.

Training a Custom Model

Follow these steps to train a custom model for classifying CFPB customer complaints.

Prerequisites

Datasets for training and testing (in Pandas DataFrame format).
Python 3.7+
Pre-trained transformer model (e.g., BERT) for fine-tuning.

Steps

Prepare the Dataset:

Ensure your dataset is in a Pandas DataFrame format with the following columns:
- Consumer complaint narrative: The complaint text.
- Product, Sub-product, Issue, Sub-issue: The categories for classification.
- labels: The label column representing the class for training.
Modify Hyperparameters:

Open config/hyperparams.py to modify the hyperparameters, such as:
- MODEL_NAME: Pre-trained model name (e.g., 'distilbert-base-uncased').
- BATCH_SIZE: Training batch size.
- NUM_EPOCHS: Number of training epochs.
- LEARNING_RATE: Learning rate for the model.
Run the Training Pipeline:

To train the model, execute the train.py script:
```
python training_pipeline/train.py
```
Check Logs and Outputs:
- Training logs will be stored in the logs/ folder.
- Trained models and tokenizer will be saved in the output/ folder.
Evaluate the Model:

After training, the model will be evaluated on the test dataset, and the evaluation metrics (accuracy, precision, recall, F1-score) will be printed.
Use the Trained Model:

The trained model can be used for inference via the Streamlit application by updating the model paths in the .env file with the new checkpoints.

Environment Setup

To replicate this project on your machine, follow these steps:

Install dependencies:

Install all required Python packages with:
```
pip install -r requirements.txt
```
Set up your environment variables:

Use a .env file to specify paths for your pre-trained models (as described above).

How to Contribute

If you'd like to contribute to this project, please follow these steps:

Fork the repository.
Create a feature branch.
Commit your changes.
Push your branch.
Create a Pull Request (PR).

We welcome all contributions that enhance this project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

complaintTag

ComplaintTag Classification System

Table of Contents

Features

Folder Structure

Loading Pre-Trained Models and Tokenizers

Example: How to Load the Model and Tokenizer

Running the Streamlit Application

Prerequisites

Steps

Training a Custom Model

Prerequisites

Steps

Environment Setup

How to Contribute

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
artifacts		artifacts
config		config
models		models
notebooks		notebooks
training_pipeline		training_pipeline
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Harshan1823/complaintTag

Folders and files

Latest commit

History

Repository files navigation

complaintTag

ComplaintTag Classification System

Table of Contents

Features

Folder Structure

Loading Pre-Trained Models and Tokenizers

Example: How to Load the Model and Tokenizer

Running the Streamlit Application

Prerequisites

Steps

Training a Custom Model

Prerequisites

Steps

Environment Setup

How to Contribute

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages