OcelotBot: Tone-Aware Student-Focused Conversational Assistant

A fine-tuned conversational AI system designed to provide empathetic, context-appropriate responses to students discussing mental well-being and academic challenges. The model uses sentiment analysis to dynamically adjust response tone, achieving a 56% improvement in tone accuracy over baseline models.

Meet the Team:

Abdulaziz Khader
Tina Toma
Selena Bahro
Aliyah Owens

Overview

Problem Statement

Students often face mental health and academic challenges but may hesitate to seek help due to stigma, accessibility barriers, or timing constraints. Traditional chatbots lack the emotional intelligence to provide appropriate support based on a student's emotional state.

Solution

This project develops a sentiment-conditioned conversational AI that:

Analyzes the emotional tone of student messages
Generates contextually appropriate, empathetic responses
Adapts communication style based on detected sentiment (positive, neutral, negative)
Provides 24/7 accessible support for academic and well-being concerns

Impact

56% improvement in tone accuracy compared to baseline models
82% reduction in training time using GPU acceleration (CUDA)
5,600+ training samples across mental health, academic, and general conversation domains

Key Features

1. Sentiment-Aware Response Generation

Real-time sentiment detection using fine-tuned RoBERTa classifier
Dynamic tone adjustment based on emotional context
Three-tier sentiment conditioning (positive, neutral, negative)

2. Efficient Fine-Tuning with LoRA

Low-Rank Adaptation (LoRA) for parameter-efficient training
4.2M trainable parameters vs. 7.2B total parameters (0.058% trainable)
Maintains 96-98% of full-precision model quality

3. Memory-Optimized Training

4-bit quantization using BitsAndBytes (NF4)
Gradient checkpointing for reduced memory footprint
Fits in 15GB GPU (NVIDIA T4) with 8-bit optimizer

4. Interactive Demo

Gradio-based web interface for real-time testing
Adjustable generation parameters (temperature, max length)
Conversation history tracking

Architecture

System Overview

┌─────────────────────────────────────────────────────────────┐
│                     User Input                              │
│                "I'm stressed about midterms"                │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│              Sentiment Classifier (RoBERTa)                 │
│         Detects: NEGATIVE (confidence: 0.92)                │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                 Prompt Constructor                          │
│   [SENTIMENT: NEGATIVE]                                     │
│   [INSTRUCTION] Respond empathetically...                   │
│   [CONTEXT] I'm stressed about midterms                     │
│   [RESPONSE]                                                │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│        LoRA Fine-Tuned Model (Mistral-7B)                   │
│          Generates tone-appropriate response                │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    Final Response                           │
│  "I understand how overwhelming exam season can be.         │
│   Let's break this down - what subjects are you most        │
│   worried about? We can create a study plan together."      │
└─────────────────────────────────────────────────────────────┘

Model Components

Component	Technology	Purpose
Base Model	Mistral-7B-v0.1	7B parameter causal language model
Fine-tuning	LoRA (r=16, α=32)	Efficient adaptation to student domain
Sentiment Classifier	RoBERTa-base	Emotion detection (3-class)
Quantization	4-bit NF4 (BitsAndBytes)	Memory optimization
Optimizer	8-bit Paged AdamW	Reduced memory footprint
Acceleration	CUDA + Mixed Precision	76% training speedup

Dataset

Data Sources

Dataset	Samples	Domain	Purpose
Human Conversation Training Data	1,985	General dialogue	Conversational fluency
Mental Health Counseling Conversations	2,038	Mental health support	Empathy & emotional tone
LMSYS Chatbot Arena	1,600	High-quality responses	Response structure & quality
Total	5,623	Multi-domain	Comprehensive coverage

Data Processing Pipeline

# 1. Data Cleaning
- Remove non-ASCII characters
- Normalize whitespace
- Filter URLs

# 2. Sentiment Labeling
- Auto-label with pretrained RoBERTa (Cardiff Twitter)
- Manual validation of 300 samples
- Achieved 88% auto-labeling accuracy

# 3. Format for Training
- Add sentiment tags to prompts
- Structure: [SENTIMENT] [INSTRUCTION] [CONTEXT] [RESPONSE]
- Create 80/10/10 train/val/test split (stratified)

# 4. Tokenization
- Mistral-7B tokenizer
- Max length: 896 tokens (95th percentile)
- Truncation + dynamic padding

Sentiment Distribution

Negative: 1,984 samples (35.3%)
Neutral:  1,823 samples (32.4%)
Positive: 1,816 samples (32.3%)

Balanced distribution ensures unbiased tone modeling

Methodology

Phase 1: Sentiment Classifier Training

Objective: Build domain-adapted sentiment classifier for student conversations

Approach:

Auto-labeled 5,600+ samples using pretrained RoBERTa
Manually validated 300 samples (88% accuracy)
Fine-tuned RoBERTa-base on labeled data
3-class output: positive, neutral, negative

Results:

Validation Accuracy: 84%
F1 Score (weighted): 0.82
Key strength: Detects subtle stress signals ("I'm managing" → negative)

Code:

from transformers import AutoModelForSequenceClassification, Trainer

model = AutoModelForSequenceClassification.from_pretrained(
    "roberta-base", 
    num_labels=3
)

trainer = Trainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics
)

trainer.train()

Phase 2: LoRA Fine-Tuning

Objective: Adapt Mistral-7B to generate tone-aware responses

Configuration:

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,                    # Rank of adaptation matrices
    lora_alpha=32,           # Scaling factor (2×r)
    target_modules=[
        "q_proj",            # Query projection in attention
        "k_proj",            # Key projection
        "v_proj",            # Value projection
        "o_proj"             # Output projection
    ],
    lora_dropout=0.05,       # Regularization
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(base_model, lora_config)

Training Hyperparameters:

Parameter	Value	Rationale
Learning Rate	2e-4	Higher for LoRA (fewer params)
Batch Size	2 per device	Memory constraint
Gradient Accumulation	8 steps	Effective batch = 16
Epochs	3	Prevent overfitting
Optimizer	8-bit Paged AdamW	Memory efficiency
Precision	4-bit (NF4)	75% memory reduction
Max Length	896 tokens	Covers 95% of data

Memory Optimization:

# 4-bit Quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

# Gradient Checkpointing
model.gradient_checkpointing_enable()

# Result: 3.5GB model (vs 14GB full precision)

Training Environment:

Hardware: NVIDIA H100 GPU (80GB VRAM via Google Colab)
Software: PyTorch 2.0, Transformers 4.36, PEFT 0.7
Platform: Google Colab
Training Time: 0.5 hours

Phase 3: Evaluation

Metrics:

Loss Curves
- Training loss: 1.266 → 0.728 (-50%)
- Validation loss: 1.254 → 1.076 (-15%)
- No overfitting observed
Tone Accuracy
- Baseline (Mistral-7B zero-shot): 54% alignment
- Fine-tuned (LoRA + sentiment conditioning): 84% alignment
- Improvement: (84% - 54%) / 54% = 55.6% ≈ 56%
Human Evaluation (50 samples)
- Empathy score: 4.2/5.0
- Relevance: 4.5/5.0
- Tone appropriateness: 4.1/5.0

Tone Accuracy Measurement:

# For each test sample:
# 1. Generate response with sentiment tag
# 2. Classify response sentiment with RoBERTa
# 3. Compare to input sentiment

correct_tone = 0
for example in test_set:
    response = model.generate(
        prompt=f"[SENTIMENT: {example.true_sentiment}] {example.context}"
    )
    predicted_sentiment = sentiment_classifier(response)
    
    if predicted_sentiment == example.true_sentiment:
        correct_tone += 1

accuracy = correct_tone / len(test_set)  # 84%

Results

Key Achievements

✅ 56% Improvement in Tone Accuracy

Baseline: 54% sentiment alignment
Our model: 84% sentiment alignment

✅ 80% Training Time Reduction

CPU baseline: ~40 hours
GPU + CUDA + FP16: ~8 hours
With quantization: ~0.5 hours
Speedup: (40 - 4.5) / 40 = 88.75%

✅ Parameter Efficiency

Trainable params: 4.2M (0.058% of total)
Model quality: 96-98% of full fine-tuning

✅ Memory Efficiency

4-bit quantization: 3.5GB (vs 14GB)
Fits consumer GPU (RTX 3050, T4)

Performance Comparison

Model	Tone Accuracy	Training Time	Memory	Quality
Mistral-7B (zero-shot)	54%	0 hrs	14 GB	Baseline
Full fine-tuning (FP16)	86%	10 hrs	18 GB	100%
Our model (LoRA + 4-bit)	84%	4.5 hrs	7 GB	98%

Example Outputs

Input: "I'm falling behind in all my classes and don't know what to do"

Sentiment Detected: Negative
Model Response: "I hear you, and I want you to know that feeling overwhelmed is really common, especially when multiple classes start piling up. You're not alone in this. Let's take it one step at a time - which class is causing you the most stress right now? We can work on a plan together."

Input: "I just got accepted into my dream graduate program!"

Sentiment Detected: Positive
Model Response: "That's incredible! Congratulations on this amazing achievement! Getting into your dream program is a huge accomplishment and a testament to all your hard work. You should be so proud of yourself. When do you start?"

Input: "What's the deadline for the research paper?"

Sentiment Detected: Neutral
Model Response: "I don't have access to specific course deadlines, but I'd recommend checking your syllabus or course management system. If you can't find it there, reaching out to your professor or TA would be the best way to get an accurate answer."

Installation

Requirements

python==3.12
datasets==4.5.0
numpy>=2.4.2
pandas>=3.0.0
torch>=2.10.0
transformers>=5.1.0
jsonlines>=3.0.0
scikit-learn>=1.3.0
bitsandbytes>=0.41.0
accelerate>=0.25.0

Setup

# Clone repository
git clone https://github.com/aokhader/OcelotBot.git
cd OcelotBot

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load model
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
base_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-v0.1",
    load_in_4bit=True,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "./lora-model-output/final_model")

# Generate response
prompt = """[SENTIMENT: NEGATIVE]
[INSTRUCTION] Respond empathetically to a student.
[CONTEXT] I'm stressed about exams
[RESPONSE]"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response.split("[RESPONSE]")[-1])

Usage

Run the demo here

Features:

Real-time conversation
Adjustable sentiment, temperature, response length
Conversation history
Export transcript

Project Structure

OcelotBot/
├── data-prep/
│   ├── data-clean.py             # Clean raw datasets
│   ├── lora-prep.py              # Pre-process datasets for LoRA fine-tuning
│ 
├── datasets/
│   ├── jsonl-datasets/           # Cleaned datasets
│   ├── lora-datasets/            # LoRA-prepared datasets
│   └── raw-datasets/             # Raw datasets downloaded for ease of use
│
├── lora-model-output/
│   ├── final_model/              # Final trained model
│   ├── training_metrics.json     # Results of training model
│
├── sentiment-labeling/
│   ├── labeling-validation.py    # Validation of auto-labeling for initial dataset
│   ├── validation-sentiment.csv  # Manually-labeled validation of auto-labeling
│
├── chatbot_demo.ipynb            # Notebook prepared with Gradio demo of final model
├── model_training.ipynb          # Notebook for training model on Google Colab
│
├── requirements.txt

Acknowledgments

Datasets

Models

Infrastructure

Google Colab for GPU access
NVIDIA CUDA for acceleration

Future Work

Short-term Improvements

Expand training data to 10,000+ samples
Add multi-turn conversation memory (beyond 3 turns)
Implement content safety filters
Support multilingual responses (Spanish, Mandarin)
A/B testing with real students

Long-term Goals

Deploy as university mental health resource
Integration with existing student support systems
Mobile app (iOS/Android)
Voice interface support
Personalization based on user history
Crisis detection and escalation protocols

Technical Enhancements

Experiment with larger models (Llama-3-70B)
Explore Mixture-of-Experts (MoE) architecture
Implement Reinforcement Learning from Human Feedback (RLHF)
Optimize for edge deployment (ONNX, TensorRT)

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
archive		archive
data-prep		data-prep
datasets		datasets
lora-model-output		lora-model-output
sentiment-labeling		sentiment-labeling
.gitignore		.gitignore
README.md		README.md
chatbot_demo.ipynb		chatbot_demo.ipynb
local-gpu-checker.py		local-gpu-checker.py
model_training.ipynb		model_training.ipynb
requirements.txt		requirements.txt
token_distribution.png		token_distribution.png

Folders and files

Latest commit

History

Repository files navigation

OcelotBot: Tone-Aware Student-Focused Conversational Assistant

Meet the Team:

📋 Table of Contents

Overview

Problem Statement

Solution

Impact

Key Features

1. Sentiment-Aware Response Generation

2. Efficient Fine-Tuning with LoRA

3. Memory-Optimized Training

4. Interactive Demo

Architecture

System Overview

Model Components

Dataset

Data Sources

Data Processing Pipeline

Sentiment Distribution

Methodology

Phase 1: Sentiment Classifier Training

Phase 2: LoRA Fine-Tuning

Phase 3: Evaluation

Results

Key Achievements

Performance Comparison

Example Outputs

Installation

Requirements

Setup

Quick Start

Usage

Project Structure

Acknowledgments

Datasets

Models

Infrastructure

Future Work

Short-term Improvements

Long-term Goals

Technical Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages