A fine-tuned conversational AI system designed to provide empathetic, context-appropriate responses to students discussing mental well-being and academic challenges. The model uses sentiment analysis to dynamically adjust response tone, achieving a 56% improvement in tone accuracy over baseline models.
- Abdulaziz Khader
- Tina Toma
- Selena Bahro
- Aliyah Owens
- Overview
- Key Features
- Architecture
- Dataset
- Methodology
- Results
- Installation
- Usage
- Project Structure
- Acknowledgments
- Future Work
Students often face mental health and academic challenges but may hesitate to seek help due to stigma, accessibility barriers, or timing constraints. Traditional chatbots lack the emotional intelligence to provide appropriate support based on a student's emotional state.
This project develops a sentiment-conditioned conversational AI that:
- Analyzes the emotional tone of student messages
- Generates contextually appropriate, empathetic responses
- Adapts communication style based on detected sentiment (positive, neutral, negative)
- Provides 24/7 accessible support for academic and well-being concerns
- 56% improvement in tone accuracy compared to baseline models
- 82% reduction in training time using GPU acceleration (CUDA)
- 5,600+ training samples across mental health, academic, and general conversation domains
- Real-time sentiment detection using fine-tuned RoBERTa classifier
- Dynamic tone adjustment based on emotional context
- Three-tier sentiment conditioning (positive, neutral, negative)
- Low-Rank Adaptation (LoRA) for parameter-efficient training
- 4.2M trainable parameters vs. 7.2B total parameters (0.058% trainable)
- Maintains 96-98% of full-precision model quality
- 4-bit quantization using BitsAndBytes (NF4)
- Gradient checkpointing for reduced memory footprint
- Fits in 15GB GPU (NVIDIA T4) with 8-bit optimizer
- Gradio-based web interface for real-time testing
- Adjustable generation parameters (temperature, max length)
- Conversation history tracking
┌─────────────────────────────────────────────────────────────┐
│ User Input │
│ "I'm stressed about midterms" │
└───────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Sentiment Classifier (RoBERTa) │
│ Detects: NEGATIVE (confidence: 0.92) │
└───────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Prompt Constructor │
│ [SENTIMENT: NEGATIVE] │
│ [INSTRUCTION] Respond empathetically... │
│ [CONTEXT] I'm stressed about midterms │
│ [RESPONSE] │
└───────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LoRA Fine-Tuned Model (Mistral-7B) │
│ Generates tone-appropriate response │
└───────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Final Response │
│ "I understand how overwhelming exam season can be. │
│ Let's break this down - what subjects are you most │
│ worried about? We can create a study plan together." │
└─────────────────────────────────────────────────────────────┘
| Component | Technology | Purpose |
|---|---|---|
| Base Model | Mistral-7B-v0.1 | 7B parameter causal language model |
| Fine-tuning | LoRA (r=16, α=32) | Efficient adaptation to student domain |
| Sentiment Classifier | RoBERTa-base | Emotion detection (3-class) |
| Quantization | 4-bit NF4 (BitsAndBytes) | Memory optimization |
| Optimizer | 8-bit Paged AdamW | Reduced memory footprint |
| Acceleration | CUDA + Mixed Precision | 76% training speedup |
| Dataset | Samples | Domain | Purpose |
|---|---|---|---|
| Human Conversation Training Data | 1,985 | General dialogue | Conversational fluency |
| Mental Health Counseling Conversations | 2,038 | Mental health support | Empathy & emotional tone |
| LMSYS Chatbot Arena | 1,600 | High-quality responses | Response structure & quality |
| Total | 5,623 | Multi-domain | Comprehensive coverage |
# 1. Data Cleaning
- Remove non-ASCII characters
- Normalize whitespace
- Filter URLs
# 2. Sentiment Labeling
- Auto-label with pretrained RoBERTa (Cardiff Twitter)
- Manual validation of 300 samples
- Achieved 88% auto-labeling accuracy
# 3. Format for Training
- Add sentiment tags to prompts
- Structure: [SENTIMENT] [INSTRUCTION] [CONTEXT] [RESPONSE]
- Create 80/10/10 train/val/test split (stratified)
# 4. Tokenization
- Mistral-7B tokenizer
- Max length: 896 tokens (95th percentile)
- Truncation + dynamic paddingNegative: 1,984 samples (35.3%)
Neutral: 1,823 samples (32.4%)
Positive: 1,816 samples (32.3%)
Balanced distribution ensures unbiased tone modeling
Objective: Build domain-adapted sentiment classifier for student conversations
Approach:
- Auto-labeled 5,600+ samples using pretrained RoBERTa
- Manually validated 300 samples (88% accuracy)
- Fine-tuned RoBERTa-base on labeled data
- 3-class output: positive, neutral, negative
Results:
- Validation Accuracy: 84%
- F1 Score (weighted): 0.82
- Key strength: Detects subtle stress signals ("I'm managing" → negative)
Code:
from transformers import AutoModelForSequenceClassification, Trainer
model = AutoModelForSequenceClassification.from_pretrained(
"roberta-base",
num_labels=3
)
trainer = Trainer(
model=model,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics
)
trainer.train()Objective: Adapt Mistral-7B to generate tone-aware responses
Configuration:
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=16, # Rank of adaptation matrices
lora_alpha=32, # Scaling factor (2×r)
target_modules=[
"q_proj", # Query projection in attention
"k_proj", # Key projection
"v_proj", # Value projection
"o_proj" # Output projection
],
lora_dropout=0.05, # Regularization
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(base_model, lora_config)Training Hyperparameters:
| Parameter | Value | Rationale |
|---|---|---|
| Learning Rate | 2e-4 | Higher for LoRA (fewer params) |
| Batch Size | 2 per device | Memory constraint |
| Gradient Accumulation | 8 steps | Effective batch = 16 |
| Epochs | 3 | Prevent overfitting |
| Optimizer | 8-bit Paged AdamW | Memory efficiency |
| Precision | 4-bit (NF4) | 75% memory reduction |
| Max Length | 896 tokens | Covers 95% of data |
Memory Optimization:
# 4-bit Quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
# Gradient Checkpointing
model.gradient_checkpointing_enable()
# Result: 3.5GB model (vs 14GB full precision)Training Environment:
- Hardware: NVIDIA H100 GPU (80GB VRAM via Google Colab)
- Software: PyTorch 2.0, Transformers 4.36, PEFT 0.7
- Platform: Google Colab
- Training Time: 0.5 hours
Metrics:
-
Loss Curves
- Training loss: 1.266 → 0.728 (-50%)
- Validation loss: 1.254 → 1.076 (-15%)
- No overfitting observed
-
Tone Accuracy
- Baseline (Mistral-7B zero-shot): 54% alignment
- Fine-tuned (LoRA + sentiment conditioning): 84% alignment
- Improvement: (84% - 54%) / 54% = 55.6% ≈ 56%
-
Human Evaluation (50 samples)
- Empathy score: 4.2/5.0
- Relevance: 4.5/5.0
- Tone appropriateness: 4.1/5.0
Tone Accuracy Measurement:
# For each test sample:
# 1. Generate response with sentiment tag
# 2. Classify response sentiment with RoBERTa
# 3. Compare to input sentiment
correct_tone = 0
for example in test_set:
response = model.generate(
prompt=f"[SENTIMENT: {example.true_sentiment}] {example.context}"
)
predicted_sentiment = sentiment_classifier(response)
if predicted_sentiment == example.true_sentiment:
correct_tone += 1
accuracy = correct_tone / len(test_set) # 84%✅ 56% Improvement in Tone Accuracy
- Baseline: 54% sentiment alignment
- Our model: 84% sentiment alignment
✅ 80% Training Time Reduction
- CPU baseline: ~40 hours
- GPU + CUDA + FP16: ~8 hours
- With quantization: ~0.5 hours
- Speedup: (40 - 4.5) / 40 = 88.75%
✅ Parameter Efficiency
- Trainable params: 4.2M (0.058% of total)
- Model quality: 96-98% of full fine-tuning
✅ Memory Efficiency
- 4-bit quantization: 3.5GB (vs 14GB)
- Fits consumer GPU (RTX 3050, T4)
| Model | Tone Accuracy | Training Time | Memory | Quality |
|---|---|---|---|---|
| Mistral-7B (zero-shot) | 54% | 0 hrs | 14 GB | Baseline |
| Full fine-tuning (FP16) | 86% | 10 hrs | 18 GB | 100% |
| Our model (LoRA + 4-bit) | 84% | 4.5 hrs | 7 GB | 98% |
Input: "I'm falling behind in all my classes and don't know what to do"
- Sentiment Detected: Negative
- Model Response: "I hear you, and I want you to know that feeling overwhelmed is really common, especially when multiple classes start piling up. You're not alone in this. Let's take it one step at a time - which class is causing you the most stress right now? We can work on a plan together."
Input: "I just got accepted into my dream graduate program!"
- Sentiment Detected: Positive
- Model Response: "That's incredible! Congratulations on this amazing achievement! Getting into your dream program is a huge accomplishment and a testament to all your hard work. You should be so proud of yourself. When do you start?"
Input: "What's the deadline for the research paper?"
- Sentiment Detected: Neutral
- Model Response: "I don't have access to specific course deadlines, but I'd recommend checking your syllabus or course management system. If you can't find it there, reaching out to your professor or TA would be the best way to get an accurate answer."
python==3.12
datasets==4.5.0
numpy>=2.4.2
pandas>=3.0.0
torch>=2.10.0
transformers>=5.1.0
jsonlines>=3.0.0
scikit-learn>=1.3.0
bitsandbytes>=0.41.0
accelerate>=0.25.0# Clone repository
git clone https://github.com/aokhader/OcelotBot.git
cd OcelotBot
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtfrom transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load model
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
base_model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-v0.1",
load_in_4bit=True,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "./lora-model-output/final_model")
# Generate response
prompt = """[SENTIMENT: NEGATIVE]
[INSTRUCTION] Respond empathetically to a student.
[CONTEXT] I'm stressed about exams
[RESPONSE]"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("[RESPONSE]")[-1])Run the demo here
Features:
- Real-time conversation
- Adjustable sentiment, temperature, response length
- Conversation history
- Export transcript
OcelotBot/
├── data-prep/
│ ├── data-clean.py # Clean raw datasets
│ ├── lora-prep.py # Pre-process datasets for LoRA fine-tuning
│
├── datasets/
│ ├── jsonl-datasets/ # Cleaned datasets
│ ├── lora-datasets/ # LoRA-prepared datasets
│ └── raw-datasets/ # Raw datasets downloaded for ease of use
│
├── lora-model-output/
│ ├── final_model/ # Final trained model
│ ├── training_metrics.json # Results of training model
│
├── sentiment-labeling/
│ ├── labeling-validation.py # Validation of auto-labeling for initial dataset
│ ├── validation-sentiment.csv # Manually-labeled validation of auto-labeling
│
├── chatbot_demo.ipynb # Notebook prepared with Gradio demo of final model
├── model_training.ipynb # Notebook for training model on Google Colab
│
├── requirements.txt
- Kaggle: Human Conversation Training Data
- Kaggle: Mental Health Counseling Conversations
- Hugging Face: LMSYS Chatbot Arena
- Google Colab for GPU access
- NVIDIA CUDA for acceleration
- Expand training data to 10,000+ samples
- Add multi-turn conversation memory (beyond 3 turns)
- Implement content safety filters
- Support multilingual responses (Spanish, Mandarin)
- A/B testing with real students
- Deploy as university mental health resource
- Integration with existing student support systems
- Mobile app (iOS/Android)
- Voice interface support
- Personalization based on user history
- Crisis detection and escalation protocols
- Experiment with larger models (Llama-3-70B)
- Explore Mixture-of-Experts (MoE) architecture
- Implement Reinforcement Learning from Human Feedback (RLHF)
- Optimize for edge deployment (ONNX, TensorRT)