Author: Yusuf Adamu
Project: Prompt Injection Detection in TinyLLaMA Chatbots using LLM Guard
Environment: Google Colab | Python | Transformers | Hugging Face | Scikit-learn
This project demonstrates the integration of LLM Guard with the TinyLLaMA-1.1B-Chat model to detect and defend against prompt injection attacks. The system combines input and output scanning to ensure safe and reliable chatbot responses, evaluated using adversarial and safe prompt datasets.
- β
Input sanitization with
PromptInjectionandBanTopicsscanners - β Unsafe prompt blocking with risk score explanation
- β End-to-end pipeline: scan β sanitize β respond or reject
- β Evaluation on Safe-Gaurd Prompt Injection Dataset
- β Metrics: Accuracy, F1, ROC-AUC, Confusion Matrix
| Metric | Value |
|---|---|
| Accuracy | 94.6% |
| Precision | 99.8% |
| Recall | 82.8% |
| F1 Score | 90.5% |
| ROC AUC | ~0.95 |
LLMGAURD_Project.ipynbβ Main implementation in ColabLLMGAURD_Project.pyβ Full implementation exported from Google ColabREADME.mdβ Project overview (this file)
For questions or collaboration: yusufadamu.research@gmail.com