Skip to content

cedrugs/Chatguard-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chatguard API

Chat smarter, speak kinder.

Python FastAPI Transformers Torch Docker License

Production-ready FastAPI service for toxic language detection using a fine‑tuned XLM‑RoBERTa model hosted on Hugging Face Hub. Modular, type‑safe, and deployable on CPU or GPU.


Features

  • Loads a Hugging Face model at startup and serves low-latency inference
  • Clean modular layout (config, model, schemas, routes)
  • Single and batch prediction with optional probability thresholding
  • JSON responses with stable probability keys (clean, toxic)
  • OpenAPI docs via /docs and /redoc
  • Health endpoint at /health

Project Structure

toxicity-api/
├─ app/
│  ├─ main.py               # FastAPI entry point
│  ├─ core/
│  │  ├─ config.py          # Env + settings
│  │  └─ model.py           # Model loader + inference logic
│  ├─ api/
│  │  └─ routes.py          # HTTP endpoints
│  └─ schemas/
│     └─ predict.py         # Pydantic I/O models
├─ requirements.txt
└─ README.md

Quickstart

1) Install

pip install -r requirements.txt

2) Configure

export MODEL_ID=your-username/toxic-xlmr
# optional
export DEVICE=cuda      # or cpu
export MAX_LENGTH=256

Or you can use the default on https://huggingface.co/cedrugs/toxic-xlmr (cedrugs/toxic-xlmr)

3) Run

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Open docs at http://localhost:8000/docs.


Environment Variables

Variable Description Default
MODEL_ID Hugging Face model repo (e.g. user/model) your-username/toxic-xlmr
DEVICE cpu or cuda (auto-detect if unset) auto
MAX_LENGTH Max token length per input 256

API

Health

GET /health

Response:

{ "status": "ok", "model_loaded": true, "device": "cuda" }

Metadata

GET /

Predict (single)

POST /v1/predict

Request:

{ "text": "you are so dumb", "threshold": 0.5 }

Response:

{
  "label": "toxic",
  "probs": { "clean": 0.12, "toxic": 0.88 },
  "latency_ms": 4.2
}

Predict (batch)

POST /v1/batch_predict

Request:

{ "texts": ["you suck", "have a nice day"], "threshold": 0.5 }

Response:

{
  "results": [
    { "label": "toxic", "probs": { "clean": 0.09, "toxic": 0.91 }, "latency_ms": 2.1 },
    { "label": "clean", "probs": { "clean": 0.97, "toxic": 0.03 }, "latency_ms": 2.1 }
  ],
  "latency_ms": 4.2
}

Docker

Build & run:

docker build -t chatguard-api .
docker run --rm -p 8000:8000 \
  -e MODEL_ID=your-username/toxic-xlmr \
  -e DEVICE=cpu \
  chatguard-api

Production Notes

  • For GPU: set DEVICE=cuda and ensure CUDA drivers are available.
  • Prefer one worker per GPU. For CPU-bound scaling:
    gunicorn -k uvicorn.workers.UvicornWorker -w 4 app.main:app --bind 0.0.0.0:8000
  • Pin model revisions in MODEL_ID for reproducible deployments (e.g., user/model@sha).
  • Consider enabling request timeouts and reverse proxying behind Traefik/Caddy.

Model Requirements

This API expects a Hugging Face repo containing a binary classifier with standard files:

pytorch_model.bin
config.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json

Pushing to Hub example:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
model.push_to_hub("toxic-xlmr")
tokenizer.push_to_hub("toxic-xlmr")

License

MIT


Acknowledgments

Built with FastAPI, Transformers, and PyTorch. Deployed anywhere from laptops to GPUs in the cloud.

About

Chat smarter, speak kinder.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors