NoKe: A Fake News Detector Powered by Machine Learning

A MLOps project and Chrome extension that detects fake news using a variety of classification models and machine learning techniques.

Technologies and Concepts

Machine Learning models including a baseline Random Forest implemented with Scikit-learn and a more advanced RoBERTa model built using HuggingFace Transformers and PyTorch Lightning.
Natural language processing (NLP) techniques, including TF-IDF vectorization for traditional models and transformer-based representations for deep learning.
Custom feature engineering pipeline, abstracted through dedicated featurizer classes that support training, caching, and serialization for consistent preprocessing.
Model evaluation using standard classification metrics such as F1 score, accuracy, AUC, and confusion matrix, supported by the scikit-learn metrics suite.
Model interpretation and error analysis enabled by SHAP (Shapley Additive Explanations) to provide insight into feature importance and decision boundaries.
Data version control and pipeline reproducibility managed using DVC (Data Version Control).
Experiment tracking and configuration handled through structured logging and MLflow integration.
Test infrastructure including unit and integration tests using PyTest and data validation with Great Expectations.
Deployment-ready backend implemented using FastAPI, exposing RESTful endpoints.
Docker for deployment workflow, and GitHub actions for constinous integration (CI).
Web browser integration through a custom Chrome extension that interfaces with the deployed model for direct content evaluation.

Get started

This project uses uv as project manager. To install it, follow the instruction here for your operating system.

Once uv is installed, clone the repository:

git clone [REPO LINK HERE]

Now, activate the virtual environment and get all the dependencies installed:

uv venv
uv sync

Get the data (training, testing, and valiation data) from this repository and put it in data/raw.

Train the models

To train the random forest baseline, run the following from the root directory:

dvc repro train-random-forest

In case you want to run the pipeline manually (without using dvc commands), you can run the following commands:


python scripts/normalize_and_clean_data.py --train-data-path data/raw/train2 tsv --val-data-path data/raw/val2.tsv --test-data-path data/raw/test2.tsv --output-dir data/processed

python scripts/compute_credit_bins.py --train-data-path data/processed/cleaned_train_data.json --output-path data/processed/optimal_credit_bins.json

python fake_news/train.py --config-file config/random_forest.json

python fake_news/train.py --config-file config/roberta.json

Deploy

Having trained the model, you can now deploy it. Build your deployment Docker image (this may take some minutes):

docker build . -f deploy/Dockerfile.serve -t fake-news-deploy

Now, run the model locally via a REST API with:

docker run -p 8000:80 -e MODEL_DIR="/home/fake-news/random_forest" -e MODULE_NAME="fake_news.server.main" fake-news-deploy

It should now be possible to interact with the API. For instance, you can do:

curl -X POST http://127.0.0.1:8000/api/predict-fakeness -d '{"text": "you can write some string here :)"}'

Tools like Postman can also be used.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.dvc		.dvc
.github/workflows		.github/workflows
config		config
data		data
deploy		deploy
fake_news		fake_news
notebooks		notebooks
scripts		scripts
tests		tests
.dvcignore		.dvcignore
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NoKe: A Fake News Detector Powered by Machine Learning

Technologies and Concepts

Get started

Train the models

Deploy

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NoKe: A Fake News Detector Powered by Machine Learning

Technologies and Concepts

Get started

Train the models

Deploy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages