Fake News Detection

This project replicates and discusses the findings from the paper "Exploring the Generalisability of Fake News Detection Models" by Nathaniel Hoy and Theodora Koulouri (2022). It evaluates the generalization of six traditional machine learning models across different preprocessing techniques and datasets.

Grading

Adherence to guidelines and report structure, quality of writing: 5 points
Relevance of data analysis: 3 points
Relevance of state-of-the-art analysis: 3 points
Relevance of the proposed model: 3 points
Implementation of the model: 3 points
Analysis of results: 3 points

Objective

The goal is to evaluate how well six machine learning models (Logistic Regression, SVM, Random Forest, Gradient Boosting, AdaBoost, Neural Network) generalize to unseen data. We compare five preprocessing methods: Bag-of-Words (BoW), TF-IDF, Word2Vec, BERT and Linguistic Cues.

Datasets

ISOT Fake News Dataset: A benchmark dataset with 44,898 articles, including 23,481 fake and 21,417 real news articles. It is used for training the models.
Fake or Real News (FoR) Dataset: An external dataset with 6,296 articles to test the generalization of the models.

Models

Logistic Regression
Support Vector Machines (SVM)
Random Forest
Gradient Boosting
AdaBoost
Neural Network (NN)

Preprocessing Methods

BoW and TF-IDF: Full text normalization.
Word2Vec, BERT & LC: Lighter preprocessing for embedding-based models.

Experiment

Train & test the models on the ISOT and FoR datasets.
Test their generalization by doing a cross-evaluation.
Compare model performance using accuracy, precision, recall, F1-score, and AUC.
Do a cross dataset evaluation to capture generalization capacity.

Results

Models trained on ISOT achieved near-perfect accuracy but showed performance drops on the external FoR dataset (and vice-verca), highlighting challenges in generalization.

Installation

Clone this repository:

git clone https://github.com/marcderoo/fake-news-detection.git

Install dependencies:
```
pip install -r requirements.txt
```

Contributors

Chappuis Maxime & Deroo Marc

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
images		images
paper		paper
.gitignore		.gitignore
Cross_ntbk.ipynb		Cross_ntbk.ipynb
FoR_ntbk.ipynb		FoR_ntbk.ipynb
Isot_ntbk.ipynb		Isot_ntbk.ipynb
Perfs_Cross.txt		Perfs_Cross.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Detection

Grading

Objective

Datasets

Models

Preprocessing Methods

Experiment

Results

Installation

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fake News Detection

Grading

Objective

Datasets

Models

Preprocessing Methods

Experiment

Results

Installation

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages