A Comprehensive Notebook on Fake News Prediction

Introduction

This project addresses the critical issue of fake news detection through thorough data analysis and machine learning. We perform a detailed examination of data distribution and analyze fake news labels using visualizations such as plots and word clouds. Statements are cleaned and analyzed to enhance label classification. The analysis extends to subjects, speakers, job titles, state information, party affiliations, and venues, with strategic groupings to improve label differentiation. Numeric data features are also assessed through visualizations. Multiple models are trained and evaluated, with a comprehensive comparison to identify the most effective approach. This notebook provides a robust framework for predicting fake news, combining advanced data analysis and predictive modeling.

Fake News Label Distribution

Words which are in the barely-true news

Words which are in the half-true news

Words which are in the mostly-true news

Words which are in the TRUE news

Words which are in the False news

Which are in the pants-fire news

Top 25 frequently words in news statement text

Distribution of the Subjects

Top 10 Speakers

Least 10 Speakers

Distribution of Speaker's Job Title

Top 10 states in the Data

Least 10 states in Data

Distribution of Party Affiliation

Distribution of Venue

Random Forest

Average Confusion Matrix:

Total Average Accuracy of Random Forest Classifier is : 0.9123319970046504

Naive Bayes

Average Confusion Matrix:

Total Average Accuracy of Naive bayes is : 0.9990616141191161

Neural Networks

Total Average Accuracy of Neural Network is : 0.9464310973295952

Decision Trees

Average Confusion Matrix:

Total Average Accuracy of Decision Trees is : 0.8853491756214755

Comparison of all Algorithms Results

Best Model is Naive Bayes because of accuracy and the conevrging time is also fast

Environment Setup

Prerequisites: Ensure Python 3.6 or newer is installed on your system.

Create a Virtual Environment:
- Install virtualenv if you prefer it over the built-in venv (optional):
```
pip install virtualenv
```
- Create the environment:
  - With venv (Python 3.3+):
```
python -m venv env
```
  - Or, with virtualenv:
```
virtualenv env
```
- Activate the environment:
  - Windows: env\Scripts\activate
  - Unix/MacOS: source env/bin/activate
- To deactivate: deactivate
Dependencies: Ensure all dependencies are listed in requirements.txt. Install them using:
```
pip install -r requirements.txt
```

Installation Instructions

To use this project, clone the repository and set up the environment as follows:

Clone the Repository:

https://github.com/Imran-ml/A-Comprehensive-Notebook-on-Fake-News-Prediction.git

Setup the Environment:
- Navigate to the project directory and activate the virtual environment.
- Install the dependencies from requirements.txt.

Resources

Kaggle Notebook: View Notebook
Dataset: View Dataset

License

This project is made available under the MIT License.

Conclusion

In this project, we have measured the progress of research under machine learning techniques and intelligent methods using LIAR data to detect fake news. We have looked into different datasets to find better data sets for experiments. Previous research showed more research on the LIAR dataset. So our experience also agreed that the LIAR dataset is the best, most accurate, and most reliable data source. After finalizing data sets and keeping in view our goal, we have implemented intelligent machine learning techniques and mentioned results in figures and tables. We found different results by using general data set. According to our sole purpose we have achieved maximum accuracy by using the homogenous nature of the classifier known as Random Forest, Neural Network, Decision Tree, and Naïve Bayes. We used the k-fold cross validation approach to make parts in k-fold 1, k-fold 2, k-fold 3, k-fold 4, and k-fold 5. All these algorithms are used for each k-fold and also calculate the running time for each. In the end, concluded that the Naïve Bayes algorithm outperforms as compared to the other algorithms due to its high evaluation measure values and convergence time.

About Author

Name: Muhammad Imran Zaman
Email: imranzaman.ml@gmail.com
Professional Links:
- Kaggle: Profile
- LinkedIn: Profile
- Google Scholar: Profile
- YouTube: Channel
- HuggingFace: Profile
Project Repository: GitHub Repo

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Complete Code.ipynb		Complete Code.ipynb
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

A Comprehensive Notebook on Fake News Prediction

Table of Contents

Introduction

Fake News Label Distribution

Words which are in the barely-true news

Words which are in the half-true news

Words which are in the mostly-true news

Words which are in the TRUE news

Words which are in the False news

Which are in the pants-fire news

Top 25 frequently words in news statement text

Distribution of the Subjects

Top 10 Speakers

Least 10 Speakers

Distribution of Speaker's Job Title

Top 10 states in the Data

Least 10 states in Data

Distribution of Party Affiliation

Distribution of Venue

Random Forest

Naive Bayes

Neural Networks

Decision Trees

Comparison of all Algorithms Results

Best Model is Naive Bayes because of accuracy and the conevrging time is also fast

Environment Setup

Installation Instructions

Resources

License

Conclusion

About Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages