Skip to content

Imran-ml/A-Comprehensive-Notebook-on-Fake-News-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

A Comprehensive Notebook on Fake News Prediction

Table of Contents

Introduction

This project addresses the critical issue of fake news detection through thorough data analysis and machine learning. We perform a detailed examination of data distribution and analyze fake news labels using visualizations such as plots and word clouds. Statements are cleaned and analyzed to enhance label classification. The analysis extends to subjects, speakers, job titles, state information, party affiliations, and venues, with strategic groupings to improve label differentiation. Numeric data features are also assessed through visualizations. Multiple models are trained and evaluated, with a comprehensive comparison to identify the most effective approach. This notebook provides a robust framework for predicting fake news, combining advanced data analysis and predictive modeling.

Fake News Label Distribution

image

image

Words which are in the barely-true news

image

Words which are in the half-true news

image

Words which are in the mostly-true news

image

Words which are in the TRUE news

image

Words which are in the False news

image

Which are in the pants-fire news

image

Top 25 frequently words in news statement text

image

Distribution of the Subjects

image

image

Top 10 Speakers

image

Least 10 Speakers

image

Distribution of Speaker's Job Title

image

image

Top 10 states in the Data

image

Least 10 states in Data

image

Distribution of Party Affiliation

image

image

Distribution of Venue

image

image

Random Forest

Average Confusion Matrix:

image

Total Average Accuracy of Random Forest Classifier is : 0.9123319970046504

Naive Bayes

Average Confusion Matrix:

image

Total Average Accuracy of Naive bayes is : 0.9990616141191161

Neural Networks

image

Total Average Accuracy of Neural Network is : 0.9464310973295952

Decision Trees

Average Confusion Matrix:

image

Total Average Accuracy of Decision Trees is : 0.8853491756214755

Comparison of all Algorithms Results

image

image

image

Best Model is Naive Bayes because of accuracy and the conevrging time is also fast

image

Environment Setup

Prerequisites: Ensure Python 3.6 or newer is installed on your system.

  1. Create a Virtual Environment:

    • Install virtualenv if you prefer it over the built-in venv (optional):
      pip install virtualenv
    • Create the environment:
      • With venv (Python 3.3+):
        python -m venv env
      • Or, with virtualenv:
        virtualenv env
    • Activate the environment:
      • Windows: env\Scripts\activate
      • Unix/MacOS: source env/bin/activate
    • To deactivate: deactivate
  2. Dependencies: Ensure all dependencies are listed in requirements.txt. Install them using:

    pip install -r requirements.txt

Installation Instructions

To use this project, clone the repository and set up the environment as follows:

  1. Clone the Repository:
    https://github.com/Imran-ml/A-Comprehensive-Notebook-on-Fake-News-Prediction.git
  2. Setup the Environment:
    • Navigate to the project directory and activate the virtual environment.
    • Install the dependencies from requirements.txt.

Resources

License

This project is made available under the MIT License.

Conclusion

In this project, we have measured the progress of research under machine learning techniques and intelligent methods using LIAR data to detect fake news. We have looked into different datasets to find better data sets for experiments. Previous research showed more research on the LIAR dataset. So our experience also agreed that the LIAR dataset is the best, most accurate, and most reliable data source. After finalizing data sets and keeping in view our goal, we have implemented intelligent machine learning techniques and mentioned results in figures and tables. We found different results by using general data set. According to our sole purpose we have achieved maximum accuracy by using the homogenous nature of the classifier known as Random Forest, Neural Network, Decision Tree, and Naïve Bayes. We used the k-fold cross validation approach to make parts in k-fold 1, k-fold 2, k-fold 3, k-fold 4, and k-fold 5. All these algorithms are used for each k-fold and also calculate the running time for each. In the end, concluded that the Naïve Bayes algorithm outperforms as compared to the other algorithms due to its high evaluation measure values and convergence time.

About Author

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors