This project addresses the critical issue of fake news detection through thorough data analysis and machine learning. We perform a detailed examination of data distribution and analyze fake news labels using visualizations such as plots and word clouds. Statements are cleaned and analyzed to enhance label classification. The analysis extends to subjects, speakers, job titles, state information, party affiliations, and venues, with strategic groupings to improve label differentiation. Numeric data features are also assessed through visualizations. Multiple models are trained and evaluated, with a comprehensive comparison to identify the most effective approach. This notebook provides a robust framework for predicting fake news, combining advanced data analysis and predictive modeling.
Average Confusion Matrix:
Total Average Accuracy of Random Forest Classifier is : 0.9123319970046504
Average Confusion Matrix:
Total Average Accuracy of Naive bayes is : 0.9990616141191161
Total Average Accuracy of Neural Network is : 0.9464310973295952
Average Confusion Matrix:
Total Average Accuracy of Decision Trees is : 0.8853491756214755
Prerequisites: Ensure Python 3.6 or newer is installed on your system.
-
Create a Virtual Environment:
- Install
virtualenvif you prefer it over the built-invenv(optional):pip install virtualenv
- Create the environment:
- With
venv(Python 3.3+):python -m venv env
- Or, with
virtualenv:virtualenv env
- With
- Activate the environment:
- Windows:
env\Scripts\activate - Unix/MacOS:
source env/bin/activate
- Windows:
- To deactivate:
deactivate
- Install
-
Dependencies: Ensure all dependencies are listed in
requirements.txt. Install them using:pip install -r requirements.txt
To use this project, clone the repository and set up the environment as follows:
- Clone the Repository:
https://github.com/Imran-ml/A-Comprehensive-Notebook-on-Fake-News-Prediction.git
- Setup the Environment:
- Navigate to the project directory and activate the virtual environment.
- Install the dependencies from
requirements.txt.
- Kaggle Notebook: View Notebook
- Dataset: View Dataset
This project is made available under the MIT License.
In this project, we have measured the progress of research under machine learning techniques and intelligent methods using LIAR data to detect fake news. We have looked into different datasets to find better data sets for experiments. Previous research showed more research on the LIAR dataset. So our experience also agreed that the LIAR dataset is the best, most accurate, and most reliable data source. After finalizing data sets and keeping in view our goal, we have implemented intelligent machine learning techniques and mentioned results in figures and tables. We found different results by using general data set. According to our sole purpose we have achieved maximum accuracy by using the homogenous nature of the classifier known as Random Forest, Neural Network, Decision Tree, and Naïve Bayes. We used the k-fold cross validation approach to make parts in k-fold 1, k-fold 2, k-fold 3, k-fold 4, and k-fold 5. All these algorithms are used for each k-fold and also calculate the running time for each. In the end, concluded that the Naïve Bayes algorithm outperforms as compared to the other algorithms due to its high evaluation measure values and convergence time.
- Name: Muhammad Imran Zaman
- Email: imranzaman.ml@gmail.com
- Professional Links:
- Project Repository: GitHub Repo


























