This project aims to detect fake news using Python and machine learning. The model analyzes the textual content of online articles to classify them as FAKE or REAL based on linguistic and statistical patterns.
- Data Loading & Cleaning
- Load the CSV file, check for missing/null values, and clean the text (removing punctuation, stopwords, and converting to lowercase).
- Exploratory Data Analysis (EDA)
-
Visualize data distribution between FAKE and REAL labels.
-
Analyze word frequency using WordCloud or CountVectorizer.
- Text Preprocessing
- Tokenization and lemmatization
- Stopword removal
- Vectorization using TF-IDF
- Model Training
-
Trained several classifiers, including:
-
Logistic Regression
-
Naive Bayes
-
Support Vector Machine (SVM)
-
Random Forest
5.Model Evaluation
-
Measured model performance using:
-
Accuracy
-
Precision
-
Recall
-
F1-Score
-
Confusion Matrix
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- nltk
- wordcloud
https://github.com/lutzhamel/fake-news.git
-
Achieved ~95% accuracy with Logistic Regression and TfidfVectorizer.
-
The model effectively distinguishes between real and fake news based on text content.