Skip to content

MR7star/Fake-News-Detector

 
 

Repository files navigation

📰 Fake News Detection System using Hybrid Data Structures

A Data Structures and Algorithms case study demonstrating efficient news classification using Trie and HashMap implementations.


📘 Overview

This project implements a Fake News Detection System that leverages core data structures to analyze and classify news articles as real or fake. By combining Trie and HashMap data structures with intelligent keyword analysis, the system performs efficient textual classification based on keyword frequency, patterns, and contextual analysis.

Unlike traditional machine learning approaches, this project showcases how fundamental data structures can effectively tackle text classification challenges when designed and utilized strategically.


🧠 Motivation

With the exponential spread of misinformation across digital platforms, detecting fake news has become one of the most critical challenges of our time. This project demonstrates that:

  • Core data structures can perform meaningful text analysis
  • Algorithmic approaches complement machine learning solutions
  • Efficient design can achieve classification without heavy computational overhead

⚙️ Key Features

  • Trie Implementation - Prefix-based storage and fast keyword lookups
  • HashMap Integration - O(1) word frequency mapping and retrieval
  • Text Preprocessing Pipeline - Cleaning, normalization, and tokenization
  • Dual Dataset Analysis - Comparative analysis between real and fake articles
  • Modular Architecture - Clean, extensible, beginner-friendly codebase
  • Suspicious Keyword Detection - Pattern matching against known misinformation indicators

🧩 Data Structures Used

Data Structure Purpose
Trie Stores and searches suspicious/frequent words using prefix-based lookups for efficient pattern detection
HashMap Maps keywords to frequency and context scores for instant O(1) retrieval
Queue (optional) Sequential processing of text blocks for stepwise analysis
Graph (optional) Represents relationships between co-occurring words and misinformation spread patterns

📂 Project Structure

fake-news-detector/
├── dataset/
│   ├── fake.csv              # Curated fake news articles
│   ├── real.csv              # Curated real news articles
├── trie.py                   # Trie data structure implementation
├── hashmap.py                # HashMap implementation
├── graph.py                  # Graph structure (optional)
├── article_fetcher.py        # Dataset loading utilities
├── main.py                   # Main execution script
├── demo.py                   # Demo/example usage
├── suspicious_keywords.txt   # List of known fake news indicators
├── config.py                 # Configuration settings
├── comprehensive_test.py     # Full test suite
├── requirements.txt          # Python dependencies
└── README.md                 # Project documentation

📊 Dataset

Source:

Details:

  • Custom-curated real.csv and fake.csv extracted from the Kaggle dataset
  • Cleaned and preprocessed to remove nulls, special characters, and redundant fields
  • Approximately 20,000+ entries balanced between real and fake categories

Format Example:

text source label
"Breaking: Govt launches new scheme" reuters.com real
"Alien ship spotted over city!" clickbaitnews.net fake

🚀 Installation

Prerequisites

  • Python 3.7 or higher
  • pip package manager

Setup

Clone the repository:

git clone https://github.com/Nikhil-Jones/Fake-News-Detector.git
cd Fake-News-Detector

💻 Usage

Running the Main Program

python main.py

Running Tests

# Comprehensive test suite
python comprehensive_test.py

🧪 Workflow

  1. Load Dataset - Import and parse real.csv and fake.csv
  2. Preprocess Text - Clean, normalize, and tokenize articles
  3. Build Trie - Store processed keywords for pattern detection
  4. Populate HashMap - Map keyword frequencies and context scores
  5. Pattern Matching - Compare against suspicious keyword lists
  6. Classification - Classify articles as Fake or Real based on combined metrics
  7. Output Results - Display classification with confidence scores

🔧 Configuration

Edit config.py to customize:

# Example configuration
SUSPICIOUS_KEYWORDS_FILE = 'suspicious_keywords.txt'
MIN_WORD_LENGTH = 3
MAX_FREQUENCY_THRESHOLD = 100
CLASSIFICATION_THRESHOLD = 0.75

👨‍💻 Team Members

Member Role
Nikhil Jones A (CB.SC.U4CSE24031) Data Preprocessing, Trie Implementation & Core Logic
Gubba Rohan (CB.SC.U4CSE24016) Graph, Integration & Optimization
Muthu Rupesh MJ (CB.SC.U4CSE24030) HashMap Development & Dataset Creation
Manohar Ravva (CB.SC.U4CSE24040) HashMap Development

🧭 Future Enhancements

  • Hybrid ML Model - Integrate Naive Bayes/Logistic Regression for improved accuracy
  • Graph Visualization - Visualize word relationships and misinformation spread patterns
  • Web Interface - Real-time fake news detection through a web application
  • Multilingual Support - Expand to Hindi, Tamil, and other regional languages
  • API Development - RESTful API for integration with external applications
  • Performance Optimization - Parallel processing for large-scale dataset analysis

💬 Acknowledgements

This project was developed as part of a Data Structures and Algorithms case study, demonstrating how efficient hybrid data structures can perform meaningful text classification without heavy machine learning dependencies.

Dataset Source:
Emine YETM, Fake News Detection Datasets, Kaggle


About

Efficient fake news classifier built with Trie and HashMap data structures. Demonstrates algorithmic text analysis for misinformation detection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%