📰 Fake News Detection System using Hybrid Data Structures

A Data Structures and Algorithms case study demonstrating efficient news classification using Trie and HashMap implementations.

📘 Overview

This project implements a Fake News Detection System that leverages core data structures to analyze and classify news articles as real or fake. By combining Trie and HashMap data structures with intelligent keyword analysis, the system performs efficient textual classification based on keyword frequency, patterns, and contextual analysis.

Unlike traditional machine learning approaches, this project showcases how fundamental data structures can effectively tackle text classification challenges when designed and utilized strategically.

🧠 Motivation

With the exponential spread of misinformation across digital platforms, detecting fake news has become one of the most critical challenges of our time. This project demonstrates that:

Core data structures can perform meaningful text analysis
Algorithmic approaches complement machine learning solutions
Efficient design can achieve classification without heavy computational overhead

⚙️ Key Features

✅ Trie Implementation - Prefix-based storage and fast keyword lookups
✅ HashMap Integration - O(1) word frequency mapping and retrieval
✅ Text Preprocessing Pipeline - Cleaning, normalization, and tokenization
✅ Dual Dataset Analysis - Comparative analysis between real and fake articles
✅ Modular Architecture - Clean, extensible, beginner-friendly codebase
✅ Suspicious Keyword Detection - Pattern matching against known misinformation indicators

🧩 Data Structures Used

Data Structure	Purpose
Trie	Stores and searches suspicious/frequent words using prefix-based lookups for efficient pattern detection
HashMap	Maps keywords to frequency and context scores for instant O(1) retrieval
Queue (optional)	Sequential processing of text blocks for stepwise analysis
Graph (optional)	Represents relationships between co-occurring words and misinformation spread patterns

📂 Project Structure

fake-news-detector/
├── dataset/
│   ├── fake.csv              # Curated fake news articles
│   ├── real.csv              # Curated real news articles
├── trie.py                   # Trie data structure implementation
├── hashmap.py                # HashMap implementation
├── graph.py                  # Graph structure (optional)
├── article_fetcher.py        # Dataset loading utilities
├── main.py                   # Main execution script
├── demo.py                   # Demo/example usage
├── suspicious_keywords.txt   # List of known fake news indicators
├── config.py                 # Configuration settings
├── comprehensive_test.py     # Full test suite
├── requirements.txt          # Python dependencies
└── README.md                 # Project documentation

📊 Dataset

Source:

Details:

Custom-curated real.csv and fake.csv extracted from the Kaggle dataset
Cleaned and preprocessed to remove nulls, special characters, and redundant fields
Approximately 20,000+ entries balanced between real and fake categories

Format Example:

text	source	label
"Breaking: Govt launches new scheme"	reuters.com	real
"Alien ship spotted over city!"	clickbaitnews.net	fake

🚀 Installation

Prerequisites

Python 3.7 or higher
pip package manager

Setup

Clone the repository:

git clone https://github.com/Nikhil-Jones/Fake-News-Detector.git
cd Fake-News-Detector

💻 Usage

Running the Main Program

python main.py

Running Tests

# Comprehensive test suite
python comprehensive_test.py

🧪 Workflow

Load Dataset - Import and parse real.csv and fake.csv
Preprocess Text - Clean, normalize, and tokenize articles
Build Trie - Store processed keywords for pattern detection
Populate HashMap - Map keyword frequencies and context scores
Pattern Matching - Compare against suspicious keyword lists
Classification - Classify articles as Fake or Real based on combined metrics
Output Results - Display classification with confidence scores

🔧 Configuration

Edit config.py to customize:

# Example configuration
SUSPICIOUS_KEYWORDS_FILE = 'suspicious_keywords.txt'
MIN_WORD_LENGTH = 3
MAX_FREQUENCY_THRESHOLD = 100
CLASSIFICATION_THRESHOLD = 0.75

👨‍💻 Team Members

Member	Role
Nikhil Jones A (CB.SC.U4CSE24031)	Data Preprocessing, Trie Implementation & Core Logic
Gubba Rohan (CB.SC.U4CSE24016)	Graph, Integration & Optimization
Muthu Rupesh MJ (CB.SC.U4CSE24030)	HashMap Development & Dataset Creation
Manohar Ravva (CB.SC.U4CSE24040)	HashMap Development

🧭 Future Enhancements

Hybrid ML Model - Integrate Naive Bayes/Logistic Regression for improved accuracy
Graph Visualization - Visualize word relationships and misinformation spread patterns
Web Interface - Real-time fake news detection through a web application
Multilingual Support - Expand to Hindi, Tamil, and other regional languages
API Development - RESTful API for integration with external applications
Performance Optimization - Parallel processing for large-scale dataset analysis

💬 Acknowledgements

This project was developed as part of a Data Structures and Algorithms case study, demonstrating how efficient hybrid data structures can perform meaningful text classification without heavy machine learning dependencies.

Dataset Source:
Emine YETM, Fake News Detection Datasets, Kaggle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📰 Fake News Detection System using Hybrid Data Structures

📘 Overview

🧠 Motivation

⚙️ Key Features

🧩 Data Structures Used

📂 Project Structure

📊 Dataset

🚀 Installation

Prerequisites

Setup

💻 Usage

Running the Main Program

Running Tests

🧪 Workflow

🔧 Configuration

👨‍💻 Team Members

🧭 Future Enhancements

💬 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
dataset		dataset
README.md		README.md
article.txt		article.txt
comprehensive_test.py		comprehensive_test.py
config.py		config.py
graph.py		graph.py
hashmap.py		hashmap.py
main.py		main.py
suspicious_keywords.txt		suspicious_keywords.txt
trie.py		trie.py

Folders and files

Latest commit

History

Repository files navigation

📰 Fake News Detection System using Hybrid Data Structures

📘 Overview

🧠 Motivation

⚙️ Key Features

🧩 Data Structures Used

📂 Project Structure

📊 Dataset

🚀 Installation

Prerequisites

Setup

💻 Usage

Running the Main Program

Running Tests

🧪 Workflow

🔧 Configuration

👨‍💻 Team Members

🧭 Future Enhancements

💬 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages