A Data Structures and Algorithms case study demonstrating efficient news classification using Trie and HashMap implementations.
This project implements a Fake News Detection System that leverages core data structures to analyze and classify news articles as real or fake. By combining Trie and HashMap data structures with intelligent keyword analysis, the system performs efficient textual classification based on keyword frequency, patterns, and contextual analysis.
Unlike traditional machine learning approaches, this project showcases how fundamental data structures can effectively tackle text classification challenges when designed and utilized strategically.
With the exponential spread of misinformation across digital platforms, detecting fake news has become one of the most critical challenges of our time. This project demonstrates that:
- Core data structures can perform meaningful text analysis
- Algorithmic approaches complement machine learning solutions
- Efficient design can achieve classification without heavy computational overhead
- ✅ Trie Implementation - Prefix-based storage and fast keyword lookups
- ✅ HashMap Integration - O(1) word frequency mapping and retrieval
- ✅ Text Preprocessing Pipeline - Cleaning, normalization, and tokenization
- ✅ Dual Dataset Analysis - Comparative analysis between real and fake articles
- ✅ Modular Architecture - Clean, extensible, beginner-friendly codebase
- ✅ Suspicious Keyword Detection - Pattern matching against known misinformation indicators
| Data Structure | Purpose |
|---|---|
| Trie | Stores and searches suspicious/frequent words using prefix-based lookups for efficient pattern detection |
| HashMap | Maps keywords to frequency and context scores for instant O(1) retrieval |
| Queue (optional) | Sequential processing of text blocks for stepwise analysis |
| Graph (optional) | Represents relationships between co-occurring words and misinformation spread patterns |
fake-news-detector/
├── dataset/
│ ├── fake.csv # Curated fake news articles
│ ├── real.csv # Curated real news articles
├── trie.py # Trie data structure implementation
├── hashmap.py # HashMap implementation
├── graph.py # Graph structure (optional)
├── article_fetcher.py # Dataset loading utilities
├── main.py # Main execution script
├── demo.py # Demo/example usage
├── suspicious_keywords.txt # List of known fake news indicators
├── config.py # Configuration settings
├── comprehensive_test.py # Full test suite
├── requirements.txt # Python dependencies
└── README.md # Project documentation
Source:
- Fake News Detection Datasets by Emine YETM (Kaggle)
- Modified dataset (real.csv & fake.csv) [Google Drive]
Details:
- Custom-curated
real.csvandfake.csvextracted from the Kaggle dataset - Cleaned and preprocessed to remove nulls, special characters, and redundant fields
- Approximately 20,000+ entries balanced between real and fake categories
Format Example:
| text | source | label |
|---|---|---|
| "Breaking: Govt launches new scheme" | reuters.com | real |
| "Alien ship spotted over city!" | clickbaitnews.net | fake |
- Python 3.7 or higher
- pip package manager
Clone the repository:
git clone https://github.com/Nikhil-Jones/Fake-News-Detector.git
cd Fake-News-Detectorpython main.py# Comprehensive test suite
python comprehensive_test.py- Load Dataset - Import and parse
real.csvandfake.csv - Preprocess Text - Clean, normalize, and tokenize articles
- Build Trie - Store processed keywords for pattern detection
- Populate HashMap - Map keyword frequencies and context scores
- Pattern Matching - Compare against suspicious keyword lists
- Classification - Classify articles as Fake or Real based on combined metrics
- Output Results - Display classification with confidence scores
Edit config.py to customize:
# Example configuration
SUSPICIOUS_KEYWORDS_FILE = 'suspicious_keywords.txt'
MIN_WORD_LENGTH = 3
MAX_FREQUENCY_THRESHOLD = 100
CLASSIFICATION_THRESHOLD = 0.75| Member | Role |
|---|---|
| Nikhil Jones A (CB.SC.U4CSE24031) | Data Preprocessing, Trie Implementation & Core Logic |
| Gubba Rohan (CB.SC.U4CSE24016) | Graph, Integration & Optimization |
| Muthu Rupesh MJ (CB.SC.U4CSE24030) | HashMap Development & Dataset Creation |
| Manohar Ravva (CB.SC.U4CSE24040) | HashMap Development |
- Hybrid ML Model - Integrate Naive Bayes/Logistic Regression for improved accuracy
- Graph Visualization - Visualize word relationships and misinformation spread patterns
- Web Interface - Real-time fake news detection through a web application
- Multilingual Support - Expand to Hindi, Tamil, and other regional languages
- API Development - RESTful API for integration with external applications
- Performance Optimization - Parallel processing for large-scale dataset analysis
This project was developed as part of a Data Structures and Algorithms case study, demonstrating how efficient hybrid data structures can perform meaningful text classification without heavy machine learning dependencies.
Dataset Source:
Emine YETM, Fake News Detection Datasets, Kaggle