Skip to content

apurva1334/AI-text-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  AI-Powered Text Summarizer

Python Streamlit spaCy Transformers License

Transform lengthy documents, articles, and web pages into concise summaries using advanced NLP.

Demo โ€ข Features โ€ข Installation โ€ข Usage โ€ข Architecture


๐Ÿ“– Overview

AI-Powered Text Summarizer is a comprehensive NLP application that analyzes and condenses large volumes of text into concise, meaningful summaries.
Built with Python, Streamlit, and Transformers, it supports multiple input sources โ€” text, files (PDF, DOCX, TXT), and website URLs โ€” making it a versatile summarization tool for researchers, developers, and professionals.

Demo


โœจ Features

๐Ÿ”„ Multi-Source Input

  • ๐Ÿ“ Direct Text Input โ€” Paste or type text directly
  • ๐Ÿ“„ File Upload โ€” Supports TXT, PDF, and DOCX formats
  • ๐ŸŒ Website URLs โ€” Extracts and summarizes content from web pages

๐Ÿค– Dual Summarization Methods

  • ๐Ÿ“Š Extractive Summarization โ€” Identifies key sentences using TextRank and TF-IDF
  • ๐ŸŽจ Abstractive Summarization โ€” Generates human-like summaries using Transformer models (BART)

๐ŸŽฏ Advanced Capabilities

  • ๐Ÿ“ˆ Text statistics: word count, reduction rate, processing time
  • ๐Ÿ” Smart handling of complex document structures
  • ๐Ÿ“ฑ Clean and modern Streamlit UI
  • ๐Ÿ’พ Export summaries as downloadable text files

๐Ÿ›  Technical Highlights

  • โšก Real-time progress indicators
  • ๐Ÿ”ง Adjustable summary length and options
  • ๐Ÿ“Š Built-in analytics and performance metrics

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+
  • pip package manager

Installation

# Clone the repository
git clone <repository-url>
cd ai-text-summarizer

# Create a virtual environment
python -m venv venv

# Activate it
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Download NLP models
python -m spacy download en_core_web_sm
python -c "import nltk; nltk.download('punkt')"

Run the Application

streamlit run app.py

Access the app at http://localhost:8501


๐Ÿ“ Project Structure

ai-text-summarizer/
โ”œโ”€โ”€ app.py                 # Streamlit main app
โ”œโ”€โ”€ requirements.txt       # Dependencies
โ”œโ”€โ”€ utils/                 # Processing modules
โ”‚   โ”œโ”€โ”€ file_processor.py  # File parsing (PDF, DOCX, TXT)
โ”‚   โ”œโ”€โ”€ summarizer.py      # Summarization algorithms
โ”‚   โ””โ”€โ”€ web_scraper.py     # Website content extraction
โ”œโ”€โ”€ static/                # Styling and scripts
โ”‚   โ”œโ”€โ”€ css/style.css
โ”‚   โ””โ”€โ”€ js/script.js
โ”œโ”€โ”€ templates/             # HTML templates
โ”‚   โ””โ”€โ”€ index.html
โ””โ”€โ”€ test_files/            # Sample test files
    โ”œโ”€โ”€ sample.txt
    โ””โ”€โ”€ sample.pdf

๐ŸŽฎ Usage Guide

  1. Select Input Method
    • Paste text, upload files, or enter website URLs.
  2. Configure Settings
    • Choose Extractive or Abstractive summarization.
    • Adjust summary length (10โ€“500 words).
  3. Generate Summary
    • Click ๐Ÿš€ Generate Summary to view real-time progress.
  4. Export
    • Download the summary as text or copy it directly.

๐Ÿ— Architecture

Core Components

1. Input Processing Layer

  • File Processor (PDF, DOCX, TXT)
  • Web Scraper (BeautifulSoup)
  • Text Normalizer

2. NLP Engine

  • Extractive: spaCy + NLTK + TextRank/TF-IDF
  • Abstractive: Transformer models (Facebook BART)
  • Context-aware, fluent, and coherent summaries

3. Presentation Layer

  • Streamlit front-end
  • Live updates and summary statistics
  • Export options

๐Ÿงฉ Tech Stack

Component Technology Purpose
Frontend Streamlit Web UI
NLP Processing spaCy, NLTK Tokenization, parsing
AI Models Transformers (BART) Abstractive summarization
File Handling PyPDF2, python-docx Input parsing
Web Scraping BeautifulSoup4 Extracting content from URLs

๐Ÿ”ง Configuration & Customization

  • Choose summarization type and length.
  • Enable/disable statistics and key phrase highlighting.
  • No external configuration files required โ€” all settings via UI.

๐Ÿ“Š Performance

Feature Metric
Max Input Length 10,000+ words
Processing Time 2โ€“10 seconds
Text Reduction 60โ€“80%
Accuracy High contextual retention

Supported Formats:

  • โœ… TXT
  • โœ… PDF (non-scanned)
  • โœ… DOCX
  • โœ… Web URLs (static pages)

๐Ÿ› Troubleshooting

Common Issues:

# Missing modules
pip install -r requirements.txt

# spaCy model missing
python -m spacy download en_core_web_sm

# NLTK data missing
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"

Performance Tips:

  • Use Extractive for faster results.
  • Start with medium summary length for long docs.
  • Ensure stable internet for model downloads.

๐Ÿค Contributing

Contributions are welcome!
To contribute:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

Possible Improvements

  • More file format support
  • Multilingual summarization
  • Enhanced scraping
  • Custom model training

๐Ÿ“„ License

This project is licensed under the MIT License.
See the LICENSE file for details.


๐Ÿ™ Acknowledgments


Built with โค๏ธ using Python and Modern NLP Technologies
Transform the way you process information with AI-powered summarization.

About

AI-Powered Text Summarizer built with Python and NLP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors