Transform lengthy documents, articles, and web pages into concise summaries using advanced NLP.
Demo โข Features โข Installation โข Usage โข Architecture
AI-Powered Text Summarizer is a comprehensive NLP application that analyzes and condenses large volumes of text into concise, meaningful summaries.
Built with Python, Streamlit, and Transformers, it supports multiple input sources โ text, files (PDF, DOCX, TXT), and website URLs โ making it a versatile summarization tool for researchers, developers, and professionals.
- ๐ Direct Text Input โ Paste or type text directly
- ๐ File Upload โ Supports TXT, PDF, and DOCX formats
- ๐ Website URLs โ Extracts and summarizes content from web pages
- ๐ Extractive Summarization โ Identifies key sentences using TextRank and TF-IDF
- ๐จ Abstractive Summarization โ Generates human-like summaries using Transformer models (BART)
- ๐ Text statistics: word count, reduction rate, processing time
- ๐ Smart handling of complex document structures
- ๐ฑ Clean and modern Streamlit UI
- ๐พ Export summaries as downloadable text files
- โก Real-time progress indicators
- ๐ง Adjustable summary length and options
- ๐ Built-in analytics and performance metrics
- Python 3.8+
pippackage manager
# Clone the repository
git clone <repository-url>
cd ai-text-summarizer
# Create a virtual environment
python -m venv venv
# Activate it
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Download NLP models
python -m spacy download en_core_web_sm
python -c "import nltk; nltk.download('punkt')"streamlit run app.pyAccess the app at http://localhost:8501
ai-text-summarizer/
โโโ app.py # Streamlit main app
โโโ requirements.txt # Dependencies
โโโ utils/ # Processing modules
โ โโโ file_processor.py # File parsing (PDF, DOCX, TXT)
โ โโโ summarizer.py # Summarization algorithms
โ โโโ web_scraper.py # Website content extraction
โโโ static/ # Styling and scripts
โ โโโ css/style.css
โ โโโ js/script.js
โโโ templates/ # HTML templates
โ โโโ index.html
โโโ test_files/ # Sample test files
โโโ sample.txt
โโโ sample.pdf
- Select Input Method
- Paste text, upload files, or enter website URLs.
- Configure Settings
- Choose Extractive or Abstractive summarization.
- Adjust summary length (10โ500 words).
- Generate Summary
- Click ๐ Generate Summary to view real-time progress.
- Export
- Download the summary as text or copy it directly.
- File Processor (PDF, DOCX, TXT)
- Web Scraper (BeautifulSoup)
- Text Normalizer
- Extractive: spaCy + NLTK + TextRank/TF-IDF
- Abstractive: Transformer models (Facebook BART)
- Context-aware, fluent, and coherent summaries
- Streamlit front-end
- Live updates and summary statistics
- Export options
| Component | Technology | Purpose |
|---|---|---|
| Frontend | Streamlit | Web UI |
| NLP Processing | spaCy, NLTK | Tokenization, parsing |
| AI Models | Transformers (BART) | Abstractive summarization |
| File Handling | PyPDF2, python-docx | Input parsing |
| Web Scraping | BeautifulSoup4 | Extracting content from URLs |
- Choose summarization type and length.
- Enable/disable statistics and key phrase highlighting.
- No external configuration files required โ all settings via UI.
| Feature | Metric |
|---|---|
| Max Input Length | 10,000+ words |
| Processing Time | 2โ10 seconds |
| Text Reduction | 60โ80% |
| Accuracy | High contextual retention |
Supported Formats:
- โ TXT
- โ PDF (non-scanned)
- โ DOCX
- โ Web URLs (static pages)
Common Issues:
# Missing modules
pip install -r requirements.txt
# spaCy model missing
python -m spacy download en_core_web_sm
# NLTK data missing
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"Performance Tips:
- Use Extractive for faster results.
- Start with medium summary length for long docs.
- Ensure stable internet for model downloads.
Contributions are welcome!
To contribute:
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
- More file format support
- Multilingual summarization
- Enhanced scraping
- Custom model training
This project is licensed under the MIT License.
See the LICENSE file for details.
- spaCy โ NLP Toolkit
- Hugging Face โ Transformer Models
- Streamlit โ Interactive Frontend
- NLTK โ Text Processing
Built with โค๏ธ using Python and Modern NLP Technologies
Transform the way you process information with AI-powered summarization.