From 4b796acc05b253caa4eddab19e4af0593c9ac383 Mon Sep 17 00:00:00 2001 From: Santhosh Sharan B <143705995+sandynaukar@users.noreply.github.com> Date: Fri, 6 Feb 2026 08:55:53 +0530 Subject: [PATCH] Revised README for clarity and structure Updated README to reflect repository structure and usage instructions for the Abstract Summarizer project. --- README.md | 143 ++++++++++++------------------------------------------ 1 file changed, 32 insertions(+), 111 deletions(-) diff --git a/README.md b/README.md index 5effb38..1296e18 100644 --- a/README.md +++ b/README.md @@ -1,129 +1,50 @@ -📄 Abstract Summarizer +# Abstract Summarizer -Abstract Summarizer is a powerful NLP-based tool that condenses long pieces of text into shorter, meaningful summaries while preserving key information and overall context. -It leverages modern deep learning models to generate high-quality extractive and abstractive summaries. +This repository contains various components for abstractive text summarization using the T5 (Text-to-Text Transfer Transformer) model. The project explores both using a pre-trained base T5 model and fine-tuning T5 for improved summarization, along with a demonstration application. -📌 Table of Contents +## Repository Structure -Features +The repository is organized into two main directories: `no-tuning` and `tuning`, each serving a distinct purpose in the summarization workflow, plus a main application in the root. -Installation +### `app.py` (Root Directory) -Usage +This `app.py` in the root directory serves as the primary demonstration of the project's capabilities. It is a Streamlit application designed to showcase summarization using a pre-trained, fine-tuned T5 model hosted on Hugging Face (`admin-sauce/t5-summarizer`). This is intended to be the user-facing application for quick and efficient summarization. -Model Architecture +### `no-tuning/` -Dataset +This directory focuses on demonstrating text summarization using a **generic, pre-trained `t5-base` model** without any custom fine-tuning. It provides a foundational understanding of how T5 can be used for summarization out-of-the-box. -Results +- **`no-tuning/app.py`**: A Flask web application that provides a simple interface for abstractive summarization. Users can input raw text, and the application will generate a summary using the `t5-base` model. It also includes logic to handle and chunk larger input texts for processing. +- **`no-tuning/main.ipynb`**: A Jupyter Notebook illustrating the process of extracting text from various document formats (PDF, DOCX) and then summarizing the extracted content using the `t5-base` model. This notebook provides the experimental code for document processing, which can be integrated into summarization workflows. +- **`no-tuning/static/` & `no-tuning/templates/`**: These subdirectories contain the static assets (CSS) and HTML templates for the Flask web application, defining its user interface and styling. -Contributing +### `tuning/` -🚀 Features +This directory is dedicated to the aspects of **fine-tuning and evaluating T5 models** for specific summarization tasks. It contains scripts and resources related to training and benchmarking custom summarization models. -✅ Extractive Summarization -Selects the most important sentences directly from the original text. +- **`tuning/evaluate.py`**: A Python script designed for evaluating fine-tuned summarization models. It typically uses benchmark datasets (e.g., `big_patent`) and metrics like ROUGE to assess the performance of a model, such as `KipperDev/t5_summarizer_model`. This script helps in understanding the effectiveness of fine-tuning efforts. +- **`tuning/t5/`**: (Hypothesized) This subdirectory is expected to contain the core scripts or Jupyter Notebooks used for the actual fine-tuning process of the T5 model. This is where the base T5 model would be adapted using specific datasets to create specialized summarization models. -✅ Abstractive Summarization -Generates new, human-like summary sentences capturing the main ideas. +## Models Used -✅ Multiple Input Formats +Throughout this project, the following T5 models are utilized: -Plain text files +- **`t5-base`**: The generic, pre-trained T5 model used for baseline summarization in the `no-tuning` components. +- **`admin-sauce/t5-summarizer`**: A fine-tuned T5 model, likely developed within this project or an external resource, used in the main Streamlit `app.py` for high-performance summarization. +- **`KipperDev/t5_summarizer_model`**: Another fine-tuned T5 model, evaluated by the `tuning/evaluate.py` script, showcasing different fine-tuning outcomes or external model comparisons. -PDF documents +## Getting Started -Web pages +To run the different applications: -✅ User-Friendly Interface +- **Streamlit Demo:** + ```bash + streamlit run app.py + ``` +- **Flask Web App (no-tuning):** + ```bash + python no-tuning/app.py + ``` + (Then navigate to `http://127.0.0.1:5000` in your browser) -Command-line interface (CLI) - -Web-based UI - -⚙️ Installation -1️⃣ Clone the repository -git clone https://github.com/sandynaukar/Abstract-Summarizer.git - -2️⃣ Navigate to project directory -cd Abstract-Summarizer - -3️⃣ Create virtual environment -python -m venv venv - -4️⃣ Activate environment - -Windows - -venv\Scripts\activate - - -macOS/Linux - -source venv/bin/activate - -5️⃣ Install dependencies -pip install -r requirements.txt - -🧠 Usage -▶ Command Line Interface - -Summarize a text file: - -python summarize.py --input path/to/your/textfile.txt --output summary.txt - -🌐 Web Interface - -Run: - -python app.py - - -Open browser: - -http://localhost:5000 - -🏗 Model Architecture - -This project uses BART (Bidirectional and Auto-Regressive Transformers) from the Hugging Face Transformers library. - -Why BART? - -Designed for sequence-to-sequence tasks - -Excellent for text generation & summarization - -Combines encoder-decoder transformer architecture - -📚 Dataset - -The model was fine-tuned on the CNN/Daily Mail Dataset, containing: - -📰 300,000+ news articles - -✍ Human-written summaries - -Widely used benchmark for summarization research. - -📊 Results - -Performance evaluated using ROUGE metrics: - -Metric Score -ROUGE-1 44.16 -ROUGE-2 21.28 -ROUGE-L 40.90 - -👉 Indicates strong content retention and fluent summaries. - -🤝 Contributing - -Contributions are welcome! - -Fork the repository - -Create a feature branch - -Submit a pull request - -Feel free to improve models, UI, datasets, or performance. +Further instructions for setting up environments and dependencies would be provided in individual component readmes or a `requirements.txt` file.