Story Scraper and Summarizer

A Python-based project to scrape stories from the web and generate concise summaries of each chapter using NLP techniques. This tool is designed for readers who want to quickly grasp the essence of a story

Overview

This project aims to:

Scrape story chapters from websites that host them.
Summarize the chapters using pre-trained NLP models.
Save the summaries for easy reading or further analysis.

The project is designed to be lightweight and efficient, making it suitable for users with limited hardware resources.

Features

Web Scraping: Extracts chapters from websites using BeautifulSoup or Scrapy.
Text Summarization: Generates concise summaries of each chapter using Transformers/NLP models from Hugging Face.
Structured Storage: Saves scraped chapters and summaries in a structured format.
Optional Web App: Can be deployed as a web app for browsing and reading summaries.
Batch Processing: Supports batch processing of chapters for efficient summarization.
Customizable Grouping: Allows users to specify how many chapters should be grouped together for summarization.
GPU Optimization: Utilizes GPU acceleration for faster processing when available.
Novel Overview: Generates an overall summary of the entire novel based on individual chapter summaries.

Installation

Clone the repository:

git clone https://github.com/omarsrstt/story_summarizer.git
cd story_summarizer

Install the required dependencies:
```
pip install -r requirements.txt
```
Configure the config.json file with the appropriate website details and selectors.

Usage

Scraping Chapters

To scrape chapters from a website, run the scraper.py script:

python scraper.py --website <website_name> --novel <novel_name> --chapter_num <chapter_number>

website: The name of the website to scrape from (as defined in config.json).
novel: The name of the novel to scrape.
chapter_num: (Optional) The specific chapter number to scrape. If omitted, all chapters will be scraped.

Summarizing Chapters

To summarize the scraped chapters, run the summarizer.py script:

python summarizer.py --novel <novel_name> --model <model_name> --group-size <group_size> --chapters <chapter_selection>

novel: The name of the novel to summarize.
model: (Optional) The NLP model to use for summarization (default: meta-llama/Meta-Llama-3.1-8B-Instruct).
group-size: (Optional) The number of chapters to group together for summarization (default: 5).
chapters: (Optional) Specific chapters or ranges to summarize (e.g., 1,3-5,7).

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Story Scraper and Summarizer

Overview

Features

Installation

Usage

Scraping Chapters

Summarizing Chapters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Story Scraper and Summarizer

Overview

Features

Installation

Usage

Scraping Chapters

Summarizing Chapters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages