You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Python-based web scraping tool designed to extract, summarize, and store articles from various websites. Scrap Articles provides both CLI and RESTful API interfaces, supporting flexible usage across different environments.
Features
Feature
Description
Web Scraping
Extracts titles, authors, and content from websites using BeautifulSoup.
Summarization
Summarizes content using the Google Gemini API.
Database Integration
Stores articles in SQLite via SQLAlchemy ORM.
CLI Interface
Command-line access to all major functionalities using Click.
API Interface
FastAPI-powered REST endpoints for programmatic access.
Docker Support
Containerized deployment using Docker and Docker Compose.
Tech Stack
Component
Technology
Backend Framework
FastAPI
Scraping Library
BeautifulSoup, Requests
Database
SQLite + SQLAlchemy
CLI Tool
Click
Summarization API
Google Gemini API
Containerization
Docker, Docker Compose
Env Management
Python-dotenv
🔧 Installation
1. Clone the Repository
git clone https://github.com/happyrao78/scrap-articles.git
cd scrap-articles
For major features or bugs, open an issue first to discuss.
About
A Python-powered web scraping tool with built-in article summarization, CLI and API interfaces, and Docker support. Ideal for extracting and managing web content at scale.