Introduction

A small tutorial project that includes a basic Scrapy spider.
The spider crawls the miamammausalinux.org news section and extracts article titles and URLs, stopping when it reaches a user-defined maximum number of pages.

Features

Configurable maximum number of pages to scrape
Save results in JSON format

Installation

TO DO

Usage

Command

scrapy crawl miamammausalinux -a max_pages=1 -O output.json

Parameters

max_pages (optional): Maximum number of pages to crawl. If omitted, the spider will crawl until no more pages are available.

Output file example

[
{"title": "Title1", "link": "link1"},
{"title": "Title2", "link": "link2"}
...
]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tutorial		tutorial
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Features

Installation

Usage

Command

Parameters

Output file example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Features

Installation

Usage

Command

Parameters

Output file example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages