A small tutorial project that includes a basic Scrapy spider.
The spider crawls the miamammausalinux.org news section and extracts article titles and URLs, stopping when it reaches a user-defined maximum number of pages.
- Configurable maximum number of pages to scrape
- Save results in JSON format
TO DO
scrapy crawl miamammausalinux -a max_pages=1 -O output.json- max_pages (optional): Maximum number of pages to crawl. If omitted, the spider will crawl until no more pages are available.
[
{"title": "Title1", "link": "link1"},
{"title": "Title2", "link": "link2"}
...
]