Skip to content

thomasgpadilla/webscraping

Repository files navigation

webscraping

Sample scripts for web scraping. Typically tied to use cases, but generally extensible to other projects. Scripts help build URL lists to feed to WGET for bulk downloading in a responsible manner (e.g. rate limiting).

fdr_item_urls: scrape item URLs matching a type from the Franklin D Roosevelt Master Speech File, write URLs to TXT or CSV file

frus_section_pdf_urls: scrape all volume URLs from the the University of Wisconsin Madison's Foreign Relations of the United States, proceed to follow each URL to each volume, navigate down to each section of each volume and scrape PDF URLs, write PDF URLs to TXT file

frus_section_parent_volume: scrape all volume URLs from the the University of Wisconsin Madison's Foreign Relations of the United States, proceed to follow each URL to each volume, navigate down to each section and scrape title field (= parent volume of section), write to TXT file

frus_section_title: scrape all volume URLs from the the University of Wisconsin Madison's Foreign Relations of the United States, proceed to follow each URL to each volume, navigate down to each section of each volume and scrape itemmd field (= title of section)

contact: tgpadillajr@gmail.com

About

sample scripts for webscraping

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages