Skip to content
This repository was archived by the owner on Dec 15, 2021. It is now read-only.
/ PDF-Scraper Public archive

Python scripts to extract text from PDFs, save it as a text file, export a list of words and their frequencies to a CSV file for further analysis, extract dates from the text, and graph the text's parts of speech.

License

Notifications You must be signed in to change notification settings

ian-nai/PDF-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

PDF-Scraper

Python scripts to extract text from PDFs, save it as a text file, export a list of words and their frequencies to a CSV file for further analysis, extract dates from the text, and graph the text's parts of speech.

Standalone versions of the part of speech grapher and the date scraper can be found here and here, respectively.

To Use:

  • Download the scripts in the "scripts" folder
  • Place the PDF files you'd like to scrape in the same folder as the scripts
  • Run pdf_scraper.py

Dependencies

Citations

About

Python scripts to extract text from PDFs, save it as a text file, export a list of words and their frequencies to a CSV file for further analysis, extract dates from the text, and graph the text's parts of speech.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages