DataExtraction

Project structure

implementation: python source code
resources: input webpages
outputs: contains 12 JSON documents (6x RegEx, 6x XPATH extraction) and 3 RoadRunner wrapper (RegEx)
porocilo.pdf

Required package: BeautifulSoup (bs4) and lxml parser
Run main.py to run all extractions (XPATH, RegEx, RoadRunner). Prints results to console.
Each type of extraction is implemented in it's own file

Consult the wirtten report for additional information.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
implementation		implementation
outputs		outputs
resources		resources
.gitignore		.gitignore
README.md		README.md
porocilo.pdf		porocilo.pdf