- implementation: python source code
- resources: input webpages
- outputs: contains 12 JSON documents (6x RegEx, 6x XPATH extraction) and 3 RoadRunner wrapper (RegEx)
- porocilo.pdf
- Required package: BeautifulSoup (bs4) and lxml parser
- Run main.py to run all extractions (XPATH, RegEx, RoadRunner). Prints results to console.
- Each type of extraction is implemented in it's own file
Consult the wirtten report for additional information.