Skip to content

Home Web Spider Application

janague edited this page Oct 4, 2013 · 1 revision

Welcome to the MSWL_Development_Tools wiki!

Requirements

Spider to track the updates of a web page

You will have to write a Python application that get the current version of a web page, compare against a local cache of the page, and if changed, retrieve the new version of the page and write in the standard output a summary of the changes. The spider must visit all the links below the current page. The log of changes displayed in the standard output will contain a list of all the links that have been changed, and the number of lines of difference between the two versions. The application must be easily installable using Python standard deployment methods, must be properly document and must include a battery of tests to check that it is working as expected. All the development will be done using Git version control, and all the code will be publicly available in a Git repository, with frequent commits.

Clone this wiki locally