Create instance of class S4Scrawler
Parameters
none
Get the roboto package. This allows you to use all the features of jculvey roboto package.
Returns Object The roboto object
Attach the crawler to be used
Parameters
crawlerObject The (roboto) crawler
Create a connection to the Elasticsearch cluster
Returns (Promise | string) A promise returning a success or an error message
Set the Elasticsearch index to be used. This index must exist.
Parameters
indexstring The name of the index
Returns (Promise | string) A promise returning the name of the index or an error message
Create a new crawling index in Elasticsearch
Parameters
indexindex Name of the index
Returns (Promise | string) A promise containing a success or an error message
Crawl pages from a website starting with given url
Parameters
siteIdstring identification of the site being crawledsearchObject Used to select the links to crawl nextmaxLinks(integer |"") Maximum number of selected links, "" = all links
Returns (Promise | string) A promise containing a success or an error message
Get all outgoing links of a webpage
Parameters
responseObject The response after requesting a webpage$Object A cheerio object containing the response
Returns Array Array of links. Every link is an object containing the url (link.url) and the text of of the link (link.linkText)
Get the id of the site being crawled
Returns string The id of the site