Skip to content

Latest commit

 

History

History
93 lines (53 loc) · 3.66 KB

File metadata and controls

93 lines (53 loc) · 3.66 KB

Table of Contents

S4Scrawler

constructor

Create instance of class S4Scrawler

Parameters

  • none

getRoboto

Get the roboto package. This allows you to use all the features of jculvey roboto package.

Returns Object The roboto object

useCrawler

Attach the crawler to be used

Parameters

  • crawler Object The (roboto) crawler

connectToES

Create a connection to the Elasticsearch cluster

Returns (Promise | string) A promise returning a success or an error message

useESIndex

Set the Elasticsearch index to be used. This index must exist.

Parameters

  • index string The name of the index

Returns (Promise | string) A promise returning the name of the index or an error message

createESIndex

Create a new crawling index in Elasticsearch

Parameters

  • index index Name of the index

Returns (Promise | string) A promise containing a success or an error message

crawl

Crawl pages from a website starting with given url

Parameters

  • siteId string identification of the site being crawled
  • search Object Used to select the links to crawl next
  • maxLinks (integer | "") Maximum number of selected links, "" = all links

Returns (Promise | string) A promise containing a success or an error message

getLinks

Get all outgoing links of a webpage

Parameters

  • response Object The response after requesting a webpage
  • $ Object A cheerio object containing the response

Returns Array Array of links. Every link is an object containing the url (link.url) and the text of of the link (link.linkText)

getCurrentId

Get the id of the site being crawled

Returns string The id of the site