S4Sroboto/api.md at master · SNStatComp/S4Sroboto

S4Scrawler
- constructor
- getRoboto
- useCrawler
- connectToES
- useESIndex
- createESIndex
- crawl
- getLinks
- getCurrentId

S4Scrawler

constructor

Create instance of class S4Scrawler

Parameters

none

getRoboto

Get the roboto package. This allows you to use all the features of jculvey roboto package.

Returns Object The roboto object

useCrawler

Attach the crawler to be used

Parameters

crawler Object The (roboto) crawler

connectToES

Create a connection to the Elasticsearch cluster

Returns (Promise | string) A promise returning a success or an error message

useESIndex

Set the Elasticsearch index to be used. This index must exist.

Parameters

index string The name of the index

Returns (Promise | string) A promise returning the name of the index or an error message

createESIndex

Create a new crawling index in Elasticsearch

Parameters

index index Name of the index

Returns (Promise | string) A promise containing a success or an error message

crawl

Crawl pages from a website starting with given url

Parameters

siteId string identification of the site being crawled
search Object Used to select the links to crawl next
maxLinks (integer | "") Maximum number of selected links, "" = all links

Returns (Promise | string) A promise containing a success or an error message

getLinks

Get all outgoing links of a webpage

Parameters

response Object The response after requesting a webpage
$ Object A cheerio object containing the response

Returns Array Array of links. Every link is an object containing the url (link.url) and the text of of the link (link.linkText)

getCurrentId

Get the id of the site being crawled

Returns string The id of the site

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table of Contents

S4Scrawler

constructor

getRoboto

useCrawler

connectToES

useESIndex

createESIndex

crawl

getLinks

getCurrentId

FilesExpand file tree

api.md

Latest commit

History

api.md

File metadata and controls

Table of Contents

S4Scrawler

constructor

getRoboto

useCrawler

connectToES

useESIndex

createESIndex

crawl

getLinks

getCurrentId