JaifaiBack

Here everything happening behind the scenes of Jaifai.

There are currently 1 program available:

scrape: extract data from the web

Scrape

This is Jeifai's core. Here happens the extraction of data from career's pages.

How the program flows:

Start main function
Connect to the database
Get the scraper
Start scraping session
Scrape
Save results to the database

scrapers.go

All the scrapers are included in a single file called scrapers.go. The most important aspect is to produce scrapers as identical as possibile to each other. The two main tools used to scrape are:

Colly: Golang scraping best library
Chromedp: Run an headless Google Chrome instance

Not all the career pages are identical:

HTML: data are stored into the HTML response, Colly is used.
API: data are returned after an API call, Colly is used.
Javascript: data are stored into the HTML, but after Javascript renders the page, Colly and Chromedp are used.
API_POST: data are returned after an initial API call to get the cookies, Colly and Chromedp are used.
Pagination: if any of the category above presents pagination, it is necessary to implement a logic for it.

How to create a new scraper?

The creation of a new scraper is divided in two different part:

Add in the database the new target and the new scraper (Scripts/CreateScraper.go)
- Name, career's url and host url are necessary

func main() {
    scraper_name := "Google"
    jobs_url := "https://www.google.com/careers"
    host_url := "https://www.google.com"
    scraper := Scraper{scraper_name, jobs_url, host_url}
    scraper.CreateScraper()
}

Create the algorithm to scrape in scrapers.go
- Often it is good practice to build and test the scraper in a separate folder.
- Here an example fo scraper which extract the information directly from the HTML.

func (runtime Runtime) Morressier() (results Results) {
    start_url := "https://morressier-jobs.personio.de/"
    type Job struct {
        Url      string
        Title    string
        Location string
        Type     string
    }
    c := colly.NewCollector()
    c.OnHTML("a", func(e *colly.HTMLElement) {
        if strings.Contains(e.Attr("class"), "job-box-link") {
            result_title := e.ChildText(".jb-title")
            result_url := e.Attr("href")
            result_description := e.ChildTexts("span")[0]
            result_location := e.ChildTexts("span")[2]
            results.Add(
                runtime.Name,
                result_title,
                result_url,
                result_location,
                Job{
                    result_url,
                    result_title,
                    result_location,
                    result_description,
                },
            )
        }
    })
    c.OnRequest(func(r *colly.Request) {
        fmt.Println(Gray(8-1, "Visiting"), Gray(8-1, r.URL.String()))
    })
    c.OnError(func(r *colly.Response, err error) {
        fmt.Println(Red("Request URL:"), Red(r.Request.URL))
    })
    c.Visit(start_url)
    return
}

How to run a scraper?

go build
./JeifaiBack scrape -s=[scraper_name] -r=[true/false]

-s select any scraper name
-r true if results need to be saved, false otherwise (might be useful for testing purposes)

TODO

How to add IP to Google Cloud Platform
Cobra
Dependencies
Write architecture_decision_record
Describe Scripts folder

Name		Name	Last commit message	Last commit date
Latest commit History 588 Commits
cmd		cmd
.env		.env
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JaifaiBack

Scrape

scrapers.go

How to create a new scraper?

How to run a scraper?

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Jeifai/JeifaiBack

Folders and files

Latest commit

History

Repository files navigation

JaifaiBack

Scrape

scrapers.go

How to create a new scraper?

How to run a scraper?

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages