go webcrawler

To rank well in Google Search, websites need to internally link pages one to another. For example, a blog post about the benefits of haircuts should probably link to my post about the best places to get haircuts.

A Go CLI tool that generates an internal links report for any website on the internet by crawling each page of the site.

Setup

git clone <url>
cd webcrawler
go install

Run

# build
go build main.go -o ./webcrawler && ./crawler <website> <concurrency> <max_pages>
# development
go run main.go <website> <concurrency> <max_pages>

Run with Docker

docker build . -t name:version
docker run -e F=site -e S=concurrency -e T=max_pages name:version

Testing

go test ./...

Ideas for extension

Make the script run on a timer and deploy it to a server. Have it email you every so often with a report.
Add more robust error checking so that you can crawl larger sites without issues.
Count external links, as well as internal links, and add them to the report
Save the report as a CSV spreadsheet rather than printing it to the console
Use a graphics library to create an image that shows the links between the pages as a graph visualization
Make requests concurrently to speed up the crawling process

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.go		config.go
crawlPage.go		crawlPage.go
getArgs.go		getArgs.go
getArgs_test.go		getArgs_test.go
getHtml.go		getHtml.go
getURLs.go		getURLs.go
getURLs_test.go		getURLs_test.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
normalize_url.go		normalize_url.go
normalize_url_test.go		normalize_url_test.go
printReport.go		printReport.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

go webcrawler

Setup

Run

Run with Docker

Testing

Ideas for extension

About

Uh oh!

Releases

Packages

Uh oh!

Languages

thetsajeet/webcrawler

Folders and files

Latest commit

History

Repository files navigation

go webcrawler

Setup

Run

Run with Docker

Testing

Ideas for extension

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages