From A Tour of Go, concurrency exercise #10.
$> go run *.go
-
In
main.gowe create astateManagerstruct, which holds 3 things: a. afetchedmap, so we can check whether we've already fetched a URL inO(1)time, b. awaitGroupthat willAdd(1)when we start fetching a url andDonewhen we finish fetching a url, and c. arwMutexso we can safely read to and write from thefetchedarray from many goroutines concurrently -
In
crawl.gowe implement anisAlreadyFetchedhelper function that uses ourstateManager.rwMutexto check whether a url has already been fetched and return a boolean. -
Also in
crawl.go, we: a.Add(1)to ourstateManager.waitGroupto tell theWaitGroupthat we are waiting to fetch an item, b. implement agoroutineusing an IIFE (Immediately Invoked Function Expression) --go func() { ... }()that does the following...- tell the wait group that we're finished whenever we return from the IIFE's scope,
- return if we've reached our maximum fetching depth,
- skip processing the node if we've already fetched the url,
- handle fetching failures,
- safely mark the url as fetched in
stateManager.fetchedusing ourstateManager.rwMutex(lock for writing this time), and - recurse and yield to other goroutines.
- Allow
main.goto receive the depth as a command-line argument. - Refactor the
Crawlfunction's IIFE. It has too many responsibilities, and mixes levels of abstraction: orchestration of concurrency, logging, error handling, recursion.