Skip to content

pinkstack/pinkstack-realestate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pinkstack Realestate

Exploring ways of quick distributed scraping with the help of Akka.

Standalone mode

The easiest way to run the scraper with stand-alone mode is to use the neat CLI interface.

# Build "fat jar" with SBT
sbt assembly

# Run it with 
java -jar target/*/scraper.jar --categories prodaja --pages 10

By default, the scraper spits out JSON.

java -jar target/*/scraper.jar --categories prodaja --pages 2 | jq -R 'fromjson?'

So to make things bit easier for your eyes your can use jq to format or restructure output further for example to CSV.

java -jar target/*/scraper.jar --categories prodaja --pages 10 \
    | jq -R 'fromjson?' \
    | jq -r "([.refNumber, .title, .price, .location.latitude, .location.longitude]) | @csv" \
    > prodaja.csv

Adjusting parallelism and other fine application.conf switches can be easily done via loading of different configuration.

java -Dconfig.resource=quick.conf -jar target/*/scraper.jar --categories najem

Some configuration options can also be adjusted via environment variables i.e.

INITIAL_CATEGORIES=prodaja,najem
CATEGORY_PAGES_LIMIT=3

Authors

About

Pinkstack Realestate - The distributed real-estate scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published