Pinkstack Realestate

Exploring ways of quick distributed scraping with the help of Akka.

Standalone mode

The easiest way to run the scraper with stand-alone mode is to use the neat CLI interface.

# Build "fat jar" with SBT
sbt assembly

# Run it with 
java -jar target/*/scraper.jar --categories prodaja --pages 10

By default, the scraper spits out JSON.

java -jar target/*/scraper.jar --categories prodaja --pages 2 | jq -R 'fromjson?'

So to make things bit easier for your eyes your can use jq to format or restructure output further for example to CSV.

java -jar target/*/scraper.jar --categories prodaja --pages 10 \
    | jq -R 'fromjson?' \
    | jq -r "([.refNumber, .title, .price, .location.latitude, .location.longitude]) | @csv" \
    > prodaja.csv

Adjusting parallelism and other fine application.conf switches can be easily done via loading of different configuration.

java -Dconfig.resource=quick.conf -jar target/*/scraper.jar --categories najem

Some configuration options can also be adjusted via environment variables i.e.

INITIAL_CATEGORIES=prodaja,najem
CATEGORY_PAGES_LIMIT=3

Authors

Oto Brglez

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
project		project
src		src
worksheets		worksheets
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pinkstack Realestate

Standalone mode

Authors

About

Uh oh!

Releases

Packages

Languages

pinkstack/pinkstack-realestate

Folders and files

Latest commit

History

Repository files navigation

Pinkstack Realestate

Standalone mode

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages