Skip to content

breakhearts/wallstreet

Repository files navigation

Wallstreet - Little Weight Nasdaq Crawler

Build Status

Wallstreet is a smallest stock deal data collect and analyze system. There are many project aim to crawl and analyze nasdaq stock trading data by wrapping yahoo or nasdaq APIs, but there're seldom one which provide the complete crawler solution. The ugly work eg. how to face poor network condition, how to store into database, how to monitor the tasks may take times. Wallstreet will provide a lightweight, distributed, incrementally crawler system help you construct your local nasdaq stock database.

Requirements

  • python (2.7+ or 3.5+)
  • pip

Installation

First, install the required python packages:

sudo pip install -r requirments.txt

Then celery is used and choose [redis] (http://redis.io/) as default broker:

sudo apt-get install redis

Sometimes, you need install pycurl manually in Ubuntu if pip install failed:

sudo apt-get install python-pycurl

Later, you need install mariadb or other mysql database, make sure you know how to install it in Ubuntu or you'd better read Setting up MariaDB Respositories

Finally, you need create a database named wallstreet to save collected data.

How it works

You need start all celery workers as your fist step, Supervisor is a cool process manager and monitor tool can save your time, and you can read supervisor.conf to learning how to start them manually.

Some configure need be modified in your machine, you need edit config.json create a new config_local.json(only covers modified fields):

{
    "celery":{
      "broker_url": "redis://localhost:6379/0"             //redis url as celery broker
    },
    "storage":{                                                         
      "db": "sql",
      "url": "mysql+pymysql://root@localhost/wallstreet"  //mysql url as data storage
    },
    "log_server": {                                       //logger server if you run workers in more than one manchines,
      "host": "localhost",                                //logs will send to central log server
      "port": 9020
    }
    "edgar":{
      "core_key": "XXXXX"                                 //edgar api key if you use edgar api
    }
}

In the root directory, run

python -m wallstreet.bin.__main__ -h

you will see most of the commands available.

And you can use celery-flower monitor tasks in http://localhost:5555:

celery -A wallstreet.tasks flower

You can also using use celery purge revoke pending tasks:

celery -A wallstreet.tasks purge

Sec fillings use CIKs instead of symbols as the identification of stocks. currently, we use Edgar-Online API get CIKs from symbols. Application of these key require waiting some days. As an alternative schema, you can use stock_info_detail.sql to import 4000+ CIKs in your database offline.

About

Little Weight Nasdaq Crawler

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages