Wallstreet is a smallest stock deal data collect and analyze system. There are many project aim to crawl and analyze nasdaq stock trading data by wrapping yahoo or nasdaq APIs, but there're seldom one which provide the complete crawler solution. The ugly work eg. how to face poor network condition, how to store into database, how to monitor the tasks may take times. Wallstreet will provide a lightweight, distributed, incrementally crawler system help you construct your local nasdaq stock database.
- python (2.7+ or 3.5+)
- pip
First, install the required python packages:
sudo pip install -r requirments.txtThen celery is used and choose [redis] (http://redis.io/) as default broker:
sudo apt-get install redis
Sometimes, you need install pycurl manually in Ubuntu if pip install failed:
sudo apt-get install python-pycurl
Later, you need install mariadb or other mysql database, make sure you know how to install it in Ubuntu or you'd better read Setting up MariaDB Respositories
Finally, you need create a database named wallstreet to save collected data.
You need start all celery workers as your fist step, Supervisor is a cool process manager and monitor tool can save your time, and you can read supervisor.conf to learning how to start them manually.
Some configure need be modified in your machine, you need edit config.json create a new config_local.json(only covers modified fields):
{
"celery":{
"broker_url": "redis://localhost:6379/0" //redis url as celery broker
},
"storage":{
"db": "sql",
"url": "mysql+pymysql://root@localhost/wallstreet" //mysql url as data storage
},
"log_server": { //logger server if you run workers in more than one manchines,
"host": "localhost", //logs will send to central log server
"port": 9020
}
"edgar":{
"core_key": "XXXXX" //edgar api key if you use edgar api
}
}
In the root directory, run
python -m wallstreet.bin.__main__ -hyou will see most of the commands available.
And you can use celery-flower monitor tasks in http://localhost:5555:
celery -A wallstreet.tasks flowerYou can also using use celery purge revoke pending tasks:
celery -A wallstreet.tasks purgeSec fillings use CIKs instead of symbols as the identification of stocks. currently, we use Edgar-Online API
get CIKs from symbols. Application of these key require waiting some days. As an alternative schema, you can use stock_info_detail.sql to
import 4000+ CIKs in your database offline.