diff --git a/README.md b/README.md index a87d760..4b36bab 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,10 @@ ## Campaign Lab Data Pipeline +For context, see [Campaign Lab Guide](https://github.com/CampaignLab/Campaign-Lab-Guide/blob/master/Campaign%20Lab%20Guide.md0). + #### What? -* We want to be able to structure our dataset (see "Campaign Lab Data Inventory"). +* We want to be able to structure our dataset from the Data Inventory. * In order to do this, we first should define what the structure (schema) of the different data sources are. * This will help us down the line to create modules that transform our raw data into our target data, for later export into a database, R package, or any other tools for utilising the data in a highly structured and annotated format. @@ -55,3 +57,22 @@ * *source* is a link (if available) to the actual dataset. * The *description* is a one liner that describes the dataset * *properties* is a list of the *datapoints* that we want to *end up with after transforming the raw dataset*. + + +### Toolset +(Author is learning his way around data science and Python, better approaches welcome.) +Datasets are expected to be largely static; transformers are intended to be manually run and eyeballed as needed, instead of automated. +They can be run in a local environment. +For reproducability and dev tooling, can also use a container environment via Docker. + +Run a specific command: +`docker-compose run datascience python -c 'from london_election_results import get_data; print(get_data())'` + +Running the environment: + +* `docker-compose up` +* `http://localhost:9200` #elasticsearch +* `http://localhost:5601` #kibana +* Can import a CSV with e.g. +* `docker-compose run datascience python -c 'elasticsearch_loader --es-host http://elasticsearch:9200 --index campaignlab --type campaignlab csv ../schemas/local_election_results_2018-05-03.csv` +* Follow https://www.elastic.co/guide/en/kibana/current/tutorial-build-dashboard.html to visualise. \ No newline at end of file diff --git a/docker-compose.yml b/docker-compose.yml new file mode 100644 index 0000000..29b932c --- /dev/null +++ b/docker-compose.yml @@ -0,0 +1,33 @@ +version: '3.1' +services: + datascience: + image: civisanalytics/datascience-python:4.2.0 + container_name: datascience-python + ports: + - "8888:8888" + volumes: + - ./:/pipeline + working_dir: "/pipeline/transformers" + tty: true + # Keep container running idle. + command: [ "/bin/sh", "-c", "pip install elasticsearch-loader; tail -f /dev/null"] + elasticsearch: + image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.6.0 + container_name: elasticsearch + ports: + - "9200:9200" + environment: + CLUSTER_NAME: "campaignlab" + HTTP_PORT: "9200" + DISCOVERY_TYPE: "single-node" + ES_JAVA_OPTS: "-Xmx256m -Xms256m" + + kibana: + image: docker.elastic.co/kibana/kibana-oss:6.6.0 + container_name: kibana + ports: + - "5601:5601" + - "8080:8080" + environment: + SERVER_NAME: "kibana" + ELASTICSEARCH_HOSTS: "http://elasticsearch:9200"