This docker image provides an environment for using Jupyter for data science. It includes a number of python libraries that are commonly used as well as several jupyter extensions that are useful, including:
- numpy
- scikit-learn
- pandas
- matplotlib
- bokeh
- seaborn
- holoviews
If you are looking at this repo, you have likely already installed and are running Docker. I'd recommend that you consider installing docker-utils as well. Within docker-utils there is a script that allows you to easily run a notebook environment based on the current directory. Simply navigate to a directory that contains notebooks or subdirectories that contain notebooks and run run-notebook. This will start a docker container with the docker-ds image, mount the directory into the container, and serve Jupyter on port 8888. If there is a setup.py file in the directory from which run-notebook is invoked, the python package will be installed into the container and be available for use in the notebook.
While the .ipynb files are json, comparing the raw file formats aren't ideal for diffing changes, etc. In order to
better facilitate working with VCS, especially for human diff comprehension, this image supports a jupyter save hook
that converts a notebook into .py and .html files in a .diffs sub-directory. To enable this, pass the envrionment
variable DOCKER_DS_DIFFS=1 when starting the container.
There are a number of credentials utlized by this container, including: notebook password, github plugin OAuth credentials, google drive OAuth credentials, etc.
The pattern for exposing these credentials to the docker container is to store the credentials in credstash, and then bind mount the current user's .aws directory into the container upon container start.
The code assumes an aws profile of 'ds-notebook'. If unable to access credstash, appropriate defaults will be used if available. In most cases extensions will not function.
In your ~/.aws directory create a credentials and config file simlar to the following:
credentials
[ds-notebook]
aws_secret_access_key = ...
aws_access_key_id = ....config
[profile ds-notebook]
region = us-west-2If you use docker-utils, the following configuration will make a notebook available in your project and allow you to configure any project specific settings you'd like present for the container.
[notebook]
volumes=--mount type=bind,source={project_root},target=/home/jovyan/project -v /data:/data --mount type=bind,source=/Users/{user}/.aws,target=/home/jovyan/.aws
ports=-p 8888:8888Requires credstash credentials named: github.client_id, github.client_secret