Skip to content

Separate the mechanisms for (1) scripts, (2) dev environment, and (3) notebooks #15

@mfhepp

Description

@mfhepp

The project can be used for multiple purposes, like

  • script deployment
  • script development (without rebuilding the image for each change in the code)
  • proper Python development environment including testing, code linting, type checking, etc.
  • for running Jupyter notebooks in a more isolated way.

The usages differ in many aspects, namely

  • which environment files they require and from where to take those
  • if the Docker image should be shared across multiple usages

Currently, the images created are partially overlapping, which may cause problems in the long run.

Hence, it seems better to separate the three usage scenarios:

  1. Script deployment takes a script`s code and environment file for building and running an isolated Docker container with minimal privileges on the host. This can also be used for building isolated CLI versions of popular Python applications or packages, like copier or Nikola.
  2. Development environment. This would bundle everything needed for typical Python development workflows, including code formatting, linters, etc. The editor would run on the host, the working directory will be mapped to the container. As one is likely to develop multiple projects, each projects should have its one, pinable environment file (basically a version with the dev dependencies and the runtime dependencies). The dev dependencies could be the same for the entire user, the runtime ones will of course differ. Each project will typically have its own Docker image and its run script or alias and be run from its down directory. For some very simple projects, it may be handy to use a standard image with popular dependencies so that experiments and quick tests do not require a 1 GB image.
  3. Notebooks. There are actually two use-cases:
    • One or multiple standard notebook environments (and respective Kernels) to be run from anywhere on the machine for quick experiments and demos (like nbh <envname). The multiple environment can either be built inside the same image or, likely better, be independent ones.
    • A project-specific notebook environment with its own environment specification, e.g. for specific tasks in research projects. In here, the environment file and the startup script will be in that project folder.

One critical issue is that the identification of the proper image is determined by the image tag on that machine, so we must take care that we do not accidentally start the wrong image.

For script development, we can either use the fully-fledged dev environment or keep the current feature of mounting the src directory to inside the container.

So basically we would have the following commands:

# Build the script / project in the current folder using the environment file found therein with no development dependencies
# TODO: Pin versions or build from a pinned version
build

# Build a dev environment from the standard dev packages and the dependencies found in the current directory
# If there are no additional dependencies, build just the standard image (or use that ?)
build dev

# Build the standard notebook image from the standard notebook packages 
# plus the dependencies found in the current directory
# If there are no additional dependencies, build just the standard image (or use that ?)
build notebook

Now, one key issue is to determine the tag of the image at build and run-time.

Several ideas:

  1. Take the basename of the $PWD. But how to spot collisions? (like src in multiple projects)
  2. Get the basename from a file or script insite the pwd (text, yaml, simply the filename like IMAGENAME.py4docker)
  3. Use a local script / alias per each project if the global defaults are to be superseded,
    So both build.sh and run.sh have to check this, otherwise, they might start completely different images depending on from where they are being invoked.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions