From 6d59309c6920a133433386bc33782b2fe12a1647 Mon Sep 17 00:00:00 2001 From: "K.E. Koziar" Date: Thu, 16 Jul 2020 14:10:08 -0700 Subject: [PATCH 1/9] Update Jupyter Notebook version number Update Jupyter Notebook version number in the format overview table --- .../Jupyter Notebooks Data Curation Primer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md index 72d0853..4e30b44 100644 --- a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md +++ b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md @@ -48,7 +48,7 @@ See also: Primers authored by the workshop attendees at DLF: http://datacuration | File Extension | [.ipynb](https://fileinfo.com/extension/ipynb) | | MIME type | https://jupyter.readthedocs.io/en/latest/reference/mimetype.html | | Structure | Browser-rendered composite digital asset: Notebook file (.ipynb); Notebook app; kernel | -| Versions | [4.0.0 - 5.7.0](https://jupyter-notebook.readthedocs.io/en/stable/changelog.html) (previously [IPython Notebook](https://ipython.org/notebook.html)) | +| Versions | [4.0.0 - 6.0.3](https://jupyter-notebook.readthedocs.io/en/stable/changelog.html) (previously [IPython Notebook](https://ipython.org/notebook.html)) | | Primary fields or areas of use | Not discipline-specific; can be used by anyone who writes code in a language with a [supported kernel](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) | | Source and affiliation | [Project Jupyter](https://jupyter.org/about)| | Metadata standards | [Codemeta](https://codemeta.github.io/); [CFF](https://citation-file-format.github.io/); [Jisc/SSI Guidance](https://zenodo.org/record/1327321#.W8lNLhNKiRs); discipline-specific keywords | From 81738decea1c7d18a515ea0a9dec05c06eb688f4 Mon Sep 17 00:00:00 2001 From: "K.E. Koziar" Date: Thu, 16 Jul 2020 14:37:44 -0700 Subject: [PATCH 2/9] Minor correction to background section The guidance is provided by the Software Sustainability Institute (1), and funded by Jisc (2). --- .../Jupyter Notebooks Data Curation Primer.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md index 4e30b44..81e484c 100644 --- a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md +++ b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md @@ -62,7 +62,7 @@ See also: Primers authored by the workshop attendees at DLF: http://datacuration ## Background -Jupyter Notebooks are composite digital objects used to develop, share, view, and execute interspersed, interlinked, and interactive documentation, equations, visualizations, and code. Researchers seeking to deposit software, in this case Jupyter Notebooks, in repositories do so with the expectation that repositories will provide documentation explaining "what you can deposit, the supported file formats for deposits, what metadata you may need to provide, how to provide this metadata and what happens after you make your deposit" (Jackson, 2018a). This expectation is not necessarily met by repositories that currently accept software deposits and complex objects like Jupyter Notebooks. This guide is meant to both inform curatorial practices around Jupyter Notebooks, and support the development of resources that meet researchers' expectations to ensure long-term availability of software in curated archival repositories. Guidance provided by Jisc (1) and the Software Sustainability Institute (2) outlines three different kinds of software deposits: a minimal deposit, a runnable deposit, and a comprehensive deposit (Jackson, 2018b). This primer follows this same conceptual framework in dealing with Jupyter Notebooks, which even in their static, non-executable form, can be used to document how scientific research was carried out or be used as teaching models among many other use cases. +Jupyter Notebooks are composite digital objects used to develop, share, view, and execute interspersed, interlinked, and interactive documentation, equations, visualizations, and code. Researchers seeking to deposit software, in this case Jupyter Notebooks, in repositories do so with the expectation that repositories will provide documentation explaining "what you can deposit, the supported file formats for deposits, what metadata you may need to provide, how to provide this metadata and what happens after you make your deposit" (Jackson, 2018a). This expectation is not necessarily met by repositories that currently accept software deposits and complex objects like Jupyter Notebooks. This guide is meant to both inform curatorial practices around Jupyter Notebooks, and support the development of resources that meet researchers' expectations to ensure long-term availability of software in curated archival repositories. Guidance provided by the Software Sustainability Institute (1), funded by Jisc (2), outlines three different kinds of software deposits: a minimal deposit, a runnable deposit, and a comprehensive deposit (Jackson, 2018b). This primer follows this same conceptual framework in dealing with Jupyter Notebooks, which even in their static, non-executable form, can be used to document how scientific research was carried out or be used as teaching models among many other use cases. ## Jupyter Notebook Format Description @@ -220,9 +220,9 @@ Jackson, M. (2018b). Software Deposit: What to deposit (Version 1.0). _Zenodo_. # End Notes -1 https://www.jisc.ac.uk/ +1 https://www.software.ac.uk/, [Software Deposit Guidance for Researchers](https://softwaresaved.github.io/software-deposit-guidance/) -2 https://www.software.ac.uk/ +2 https://www.jisc.ac.uk/ 3 https://jupyter.org/install From 3ada1c31db03944e7e757f20351ef1f67ee25251 Mon Sep 17 00:00:00 2001 From: "K.E. Koziar" Date: Thu, 16 Jul 2020 14:47:18 -0700 Subject: [PATCH 3/9] Expand/clarify Format Description section Clarified for curators unfamiliar with computer science terminology the relation between a kernel and programming language. Elaborated on the cell order and expectations of users (those who download a notebook) --- .../Jupyter Notebooks Data Curation Primer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md index 81e484c..20c0cdf 100644 --- a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md +++ b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md @@ -68,7 +68,7 @@ Jupyter Notebooks are composite digital objects used to develop, share, view, an A Jupyter Notebook is a file used in conjunction with a suite of tools that allow users to create and share documents that contain runnable code, equations, data visualizations, and other interactive material. While Python is the most common language associated with Jupyter Notebooks, they can be used with code written in over 40 different programming languages. Jupyter Notebooks' versatility enables them to be used in any number of disciplines and for various purposes, and while they are very popular in the sciences, they are also used in the social sciences and the humanities. Because Jupyter Notebooks are meant to be interactive and constructed using a multitude of programming and spoken languages, they are especially challenging for curators to work with. Any curation and archiving activity needs to be done in such a way as to not inhibit a future user's need to adapt the code contained within the Notebook file. Similarly, when a future user extracts deposited Notebook files, metadata, and supplemental material from the archive, curation and archiving activities should have had no degrading influence on the level of functionality that a depositor enabled with their initial deposit. For example, rather than zipping files on the depositor's behalf, it is preferable for curators to request that depositors pack and unpack their content prior to making their deposit to allow the them to check that files function as intended when unpacked. -To open a Jupyter Notebook file, a curator would need to have installed Python and Jupyter (using either pip or Anaconda(3)) and be familiar with using the Terminal (Mac/Linux), Command Prompt, or Bash (Windows).(4) Once opened, Jupyter Notebooks have a browser-rendered user interface composed of "cells" and clickable buttons to execute tasks. A cell is a multiline text input field where a user can enter and execute code or a markup language called Markdown. Markdown handles text formatting, linking, and the display of images. Behind the Notebook cells is a kernel that runs the processes needed for each cell to function. Code cells often require dependencies and specific input parameters, and may be run in any order, which is both a strength and a weakness.(5) +To open a Jupyter Notebook file, a curator would need to have installed Python and Jupyter (using either pip or Anaconda(3)) and be familiar with using the Terminal (Mac/Linux), Command Prompt, or Bash (Windows).(4) Once opened, Jupyter Notebooks have a browser-rendered user interface composed of "cells" and clickable buttons to execute tasks. A cell is a multiline text input field where a user can enter and execute code or a markup language called Markdown. Markdown handles text formatting, linking, and the display of images. Behind the Notebook cells is a kernel, which provides programming language support that runs the processes needed for each cell to function. Notebooks often require dependencies and specific input parameters. While code cells may be run in any order, running the code from top to bottom is invariably the intention, and would certainly be the expectation by any future users of the notebook.(5) Once rendered in the user's browser, a Notebook can be exported in the following formats: From 7d8c207a377f7eeab571a629983fd9fb4212eaa0 Mon Sep 17 00:00:00 2001 From: "K.E. Koziar" Date: Thu, 16 Jul 2020 14:53:36 -0700 Subject: [PATCH 4/9] Expand/annotate Minimally Required Files section expand dependencies section to include other types of dependencies file. Annotate citation.cff Clarify that a container metafile is appropriate to request if used. --- .../Jupyter Notebooks Data Curation Primer.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md index 20c0cdf..ba79801 100644 --- a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md +++ b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md @@ -100,10 +100,10 @@ The following elements outline recommendations for repositories accepting Jupyte - Additional files to request: - PDF of the Jupyter Notebook (export from Jupyter web application or [nbviewer](https://nbviewer.jupyter.org/)) - reST export of the Jupyter Notebook (export from Jupyter web application) - - CodeMeta.json - - CITATION.cff + - CodeMeta.json, requirements.txt, or environment.yml (dependencies) + - CITATION.cff, a software citation file appropriate if not depositing in a repository - Sample datasets and documentation (see below) - - Container metafile (e.g. docker, singularity, reprozip) + - If used, the Container metafile (e.g. docker, singularity, reprozip) - Can be created using [jupyter-](https://repo2docker.readthedocs.io/en/latest/)[repo2docker](https://repo2docker.readthedocs.io/en/latest/) - Can be published separately with execution instructions; link this to the Jupyter Notebook record - Release of the full repository of files associated with .ipynb when applicable From 476fcf82bb332ec274a6abd5cdb4edca21ed6c5c Mon Sep 17 00:00:00 2001 From: "K.E. Koziar" Date: Thu, 16 Jul 2020 14:54:59 -0700 Subject: [PATCH 5/9] Fix style in File Requirements section --- .../Jupyter Notebooks Data Curation Primer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md index ba79801..87e35a1 100644 --- a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md +++ b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md @@ -101,7 +101,7 @@ The following elements outline recommendations for repositories accepting Jupyte - PDF of the Jupyter Notebook (export from Jupyter web application or [nbviewer](https://nbviewer.jupyter.org/)) - reST export of the Jupyter Notebook (export from Jupyter web application) - CodeMeta.json, requirements.txt, or environment.yml (dependencies) - - CITATION.cff, a software citation file appropriate if not depositing in a repository + - CITATION.cff (a software citation file appropriate if not depositing in a repository) - Sample datasets and documentation (see below) - If used, the Container metafile (e.g. docker, singularity, reprozip) - Can be created using [jupyter-](https://repo2docker.readthedocs.io/en/latest/)[repo2docker](https://repo2docker.readthedocs.io/en/latest/) From 305fcc8cc16ff99432a0b13f66d367de93294e49 Mon Sep 17 00:00:00 2001 From: "K.E. Koziar" Date: Thu, 16 Jul 2020 15:04:20 -0700 Subject: [PATCH 6/9] Update metadata requirements section Added annotations and clarifications. --- .../Jupyter Notebooks Data Curation Primer.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md index 87e35a1..6d448d9 100644 --- a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md +++ b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md @@ -118,25 +118,26 @@ The following elements outline recommendations for repositories accepting Jupyte - Jupyter implementation details - Jupyter version - Distribution (e.g. Anaconda) - - Kernel version + - Kernel version (programming language plus version) - README - - Documents what the Jupyter Notebook is for - - Request that this file include citation(s) to third-party algorithms and analyses - - Recommend code comments within the Notebook file itself in addition to the README file + - Documents what the Jupyter Notebook is for (but recommendation is that the Notebook utilize code comments) + - Lists dependencies on external software packages and datasets + - Requests that this file include citation(s) to third-party algorithms and analyses - Alternate identifiers and supplemental links associated with the Notebook - License information -- **Runnable submission:** allows another researcher to execute the Notebook locally using sample data and files provided by the depositor (12); minimal submission metadata plus: +- **Runnable submission:** allows another researcher or curator to execute the Notebook locally using sample data and files provided by the depositor (12); minimal submission metadata plus: - User documentation - Instructions to support configuration needed to execute the Notebook and code cells - Sample input and output files - - CodeMeta.json - - Document required software dependencies - - Recommend additional machine actionable dependency documentation (e.g. requirements.txt) + - Software Dependency Documentation + - CodeMeta.json + - Recommend additional machine actionable dependency documentation (e.g. requirements.txt or environment.yml) - CITATION.cff for the Notebook - Preferred citation; should enable native software citation + - Relevant if the Notebook is not being submitted to a repository - **Comprehensive metadata:** minimal and "runnable" requirements plus: - Developer documentation - Include test code and description of expected results From a358973a040489ee168dc65e7b476cc4c64a9370 Mon Sep 17 00:00:00 2001 From: "K.E. Koziar" Date: Thu, 16 Jul 2020 15:12:43 -0700 Subject: [PATCH 7/9] Key Curatorial Questions Add clarifying question to help curator unfamiliar with code. Add examples of ipynb archived in data repositories. add/renumber associated end-notes. --- .../Jupyter Notebooks Data Curation Primer.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md index 6d448d9..3aa93cd 100644 --- a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md +++ b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md @@ -148,13 +148,16 @@ The following elements outline recommendations for repositories accepting Jupyte # Key Curatorial Questions Once a decision has been made to accept and curate Jupyter Notebook submissions in an archival repository, the following questions should be considered with each submission: -1. What are the depositor's expectations for the Notebook's future functionality once the deposited files are exported from the archival repository? +1. What are the depositor's expectations for the Notebook's future functionality once the deposited files are exported from the archival repository? In other words, should the code in the deposited files be able to run as-is (runnable), run with external data files (minimal), or is the deposit considered to be a static document that does not execute code (minimal)? Understanding this expectation will help determine if the deposit is minimal, runnable, or comprehensive. 2. Does the submission include minimally required files and metadata to enable the expected functionality? 3. Is the Notebook self-contained? 4. Is the Notebook a standalone object or one of many products resulting from a project? - Examples: - Notebook that is a stand alone object: [USGS Python for Data Management](https://my.usgs.gov/confluence/display/cdi/Python+for+Data+Management#PythonforDataManagement-June11,2018:Part1-WorkingwithLocalFiles)(13) - - Notebooks that supplement other digital objects: [Starry](https://arxiv.org/abs/1810.06559)(14) + - Notebooks that supplement other digital objects: + 1. [Starry Archived Code](https://zenodo.org/record/3565772)(14), [Starry Article](https://arxiv.org/abs/1810.06559)(15) + 2. Swiger, B. M., Liemohn, M. W., & Ganushkina, N. Y. (2020). Data for Improvement of Plasma Sheet Neural Network Accuracy with Inclusion of Physical Information. Deep Blue Data: https://doi.org/10.7302/559r-t639
+(See PlottingCode.zip) (16) - Were supplemental files deposited along with the Notebook? - Is information about supplemental files included within the Notebook or in separate files? - If separate files, can those files be opened and read? @@ -174,7 +177,7 @@ Once a decision has been made to accept and curate Jupyter Notebook submissions # Decision Trees ([view online](https://www.lucidchart.com/documents/view/4848c483-1267-499c-9172-3a2782abfaaf/0)) -The following decision trees (15) illustrate questions and actions that should be considered when determining whether or not to accept a Jupyter Notebook submission into a particular repository, as well key questions curators should consider when evaluating Jupyter Notebook submissions. +The following decision trees (17) illustrate questions and actions that should be considered when determining whether or not to accept a Jupyter Notebook submission into a particular repository, as well key questions curators should consider when evaluating Jupyter Notebook submissions. ## Repository Suitability ![](DT-Repo.png) @@ -247,6 +250,10 @@ Jackson, M. (2018b). Software Deposit: What to deposit (Version 1.0). _Zenodo_. 13 https://bit.ly/2sBF3jH -14 https://arxiv.org/abs/1810.06559 +14 https://zenodo.org/record/3565772 -15 https://www.lucidchart.com/documents/view/4848c483-1267-499c-9172-3a2782abfaaf/0 +15 https://arxiv.org/abs/1810.06559 + +16 https://doi.org/10.7302/559r-t639 + +17 https://www.lucidchart.com/documents/view/4848c483-1267-499c-9172-3a2782abfaaf/0 From 555466db78477d7856e09ce3afcaa9c6f326b7d0 Mon Sep 17 00:00:00 2001 From: "K.E. Koziar" Date: Thu, 16 Jul 2020 15:16:21 -0700 Subject: [PATCH 8/9] Fix broken link in recommended reading --- .../Jupyter Notebooks Data Curation Primer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md index 3aa93cd..2d7d249 100644 --- a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md +++ b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md @@ -195,7 +195,7 @@ The following decision trees (17) illustrate questions and actions that should b - Ten Simple Rules for Reproducible Research in Jupyter Notebooks - [https://arxiv.org/abs/1810.08055](https://arxiv.org/abs/1810.08055) - How IPython and Jupyter Notebook work - - [https://jupyter.readthedocs.io/en/latest/architecture/how\_jupyter\_ipython\_work.html](https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html) + - [https://test-jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html](https://test-jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html) - Developing maintainable software - [https://www.software.ac.uk/resources/guides/developing-maintainable-software](https://www.software.ac.uk/resources/guides/developing-maintainable-software) - Does it make sense to apply the FAIR Data Principles to Software? From 2ffe26c962fe2a6e6c6a51713164e0f884641fa1 Mon Sep 17 00:00:00 2001 From: "K.E. Koziar" Date: Thu, 16 Jul 2020 15:23:43 -0700 Subject: [PATCH 9/9] Add alt text for images Add title and alt text for decision tree images. --- .../Jupyter Notebooks Data Curation Primer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md index 2d7d249..56be142 100644 --- a/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md +++ b/Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md @@ -180,13 +180,13 @@ Once a decision has been made to accept and curate Jupyter Notebook submissions The following decision trees (17) illustrate questions and actions that should be considered when determining whether or not to accept a Jupyter Notebook submission into a particular repository, as well key questions curators should consider when evaluating Jupyter Notebook submissions. ## Repository Suitability -![](DT-Repo.png) +![Decision tree figure with questions to consider when reviewing a Jupyter notebook for acceptance in a repository. Decisions include local, disciplinary or general repositories.](DT-Repo.png "Repository Suitability Decision Tree") *https://datacurationnetwork.org/home/resources/
**http://hdl.handle.net/11299/202815 ## Curatorial Activities -![](DT-Curat.png) +![Decision tree figure with questions related to curatorial activities for Jupyter notebooks. Actions are suggested depending on if the repository accepts zipped files, if minimal metadata are included with the Notebook, and at what level identifiers (DOI’s) should be created.](DT-Curat.png "Curatorial Activities Decision Tree") # Additional Recommended Reading