From be8427785ee6b32cb012ff16fa0d4b19f4d548a0 Mon Sep 17 00:00:00 2001 From: Ray Osborn Date: Wed, 10 Sep 2014 13:37:46 -0500 Subject: [PATCH 1/8] Added .gz to .gitignore Generated by TeXShop --- 2014/csipaper/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/2014/csipaper/.gitignore b/2014/csipaper/.gitignore index f20349e..05cdf2d 100644 --- a/2014/csipaper/.gitignore +++ b/2014/csipaper/.gitignore @@ -2,5 +2,6 @@ *.bbl *.log *.blg +*.gz nexus14aip.pdf nexus14aipNotes.bib From d1a79eb87b95e4f430f7083cb2eb01f0c8ede9e9 Mon Sep 17 00:00:00 2001 From: Ray Osborn Date: Wed, 10 Sep 2014 13:39:56 -0500 Subject: [PATCH 2/8] Some minor edits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit I think “This is a very important use case” is implied by the fact that it is the first purpose in the abstract. I’ve also added a reference to the original NeXus paper in 1997, and to NeXpy in the .bib file (not in the text yet). --- 2014/csipaper/nexus14aip.bib | 12 ++++++++++++ 2014/csipaper/nexus14aip.tex | 18 +++++++++--------- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/2014/csipaper/nexus14aip.bib b/2014/csipaper/nexus14aip.bib index dd7292a..a88b57e 100644 --- a/2014/csipaper/nexus14aip.bib +++ b/2014/csipaper/nexus14aip.bib @@ -119,3 +119,15 @@ @MANUAL{hdfview year = "2014 (accessed September 2014)", note = "\url{http://www.hdfgroup.org/products/java/hdfview/index.html}", } + +@webpage{nexpy, + title = "NeXpy: A Python GUI to analyze NeXus data", + Url = {http://nexpy.github.io/nexpy/}} + +@article{Klosowski:1997fk, + Author = {Klosowski, Przemek and Koennecke, Mark and Tischler, Jon and Osborn, Raymond}, + Journal = {Physica B: Physics of Condensed Matter}, + Pages = {151--153}, + Title = {{NeXus: A common format for the exchange of neutron and synchrotron data}}, + Volume = {241}, + Year = {1998}} diff --git a/2014/csipaper/nexus14aip.tex b/2014/csipaper/nexus14aip.tex index 9d23c7c..1bf1ebd 100644 --- a/2014/csipaper/nexus14aip.tex +++ b/2014/csipaper/nexus14aip.tex @@ -76,7 +76,7 @@ \affiliation{ANSTO, Australia} \author{Raymond Osborn} -\affiliation{Argonne National Laboratory, USA} +\affiliation{Materials Science Division, Argonne National Laboratory, USA} \author{Peter F. Peterson} \affiliation{Spallation Neutron Source, USA} @@ -110,7 +110,7 @@ rules for organizing data within HDF5 files in addition to a dictionary of well-defined domain-specific field names. The NeXus data format has two purposes. First, NeXus defines a format that can serve as a container for all relevant data associated -with a beamline. This is a very important use case. Second, NeXus +with an experiment. Second, NeXus defines standards in the form of \emph{application definitions} for the exchange of data between applications. NeXus provides structures for raw experimental data as well as for processed data. \end{abstract} @@ -122,17 +122,17 @@ \section{Introduction} -Increasingly, major neutron and X-ray facilities have chosen to store data using the NeXus data format. -Since 2006, NeXus\cite{nxold} has undergone substantial refocusing, +Increasingly, major neutron and X-ray facilities have chosen to store data using the NeXus data format\cite{nxold,Klosowski:1997fk}. +Since 2006, NeXus has undergone substantial refocusing, refinement and enhancement as described in this paper. Historically, neutron and X-ray facilities have chosen to store data in a plethora of home-grown data formats. This scheme has a number of drawbacks addressed by NeXus: \begin{itemize} \item It makes the life of traveling scientists unnecessarily difficult as they must deal with multiple files - in different formats, file converters and such in order to extract scientific information from the data. + in different formats or use file converters, in order to extract scientific information from the data. \item An unnecessary burden is imposed on data analysis software producers to accommodate many different formats. -\item The whole idea of open access to data is sabotaged if the data is in a format which cannot be easily understood. +\item The whole idea of open access to data is sabotaged if the data is in a format that cannot be easily understood. \item Scientific integrity is jeopardized if the data cannot be understood or important elements are missing. \item Modern high speed detectors produce data at such a high rate that many older single image storage schemes have become impractical and @@ -196,7 +196,7 @@ \subsection{Raw Data File Hierarchy} } \end{figure} -The major focus of NeXus has been the recording of \emph{raw} experimental data, i.e. information taken directly from the experimental +A major focus of NeXus has been the recording of \emph{raw} experimental data, i.e. information taken directly from the experimental equipment or processed only as required to provide physically meaningful values. The NeXus raw data file hierarchy is the consequence of some practical considerations. An overview of the NeXus data file structure for raw experimental data is shown in FIG.~\ref{rawfile}. @@ -220,12 +220,12 @@ \subsection{Raw Data File Hierarchy} In the course of NeXus history, the decision was taken to move \texttt{NXmonitor} out of \texttt{NXinstrument} to the higher hierarchy level of \texttt{NXentry}, -in order to facilitate quick inspection by humans. +in order to facilitate quick inspection. To enable a simple default visualization, a \texttt{NXdata} group must be provided at \texttt{NXentry} level. It contains information about plot axes and links to the data -(which typically reside in the \texttt{NXdetector} group). +(which often reside in the \texttt{NXdetector} group). Links are supported by HDF5 and work like symbolic links in the Unix file system. A special base class, \texttt{NXcollection}, exempts its contents from validation From 63a28849d37f4334c97fc9deb95abbeee5944f07 Mon Sep 17 00:00:00 2001 From: Ray Osborn Date: Wed, 10 Sep 2014 14:24:16 -0500 Subject: [PATCH 3/8] Changes to Section IIIA (mostly) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The most substantive changes are (i) shift NXdata description above NXmonitor, since it’s more important. (ii) make it clear that not all data in NXdata groups are linked. (iii) describe the NXmonitor group, without apologizing for its location in the NXentry groups (since I don’t feel an apology is necessary, and it’s probably unwise when you are promoting the format). This is the right place for, e.g., all pulsed neutron experiments. --- 2014/csipaper/nexus14aip.tex | 34 ++++++++++++++++------------------ 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/2014/csipaper/nexus14aip.tex b/2014/csipaper/nexus14aip.tex index 1bf1ebd..0b91e53 100644 --- a/2014/csipaper/nexus14aip.tex +++ b/2014/csipaper/nexus14aip.tex @@ -130,7 +130,7 @@ \section{Introduction} home-grown data formats. This scheme has a number of drawbacks addressed by NeXus: \begin{itemize} \item It makes the life of traveling scientists unnecessarily difficult as they must deal with multiple files - in different formats or use file converters, in order to extract scientific information from the data. + in different formats, file converters, etc., in order to extract scientific information from the data. \item An unnecessary burden is imposed on data analysis software producers to accommodate many different formats. \item The whole idea of open access to data is sabotaged if the data is in a format that cannot be easily understood. \item Scientific integrity is jeopardized if the data cannot be understood or important elements are missing. @@ -184,9 +184,8 @@ \section{Design Principles} \section{File Hierarchies} NeXus data files are organized into a hierarchy of groups which, in turn, can contain further groups or fields, -very much like an internal file system. The content of each NeXus group is defined by a base class, or -an application definition, or a contributed definition. - +very much like an internal file system. The possible contents of each NeXus group are defined by a base class, while an application definition, +or a contributed definition, is used to specify which of these fields and groups are required for a particular type of analysis. \subsection{Raw Data File Hierarchy} @@ -202,13 +201,13 @@ \subsection{Raw Data File Hierarchy} An overview of the NeXus data file structure for raw experimental data is shown in FIG.~\ref{rawfile}. -When looking at a beamline it is easy to -discern different components: beam optic components, sample position, detectors and such. It is quite natural to replicate this physical -separation with a logical arrangement of storing the data from each component into a separate group. This approach explains the +When looking at a beamline, it is easy to +discern different components: beam optic components, sample position, detectors, etc. It is quite natural to replicate this physical +separation with a logical arrangement, in which metadata from each component are stored a separate group. This approach explains the list of beamline components in the \texttt{NXinstrument} group presented in FIG.~\ref{rawfile}. -As there can be multiple instances of the same kind of equipment, like slits or detectors, in a given beamline it becomes necessary +As there can be multiple instances of the same kind of equipment, like slits or detectors, in a given beamline, it becomes necessary to add type information to the group name. This type information, the NeXus class name, is provided by a HDF5 attribute. -By convention NeXus class names start +By convention, NeXus class names start with the prefix \texttt{NX}. Each NeXus group describing a beamline component contains further groups and fields describing the component. A field can contain a single number, a text string or an array, as appropriate to the data to be described. @@ -217,16 +216,15 @@ \subsection{Raw Data File Hierarchy} group in the hierarchy. The \texttt{NXentry} group thus represents one scan or run (or a processed data entry, as will be discussed later). The \texttt{NXentry} group also holds the experiment metadata, such as the date and time at which it was performed. -In the course of NeXus history, -the decision was taken to move \texttt{NXmonitor} -out of \texttt{NXinstrument} to the higher hierarchy level of \texttt{NXentry}, -in order to facilitate quick inspection. +To enable default visualization of the experimental data, +at least one \texttt{NXdata} group should be provided at the \texttt{NXentry} level. +It contains the plottable data, or links to the data, which often reside in the \texttt{NXdetector} group (links are supported by HDF5 and work like +symbolic links in the Unix file system). It also contains information about plot axes, using attributes to define what +the data should be plotted against. -To enable a simple default visualization, -a \texttt{NXdata} group must be provided at \texttt{NXentry} level. -It contains information about plot axes and links to the data -(which often reside in the \texttt{NXdetector} group). -Links are supported by HDF5 and work like symbolic links in the Unix file system. +The \texttt{NXentry} group may also contain one or more \texttt{NXmonitor} groups, containing data from beamline monitors. Since they may +also contain plottable data, it uses the same attribute scheme to associate the monitor data with its plotting axes. Its location in the +\texttt{NXentry} group facilitates quick inspection for beamline diagnostics. A special base class, \texttt{NXcollection}, exempts its contents from validation and thereby allows inclusion of whatever data in arbitrary non-NeXus formats. From a9cebe5fd4ad268c35b3f47366867c6878469328 Mon Sep 17 00:00:00 2001 From: Ray Osborn Date: Wed, 10 Sep 2014 17:31:09 -0500 Subject: [PATCH 4/8] Removed redundant authors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In one case, “The HDF Group” was rendered as “T. H. Group”. Authors like “cansas” are too poorly defined to be of use in an archived journal. --- 2014/csipaper/nexus14aip.bib | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/2014/csipaper/nexus14aip.bib b/2014/csipaper/nexus14aip.bib index a88b57e..9262ebd 100644 --- a/2014/csipaper/nexus14aip.bib +++ b/2014/csipaper/nexus14aip.bib @@ -14,7 +14,6 @@ @ARTICLE{nxold @MANUAL{hdf5, - author = "Hdfgroup", title = "HDF-5", year = "2014 (accessed July 2014)", note = "\url{http://www.hdfgroup.org/HDF5/}", @@ -37,7 +36,6 @@ @BOOK{ITCVG } @MANUAL{nxman, - author = "NIAC", title = "NeXus Manual", year = "2014 (accessed July 2014)", note = "\url{http://download.nexusformat.org/kits/definitions/nexus-manual-3.1.0.tar.gz}", @@ -64,14 +62,12 @@ @INCOLLECTION{cbflib @MANUAL{nxwww, - author = "NIAC", title = "NeXus A common data format for neutron, x-ray and muon science", year = "2014 (accessed July 2014)", note = "\url{http://www.nexusformat.org/}", } @MANUAL{cansas, - author = "canSAS", title = "Description of the canSAS2012 data format", year = "2014 (accessed July 2014)", note = "\url{http://www.cansas.org/formats/canSAS2012/1.0/doc/}", @@ -93,14 +89,12 @@ @ARTICLE{muon2 } @MANUAL{niac, - author = "NIAC", title = "NeXus International Advisory Committee (NIAC)", year = "2014 (accessed July 2014)", note = "\url{http://wiki.nexusformat.org/NIAC}", } @MANUAL{nxvalidate, - author = "NIAC", title = "NXvalidate: a java GUI tool to validate NeXus data files", year = "2014 (accessed July 2014)", note = "\url{https://github.com/nexusformat/code/tree/master/applications/NXvalidate}", @@ -114,20 +108,22 @@ @MANUAL{epicsad } @MANUAL{hdfview, - author = "The HDF Group", title = "HDFView", year = "2014 (accessed September 2014)", note = "\url{http://www.hdfgroup.org/products/java/hdfview/index.html}", } -@webpage{nexpy, - title = "NeXpy: A Python GUI to analyze NeXus data", - Url = {http://nexpy.github.io/nexpy/}} +@MANUAL{nexpy, + title = "NeXpy: A Python GUI to Analyze NeXus Data", + year = "2014 (accessed September 2014)", + note = "\url{http://nexpy.github.io/nexpy/}", +} @article{Klosowski:1997fk, - Author = {Klosowski, Przemek and Koennecke, Mark and Tischler, Jon and Osborn, Raymond}, - Journal = {Physica B: Physics of Condensed Matter}, - Pages = {151--153}, - Title = {{NeXus: A common format for the exchange of neutron and synchrotron data}}, - Volume = {241}, - Year = {1998}} + Author = {Klosowski, Przemek and K\"onnecke, Mark and Tischler, Jon and Osborn, Raymond}, + Journal = {Physica B: Physics of Condensed Matter}, + Pages = {151--153}, + Title = {{NeXus: A common format for the exchange of neutron and synchrotron data}}, + Volume = {241}, + Year = {1998} +} From f8a0ea727fa66bc33898ec13f135ab83811c3ae9 Mon Sep 17 00:00:00 2001 From: Ray Osborn Date: Wed, 10 Sep 2014 17:32:38 -0500 Subject: [PATCH 5/8] Rewritten "Design Principles" section The existing version did not, IMHO, give enough of an overview for a new reader. --- 2014/csipaper/nexus14aip.tex | 131 ++++++++++++++++++----------------- 1 file changed, 69 insertions(+), 62 deletions(-) diff --git a/2014/csipaper/nexus14aip.tex b/2014/csipaper/nexus14aip.tex index 0b91e53..2d67203 100644 --- a/2014/csipaper/nexus14aip.tex +++ b/2014/csipaper/nexus14aip.tex @@ -130,7 +130,7 @@ \section{Introduction} home-grown data formats. This scheme has a number of drawbacks addressed by NeXus: \begin{itemize} \item It makes the life of traveling scientists unnecessarily difficult as they must deal with multiple files - in different formats, file converters, etc., in order to extract scientific information from the data. + in different formats, file converters, \textit{etc}., in order to extract scientific information from the data. \item An unnecessary burden is imposed on data analysis software producers to accommodate many different formats. \item The whole idea of open access to data is sabotaged if the data is in a format that cannot be easily understood. \item Scientific integrity is jeopardized if the data cannot be understood or important elements are missing. @@ -147,7 +147,7 @@ \section{Introduction} NeXus adds to HDF5: \begin{itemize} \item Rules for organizing domain-specific data within a HDF5 file -\item A link structure to enable quick default visualization +\item Features to enable rapid data visualization \item A dictionary of documented domain-specific field names \item Definitions of standards that can be validated \end{itemize} @@ -156,36 +156,38 @@ \section{Introduction} \section{Design Principles} -The authors of data-acquisition and instrument-control software are encouraged to generate exactly \emph{one} NeXus container file per measurement -(a measurement is either a data accumulation under fixed conditions, -or a scan). -This file includes not only the detector and monitor data, -but also metadata, information on the state of the beamline, parameter logs, and more. -Authors of data-reduction and data-analysis software can use NeXus to -store processed data along with metadata and a processing log. - -NeXus data files are built using basic HDF5 storage elements: -data groups (like file system folders), -data fields (such as strings, floats, integers, and arrays), -attributes (additional descriptors of groups and fields), -and links (like file system links). These basic storage elements are used to -build the \emph{base classes}, \emph{application definitions}, -and \emph{contributed definitions} that elaborate the NeXus standard. -As a container format, NeXus allows files to be extended at any moment by -additional content, including NeXus base classes, HDF5 groups, and HDF5 datasets. - -NeXus can be used for many different experimental techniques, -and at different levels of data processing. -For each of these different applications, -a specific subset of the standardized NeXus entities -(data groups and fields) is needed. -These subsets, and their hierarchical structure, are standardized -in the NeXus application definitions (Sect.~\ref{sect_appdef}). +NeXus utilizes certain design principles to make it easy to navigate even the most complex of HDF5 files. Data and associated +metadata are stored as fields within groups that have a logical (and often physical) association with the experiment (see FIG.~\ref{rawfile}). +HDF5 attributes are used to define the types, or classes, of these groups. For example, sample information is stored in a group of class \texttt{NXsample}, +instrumental information in a group of class \texttt{NXinstrument}, \textit{etc}. The beamline components that form the instrument, +such as monochromators, collimators, and detectors, are stored as sub-groups within the \texttt{NXinstrument} group. This +hierarchical structure makes NeXus extremely flexible, capable of accommodating new types of instrument as they are developed, +and extremely scalable, capable of storing data from single point-detectors to complex multi detector configurations. It can also, +just as easily, contain processed data or even theoretical simulations to be stored alongside the experimental results. + +These groups are contained within a root-level group with class \texttt{NXentry}. The \texttt{NXentry} group contains all the data from a single measurement, +which could represent data collected in a certain configuration or in a scan, so multiple measurements can be stored in separate \texttt{NXentry} +groups within a single file if needed. Each NeXus file is required to contain at least one \texttt{NXentry} group. + +Each \texttt{NXentry} group should +contain at least one \texttt{NXdata} group, which contains the measured (or processed or simulated) data along with the other information required to plot it, +\textit{e.g.}, the plotting axis or axes. The NeXus design allows default plots of \texttt{NXdata} groups to be generated without any prior knowledge of the +type of measurement. This feature was implemented in NeXus before HDF5 introduced dimension scales, which provide similar functionality. + +As well as defining a logical group structure, NeXus provides a dictionary of names that can be used to define specific fields within each class of +groups. For example, if the sample temperature is stored, the NeXus standard specifies that it should be called \texttt{temperature} and stored in +the \texttt{NXsample} group. These names are documented in the NeXus base class definitions (Sect.~\ref{sect_baseclasses}). It should be stressed that +it is not necessary for a particular NeXus file to contain every item defined for each base class; the base classes just define the names that should be +used when they are present. However, certain applications may require particular +items to be present for specific types of data analysis. For each of these different applications, a specific subset of the standardized NeXus entities +(data groups and fields) are standardized in the NeXus application definitions (Sect.~\ref{sect_appdef}). + +The combination of a well-defined hierarchy of groups with a comprehensive and well-documented dictionary of data and metadata names ensures +that NeXus files are self-describing. It should be possible for another scientist to understand the contents of a NeXus file without consulting +documentation specific to any one facility or beamline. By enabling the storage of comprehensive metadata, the NeXus format facilitates the +sharing of data between collaborators and long-term data curation. \section{File Hierarchies} -NeXus data files are organized into a hierarchy of groups which, in turn, can contain further groups or fields, -very much like an internal file system. The possible contents of each NeXus group are defined by a base class, while an application definition, -or a contributed definition, is used to specify which of these fields and groups are required for a particular type of analysis. \subsection{Raw Data File Hierarchy} @@ -195,14 +197,14 @@ \subsection{Raw Data File Hierarchy} } \end{figure} -A major focus of NeXus has been the recording of \emph{raw} experimental data, i.e. information taken directly from the experimental +A major focus of NeXus has been the recording of \emph{raw} experimental data, \textit{i.e.}, information taken directly from the experimental equipment or processed only as required to provide physically meaningful values. The NeXus raw data file hierarchy is the consequence of some practical considerations. An overview of the NeXus data file structure for raw experimental data is shown in FIG.~\ref{rawfile}. When looking at a beamline, it is easy to -discern different components: beam optic components, sample position, detectors, etc. It is quite natural to replicate this physical +discern different components: beam optic components, sample position, detectors, \textit{etc}. It is quite natural to replicate this physical separation with a logical arrangement, in which metadata from each component are stored a separate group. This approach explains the list of beamline components in the \texttt{NXinstrument} group presented in FIG.~\ref{rawfile}. As there can be multiple instances of the same kind of equipment, like slits or detectors, in a given beamline, it becomes necessary @@ -226,22 +228,26 @@ \subsection{Raw Data File Hierarchy} also contain plottable data, it uses the same attribute scheme to associate the monitor data with its plotting axes. Its location in the \texttt{NXentry} group facilitates quick inspection for beamline diagnostics. +Most NeXus files will also contain a \texttt{NXsample} group containing information about the sample being measured in the experiment, \textit{e.g.}, +its chemical composition, mass, unit cell parameters, \textit{etc}. It may also contain information about the sample environment, such as +temperature or pressure. If one or more of these parameters is varied in an experiment, these could be used as scanned variables (see +Section III.A). + A special base class, \texttt{NXcollection}, exempts its contents from validation and thereby allows inclusion of whatever data in arbitrary non-NeXus formats. \subsubsection{Multiple Method Instruments} -Particularly at X-ray sources, -some instruments offer multiple techniques that can be used in parallel. +Some instruments, particularly at X-ray sources, offer multiple techniques that can be used in parallel. For example small-angle scattering and powder diffraction can be measured simultaneously at a SAXS/WAXS beamline. We recommend storing the data from all methods in \emph{one} file, in a \emph{single} \texttt{NXentry} hierarchy -(FIG.~\ref{multimethod}). All information from all detectors, logs and -such are collected in this one \texttt{NXentry} group to keep the data together. -Information that is particular for one experimental technique -is linked into a \texttt{NXsubentry}. The \texttt{NXsubentry} follows the hierarchy of -\texttt{NXentry}. But it will typically only link to the data required by the +(FIG.~\ref{multimethod}). All information from detectors, logs, \textit{etc}., + are collected in this one \texttt{NXentry} group to keep the data together. +Information that is peculiar to one experimental technique +is linked into a \texttt{NXsubentry}. The \texttt{NXsubentry} follows the hierarchy of +\texttt{NXentry}, but it will typically only link to the data required by the application definition for the specific experimental technique. The point of this scheme is that both humans and computerized users can easily locate method-specific data while maintaining the full view of the experiment. @@ -282,7 +288,8 @@ \subsubsection{Scans} \end{itemize} NeXus allows multi-dimensional scans too. This makes it very simple to produce meaningful slices through data -volumes even with NeXus-agnostic software ({\it e.g.} HDFView\cite{hdfview}). +volumes, whether the software is designed for NeXus (\textit{e.g.}, NeXpy\cite{nexpy}) or NeXus-agnostic + (\textit{e.g.}, HDFView\cite{hdfview}). % FIXME: this pathology is not necessary to describe, not unique to NeXus, too much detail for this manuscript %Interrupting a multi-dimensional scan may, depending %on the software used, leave some of the data in an uninitialised state (usually the HDF5 fill value). @@ -306,7 +313,7 @@ \subsection{Processed Data} The hierarchy is much reduced as it is not important to carry all experimental information in the data reduction. In contrast to the raw data file structure, \texttt{NXdata} in the processed file structure is the place -to store the results of the processing, together with its associated axes if the result is a multi-dimensional array. +to store the results of the processing, together with its associated axis or axes. In addition to the \texttt{NXdata} and \texttt{NXsample} groups, the \texttt{NXprocess} group provides structure to store details @@ -319,10 +326,10 @@ \section{Coordinate Systems, Positioning of Components and Further Rules} For data reduction, it is often necessary to know the exact position and orientation of beamline components. The first thing needed is a reference coordinate system. NeXus chose to use the same coordinate system as the -neutron beamline simulation software McStas\cite{mcstas}. +neutron beamline simulation software, McStas\cite{mcstas}. -For describing the placement and orientation of components, NeXus stores the same information as is used for the -same purpose in the Crystallographic Interchange Format (CIF)\cite{ITCVG}. CIF (and NeXus) stores the details +For describing the placement and orientation of components, NeXus stores the same information as the +Crystallographic Interchange Format (CIF)\cite{ITCVG}. CIF (and NeXus) stores the details of the translations and rotations necessary to move a given component from the zero point of the coordinate system to its actual position. As coordinate transformations are not commutative, the order of transformations must also be stored. @@ -342,6 +349,7 @@ \section{Coordinate Systems, Positioning of Components and Further Rules} \section{Base Classes} + \label{sect_baseclasses} As can be seen from the discussion of the NeXus file hierarchy, NeXus arranges data in groups which have a @@ -350,7 +358,7 @@ \section{Base Classes} The term \emph{base class} is not used in the same sense as in object-oriented programming languages; in particular, there is no inheritance. The NeXus base classes provide a comprehensive dictionary of terms -that can be used for each class. +that can be used in each class. The terms in the dictionary comprise concepts and names common to the topic of the base class. The expected spelling and definition of each term is specified in the base classes. It is neither expected nor required to provide all the terms specified in a base class. @@ -371,11 +379,10 @@ \section{Base Classes} These decisions can be standardized in the form of application definitions (see below, Sect.~\ref{sect_appdef}). -The NeXus base classes are encoded in NeXus Description Language (NXDL)\cite{nxman}. NXDL is -just another form of an XML file that specifies the content of a NeXus base class. -NXDL files may be parsed either by humans or by software and -may be validated for syntax and content. The NXDL files are used to validate the structure of -NeXus data files. Java source code of a GUI tool has been prepared\cite{nxvalidate} to perform such validation.% +The NeXus base classes are defined in XML files using the NeXus Description Language (NXDL)\cite{nxman}. +NXDL files may be parsed either by people or by software and +may be validated for syntax and content. The NXDL files may be used to validate the structure of +NeXus data files. GUI tools have been prepared\cite{nxvalidate} to perform such validation.% % The JAR file available, but it needs maintenance and vastly improved documentation how to use it % before it is ready for general release. % TODO: *** good HIGH-PRIORITY item for 2014 Code Camp *** @@ -390,15 +397,15 @@ \section{Application Definitions} For each group, a \emph{minimum} content is specified. Application definitions are therefore different than base class definitions, which specify a comprehensive -dictionary of terms that can be used. +dictionary of terms that can be used but does not specify which are required. Historically, an application definition addressed one type of instrument, -like X-ray reflectometer, or direct-geometry neutron time-of-flight spectrometer. +like an X-ray reflectometer or direct-geometry neutron time-of-flight spectrometer. Thus, application definitions were originally named \emph{instrument definitions}. -However, as NeXus can also be used for processed data -like a tomography reconstruction or a dynamic scattering law $S(Q,\omega)$, -the more generic term \emph{application definition} has been adopted. - +However, the same instrument can be used for different types of analysis that require different +experimental variables; \textit{e.g.}, a powder diffractometer could be used for Rietveld +refinements or pair-distribution-function analysis. The more generic term \emph{application definition} has +been adopted to signify what data are required for each type of data analysis. \section{Contributed Definitions} \label{sect_contribdef} @@ -417,7 +424,7 @@ \section{Contributed Definitions} All such proposals from the scientific community to extend NeXus with new application definitions and base classes are added to NeXus, initially, as contributed definitions either in incubation -or a special case not for general use. The NIAC is charged to +or as a special case not for general use. The NIAC is charged to review any new contributed definitions and provide feedback to the authors before ratification and acceptance. @@ -440,17 +447,17 @@ \section{Governance} \section{Uptake of NeXus} NeXus is already in use as the main data format at many facilities including Soleil, Diamond, SINQ, SNS, Lujan/LANL -and KEK. Other facilities including ISIS, DESY and the $\mu$SR community are in the process of moving towards -NeXus as their data format. At LBNL, NeXus is currently being adapted for XFEL serial crystallographic data. -APS is storing some of its data collection using NeXus. +and KEK. Other facilities including ISIS, DESY, and the $\mu$SR community are in the process of moving towards +NeXus as their data format. At LBNL, NeXus is currently being adapted for XFEL +serial crystallographic data. The APS is using it for some techniques. The EPICS\cite{epicsad} area detector software has a plug-in to write acquired images into NeXus data files. Also, some commercial manufacturers of area detectors now write acquired images into NeXus data files. % NOTE: do NOT name the companies or else we must add disclaimers to the bottom of the manuscript -The adoption of NeXus has taken some time. The reason is that NeXus is often chosen whenever +The adoption of NeXus has taken some time. The reason is that partly NeXus is often chosen whenever a facility starts operation or undergoes major refurbishments. For those facilities where there is an existing and working pipeline from data acquisition to data analysis, the resources are usually lacking to move -towards NeXus as the only data file format. +towards NeXus as the only data file format. This is reflected in the experience of the muon community. For the ISIS source, the move to a Windows PC-based data acquisition system in 2002 required a new data format, providing an ideal opportunity to exploit the emerging NeXus standard\cite{muon1}. In From 85cf15b33704187537d535e43b8f7aab49449234 Mon Sep 17 00:00:00 2001 From: Ray Osborn Date: Thu, 11 Sep 2014 10:27:52 -0500 Subject: [PATCH 6/8] Restored HDF5 storage elements MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit These are a few changes resulting from the comments by Pete Jemian. I have restored a sentence about using basic HDF5 storage elements, changed ‘peculiar’ to ‘particular’, modified the sentence on using NXDL for validation, and added ‘according to context’ in the application definition section. --- 2014/csipaper/nexus14aip.tex | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/2014/csipaper/nexus14aip.tex b/2014/csipaper/nexus14aip.tex index 2d67203..af37ba5 100644 --- a/2014/csipaper/nexus14aip.tex +++ b/2014/csipaper/nexus14aip.tex @@ -21,7 +21,7 @@ % Use the file aiptemplate.tex as a template for your document. \documentclass[% aip, -%jmp,% +%jmp,%!TEX encoding = UTF-8 Unicode %bmf,% rsi, amsmath,amssymb, @@ -156,7 +156,9 @@ \section{Introduction} \section{Design Principles} -NeXus utilizes certain design principles to make it easy to navigate even the most complex of HDF5 files. Data and associated +NeXus data files are built using basic HDF5 storage elements, \textit{i.e.}, groups (like file system folders), fields (such as strings, floats, integers, and arrays), +attributes (additional descriptors of groups and fields), and links (like file system links), but follows certain design principles to make it easy to navigate even the +most complex of HDF5 files. Data and associated metadata are stored as fields within groups that have a logical (and often physical) association with the experiment (see FIG.~\ref{rawfile}). HDF5 attributes are used to define the types, or classes, of these groups. For example, sample information is stored in a group of class \texttt{NXsample}, instrumental information in a group of class \texttt{NXinstrument}, \textit{etc}. The beamline components that form the instrument, @@ -165,7 +167,7 @@ \section{Design Principles} and extremely scalable, capable of storing data from single point-detectors to complex multi detector configurations. It can also, just as easily, contain processed data or even theoretical simulations to be stored alongside the experimental results. -These groups are contained within a root-level group with class \texttt{NXentry}. The \texttt{NXentry} group contains all the data from a single measurement, +NeXus groups are contained within a root-level group with class \texttt{NXentry}. The \texttt{NXentry} group contains all the data from a single measurement, which could represent data collected in a certain configuration or in a scan, so multiple measurements can be stored in separate \texttt{NXentry} groups within a single file if needed. Each NeXus file is required to contain at least one \texttt{NXentry} group. @@ -245,7 +247,7 @@ \subsubsection{Multiple Method Instruments} in a \emph{single} \texttt{NXentry} hierarchy (FIG.~\ref{multimethod}). All information from detectors, logs, \textit{etc}., are collected in this one \texttt{NXentry} group to keep the data together. -Information that is peculiar to one experimental technique +Information that is particular to one experimental technique is linked into a \texttt{NXsubentry}. The \texttt{NXsubentry} follows the hierarchy of \texttt{NXentry}, but it will typically only link to the data required by the application definition for the specific experimental technique. The point of this scheme @@ -381,8 +383,8 @@ \section{Base Classes} The NeXus base classes are defined in XML files using the NeXus Description Language (NXDL)\cite{nxman}. NXDL files may be parsed either by people or by software and -may be validated for syntax and content. The NXDL files may be used to validate the structure of -NeXus data files. GUI tools have been prepared\cite{nxvalidate} to perform such validation.% +used to validate NeXus files for syntax and content. GUI tools have been prepared\cite{nxvalidate} to +perform such validation.% % The JAR file available, but it needs maintenance and vastly improved documentation how to use it % before it is ready for general release. % TODO: *** good HIGH-PRIORITY item for 2014 Code Camp *** @@ -397,7 +399,7 @@ \section{Application Definitions} For each group, a \emph{minimum} content is specified. Application definitions are therefore different than base class definitions, which specify a comprehensive -dictionary of terms that can be used but does not specify which are required. +dictionary of terms that can be used according to the context. Historically, an application definition addressed one type of instrument, like an X-ray reflectometer or direct-geometry neutron time-of-flight spectrometer. From 3e257cf5aaf75fd0d7ab4a17df0cd5df48ea6e5a Mon Sep 17 00:00:00 2001 From: Ray Osborn Date: Thu, 11 Sep 2014 10:32:32 -0500 Subject: [PATCH 7/8] Changed \textit to \emph MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Also, corrected ‘follows’ to ‘follow’ in the first sentence of the Design Principles. --- 2014/csipaper/nexus14aip.tex | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/2014/csipaper/nexus14aip.tex b/2014/csipaper/nexus14aip.tex index af37ba5..4b9be3e 100644 --- a/2014/csipaper/nexus14aip.tex +++ b/2014/csipaper/nexus14aip.tex @@ -130,7 +130,7 @@ \section{Introduction} home-grown data formats. This scheme has a number of drawbacks addressed by NeXus: \begin{itemize} \item It makes the life of traveling scientists unnecessarily difficult as they must deal with multiple files - in different formats, file converters, \textit{etc}., in order to extract scientific information from the data. + in different formats, file converters, \emph{etc}., in order to extract scientific information from the data. \item An unnecessary burden is imposed on data analysis software producers to accommodate many different formats. \item The whole idea of open access to data is sabotaged if the data is in a format that cannot be easily understood. \item Scientific integrity is jeopardized if the data cannot be understood or important elements are missing. @@ -156,12 +156,12 @@ \section{Introduction} \section{Design Principles} -NeXus data files are built using basic HDF5 storage elements, \textit{i.e.}, groups (like file system folders), fields (such as strings, floats, integers, and arrays), -attributes (additional descriptors of groups and fields), and links (like file system links), but follows certain design principles to make it easy to navigate even the +NeXus data files are built using basic HDF5 storage elements, \emph{i.e.}, groups (like file system folders), fields (such as strings, floats, integers, and arrays), +attributes (additional descriptors of groups and fields), and links (like file system links), but follow certain design principles to make it easy to navigate even the most complex of HDF5 files. Data and associated metadata are stored as fields within groups that have a logical (and often physical) association with the experiment (see FIG.~\ref{rawfile}). HDF5 attributes are used to define the types, or classes, of these groups. For example, sample information is stored in a group of class \texttt{NXsample}, -instrumental information in a group of class \texttt{NXinstrument}, \textit{etc}. The beamline components that form the instrument, +instrumental information in a group of class \texttt{NXinstrument}, \emph{etc}. The beamline components that form the instrument, such as monochromators, collimators, and detectors, are stored as sub-groups within the \texttt{NXinstrument} group. This hierarchical structure makes NeXus extremely flexible, capable of accommodating new types of instrument as they are developed, and extremely scalable, capable of storing data from single point-detectors to complex multi detector configurations. It can also, @@ -173,7 +173,7 @@ \section{Design Principles} Each \texttt{NXentry} group should contain at least one \texttt{NXdata} group, which contains the measured (or processed or simulated) data along with the other information required to plot it, -\textit{e.g.}, the plotting axis or axes. The NeXus design allows default plots of \texttt{NXdata} groups to be generated without any prior knowledge of the +\emph{e.g.}, the plotting axis or axes. The NeXus design allows default plots of \texttt{NXdata} groups to be generated without any prior knowledge of the type of measurement. This feature was implemented in NeXus before HDF5 introduced dimension scales, which provide similar functionality. As well as defining a logical group structure, NeXus provides a dictionary of names that can be used to define specific fields within each class of @@ -199,14 +199,14 @@ \subsection{Raw Data File Hierarchy} } \end{figure} -A major focus of NeXus has been the recording of \emph{raw} experimental data, \textit{i.e.}, information taken directly from the experimental +A major focus of NeXus has been the recording of \emph{raw} experimental data, \emph{i.e.}, information taken directly from the experimental equipment or processed only as required to provide physically meaningful values. The NeXus raw data file hierarchy is the consequence of some practical considerations. An overview of the NeXus data file structure for raw experimental data is shown in FIG.~\ref{rawfile}. When looking at a beamline, it is easy to -discern different components: beam optic components, sample position, detectors, \textit{etc}. It is quite natural to replicate this physical +discern different components: beam optic components, sample position, detectors, \emph{etc}. It is quite natural to replicate this physical separation with a logical arrangement, in which metadata from each component are stored a separate group. This approach explains the list of beamline components in the \texttt{NXinstrument} group presented in FIG.~\ref{rawfile}. As there can be multiple instances of the same kind of equipment, like slits or detectors, in a given beamline, it becomes necessary @@ -230,8 +230,8 @@ \subsection{Raw Data File Hierarchy} also contain plottable data, it uses the same attribute scheme to associate the monitor data with its plotting axes. Its location in the \texttt{NXentry} group facilitates quick inspection for beamline diagnostics. -Most NeXus files will also contain a \texttt{NXsample} group containing information about the sample being measured in the experiment, \textit{e.g.}, -its chemical composition, mass, unit cell parameters, \textit{etc}. It may also contain information about the sample environment, such as +Most NeXus files will also contain a \texttt{NXsample} group containing information about the sample being measured in the experiment, \emph{e.g.}, +its chemical composition, mass, unit cell parameters, \emph{etc}. It may also contain information about the sample environment, such as temperature or pressure. If one or more of these parameters is varied in an experiment, these could be used as scanned variables (see Section III.A). @@ -245,7 +245,7 @@ \subsubsection{Multiple Method Instruments} can be measured simultaneously at a SAXS/WAXS beamline. We recommend storing the data from all methods in \emph{one} file, in a \emph{single} \texttt{NXentry} hierarchy -(FIG.~\ref{multimethod}). All information from detectors, logs, \textit{etc}., +(FIG.~\ref{multimethod}). All information from detectors, logs, \emph{etc}., are collected in this one \texttt{NXentry} group to keep the data together. Information that is particular to one experimental technique is linked into a \texttt{NXsubentry}. The \texttt{NXsubentry} follows the hierarchy of @@ -290,8 +290,8 @@ \subsubsection{Scans} \end{itemize} NeXus allows multi-dimensional scans too. This makes it very simple to produce meaningful slices through data -volumes, whether the software is designed for NeXus (\textit{e.g.}, NeXpy\cite{nexpy}) or NeXus-agnostic - (\textit{e.g.}, HDFView\cite{hdfview}). +volumes, whether the software is designed for NeXus (\emph{e.g.}, NeXpy\cite{nexpy}) or NeXus-agnostic + (\emph{e.g.}, HDFView\cite{hdfview}). % FIXME: this pathology is not necessary to describe, not unique to NeXus, too much detail for this manuscript %Interrupting a multi-dimensional scan may, depending %on the software used, leave some of the data in an uninitialised state (usually the HDF5 fill value). @@ -405,7 +405,7 @@ \section{Application Definitions} like an X-ray reflectometer or direct-geometry neutron time-of-flight spectrometer. Thus, application definitions were originally named \emph{instrument definitions}. However, the same instrument can be used for different types of analysis that require different -experimental variables; \textit{e.g.}, a powder diffractometer could be used for Rietveld +experimental variables; \emph{e.g.}, a powder diffractometer could be used for Rietveld refinements or pair-distribution-function analysis. The more generic term \emph{application definition} has been adopted to signify what data are required for each type of data analysis. From 75e61c733054b8adffc4b5a5c78268f6c3f73b0f Mon Sep 17 00:00:00 2001 From: Ray Osborn Date: Thu, 18 Sep 2014 09:53:05 -0500 Subject: [PATCH 8/8] Updated DOE acknowledgement --- 2014/csipaper/nexus14aip.tex | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/2014/csipaper/nexus14aip.tex b/2014/csipaper/nexus14aip.tex index 4b9be3e..6438b0e 100644 --- a/2014/csipaper/nexus14aip.tex +++ b/2014/csipaper/nexus14aip.tex @@ -491,8 +491,10 @@ \section{Summary} \begin{acknowledgments} The NIAC acknowledges the support of all the institutions contributing to NeXus and their respective -funding agencies, most notably DOE, NIH and NSF in the US. The development of the muSR NeXus data format was partly -funded by the European Commission within the sixth Framework Programme. +funding agencies, most notably DOE, NIH and NSF in the US. Work at Argonne was supported by the +Scientific User Facilities Division and the Materials Science and Engineering Division, +Basic Energy Sciences, Office of Science, U.S. Department of Energy. The development of the muSR +NeXus data format was partly funded by the European Commission within the sixth Framework Programme. \end{acknowledgments} \nocite{*}