Skip to content
This repository was archived by the owner on Jul 26, 2022. It is now read-only.

Integration Options

massfords edited this page Apr 26, 2015 · 18 revisions

Overview

This page is an attempt at defining how teams will integrate. Based on a brief meeting with team World, we think there are three levels of integration that we could offer.

  • Level 1: CSV files only
  • Level 2: JSON files
  • Level 3: Service calls

The minimum requirement is Level 1. We'll have some mechanism where component A from a team will export the proscribed format and component B from another team will be able to parse that format and use the data.

CSV Format

The following rules must be supported by all CSV parsers and formatters

  • lines beginning with a # are comments and not part of the data
  • all rows must have the same number of columns
  • null values are represented by all whitespace chars for the cell
  • a value read from a cell is trimmed before becoming data
  • timestamps are represented as ISO 8601 Date-Time values

Example

# This is a comment
#
# The next two rows have the same data since whitespace is trimmed
0,  0,  1,  2, 3 , Compound-1 ,  2015-01-02T10:30:20Z
0,0,1,2,3,Compound-1,2015-01-02T10:30:20Z
#
# The next row doesn't have a compound, just empty space for that col
0,  0,  1,  2, 3 ,  ,  2015-01-02T10:30:20Z

Plate Map

Models the user's layout of a plate. This format is the intended output of the Plate Map Wizard feature from another team but prior to the point where it is merged with the Compound Mappings File. The plate mapping process allows the user to create a logical mapping of wells on a plate with symbolic names for their contents.

There will be a special label value named "compound" that tells us which compound to use.

Column Data Type Required Description
row int yes row field for the well coordinate
col int yes col field for the well coordinate
well-type COMP, POS, NEG, EMPTY yes indicates the type of the substance in the well
label-name string no user supplied label name
label-value string no user supplied label value

You can specify multiple labels for a well by repeating the well's coordinates in 2 or more rows OR by repeating the label's name/value pairings.

Example

    #row, col, well-type, (label-name, label-value)+
    #
    # This is a sparse CSV. We don't need to define entries for every single cell.
    #
    0, 0, NEG, compound, A1
    0, 0, NEG, compound, B1
    0, 1, COMP, compound, C1, other, foo
    0, 2, COMP, compound, C1
    0, 3, COMP, compound, E1
    0, 49, EMPTY, , 
    2, 0, COMP, compound, G1
    2, 1, COMP, compound, G1
    2, 2, COMP, compound, H1
    2, 3, COMP, compound, H1
    2, 49, EMPTY,,
    50,50,EMPTY,,

Implementation Details

Resource Description
edu.harvard.we99.services.io.PlateMapCSVReader Reads the Plate Map format
edu.harvard.we99.services.io.PlateMapCSVReaderTest Unit test converting the CSV format into WE99 Domain Objects
/PlateMapCSVReaderTest/plate-mappings.csv Test input file
platemapping.xml Bean IO mapping config file

Plate Map with Doses

TBD - update the wiki to define this format

Plate with Doses and Compounds

Models the user's layout of a plate and also includes details on the contents of the wells. This format is the intended output of the Plate Map Wizard feature from another team. The plate mapping process allows the user to create a logical mapping of wells on a plate with symbolic names for their contents. A secondary data file provides the mapping of the symbolic names to actual compounds and doses. This interchange format represents the combination of those two elements in order to provide an input file that is suitable for creating a new Plate instance for an experiment.

Column Data Type Required Description
row int yes row field for the well coordinate
col int yes col field for the well coordinate
well-type COMP, POS, NEG, EMPTY yes indicates the type of the substance in the well
label-name string no user supplied label name
label-value string no user supplied label value
compound string yes name of the compound in the well
quantity int yes works with units to specify the amount of the substance
units MILLIMOLAR, MICROMOLAR, NANOMOLAR, PICOMOLAR yes works with quantity to specify the amount of the substance

TBD:

  • the only standard label name is 'compound'
  • rows are additive
  • coordinate and well type are required and well type cannot change
  • labels and compounds are optional and additive to a map of labels/compounds each keyed by their names

Example

    #row, col, well-type, label-name, label-value, Compound, quantity, [units, defaults to MICRO]
    #
    # This is a sparse CSV. We don't need to define entries for every single cell.
    #
    0, 0, NEG, temp, 20, H20, 5.0, MICROMOLAR
    0, 0, NEG, temp, 20, NaCl, 1.0, MICROMOLAR
    0, 1, COMP, A1, 123, Cx, 1.0, MICROMOLAR
    0, 2, COMP, A1, 123, Cx, 1.0, MICROMOLAR
    0, 3, COMP, A1, 123, Cx, 1.0, MICROMOLAR
    0, 49, EMPTY,,,,
    2, 0, COMP, B, 123, Cx, 1.0, MICROMOLAR
    2, 1, COMP, B, 123, Cx, 1.0, MICROMOLAR
    2, 2, COMP, B, 123, Cx, 1.0, MICROMOLAR
    2, 3, COMP, B, 123, Cx, 1.0, MICROMOLAR
    2, 49, EMPTY,,,,
    50,50,EMPTY,,,,

Example with repeated values

There are two options for specifying multiple Compounds or multiple Labels within a well mapping. The first is shown above where we repeat the entry for the 0,0 well with different compounds.

The second option is shown below. In this option the CSV repeats the entries for the Compound fields to specify both water and salt for the well at 0,0.

    #row, col, well-type, label-name, label-value, Compound, quantity, units
    0, 0, NEG, A, 123, H20, 5, PPM, NaCl, 1, PPM

Implementation Details

Resource Description
edu.harvard.we99.services.io.PlateCSVReader Reads the Plate format
edu.harvard.we99.services.io.PlateCSVReaderTest Unit test converting the CSV format into WE99 Domain Objects
/PlateCSVReaderTest/input.csv Test input file for a single plate
/PlateCSVReaderTest/input-multi.csv Test input file for multiple plates
plate.xml Bean IO mapping config file

Plate Result: Assay Result Interchange Format (ARIF)

TODO:

  • split label into label-name, label-value
  • remove measuredAt

Models the output from a device. The device would have accepted the PlateMap CSV above and output something like the records defined here.

Column Data Type Required Description
row int yes row field for the well coordinate
col int yes col field for the well coordinate
value double yes value computed by the device
label string no label for the value. Some devices may compute multiple values for a well so the label is useful for disambiguating.
measuredAt iso8601 no timestamp for when the sample was taken

Example

    # row, col, value, labels measuredAt
    0, 0, 0.0, 0.1, 0.2, A, 2015-01-02T10:20:30.100Z
    0, 0, 0.0, 0.1, 0.2, , 2015-01-02T10:20:30.100Z
    0, 0, 0.0, 0.1, 0.2, , 2015-01-02T10:20:30.100Z
    #
    0, 1, 1.0, ,2015-01-02T10:20:30.100Z
    0, 2, 2.0, ,2015-01-02T10:20:30.100Z
    0, 3, 3.0, ,2015-01-02T10:20:30.100Z
    0, 4, 4.0, ,2015-01-02T10:20:30.100Z
    #
    1, 0, 10.0, ,2015-01-02T10:20:30.100Z
    1, 1, 11.0, ,2015-01-02T10:20:30.100Z
    1, 2, 12.0, ,2015-01-02T10:20:30.100Z
    1, 3, 13.0, ,2015-01-02T10:20:30.100Z
    1, 4, 14.0, ,2015-01-02T10:20:30.100Z

Implementation Details

Resource Description
edu.harvard.we99.services.io.PlateResultCSVReader Reads the ARIF format
edu.harvard.we99.services.io.PlateResultCSVReaderTest Unit test converting ARIF format into WE99 Domain Objects
/PlateResultServiceCSVTest/results-single.csv Test input file for a single plate
/PlateResultServiceCSVTest/results-multi.csv Test input file for multiple plates
resultsmapping.xml Bean IO mapping config file

Plate Result: Matrix Format

This format is based on the sample files from the course web site. Note that these files are not necessarily CSV formatted. See the table below for how they are laid out.

Format Plates Rows Cols Delim Description
Envision single 16 24 whitespace Lots of metadata at the top of the file. Each of the wells is identified by a letter row (A-P) and a column header (01-24)
HTS single 16 24 whitespace Data only with row identifier (a-p) and column header (1-24)
Kinase single 16 24 comma Data only with row identifier (A-P) and column header (1-24)
Multiplate multi 16 24 whitespace Data only with row identifier (A-P) and column header (1-24)

Implementation Details

Resource Description
edu.harvard.we99.services.io.MatrixParser Reads the source file into a PlateResult
edu.harvard.we99.services.io.MatrixParserTest Unit test converting the raw file into WE99 Domain Objects
/MatrixParserTest See this folder for samples of each of the files above and the resulting JSON
edu.harvard.we99.services.io.PlateResultCollector Interface that in conjunction with the MatrixParser to collect the results into a single plate or multiple plates based on the implementation that is passed in. This allows us to support loading multiple sample results into a single plate or load multiple plate results at once.

Clone this wiki locally