Skip to content

Development

Asis Hallab edited this page Oct 21, 2021 · 3 revisions

Integration of Map-Man and Mercator

The MapMan-Bin Ontology is used to inform in a standardized and machine readable form about the molecular functions of a plant protein. Mercator is a tool to assign MapMan Bin annotations to query proteins. MapMan is a visualization tool, that enables the user to get an overview of which gene functions are assigned to the query proteome and also highlight continuous values of the query proteins, e.g. expression counts, log2-fold-change values, etc.

Resources

Implementation Specification

We want to integrate MapMan (a Gene Function Visualization Tool) into Gene-Expression-Plots (GXP). MapMan Bins form an ontology like Gene Ontology. MapMan Bins (words from this gene function ontology) are annotated using Hidden Markov Models i.e. HMMER3. This is what our Mercator pipeline does. Results for a proteome can be visualized with MapMan. So distinguish: MapMan, the app, ManMan (Bins) the ontology, and Mercator, the tool that annotates MapMan Bins. MapMan is a Java app that actually does nothing more than the following:

  1. Input:
  • MapMan-Bin annotation table for a query proteome - this table comes from Mercator,
  • Metabolic Pathway Sketch Image (PNG)
  • Image Coordinates (XML)

The latter two can be downloaded here

  1. Use the coordinates (XML) to visualize the Map Man annotations of the query proteome (Mercator table) in the image at a given location. At given location, i.e. the annotated Map Man bin, one box per query protein annotated with this bin. The box can be colored, e.g. with the log2-fold-change from an analysis to determine differential expression. Such visualizations give a very good overview of the functions represented in the whole query proteome. Furthermore, there is already a large variety of such mappings that can be used.

A D3 based implementation of MapMan has been made by Björn. Caution: Alpha status! But we could certainly make good use of it, to offer MapMan in GXP. It can be found here and uses this code (link already points to the corresponding important file in the repo): github.com/usadellab/MapManJS/blob/master/ultramicro.html

Coordinates

The coordinate information of where to put MapMan-Bin annotations as boxes, one per annotated gene, inside the SVG is contained in the respective XML files, one for each image. These files follow a well defined structure. The root tag is Image and has children DataArea, with properties x, y specifying the coordinates. Property blockFormat specifies the maximum width either in the horizontal, e.g. x10, or vertical, e.g. y8 axis. The number behind the x or y indicating the maximum number of boxes to be drown before starting the next row or column, respectively.

Cards / Single units of programming work

Setup

With the help of Constantin setup your work environment: Clone the project, install dependencies, get it running on your computer, setup your editor, create a new React Component / plot to visualize MapMan annotations.

MapMan Plots

Input

Create a new button in the "plots" sub-menu that says "MapMan function sketch" (or something similar). When this button is clicked a menu appears in an overlay. The menu offers the user a few selections, some of them are required input data, which come with default values.

Note that at this point we expect the info_table.txt to contain a column with the MapMan-Bin annotations of the transcripts. By default this column can be expected to be called MapMan_BINCODE. The column will contain one or more MapMan Bin annotations for each transcript. Most query proteins receive just a single MapMan Bin annotation by Mercator, but some get more. Thus, the second input parameter in the form must be a separator-character which by default should be ",".

Form summary

Input parameters the user needs to give and their defaults:

  • Info-Table column, default: MapMan_BINCODE
  • Info-Table column separator character, default: ,
  • Sketch-Template, this is a HTML-select that offers all Images present in the MapMan store (see above). Note, that for now we only offer the images of version "to be announced".

The form shall have a section in which the user selects numeric values to be used to color the boxes in a MapMan plot. These numeric continuous values can come either from the expression data or from the gene information data. The user can select between "gene-expression-counts" and "gene-info-column" (or something similar). If the user chooses the former, i.e. expression counts, he is then shown another select menu that enables her/him to choose which "sample" to use - see this section in the manual for details on what a "sample" exactly is in GXP. If the user selects the latter, i.e. a column in the info table, another select menu appears, which shows all column names in the info_table.txt table. The selected column is expected to hold numeric continuous data that can be translated into a numeric continuos scale ranging from minimum ("blue") to maximum ("red").

Now, if the user has all these values selected and clicks on "load" the respective MapMan plot is generated.

Plot / Output

Based on the alpha version already done using D3.js create a new React.js component just like the other plots that visualizes the MapMan sketch as the user selected.

Coordinates Parser

Write a function or module that parses the XML coordinates file matching the selected image or template. You can use the browser specific DOMParser and just return the parsed content, which will enforce you later to work with the special HTMLCollection type. You can also, if more convenient parse the HTMLCollection into a plain Javascript object and return that. Probably working with the HTMLCollection is better. In that case the Coordinates-Parser module should have functions for all read access required by the MapMan-JS integration into GXP. E.g. a function getDataAreas( htmlCollection ) returns the XML representation of all contained DataArea tags. Another function getCoordinates( dataAreaTag ) returns a simple Javascript object with the x and y coordinates. Implement as you feel would be best.

Info-Table access

Extend the module handling the GXP info-table to find and return genes (rows) matching MapMan-Bin identifiers. getGenesForMapManBin( columnName, mapManBinId, recursive = /* true or false */). The function iterates over the argument info-table column applies regular expression matching as indicated by argument recursive and returns all matching genes. The returned genes can either be an array of Gene-Objects or an Array of row-identifiers (indices). Choose what works best in this context.