Gene function phenotype chapter WBook

Gene function curation in WormBase

Chris Grove, Gary Schindelman, Kimberly Van Auken, Karen Yook Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA.

Abstract

C. elegans has proven to be a superb organism in which to understand how a gene functions. This function can be gleaned through the descriptive analyses of animals with natural variants or mutations in a given locus and comparing them to ‘wild-type’ animals. Manipulations in gene copy number through transgenics or in expression level through RNA interference experiments, also inform on where and when a gene might function. When these analyses are combined with studies of physical interactions between gene products or genetic interactions between different loci in a single animal, a fuller picture of how a gene functions can be achieved, such as in what complex, in what pathway, and in some cases, in what order of the pathway. Because the resulting descriptions or, phenotypes, are the information currency for these gene function studies, capturing phenotypic descriptions for any and all types of analyses is a WormBase priority for our community. In addition, to make these data easy to mine and analyze, WB developed, and currently maintains, a controlled vocabulary of community driven phenotype terms to allow consistency in phenotype data collection and dissemination.

Introduction

Phenotypes are the observable physical or biochemical traits manifested by an organism and result from the interaction of the organism’s genotype with its environment. By comparing measurable effects of nucleotide difference(s) in a particular gene in one animal, to a control, 'wild-type', sequence in another animal, one can get an idea if that gene plays a role in a particular process. By looking for physical or genetic interactors of that gene, one can further define how that gene might function in the process, that is, upstream, downstream or in parallel pathways to one another. All these analyses rely on being able to assign and compare phenotypes of animals with known genomic alterations with animals containing a wild-type sequence.

WormBase (WB) curators collect, annotate, and design displays of phenotypes published or reported from gene function experiments from the worm community. As part of the curation process, phenotypes reported by the community are translated into computationally amenable terms that are part of a hierarchical controlled vocabulary that make up the Phenotype Ontology. In this chapter we discuss the types of gene function data classes that are curated for phenotypes and describe the use of the Phenotype Ontology during this curation process. Finally, we also detail how you can find and mine these data for your own research purposes.

Data types annotated with phenotypes

Sequence variations that affect gene function can occur through natural processes, environmental or applied mutagens, or engineered manipulations, such as with genetic engineering technology. These types of sequence differences can have specific effects on the activity of the gene compared to a control, or starting genotype, of the animal. Loss-of-function mutations result in the absence of gene function (amorph/null/lof) or a reduction in gene function (hypomorph). Mutations in genes can also lead to gain-of-function activity, which can be an enhanced activity (hypermorph/gof), activity that antagonizes normal gene action (antimorph), or result in a completely new activity of the gene product (neomorph) (Muller, 1932). Understanding how a gene works based on a mutation requires an understanding of the effect of the mutation on the gene activity. The analysis of the resulting gene activity based on these different types of alleles is indispensable in the study of gene function, all of which can result in different phenotypes. WB curators look specifically for natural variations, or polymorphisms, and alleles that are reported to produce a measurable phenotype.

For some processes, gene function is affected by gene dosage. Gene dosage can be manipulated by introducing extra copies of the gene through the introduction of transgenes that can remain as extrachromosomal arrays or be integrated into the genome. The outcome of these studies result in a gene overexpression phenotype. Alternatively, gene levels can be altered by knocking down its expression through RNA interference (RNAi). Phenotypes from RNAi experiments are assumed to represent the reduction- or loss-of-function of the gene.

WB curators aim to collect all phenotype characterizations resulting from any of these methods and present those phenotypes in association with the responsible gene(s). Once a gene has been characterized through phenotypic analysis, researchers can further explore how it functions within a pathway by looking for genic interactors through making double, triple, etc. mutants or through treating a mutant of one gene with RNAi of another and observing if the starting phenotype is modified, such as being suppressed, enhanced, or produces a synthetic phenotype. Interaction data comprises a large and important data class in WB that covers genetic, regulatory, physical and predicted interactions. While only genetic and regulatory interactions are assigned phenotypes, the whole class itself is included here due to its importance for gene function studies.

Alleles and other genetic variations

Comparing gene activity resulting from polymorphisms among strain isolates or related species have informed nematode researchers on various aspects of biology. For example, nematode researchers have shown that a single natural polymorphism in a key gene, npr-1, can alter the behavior of feeding from one of solitary feeding and dispersing across a bacterial lawn to social feeding where the worm seeks out the most concentrated part of the bacterial lawn resulting in clumps of feeding animals (de Bono and Bargmann, 1998). When the natural polymorphism is known, a variation object ID can be assigned and a phenotype annotated; however if a single locus cannot be identified as the source of the phenotypic difference, the phenotype is assigned to the whole strain.

Genetic variations can also be created in the lab through the use of a number of different types of mutagens (e.g., chemical, radiation, transposon) (Benzer, 1977; Brenner, 1974; Lewis, 1964). As mentioned above, variations can affect gene function in many different ways and result in different phenotypes of the same gene. Thus, when phenotypes are assessed for a gene based on a sequence variation, that phenotype is attributed directly to the allele itself rather than the gene.

In addition, conditional mutations, such as heat- and cold-sensitive mutations allow the study of a gene’s function at different stages of development. Again these sensitivities are often the result of the particular sequence variation and thus are curated as an attribute of the allele itself rather than the gene.

Caveat: With the ease of whole-genome sequencing, researchers need to pay more attention to possible extragenic polymorphisms, or mutations, that might have a modifying, or stronger, contribution to the reported phenotype.

Overexpression of genes

In some cases, gene activity can be effected by the presence of additional copies of the gene. In these types of analyses, researchers use transgenic technology to increase the gene copy number by introducing constructs that contain the wild type gene sequence into the worm. Transgenic technology in C. elegans research is advanced enough to allow researchers to control the number of extra copies introduced into the worm, from a single copy integrated into the genome, such as through MOS1-mediated Single Cite Insertion (MosSCI) to multiple extra copies maintained in the animal as extrachromosomal arrays (Ex transgenes) or random integration of those arrays (Is or In transgenes) (REFs).

RNAi and genetic engineering

Gene activity can also be altered by mechanical means. While, enhanced gene activity can be induced by the introduction of extra copies of the gene through the expression of transgenes, loss- or reduction-of-function can be induced by RNA interference. Such “reverse genetics” approaches using RNA interference to target genes for knockdown expression was first adapted as a genetic tool in C. elegans in 1998 and has revolutionized science research across many fields (Fire et al. 1998; Ahringer, 2006; Kamath et al. 2000??; Boutros et al. 2004??; Cullen and Arndt, 2005; Mello and Conte, 2004). More recently any one of the above types of gene function alteration can be achieved through CRISPR/Cas9 gene targeting and related engineering methods (Cong et al. 2013). These technologies allow researchers to target specific genes for knockdown/knockout and alterations of any one of a specific gene’s activity (Fire???(1998); Friedland et al. 2013). Any phenotype reported through any of the experimental methods above is captured in WormBase (WB) and can be found on individual gene pages or on annotated phenotype pages as discussed below.

How phenotypes are curated in WB

WB curators extract phenotype information from the published literature as well as from non-published data submitted directly from members of the worm community. Curators are alerted to papers that contain phenotype data through automated text mining methods and through the help of authors. One automated method used to flag papers containing phenotype data makes use of the Support Vector Machine (SVM) algorithms (Fang et al. 2012). As an alternative, in 2009, WB curators began asking the community to help flag their papers for phenotype data, along with many other data types. This effort involved sending out forms to authors, called the Author First Pass Form, in which the author can check boxes and enter relevant data for a select number of data types. Further, authors and researchers can submit allele-phenotype data directly to WB through the allele-phenotype submission form.

Once a phenotype data has been “flagged”, curators convert community reported phenotypes to standardized, computable, phenotype terms from the Worm Phenotype Ontology (WPO) (Schindelman et al. 2011). By converting published descriptions into standardized phenotype terms, curators are making phenotype data discoverable by bioinformatics tools, which is critical for allowing the community to perform gene enrichment analyses. The WPO is currently comprised of 2226 phenotype terms, 85% of which have been used in the annotation of phenotypes associated with greater than 18,000 C. elegans genes. In WB you can use the WB Ontology Browser to easily view all variations associated with a particular phenotype term (see below).

Development of the Worm Phenotype Ontology (WPO)

The WPO constantly evolves to capture newly reported characterizations and descriptions of genotype effects, thus its development is driven by the needs of the research community. Although the WPO is “species agnostic” and the phenotype terms can be applied to variations in other related nematodes, utilization of other ontologies is necessary to integrate phenotype data from more diverse taxa. As the WPO is a pre-composed ontology (terms are already defined and placed in a hierarchical structure), these pre-composed Phenotype and Trait ontology (PATO) and the Gene Ontology (GO) (Gkoutos et al. 2005; Ashburner et al. 2000) as these traverse taxon constraints. This type of approach is coined “post-compositional” and we have begun to create such logical equivalence relationships (or cross-products) using PATO and GO (Mungall et al. 2010; Köhler et al. 2013; Schindelman et al. 2011). For example the WB (WPO) term “drug hypersensitive” (WBPhenotype:0000010) can be represented by the intersection of the PATO term “increased sensitivity of a process” (PATO:0001551) and the GO term “response to drug” (GO:0042493). This will allow interoperability across the different Model Organism Databases (MODs) and other biological databases that use these ontologies. Expanding our ontologies to be compliant with other MODs and non-nematode databases will allow our community members to more easily translate their results into a broader scientific context, which includes other organisms. However, as WB primarily serves the worm community, there is no getting around making classic worm-centric phenotype vocabulary, such as “kinker”, Muv, etc., always available, thus our ultimate vision is to have both pre- and post-composed terms represented in WB as it is useful to different communities.

How you can help

While flagging methods (automated and community-driven) help WB curators identify papers that contain phenotype data, curators must manually extract the relevant data, which can be time consuming. A better and more efficient way for data to get into WB is by researchers directly entering their data. WB is now asking the community to take part in doing their own annotation. See below (section 2.9) for a brief description of our Allele-Phenotype data submission form, whereby the research community may directly annotate alleles with phenotypes based on published papers.

How you can find and query phenotype data

Since phenotypes are reported and curated for many different data classes, there are numerous options for exploring these data in WB. First, each data type that contains phenotype information, gene, variation, RNAi, transgene, rearrangements, strains and genetic interactions, will have a phenotype widget on their respective pages. Second, phenotypes can be explored through the individual phenotype term page. Third, phenotypes and associated genes can be explored through the Worm Phenotype Ontology Browser. Finally, phenotypes are now used to predict shared gene functions in a common biological process, so phenotypes can be explored on the Process&Pathway pages. More detailed information about these page views is reviewed below.

Phenotypes on gene pages

The first, and perhaps most obvious, place to look for phenotype data is on the WB gene page, inside the Phenotypes widget (Figure 1). The Phenotypes widget has two main sections, the top section is expanded by default and contains information on observed phenotypes reported for the gene. The bottom section is collapsed by default and contains negative phenotype data, in which a phenotype was assayed but was reported as not observed for some genetic perturbation of the gene (Figure 2). Each of these two sections contains a similarly formatted table with phenotype names presented in the first column and supporting evidence listed in the second column. Each piece of supporting evidence information displays the genetic perturbation for which the listed phenotype is associated (usually an RNAi experiment or an allele) and usually has detailed information that may be viewed by expanding the “details” below each genetic perturbation reference. The detailed information minimally includes a paper reference or personal communication and often includes a remark describing the experimental results (Figure 3). Additional information may include a relevant genotype background, strain, and in the case of alleles, the mutation type (Figure 4). A recent addition to the Phenotypes widget on gene pages is the “Interaction-based phenotypes” table at the bottom of the widget, where applicable (Figure 5). This table displays all phenotypes that are reported to be affected by the gene in the context of genetic interactions (see interactions discussion below). Thus, if the gene has genetic perturbations (alleles, RNAi experiments, etc.) shown to suppress the phenotype of another gene’s genetic perturbation, for example, this phenotype would be reported in this table. These phenotypes may or may not be observed directly by perturbation of this gene alone.

Phenotypes on variation pages

Phenotype information for individual alleles (or other genetic variations) can be seen on the corresponding WB variation page in the Phenotypes widget (Figure 7). The phenotype information is displayed essentially as on the gene page, but is restricted to the phenotypes associated with the particular variation in question and includes the official phenotype description. As on the gene page, the Phenotypes widget on the variation page displays observed phenotypes in a separate table above any reported not-observed phenotypes. The variation page may be reached by clicking on the variation name in the Phenotypes widget of the gene page or by searching “for a variation” by name in the search box at the top right corner of any WB page.

Phenotypes on RNAi pages

The phenotype widget on a gene page contains superficial information regarding the association of a gene to a phenotype. Additional experimental details for RNAi experiments may be viewed on the web page for a particular RNAi experiment (Figure 6), which may be reached by clicking on the RNAi object ID (e.g. “97385”) in the supporting evidence column of the gene page Phenotypes widget (Figure 1). The RNAi experiment page includes information about which genes (and more specifically coding sequences or ‘CDSs’) are targeted as primary or secondary targets of the double-stranded RNA used for the experiment (see below for explanation), as well as relevant genotype, RNAi reagents (including dsRNA sequence used), treatment conditions and dsRNA delivery method. The phenotype information from the Phenotypes widget on the gene page (including experiment remark) is repeated in the Phenotypes widget on the RNAi page with the addition of the official description of the phenotype. The Overview widget of the RNAi page also indicates the Laboratory associated with the experiment as well as any WB curator remarks.

Figure 1: The “Phenotypes” widget on the WormBase gene page.

Figure 2: Phenotypes widget: Phenotypes NOT observed when assayed

REPLACE THIS FIGURE; the remark for e1370 has typos Figure 3: Phenotypes widget: Observed phenotypes with expanded details

Figure 4: Phenotypes widget: Observed allele phenotypes with expanded details and allele type

Figure 5: Phenotypes widget: Interaction-based phenotypes

Figure 6: Experimental details on the WormBase RNAi page

Phenotypes on transgene pages

WB captures any observable functional output resulting from the overexpression of a gene through a transgenic construct. Transgene-affiliated phenotypes are visible in the Phenotypes widget on a transgene page (Figure 8) and are displayed in a manner similar to how phenotypes are displayed on the variation page, with the addition of a “Caused by” field indicating which gene in the transgene is responsible for the phenotype indicated.

Phenotypes on rearrangement pages

As with RNAi experiments, variations, and transgenes, rearrangements (e.g. chromosomal duplications and deficiencies) affiliated with phenotypes have phenotypic information available in a Phenotypes widget (Figure 9) with supporting evidence details provided under the “details” header of the “Evidence” column.

Phenotypes on strain pages

Phenotypes are only assigned to wild natural isolate strains. [will write more, ky]

Phenotypes on interaction pages

Phenotype term pages

Last but not least, phenotype information can be accessed on an individual WB phenotype page. Each phenotype page has an RNAi widget, a Variation widget, and a Transgene widget. The RNAi widget on a phenotype page (Figure 10) lists the RNAi experiment ID in the first column, followed by the species in which the experiment was carried out, a reference to a WB sequence object used to generate dsRNA for the experiment, the strain used, the genotypic background, and any treatment conditions for the experiment. Each RNAi widget on the phenotype page will also display any assayed for but not-observed phenotypes in a table below the observed phenotypes. The Variation widget on the phenotype page displays the list of variations for which the phenotype was observed (or not observed in the table below) in column 1 followed by the associated gene, the type of variation object (e.g. allele), and the species in which the variation exists (Figure 11). The Transgene widget on the phenotype page displays the list of transgenes for which the phenotype was observed in column 1, followed by the genes for which overexpression resulted in the phenotype, and the experimental remark (Figure 12). The details in column 2 provide the paper in which the experiment was reported along with the name of the WB curator who annotated the experiment. Phenotypes reported as not-observed for transgenes are listed in a separate table below the observed transgene phenotypes.

Enrichment and mining tools to find discover gene function

Worm Phenotype Ontology Browser

In addition to the RNAi, Variation, and Transgene widgets on the phenotype page is the Ontology Browser widget, which takes advantage of our new WB Ontology Browser. With the ontology browser, a user may browse the phenotype ontology (or any WB ontology) and view direct and indirect phenotype annotations (Figure 13). For example, when viewing the Ontology Browser on the phenotype page for the phenotype “programmed cell death variant” we can see that there are 29 direct gene associations to the phenotype as well as a total of 701 direct and indirect gene associations. Indirect gene associations are any gene associations to ontological descendents of the term in question, in this case genes associated to phenotypes that are descendant terms of “programmed cell death variant”. Clicking on the gene association number directs the user to a page listing direct and indirect gene associations via RNAi and variation experiments (Figure 14).

Figure 7: The “Phenotypes” widget on the WormBase variation page

Figure 8: The “Phenotypes” widget on the WormBase transgene page

Figure 9: The “Phenotypes” widget on the WormBase rearrangement page

Figure 10: The “RNAi” widget on the WormBase phenotype page

Figure 11: The “Variation” widget on the WormBase phenotype page

Figure 12: The “Transgene” widget on the WormBase phenotype page

Figure 13: The “Ontology Browser” widget on the phenotype page

Figure 14: Ontology browser phenotype-gene associations page

Caveats to data and interpretation

Phenotype data in WB are subject to certain caveats that ought to be considered when interpreting reported phenotype results. There are often specific caveats for each experimental system, so we will discuss these accordingly.

RNAi data caveats

The first set of caveats to consider with regards to RNAi experimental data involve the experimental treatment conditions and the nature of RNA interference in the nematode strain used. It is widely known that, in general, the nervous system of Caenorhabditis elegans is refractory to RNAi treatment (Timmons et al. 2001); therefore, it is questionable whether genes intended to be knocked down specifically in the nervous system of the worm will actually be knocked down in neurons unless certain experimental strategies are implemented. For example, some researchers have begun expressing the double-stranded RNA (dsRNA) transporter SID-1 in the nervous system using a pan-neuronal promoter to allow for dsRNA to migrate to the nervous system and engage target gene mRNAs for effective silencing (Calixto et al. 2010). Others have employed general RNAi-sensitizing genetic backgrounds, including the mutations rrf-3(pk1426) and/or eri-1(mg366), in order to enhance the overall effectiveness of RNAi in worms (Zhuang and Hunter, 2011). There is also the dsRNA delivery method to consider, as it has generally been observed that injection of dsRNA into the gonads of parental (P0) worms can result in more robust phenotypes in progeny (F1) than introducing dsRNA into worms by feeding them dsRNA-expressing bacteria (Timmons et al. 2001). Likewise, the timing of introduction of dsRNA by methods such as feeding or soaking can have profound impacts on the phenotypes resulting from the technique (reviewed in WormBook chapter “Reverse Genetics” by Julie Ahringer, doi/10.1895/wormbook.1.47.1).

The second set of caveats to consider with regards to RNAi experimental data involve the reporting of dsRNA sequences by publication authors. There is a great heterogeneity across publications as to whether or not authors explicitly state the sequence of the dsRNA trigger used to knockdown expression (mRNA levels) of particular genes. Some authors, in the best case scenario, provide exact dsRNA sequences used and/or primer sequences used to amplify a gene for subsequent introduction into a dsRNA-expression vector. In a subset of these cases, even though primer sequences are provided, the authors state that the template used for the PCR reaction was a cDNA library and the explicit intervening sequence is not provided; this proves to be problematic particularly for genes with multiple splice variant isoforms. Other authors provide indications of the dsRNA sequence used by generally stating that clones were selected from the Ahringer or Vidal clone libraries (Kamath et al. 2003; Rual et al. 2004) without specifying exact clones used. It should be made clear that this is quite often an ambiguous reference, as there is no up-to-date and official one-to-one mapping of C. elegans genes to RNAi clones from these libraries. Often there are multiple clones that share sequence with a given gene, many clones share sequence with multiple genes (neighboring or nested genes, for example), and many gene models from which the clones were originally designed (over a decade ago) have dramatically changed, making it difficult or impossible to know what an author means by the “Ahringer clone” or “Vidal clone” for a gene. In the worst case scenario, authors completely omit any reference to the dsRNA used in their published RNAi experiments. In the ambiguous cases (or in cases where a reference to dsRNA sequence is simply omitted altogether), WB curators enter a remark for an RNAi experiment indicating this ambiguity and usually provide the canonical cDNA coding sequence (open reading frame) or the best applicable Ahringer RNAi clone, as determined by the curator.

The third set of caveats to consider with regards to RNAi experimental data involve the in silico mapping of dsRNA sequences used in the experiments to target genes predicted to be knocked down (i.e. gene-specific mRNA abundance reduced) by the technique. For the last decade, WB has employed, for lack of any other established method, a crude RNAi-target prediction algorithm whereby primary gene targets are identified (using the BLAT algorithm) as having a 90% or greater sequence identity to the genome across 100 base pairs or more and secondary targets are identified as having a 80% to 90% sequence identity to the genome across 200 base pairs or more. These criteria seem to be generally acceptable, but could suffer from significant false positives and/or false negatives in cases where there are, for example, paralogous genes that share significant homology with one another. Current mapping procedures only map to the genome (not the transcriptome, which would be ideal), and even a single base-pair overlap of the region of threshold sequence identity with a gene will identify this gene as a potential target of the RNAi experiment. Another pitfall of the traditional WB RNAi-target mapping procedure is that cDNA-derived clones, like those from the Vidal library (Rual et al. 2004), are most often lacking a complete confirmed sequence and have been dealt with by assuming the clone contains all genomic sequence between primer mappings. To avoid false positives, the mapping pipeline for Vidal clones only identifies the single best-match (using the BLAT algorithm) and ignores any other gene matches, which may result in false negatives. Recently, Thakur & Ewbank et al (Thakur et al. 2014) have developed a more sophisticated RNAi target gene identification strategy in an online tool called Clone Mapper, in which target genes are identified by the presence of a substantial number of overlapping 21-mer base pair sequences (by default) as siRNAs known to be produced from longer dsRNA trigger sequences are generally about that size (or larger). In the near future, WB intends to implement the Clone Mapper tool and present the range of likely RNAi targets predicted by the tool for any given RNAi experiment in WB. When interpreting RNAi-based phenotype annotations in WB, it is important to consider, therefore, whether or not the phenotype attributed to a gene has been done so as a bona fide primary target of an explicitly provided dsRNA sequence or, at the opposite end of the confidence spectrum, as a secondary target of a dsRNA sequence assumed (by a WB curator) to be used in the experiment. Such considerations are notably difficult, yet important, to assess when data mining WB for all genes attributed to a particular phenotype, for example.

Allele-phenotype data caveats

Allele-phenotype data in WB has unique caveats of its own. For example, many small genes, notably the 21-uRNAs on chromosome IV, may lie in the region of a genetic deletion. Phenotypes are attributed to variations, which are then subsequently mapped to genes to determine which genes ought to be attributed with the phenotype. Although a deletion may remove most or all of a larger protein coding gene, phenotypes may be mapped to any small genes that also lie in the same interval of the deletion. It would not be entirely prudent to assume that the small RNA genes do not contribute to the phenotype, so both the larger protein-coding gene AND the small RNA genes are ALL attributed to the observed phenotype. This fact is not particularly obvious when looking at, for example, the Phenotypes widget on gene pages for small RNAs. (Also, Gene Ontology (GO) annotations are often made using the “Inferred from mutant phenotype” or “IMP” evidence code, in which some biological processes or molecular functions are inferred from what phenotype has been attributed to a gene. Just as we cannot completely rule out the role of the small RNA genes in an observed phenotype, we do not currently filter out small RNAs from automated GO annotations using the IMP evidence code. Thus, small RNA genes may have acquired GO annotations that ought to be interpreted with caution.)-→Not sure this fits here, there is no full discussion of GO in this chapter, I think this should move to the GO chapter.

How you can help with these data

Contributing allele-phenotype connections

WormBase has developed a number of data submission forms, including one designed to enable community submission of allele-phenotype connections (Figure 15). This allele-phenotype data submission form is reachable from the WormBase homepage, under the “Submit Data” menu header. Submission of allele-phenotype data requires a submitter’s name and e-mail address, a publication’s paper ID (PubMed ID or D.O.I.), and allele name and a phenotype (observed or not observed). Additionally, optional information about an allele may be submitted through the “Optional” series of fields, including information about the allele’s inheritance pattern (recessive or dominant), mutation effect (loss of function, gain of function, null, etc.), degree of penetrance (% of population exhibiting the phenotype(s)), and temperature sensitivity with respect to the phenotype(s) entered into the phenotype fields. There is also a general comment field where submitter’s can write in any pertinent details about the allele-phenotype connection. A user guide is available on the WormBase Wiki and a short tutorial video is available at the WormBase HD YouTube channel.

Figure 15: Allele-phenotype data submission form

Gene Interactions

Why gene interactions are important

Interactions between genes and gene products are arguably one of the most important facets of gene activity in any organism. These interactions range from direct physical contacts between gene products to the more indirect and abstract associations between genes revealed by phenotypic analysis and gene expression studies. WB curators capture four different types of interactions between genes, gene products, DNA sequence elements, or small molecules: physical, regulatory, genetic, and predicted. Physical interactions represent direct physical contact and specific binding between two or more gene products. Currently we capture protein-protein and protein-DNA physical interactions, as well as a small number of protein-RNA physical interactions. Regulatory interactions represent direct or indirect regulation of expression level (RNA or protein) or localization of one gene product by another gene or small molecule. The evidence for regulatory interactions most often comes in the form of mutation or overexpression that results in a change of gene expression or localization of a gene product. Genetic interactions refer to unexpected phenotypes of double mutant animals and suggest a functional dependence between two genes with respect to a phenotype or biological process. Several groups have predicted genetic interactions by integrating data that is suggestive of genes interacting. For example, gene pairs that interact in Drosophila melanogaster are likely to interact in C. elegans. Gene pairs expressed in the same cells in C. elegans, that have the same phenotype and physically interact are likely to interact genetically. These predicted interactions have proven useful to prioritize experimental tests of genetic interaction.

How interactions get into WB

As with many other data types captured by WB curators, papers with physical, genetic or regulatory interactions are identified using a Support Vector Machine (SVM) algorithm (Fang et al. 2012). Papers are catalogued according to what type of interactions they likely contain and the interactions are subsequently annotated from these papers. Each interaction is minimally annotated with a reference (usually a publication), an interaction type, and two interacting components (at least one gene, usually two genes). Additionally, interactions are often described in a text summary. As each type of interaction is unique, we will discuss each separately.

Types of interaction data curated by WormBase

Physical Interactions

Recognized interactors for physical interactions include genes (as a proxy for the gene product) and DNA sequence features (i.e. genomic sequence elements including enhancers and transcription factor binding sites). In addition to capturing the basic interaction information mentioned above, WB curators also record the physical interaction detection method, chosen from a controlled vocabulary list of interaction detection methods, many of which are shared by the Biological General Repository for Interaction Datasets (BioGRID; WEBSITE URL) database (Chatr-Aryamontri et al. 2015). For protein-protein interactions the methods included are: affinity capture luminescence, affinity capture mass spectrometry (MS), affinity capture western blot, cofractionation, colocalization, copurification, fluorescence resonance energy transfer (FRET), protein fragment complementation assay, yeast two hybrid, biochemical activity assay, co-crystal structure, far western blot, and reconstituted complex. For protein-DNA interactions the methods included are: chromatin immunoprecipitation (ChIP), DNase I footprinting, electrophoretic mobility shift assay (EMSA a.k.a. “gel shift” assay), and yeast one hybrid. For protein-RNA interactions we record if it was detected using affinity capture RNA (protein affinity capture followed by sequence detection of RNA). We also try to capture experimental details including libraries screened and times found with a given library and reagents used including clones, antibodies, constructs, and transgenes. We have established a data sharing pipeline with the biological interaction database BioGRID so that nematode protein-protein physical interactions are synchronized across both databases.

Regulatory Interactions

Regulatory interactions capture the regulation of one biological entity by another biological entity, usually genes regulating other genes. The evidence may come in many different forms, the most common being the perturbation of one gene resulting in a change in a gene’s expression level or gene product’s localization. For example, a deletion of one gene may result in increased levels of mRNA for a different gene, as determined by quantitative reverse transcription polymerase chain reaction (qRT-PCR), suggesting a negative regulation of transcription by the first gene on the second. Minimally regulatory interactions capture a regulator entity, a regulated entity, and a citation. Additionally regulatory interactions are curated with a free-text summary, a regulation type (“change of expression level” or “change of localization”), a regulation level (transcriptional, post-transcriptional, or post-translational), and a regulation result (“Positive_regulate”, “Negative_regulate”, or “Does_not_regulate”) along with any relevant anatomy terms indicating where the regulation takes place and text descriptions of how subcellular localization of the regulated target may be affected. Experimental details are also curated, including relevant constructs, transgenes, antibodies, alleles, and detection techniques like in situ hybridization, western blot, northern blot, or qRT-PCR. Associations are also made to other WormBase annotations like relevant RNAi experiments and/or expression patterns. Although most regulators are nematode genes, some are chemicals, such as ethanol, tunicamycin, acrylamide, or serotonin, and some are conditions, such as heat shock, anoxia/hypoxia, high osmolarity, pathogen infection, ionizing radiation, or high temperature. An additional class of regulators are cis-regulatory elements present in or around the regulated gene to affect gene expression in some manner. These come in the form of WormBase sequence feature objects (e.g. WBsf019090), unique contiguous spans of genomic sequence at particular locations in the genome.

Genetic Interactions

Genetic interactions capture scenarios in which interesting or unexpected phenotypes arise from the combination of mutant alleles with other mutant alleles or with experimental conditions or treatments. Along with gene interactors (and entities, like alleles, that implicate a particular gene in the interaction) genetic interactions are annotated with a genetic interaction type, a phenotype, and (usually) a text description of the interaction. There are a number of different genetic interaction types that are recognized by WB. These include traditional genetic interaction types like suppression, enhancement, epistasis and synthetic interactions. Recently WB has adopted the use of new genetic interaction types including oversuppression, asynthetic interactions, phenotype bias, minimal epistasis, and maximal epistasis (paper in progress; WOULD YOU ADD SOME KIND OF DEFINITION FOR THESE TERMS? OR SOME KIND OF EXPLANATION OF WHY THESE TERMS HAVE BEEN ADOPTED AND FROM WHERE? – MORE INFORMATION WOULD BE A GOOD TEASER FOR THE PAPER COMING OUT). It is often necessary to capture the directionality of the interaction, for example when one gene’s allele suppresses the phenotype of another gene’s allele or, similarly, when one allele is epistatic to another. Recognized interactors that may implicate a gene in a genetic interaction are alleles, rearrangements (chromosomal duplications and deficiencies), and transgenes (when overexpression may genetically interact). Interactors may also be small molecules, for example, when exposure to a chemical suppresses or enhances a mutant allele’s phenotype.

Predicted Interactions

As of the writing of this WormBook chapter, there are two publications that provide all of the predicted interactions in WormBase. Wei Wei Zhong and Paul Sternberg generated a list of approximately 20,000 predicted C. elegans genetic interactions based on integration of interactome data, gene expression data, phenotype data, and existing functional annotations from the budding yeast Saccharomyces cerevisiae, the fruit fly Drosophila melanogaster, and Caenorhabditis elegans (Zhong and Sternberg, 2006). Each predicted interaction from this dataset is annotated with a Log-likelihood score and can also be viewed at the Gene Orienteer website, www.geneorienteer.org. (ADD FIRST NAMES??) Marcotte and Fraser’s group generated a list of 384,700 predicted C. elegans gene interactions by integrating gene expression, physical interaction, genetic interaction, and gene structure data from yeast, bacteria, humans, flies, and C. elegans (Lee et al. 2008). Each of these predicted interactions are similarly annotated with a Log-likelihood score and can also be viewed at the WormNet website, www.functionalnet.org/wormnet/.

How you can find and query them

The first place to look for interactions in WB is on the gene page. Each gene page has an “Interactions” widget, which summarizes all of the curated interaction data for that particular gene. The Interactions widget has two parts: a Cytoscape-enabled local interaction network viewer on top (Figure 16), and an interactions table below (Figure 17). Cytoscape is a network visualization tool with numerous available extensions and plugins, and has been implemented in WormBase to embed network visualizations into various WormBase pages (Smoot et al. 2011). In the gene page Interactions widget, the Cytoscape view of the local interaction network includes a comprehensive legend indicating the color code for the network edges (interactions) between nodes (genes and other interactors). Users can toggle the various interaction types on and off using the checkboxes within the legend. Interaction types in the legend are separated first on the basic interaction type (physical, predicted, regulatory, and genetic) and then on interaction subtypes. Note that predicted interactions are invisible by default and require the user to toggle on the “Predicted” checkbox in the legend. Genetic interactions can also be selected (or deselected) on the basis of the relevant phenotype. The local network displayed includes interactions directly involving the gene of interest as well as secondary interactions involving direct interactors of the gene of interest. The secondary interactions may be toggled on or off from view with the “Nearby interactions” checkbox. In the Cytoscape network individual nodes may be rearranged by clicking and dragging. Single clicks on nodes will open up the WB page for that object. The scroll wheel on a mouse may be used to zoom in and out of the network. Panning the network view may be achieved by clicking and holding the left mouse button in any whitespace for one second followed by dragging the mouse. Some network edges have arrows to indicate directionality of the interaction, where applicable. Directional (or non-directional) interactions may be selectively viewed using the checkboxes under the “Directions” section of the network view legend.

The interactions table provides more detail for each of the interactions for the gene of interest. The “Interactions” column in the table indicates the interacting entities and provides a link to the WB Interaction object page (a web page dedicated to that interaction object; see below). This column is followed by the “Interaction Type”, which indicates the interaction type (or subtype). The following “Effector”, and “Affected” columns indicate the identities of the interactors. In directional interactions, “Effector” refers to the upstream (or causative) interactor and “Affected” refers to the downstream (or recipient) interactor. The “Direction” column indicates whether or not the interaction is directional. The “Phenotype” column indicates the relevant phenotype for genetic interactions. The “Citations” column indicates the primary reference that provides evidence for the interaction. The search box may be used to search for attributes of an interaction, including gene names, interaction types, phenotypes, or reference information. For genes with a large number of interactions, the table will include several pages of results, which can be retrieved using the buttons at the bottom right corner of the table.

In addition to the gene page Interactions widget, interaction details may be viewed on the Interaction object page. This page may be reached through the link in the first column of the interactions table in the gene page Interactions widget. Alternatively, you may click on the magnifying glass icon of an empty search box (upper right corner of any WB page) to pull up the detailed search options page (Figure 18); select “Interaction” under the “Classes” list and either type in a gene of interest or, if you know it, the 9-digit WBInteraction ID for the interaction of interest.

Figure 16: The Cytoscape interaction network viewer in the gene page “Interactions” widget

Figure 17: The interactions table in the gene page “Interactions” widget

Fig 18 HIGHLIGHT THE MAGNIFYING GLASS AND “INTERACTION” AMONG THE OTHER CLASS NAMES?? Figure 18: The WormBase detailed search options page

3.5. Caveats to data and interpretation

References

Ahringer, J., ed. Reverse genetics (April 6, 2006), WormBook, ed. The C. elegans Research Community, WormBook, doi/10.1895/wormbook.1.47.1, http://www. wormbook.org.

Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 25, 25-9.

Benzer S. (1977) From the gene to behavior. JAMA 218, 1015-22.

Brenner S. (1974) The genetics of Caenorhabditis elegans. Genetics 77, 71-94.

Calixto A., Chelur D., Topalidou I., Chen X., and Chalfie M. (2010) Enhanced neuronal RNAi in C. elegans using SID-1. Nat Methods 7, 554-559.

Cong L., Ran F.A., Cox D., Lin S., Barretto R., Habib N., Hsu P.D., Wu X., Jiang W., Marraffini L.A., et al. (2013) Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-23.

Chatr-Aryamontri A., Breitkreutz B.J., Oughtred R., Boucher L., Heinicke S., Chen D., Stark C., Breitkreutz A., Kolas N., O'Donnell L., et al. (2015) The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43(D1), D470-478.

Cullen L.M. and Arndt GM. (2005) Genome-wide screening for gene function using RNAi in mammalian cells. Immunol Cell Biol. 83, 217-23.

de Bono M., and Bargmann C.I. (1998) Natural variation in a neuropeptide Y receptor homolog modifies social behavior and food response in C. elegans. Cell 94, 679–689

Fang R., Schindelman G., Van Auken K., Fernandes J., Chen W., Wang X., Davis P., Tuli M.A., Marygold S.J., Millburn G., et al. (2012) Automatic categorization of diverse experimental information in the bioscience literature. BMC Bioinformatics 13:16.

Fire A., Xu S., Montgomery M.K., Kostas S.A., Driver S.E., and Mello C.C. (1998) Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806–811.

Friedland A.E., Tzur Y.B., Esvelt K.M., Colaiácovo M.P., Church G.M., and Calarco J.A. (2013) Heritable genome editing in C. elegans via a CRISPR-Cas9 system. Nat Methods 10, 741-3.

Gkoutos G.V., Green E.C., Mallon A.M., Hancock J.M., and Davidson D. (2005) Using ontologies to describe mouse phenotypes. Genome Biol. 6:R8.

Kamath R.S., Fraser A.G., Dong Y., Poulin G., Durbin R., Gotta M., Kanapin A., Le Bot N., Moreno S., Sohrmann M., et al. (2003) Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421, 231-237.

Lee I., Lehner B., Crombie C., Wong W., Fraser A.G., and Marcotte E.M. (2008) A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans. Nat Genet. 40, 181-188.

Lewis E.B. (1964) Genetic control and regulation of developmental pathways. In: Role of Chromosomes in Development, M. Locke ed., New York, Academic Press, pp. 231-252.

Mello C.C. and Conte D. Jr. (2004) Revealing the world of RNA interference. Nature 431, 338-42.

Muller H. J. (1932) Further studies on the nature and causes of gene mutations. Proceedings of the 6th International Congress of Genetics, pp. 213–255.

Mungall C.J., Gkoutos G.V., Smith C.L., Haendel M.A., Lewis S.E., and Ashburner M. (2010) Integrating phenotype ontologies across multiple species. Genome Biol. 11:R2.

Rual J.F., Ceron J., Koreth J., Hao T., Nicot A.S., Hirozane-Kishikawa T., Vandenhaute J., Orkin S.H., Hill D.E., van den Heuvel S., et al. (2004) Toward improving Caenorhabditis elegans phenome mapping with an ORFeome-based RNAi library. Genome Res. 14, 2162-2168.

Schindelman G., Fernandes J.S., Bastiani C.A., Yook K., and Sternberg P.W. (2011) Worm Phenotype Ontology: integrating phenotype data within and beyond the C. elegans community. BMC Bioinformatics 12:32.

Smoot M.E., Ono K., Ruscheinski J., Wang P.L., and Ideker T. (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 27, 431-432.

Thakur N., Pujol N., Tichit L., and Ewbank J.J. (2014) Clone mapper: an online suite of tools for RNAi experiments in Caenorhabditis elegans. G3 (Bethesda) 4, 2137-2145.

Timmons L., Court D.L., and Fire A. (2001) Ingestion of bacterially expressed dsRNAs can produce specific and potent genetic interference in Caenorhabditis elegans. Gene 263, 103-112.

Zhong W. and Sternberg P.W. (2006) Genome-wide prediction of C. elegans genetic interactions. Science 311, 1481-1484.

Zhuang J.J., and Hunter C.P. (2011) Tissue specificity of Caenorhabditis elegans enhanced RNA interference mutants. Genetics 188, 235-237.

Gene function phenotype chapter WBook

Gene function curation in WormBase

Abstract

Introduction

Data types annotated with phenotypes

Alleles and other genetic variations

Overexpression of genes

RNAi and genetic engineering

How phenotypes are curated in WB

Development of the Worm Phenotype Ontology (WPO)

How you can help

How you can find and query phenotype data

Phenotypes on gene pages

Phenotypes on variation pages

Phenotypes on RNAi pages

Phenotypes on transgene pages

Phenotypes on rearrangement pages

Phenotypes on strain pages

Phenotypes on interaction pages

Phenotype term pages

Enrichment and mining tools to find discover gene function

Worm Phenotype Ontology Browser

Caveats to data and interpretation

RNAi data caveats

Allele-phenotype data caveats

How you can help with these data

Gene Interactions

Why gene interactions are important

How interactions get into WB

Types of interaction data curated by WormBase

Physical Interactions

Regulatory Interactions

Genetic Interactions

Predicted Interactions

How you can find and query them

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally