OBAMA (Optimization-Based Analysis of Multiple Arrays) is an R package that includes an interactive Shiny application designed to assist researchers in the analysis of gene expression data.
This tool integrates three complementary optimization-based methods to support different stages of transcriptomic analysis:
- MCO (Multi-Criteria Optimization): Selects genes that show the most significant changes in expression across conditions, optimizing multiple criteria.
- MST (Minimum Spanning Tree): Constructs a network that highlights relationships between gene products based on maximum correlation, enabling the identification of interaction structures among deregulated genes.
- OGF (Ontology-Guided Filtering): Groups genes and biological terms (such as Gene Ontology: Biological Processes) into highly probable functional groups, helping interpret the biological relevance of deregulated genes.
Together, these methods allow for robust gene selection, structural analysis of gene-gene associations, and biological interpretation based on functional annotation.
OBAMA is particularly suitable for analyzing microarray or RNA-seq datasets from individual studies or performing meta-analyses across multiple datasets.
Before installing OBAMA, please ensure that the archived version of the optrees package is installed manually, as it is required to properly run the MST (Minimum Spanning Tree) analysis:
# Step 1: Download and install 'optrees'
install.packages("https://cran.r-project.org/src/contrib/Archive/optrees/optrees_1.0.tar.gz",
repos = NULL, type = "source")
# Step 2: Install OBAMA from GitHub
devtools::install_github("DeiverSuarez/OBAMA")To launch the OBAMA Shiny application:
library(OBAMA)
run_app()Once the app is running, upload your input files in .csv format containing gene expression matrices. You will be able to compare conditions, visualize deregulated genes, and construct optimized correlation networks.
Input files must be in .csv or .tsv format, with the following structure:
-
Each row represents a sample.
-
Each column represents a gene, with gene expression values.
-
The first two columns must be:
geo_accession: unique sample identifier (e.g., GSM650656).disease.state: associated biological condition, such ascontrolordisease.
The disease.state column is required to perform differential expression comparisons between groups.
| geo_accession | disease.stat | A1BG | A1CF | A2M | ... |
|---|---|---|---|---|---|
| GSM650656 | disease | 192.72 | 94.82 | 123.33 | ... |
| GSM650657 | control | 241.33 | 120.10 | 142.54 | ... |
| GSM650658 | disease | 213.99 | 130.45 | 159.91 | ... |
Examples datasets: [See OBAMA/data-raw/ folder]
The MCO module helps identify genes with significant expression changes across conditions by optimizing multiple performance criteria. You can access this feature directly from the OBAMA Shiny interface.
- Launch the Application:
library(OBAMA)
run_app()Use the top navigation menu to access the MCO functionality.
-
Number of Performance Metrics:
- Select Two or Three metrics depending on your analysis.
- Choose the desired performance metrics for each (e.g., Median, Mean).
-
Number of Datasets:
- Indicate whether you are analyzing one or multiple datasets.
-
Number of Frontiers:
- Define the number of Pareto frontiers to explore (default: 10).
- Click Browse under Gene Expression Dataset Example: GSE35974_SCZ.csv.
- Upload a
.csvfile following the required input format.
- Click the Run button to start the MCO analysis.
Navigate through the following tabs:
- SummaryData: Review summary statistics.
- Frontiers: Examine gene sets identified at different optimization frontiers.
- Frontiers-plot: Visualize the Pareto frontiers interactively.
- Visualization: Explore graphical representations of selected genes.
- Click Save My Results To download the gene set with the highest expression changes for further analysis.
The MST module identifies key correlation structures among a set of deregulated genes, constructing a minimum spanning tree that reveals highly connected nodes (genes) within the expression network.
This is especially useful to detect possible regulatory hubs or functionally linked genes based on similarity in expression profiles.
- Launch the OBAMA App:
library(OBAMA)
run_app()-
Navigate to the MST Section:
- Use the top menu bar and click on MST.
-
Upload Required Data:
- Gene Expression Data: Upload a
.csvfile containing gene expression values (Example: GSE35974_SCZ.csv). - Genes of Interest: Upload a list of genes to be included in the network. This file must contain a single column with gene names (e.g., Gene_of_interest_GSE35974_SCZ.csv).
- Gene Expression Data: Upload a
-
Run the Analysis:
- Click the Run button to compute the correlation matrix and generate the MST.
-
Explore Results:
Navigate through the available tabs:- SummaryData: Overview of input data and correlation statistics.
- MST table: Displays the edges (connections) in the MST with corresponding correlation values.
- MST diagram: An interactive network visualization showing the structure of gene-gene relationships.
-
Export Results:
- Click Save My MST Results to download the MST output for further exploration or visualization in external tools.
This package is licensed under the MIT License.
See the LICENSE.md file for more details.
If you use OBAMA in your research, please cite:
SuΓ‘rez-GΓ³mez, D. et al. OBAMA: Optimization-Based Analysis of Micro Arrays. (Manuscript in preparation).
[Add final reference once published]
Version: 1.0
Contact: deiver.suarez@upr.edu
For bug reports or feature requests, please use the GitHub Issues page.
This repository follows the FAIR principles by including:
- Clear metadata and documentation
- Example datasets and usage
- Interoperability with standard data formats (CSV)
- Reusability via open-source licensing and citation information