I develop interpretable machine learning frameworks that integrate multi-omic and spatial data to uncover the molecular logic of complex biological systems, with a focus on reproducibility, biological grounding, and transparent interpretation of high-dimensional genomics data.
My work focuses on developing interpretable and reproducible computational frameworks for genomics, uniting biological prior knowledge with multi-omic and spatial data. The central goal is to move beyond black-box prediction toward mechanistic understanding, building models that not only perform well but explain how genetic variation, regulatory programs, and perturbations reshape cellular states. Each framework emphasizes pathway- and network-level interpretability, cross-dataset generalization, and transparent benchmarking, establishing reusable analytical standards for single-cell and multimodal genomics. Through this approach, I aim to bridge machine learning, systems biology, and data-driven biological interpretation, supporting robust and generalizable discovery across biological contexts.
The following key projects are part of the MM-KPNN framework family, a unified effort to develop concept-bottleneck and biologically constrained models that embed prior knowledge directly into network architectures, ensuring interpretability, reproducibility, and mechanistic insight across multi-omic and spatial data.
A modular and interpretable graph framework for spatial transcriptomics in tissue microenvironments.
- Combines Graph Attention Networks (GAT) with knowledge-primed decoding
- Models cell–cell communication, immune exclusion, and stromal remodeling
- Outputs attention maps, pathway overlays, and ligand–receptor driver rankings
2. MM-KPNN
Interpretable multimodal neural network integrating scRNA-seq and scATAC-seq using biological priors.
- Decoder constrained by pathway and transcription factor nodes
- Enables mechanistic attribution of regulatory programs and cell states
- Designed for reproducible benchmarking across single-cell modalities
Pathway-bottleneck graph neural network for perturbation and drug-response prediction.
- Integrates multi-omic features with prior knowledge graphs
- Focuses on cross-dataset generalization across pharmacogenomic panels
- Provides pathway-level interpretability and reproducible evaluation
Concept-bottleneck framework for modeling drug and CRISPR perturbation responses at single-cell resolution.
- Implements pathway and TF bottlenecks for interpretability
- Measures attribution stability across perturbation conditions
- Supports counterfactual pathway analysis in single-cell datasets
A modular computational framework for the analysis of organoid systems.
- Addresses reproducibility, heterogeneity, and data integration
- Integrates RNA and protein modalities with interpretable ML
- Demonstrates end-to-end reproducibility through documented notebooks
Spatial mapping of tissue architecture using 10x Visium transcriptomics.
- Defines epithelial, immune, stromal, and proliferative regions
- Reveals spatial organization and regional heterogeneity
- Fully documented, end-to-end analytical workflow
End-to-end pipeline for structural variant discovery and annotation using PacBio long-read sequencing.
- Implements clinical annotation (ACMG/AMP) and variant filtering
- Includes functional scoring and visualization modules
- Designed for scalable deployment in HPC environments
Modular framework for rare-variant burden analysis in genomic cohorts.
- Supports SKAT, SKAT-O, and extended statistical methods
- Implements functional weighting and population correction
- Provides reproducible filtering and QC workflows
Systems biology workflow for reconstructing gene regulatory networks.
- Integrates TF–target priors with expression-based inference
- Performs network topology and modularity analysis
- Identifies functionally enriched regulatory modules
Gene co-expression analysis pipeline using WGCNA.
- Identifies expression modules and hub genes
- Evaluates biological function and module preservation
- Applies to bulk and single-cell RNA-seq datasets
Workflow for secure and efficient genomic data transfer using Globus.
- Supports HPC environments and structured data sharing
- Enables checksum validation and metadata tracking
- Designed for collaborative, reproducible research
Sally Yepes
📧 sallyepes233@gmail.com
🔗 GitHub: Sally332
🔗 Portfolio: sally332.github.io