Skip to content

LangeLab/ProteoForge_Analysis

Repository files navigation

ProteoForge Manuscript Analysis

Status License Language Release Zenodo Citation
Status CC BY-NC 4.0 Language Release DOI bioRxiv

This repository contains the code, analyses, and rendered figures supporting the ProteoForge manuscript. It includes real-data benchmarks, simulation studies, and an application to a hypoxia study. This repo is the analysis snapshot.

The scripts used here (ProteoForge/) are not packaged; they are simply a collection of functions developed alongside the manuscript. Some features — especially plotting and printing helpers — were added ad hoc and are specific to this analysis. A standalone Python package implementing the ProteoForge methodology is currently under development and will be made publicly available upon stable release.

Repository Layout

Top-level folders and their purpose:

  • ProteoForge/ — core Python scripts used in analyses (parsers, processing, modelling, clustering, classifiers).
  • Benchmark/ — scripts and notebooks for benchmark analyses (R and Python).
  • NSCLC/ — notebooks, data and figures for the hypoxia/NSCLC application.
  • Simulation/ — simulation scripts, notebooks and utilities used to evaluate methods.
  • src/ — auxiliary Python library used by some scripts (utilities, plotting helpers, tests).
  • requirements.txt, setup_project.sh, setup_project.ps1, setup_env.R — environment and setup helpers.

The setup utilities ensure you have venv and renv folders created with the required dependencies. They setup the environment for both R and Python analyses to facilitate reproducibility across OSes.

Notes on data and outputs:

  • Raw and derived data/figures are not committed. Place raw inputs under the appropriate */data/input/ folders; scripts/notebooks will write to */data/ and */figures/ (see folder READMEs).
  • A snapshot of the repository with input data, and the html renders of all notebooks, is available at Zenodo: 10.5281/zenodo.17795845.

Environment Setup (Cross-Platform)

Use the provided setup scripts to configure both Python (venv) and R (renv + pak). R 4.5.0 or newer is required for the R environment.

Linux / macOS (bash):

git clone https://github.com/LangeLab/ProteoForge_Analysis.git
cd ProteoForge_Analysis
bash setup_project.sh

Windows (PowerShell):

git clone https://github.com/LangeLab/ProteoForge_Analysis.git
cd ProteoForge_Analysis
./setup_project.ps1

If R is not on PATH, install it from CRAN and rerun the setup command, or run Rscript setup_env.R after installation. To activate the Python environment later, use source .venv/bin/activate (Linux/macOS) or ./.venv/Scripts/Activate.ps1 (PowerShell).

Run Steps

Entry points for reproducing analyses and figures:

  • Notebooks: Benchmark/*.ipynb, Simulation/*.ipynb, NSCLC/*.ipynb.
  • Scripts (Python): Benchmark/04-runProteoForge.py, Simulation/04-runProteoForge.py.
  • Scripts (R): Benchmark/01-DataProcessing.R, Benchmark/02-runCOPF.R, Benchmark/03-runPeCorA.R, plus analogous scripts in Simulation/.

Each notebook/script documents its required inputs and outputs. Place raw inputs under the corresponding */data/input/ directory before running. Outputs will be written under */data/ and */figures/.

Reproducibility Notes

  • R environment: managed with renv; run via setup_project.sh/setup_project.ps1 or Rscript setup_env.R. Required R version: >= 4.5.0.
  • Python environment: requirements.txt lists dependencies; the setup scripts create .venv and install the requirements.
  • Data locations: inputs are expected under */data/input/; outputs are written to */data/ and */figures/. Large files are not tracked in git.
  • Software vs analysis: this repository is the analysis snapshot. A standalone Python package is under development and will be made publicly available upon stable release.

Citations

Please cite the manuscript and the analysis snapshot when using this work.

License

This repository is licensed under CC BY-NC 4.0: see license.

CC BY-NC 4.0

About

All the scripts and notebooks used to produce the data written in the manuscript titled "ProteoForge: An Imputation-Aware Framework for Differential Proteoform Discovery in Bottom-Up Proteomics".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors