Status |
License |
Language |
Release |
Zenodo |
Citation |
|---|---|---|---|---|---|
This repository contains the code, analyses, and rendered figures supporting the ProteoForge manuscript. It includes real-data benchmarks, simulation studies, and an application to a hypoxia study. This repo is the analysis snapshot.
The scripts used here (
ProteoForge/) are not packaged; they are simply a collection of functions developed alongside the manuscript. Some features — especially plotting and printing helpers — were added ad hoc and are specific to this analysis. A standalone Python package implementing the ProteoForge methodology is currently under development and will be made publicly available upon stable release.
Top-level folders and their purpose:
ProteoForge/— core Python scripts used in analyses (parsers, processing, modelling, clustering, classifiers).Benchmark/— scripts and notebooks for benchmark analyses (R and Python).NSCLC/— notebooks, data and figures for the hypoxia/NSCLC application.Simulation/— simulation scripts, notebooks and utilities used to evaluate methods.src/— auxiliary Python library used by some scripts (utilities, plotting helpers, tests).requirements.txt,setup_project.sh,setup_project.ps1,setup_env.R— environment and setup helpers.
The setup utilities ensure you have venv and renv folders created with the required dependencies. They setup the environment for both R and Python analyses to facilitate reproducibility across OSes.
Notes on data and outputs:
- Raw and derived data/figures are not committed. Place raw inputs under the appropriate
*/data/input/folders; scripts/notebooks will write to*/data/and*/figures/(see folder READMEs). - A snapshot of the repository with input data, and the html renders of all notebooks, is available at Zenodo: 10.5281/zenodo.17795845.
Use the provided setup scripts to configure both Python (venv) and R (renv + pak). R 4.5.0 or newer is required for the R environment.
Linux / macOS (bash):
git clone https://github.com/LangeLab/ProteoForge_Analysis.git
cd ProteoForge_Analysis
bash setup_project.shWindows (PowerShell):
git clone https://github.com/LangeLab/ProteoForge_Analysis.git
cd ProteoForge_Analysis
./setup_project.ps1If R is not on PATH, install it from CRAN and rerun the setup command, or run Rscript setup_env.R after installation. To activate the Python environment later, use source .venv/bin/activate (Linux/macOS) or ./.venv/Scripts/Activate.ps1 (PowerShell).
Entry points for reproducing analyses and figures:
- Notebooks:
Benchmark/*.ipynb,Simulation/*.ipynb,NSCLC/*.ipynb. - Scripts (Python):
Benchmark/04-runProteoForge.py,Simulation/04-runProteoForge.py. - Scripts (R):
Benchmark/01-DataProcessing.R,Benchmark/02-runCOPF.R,Benchmark/03-runPeCorA.R, plus analogous scripts inSimulation/.
Each notebook/script documents its required inputs and outputs. Place raw inputs under the corresponding */data/input/ directory before running. Outputs will be written under */data/ and */figures/.
- R environment: managed with
renv; run viasetup_project.sh/setup_project.ps1orRscript setup_env.R. Required R version:>= 4.5.0. - Python environment:
requirements.txtlists dependencies; the setup scripts create.venvand install the requirements. - Data locations: inputs are expected under
*/data/input/; outputs are written to*/data/and*/figures/. Large files are not tracked in git. - Software vs analysis: this repository is the analysis snapshot. A standalone Python package is under development and will be made publicly available upon stable release.
Please cite the manuscript and the analysis snapshot when using this work.
- Manuscript (preprint):
- ProteoForge: An Imputation-Aware Framework for Differential Proteoform Discovery in Bottom-Up Proteomics. bioRxiv. Posted December 16, 2025. (under review)
- Analysis snapshot (this repository): use the Zenodo record and select the version matching the git tag you used.
- "Snapshot of Benchmarking and Showcasing ProteoForge for Proteoform Deconvolution from Peptide Level Data. Version 1. Zenodo. 10.5281/zenodo.17795845."
This repository is licensed under CC BY-NC 4.0: see license.
