PhosPy is a small Python library for selected phosphoproteomics workflows inspired by the R PhosR package.
Use it when you want to:
- preprocess total and phospho tables
- analyse kinase activity from a
predMat - run the native Python kinase workflow
It is intentionally narrow. It does not aim to reproduce all of PhosR.
PhosPy supports Python 3.10 and newer.
pip install phospyThe file paths below use examples/data/..., so they assume a repository checkout. If you installed from PyPI, use your own input-file paths instead.
Use PhosphoDataset when you want validated inputs and the standard preprocessing flow.
from phospy import CoreOutputWriter, PhosphoDataset
dataset = PhosphoDataset.from_files(
"examples/data/total.tsv",
"examples/data/phospho.tsv",
phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)
writer = CoreOutputWriter()
writer.write(core, outdir="examples/output", format="csv")
site_matrix = core.site_matrix.matrix
corrected = core.phospho_correcteddataset.preprocessing.run(...) returns a CoreProcessingResult with:
total_uniquetotal_filteredphospho_filteredphospho_correctedsite_matrix
Use CoreOutputWriter when you want to write the core outputs to disk.
Use KinaseActivityAnalyzer when you already have a phosphosite matrix and a predMat.
from phospy import KinaseActivityAnalyzer, PhosphoDataset
dataset = PhosphoDataset.from_files(
"examples/data/total.tsv",
"examples/data/phospho.tsv",
phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)
analyzer = KinaseActivityAnalyzer()
result = analyzer.load_and_analyze(
pred_mat_path="examples/data/predMat.csv",
phospho_matrix=core.site_matrix.matrix,
threshold=0.6,
min_substrates=1,
top_n_substrates=1,
)
analyzer.write_outputs(result, outdir="examples/output")
ksea_scores = result.ksea_scores
target_counts = result.target_countsThe bundled example uses min_substrates=1 and top_n_substrates=1 because the example data is very small.
Use PhosRPipeline when you want file loading, preprocessing, optional kinase analysis, and output writing in one call.
from phospy import PhosRPipeline
pipeline = PhosRPipeline.from_files(
total_path="examples/data/total.tsv",
phospho_path="examples/data/phospho.tsv",
pred_mat_path="examples/data/predMat.csv",
phospho_encoding="utf-16le",
max_unmatched_fraction=0.1,
)
outputs = pipeline.run(outdir="examples/output")This writes the core CSV outputs and, when pred_mat_path is provided, the downstream kinase-analysis tables as well.
A pipeline run also writes run_manifest.json.
Use KinaseWorkflow for the native end-to-end prediction workflow.
A complete runnable example lives in examples/native_workflow_demo.py.
From a repository checkout, run:
make native-workflow-demoFor file-based workflows:
- total input is read as TSV
- phospho input is read as TSV
predMatis read as CSV, using the first column as the phosphosite index
For the default schema, the expected columns are documented in docs/api.md.
After installation, you can run the CLI on your own files.
phospy \
--total examples/data/total.tsv \
--phospho examples/data/phospho.tsv \
--pred-mat examples/data/predMat.csv \
--phospho-encoding utf-16le \
--max-unmatched-fraction 0.1 \
--outdir examples/outputThe CLI covers the file-based preprocessing path and optional predMat analysis.
docs/api.mdfor method signatures, parameters, validation, and examplesdocs/validation.mdfor the validation quick guidedocs/parity.mdfor parity to the RPhosRpackagedocs/fixtures.mdfor fixture and trace layoutCONTRIBUTING.mdfor local development and test commands