Immune cell deconvolution (CIBERSORT)

Run LM22-based immune cell deconvolution (CIBERSORT) from Python via rpy2, with an automated QA pass that validates and labels samples (“Excellent / Moderate / Poor”).

This runner executes an R pipeline under the hood, handles minimal preprocessing (counts→CPM via edgeR when needed), writes standard plots/CSVs, and performs a numeric-safety QA step.

Features

Input auto-detection: Treats inputs as TPM/RPKM TMM-normalized CPM (edgeR).
Metadata-aware: Optional metadata alignment by sample_id.
Artifacts:
- CIBERSORT_results.csv (fractions + metrics)
- Stacked barplots per sample (paged)
- Heatmap (samples × cell types)
- P-value histogram; Corr vs RMSE scatter
- LM22 overlap reports
- CIBERSORT_Quality_Assessment.csv with categorical labels
- RUN_SUMMARY.txt with session info
Fail-fast for missing inputs and non-numeric columns.

Requirements

Python 3.9–3.12
R ≥ 4.0 installed and on PATH
Python packages: rpy2
R packages (installed automatically if missing):
- CRAN: devtools, readr, readxl, dplyr, tibble, stringr, tools, ggplot2, pheatmap, reshape2, tidyr, ggrepel, scales, data.table
- Bioc: edgeR
- GitHub (if needed): Moonerss/CIBERSORT (installed via devtools::install_github)

The runner also looks for a local folder CIBERSORT-main and will install_local if present.

Install

Setup Instructions

Create a Python virtual environment and install dependencies using pip

python -m venv cibersort-env source cibersort-env/bin/activate # On Linux/macOS cibersort-env\Scripts\activate.bat # On Windows

pip install --upgrade pip pip install rpy2

Ensure that a compatible version of R (≥ 4.0) is installed and accessible via your system PATH. Check by running:

R --version

If using a system R outside Conda, ensure R is discoverable: R --version works in the same shell.

CLI

python Cibersort.py --counts <file> --lm22 <file> --out <dir> [options]

Required

--counts : TSV/CSV/XLS(X). First column = gene symbol (header names are auto-detected).
--lm22 : Path to LM22 signature matrix (txt/tsv).
--out : Output directory.

Optional

--meta : Metadata file with columns sample_id, condition.
--perm : CIBERSORT permutations (default: 100).
--qn : Enable quantile normalization (true/false; default false; keep false for RNA-seq/TPM).
--chunk-size : Samples per page for stacked bar plot (default: 60).
--install : Pre-install base R packages before running.
--res-path : Explicit path to CIBERSORT_results.csv for the QA step (rarely needed).
--log-level : DEBUG|INFO|WARNING|ERROR (default: INFO).

Exit codes

0 success
1 pipeline exception
2 missing input files

Usage Examples

Bash / sh (Linux/macOS)

python Cibersort.py \
  --counts 'Bulk-data/counts.csvv' \
  --meta   'Bulk-data/meta.csv' \
  --lm22   'inst/extdata/LM22.txt' \
  --out    'CIBERSORT_outputs-v' \
  --perm 100 --qn false --chunk-size 40 --install

Paths with spaces/parentheses must be quoted (or escaped) on all shells.

Windows PowerShell

python Cibersort.py `
  --counts "Bulk-data/counts.csvv" `
  --meta   "Bulk-data/meta.csv" `
  --lm22   "inst/extdata/LM22.txt" `
  --out    "CIBERSORT_outputs-v" `
  --perm 100 --qn false --chunk-size 40 --install

Windows CMD

python Cibersort.py ^
  --counts "Bulk-data/counts.csvv" ^
  --meta   "Bulk-data/meta.csv" ^
  --lm22   "inst/extdata/LM22.txt" ^
  --out    "CIBERSORT_outputs-v" ^
  --perm 100 --qn false --chunk-size 40 --install

Inputs & Format

Counts / Expression

Row = gene, Column = sample.
First column = gene symbol. Accepted headers include: Gene, gene, GeneSymbol, Symbol, ID, etc.
The runner:
- Keeps genes with expression > 1 in ≥10% of samples (≥1 sample minimum).
- Auto-detects scale:
  - Median column sum ~ 1e6 → treat as TPM/RPKM (QN=FALSE recommended).
  - Otherwise → edgeR TMM → CPM.

Metadata (optional)

Columns:
- sample_id (must match counts’ column names)
- condition (free text; used only in QC table join, currently no group plots)

LM22

Text/TSV with first column as gene and 22 cell types as columns.
We upper-case gene symbols internally for overlap accounting.

Outputs (key files)

CIBERSORT_results.csv — Deconvolution results (rows=samples). Common trailing columns:
- P-value, Correlation, RMSE, Absolute score (if provided by CIBERSORT impl)
01_stacked_bar_fractions*.png — Horizontal stacked bars of cell fractions (paged by --chunk-size)
02_heatmap_all_cell_types.png — Heatmap (samples × LM22 types)
03_pvalue_histogram.png — Per-sample P-value distribution (if present)
04_scatter_correlation_vs_RMSE.png — Fit scatter (if both fields present)
LM22_overlap_gene_values_by_sample.csv — Per-sample expression for LM22-overlap genes
LM22_overlap_gene_summary.csv — Mean expression + cell types per gene
LM22_overlap_report.txt — Coverage summary
CIBERSORT_Quality_Assessment.csv — QA summary per sample:
- Fraction sum check (0.85–1.15), discretized P/Corr/RMSE, Quality_Category
RUN_SUMMARY.txt — Inputs, counts, session info, medians

Logging

Python logs to stdout (--log-level DEBUG for verbose).
R messages are surfaced via rpy2.
Failures during R package install or CIBERSORT run are bubbled up and return exit code 1.

Python deps

RUN pip install --no-cache-dir rpy2

Copy code

WORKDIR /app COPY Cibersort.py .

Default command (print help)

CMD ["python", "Cibersort.py", "--help"]

Build & run:

docker build -t cibersort-runner .
docker run --rm -v "$PWD":/work -w /work cibersort-runner \
  python /app/Cibersort.py --help

>

Maintenance Checklist

Validate paths quoting on all OSes.
Confirm R package installs on clean hosts (--install first run).
Keep LM22 reference versioned in inst/extdata/.
CI smoke-test: run with a tiny synthetic dataset (see below) to ensure plots/CSVs render.

Tiny Synthetic Smoke Test (optional)

# create_fake.py
import numpy as np, pandas as pd
rng = np.random.default_rng(1)
genes = [f"GENE{i}" for i in range(200)]
samples = [f"S{i}" for i in range(12)]
df = pd.DataFrame(rng.poisson(5, size=(len(genes), len(samples))), index=genes, columns=samples)
df.insert(0, "GeneSymbol", genes)
df.to_csv("fake_counts.csv", index=False)
print("fake_counts.csv written")

Then:

python create_fake.py
python Cibersort.py --counts fake_counts.csv --lm22 inst/extdata/LM22.txt --out out_fake --perm 10 --qn false --install

FAQ

Why is QN default false?
For RNA-seq/TPM, quantile normalization is generally not recommended; we leave it opt-in.
Where do QA thresholds come from?
They’re pragmatic defaults: P≤0.05, Corr≥0.5, RMSE≤0.7 and fraction sum in [0.85, 1.15]. Tune in the R QA block if needed.
Can I plug other signatures?
Yes. Use any CIBERSORT-compatible signature matrix via --lm22 <path>; plots/QA still work.

One-liner to remember (Linux/macOS)

python Cibersort.py --counts 'Bulk-data/counts.csvv' \
  --meta 'Bulk-data/meta.csv' \
  --lm22 'inst/extdata/LM22.txt' \
  --out 'CIBERSORT_outputs-v' --perm 100 --qn false --chunk-size 40 --install

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Bulk-data		Bulk-data
GSE107011_ICT_PBMC_Outputs-v2		GSE107011_ICT_PBMC_Outputs-v2
inst/extdata		inst/extdata
Cibersort.py		Cibersort.py
Profiling tumor infiltrating immune cells with CIBERSORT.pdf		Profiling tumor infiltrating immune cells with CIBERSORT.pdf
README.md		README.md
README_CIBERSORT_Runner.md		README_CIBERSORT_Runner.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Immune cell deconvolution (CIBERSORT)

Features

Requirements

Install

Setup Instructions

Create a Python virtual environment and install dependencies using pip

CLI

Required

Optional

Exit codes

Usage Examples

Bash / sh (Linux/macOS)

Windows PowerShell

Windows CMD

Inputs & Format

Counts / Expression

Metadata (optional)

LM22

Outputs (key files)

Logging

Python deps

Copy code

Default command (print help)

Maintenance Checklist

Tiny Synthetic Smoke Test (optional)

FAQ

One-liner to remember (Linux/macOS)

About

Uh oh!

Releases

Packages

Languages

shari01/Immune-Deconvolution

Folders and files

Latest commit

History

Repository files navigation

Immune cell deconvolution (CIBERSORT)

Features

Requirements

Install

Setup Instructions

Create a Python virtual environment and install dependencies using pip

CLI

Required

Optional

Exit codes

Usage Examples

Bash / sh (Linux/macOS)

Windows PowerShell

Windows CMD

Inputs & Format

Counts / Expression

Metadata (optional)

LM22

Outputs (key files)

Logging

Python deps

Copy code

Default command (print help)

Maintenance Checklist

Tiny Synthetic Smoke Test (optional)

FAQ

One-liner to remember (Linux/macOS)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages