Skip to content

Python-based CIBERSORT runner for immune cell deconvolution using LM22, with automated QC, plotting, and seamless R integration.

Notifications You must be signed in to change notification settings

shari01/Immune-Deconvolution

Repository files navigation

Immune cell deconvolution (CIBERSORT)

Run LM22-based immune cell deconvolution (CIBERSORT) from Python via rpy2, with an automated QA pass that validates and labels samples (“Excellent / Moderate / Poor”).

This runner executes an R pipeline under the hood, handles minimal preprocessing (counts→CPM via edgeR when needed), writes standard plots/CSVs, and performs a numeric-safety QA step.


Features

  • Input auto-detection: Treats inputs as TPM/RPKM TMM-normalized CPM (edgeR).
  • Metadata-aware: Optional metadata alignment by sample_id.
  • Artifacts:
    • CIBERSORT_results.csv (fractions + metrics)
    • Stacked barplots per sample (paged)
    • Heatmap (samples × cell types)
    • P-value histogram; Corr vs RMSE scatter
    • LM22 overlap reports
    • CIBERSORT_Quality_Assessment.csv with categorical labels
    • RUN_SUMMARY.txt with session info
  • Fail-fast for missing inputs and non-numeric columns.

Requirements

  • Python 3.9–3.12
  • R ≥ 4.0 installed and on PATH
  • Python packages: rpy2
  • R packages (installed automatically if missing):
    • CRAN: devtools, readr, readxl, dplyr, tibble, stringr, tools, ggplot2, pheatmap, reshape2, tidyr, ggrepel, scales, data.table
    • Bioc: edgeR
    • GitHub (if needed): Moonerss/CIBERSORT (installed via devtools::install_github)

The runner also looks for a local folder CIBERSORT-main and will install_local if present.


Install


Setup Instructions

Create a Python virtual environment and install dependencies using pip

python -m venv cibersort-env
source cibersort-env/bin/activate   # On Linux/macOS
cibersort-env\Scripts\activate.bat  # On Windows

pip install --upgrade pip pip install rpy2

Ensure that a compatible version of R (≥ 4.0) is installed and accessible via your system PATH. Check by running:

R --version

If using a system R outside Conda, ensure R is discoverable: R --version works in the same shell.


CLI

python Cibersort.py --counts <file> --lm22 <file> --out <dir> [options]

Required

  • --counts : TSV/CSV/XLS(X). First column = gene symbol (header names are auto-detected).
  • --lm22   : Path to LM22 signature matrix (txt/tsv).
  • --out    : Output directory.

Optional

  • --meta         : Metadata file with columns sample_id, condition.
  • --perm         : CIBERSORT permutations (default: 100).
  • --qn           : Enable quantile normalization (true/false; default false; keep false for RNA-seq/TPM).
  • --chunk-size   : Samples per page for stacked bar plot (default: 60).
  • --install      : Pre-install base R packages before running.
  • --res-path     : Explicit path to CIBERSORT_results.csv for the QA step (rarely needed).
  • --log-level    : DEBUG|INFO|WARNING|ERROR (default: INFO).

Exit codes

  • 0 success
  • 1 pipeline exception
  • 2 missing input files

Usage Examples

Bash / sh (Linux/macOS)

python Cibersort.py \
  --counts 'Bulk-data/counts.csvv' \
  --meta   'Bulk-data/meta.csv' \
  --lm22   'inst/extdata/LM22.txt' \
  --out    'CIBERSORT_outputs-v' \
  --perm 100 --qn false --chunk-size 40 --install

Paths with spaces/parentheses must be quoted (or escaped) on all shells.

Windows PowerShell

python Cibersort.py `
  --counts "Bulk-data/counts.csvv" `
  --meta   "Bulk-data/meta.csv" `
  --lm22   "inst/extdata/LM22.txt" `
  --out    "CIBERSORT_outputs-v" `
  --perm 100 --qn false --chunk-size 40 --install

Windows CMD

python Cibersort.py ^
  --counts "Bulk-data/counts.csvv" ^
  --meta   "Bulk-data/meta.csv" ^
  --lm22   "inst/extdata/LM22.txt" ^
  --out    "CIBERSORT_outputs-v" ^
  --perm 100 --qn false --chunk-size 40 --install

Inputs & Format

Counts / Expression

  • Row = gene, Column = sample.
  • First column = gene symbol. Accepted headers include: Gene, gene, GeneSymbol, Symbol, ID, etc.
  • The runner:  
         
    • Keeps genes with expression > 1 in ≥10% of samples (≥1 sample minimum).
    •    
    • Auto-detects scale:      
               
      • Median column sum ~ 1e6 → treat as TPM/RPKM (QN=FALSE recommended).
      •        
      • Otherwise → edgeR TMM → CPM.
      •      
         
    •  

Metadata (optional)

  • Columns:    
         
    • sample_id (must match counts’ column names)
    •    
    • condition (free text; used only in QC table join, currently no group plots)
    •  

LM22

  • Text/TSV with first column as gene and 22 cell types as columns.
  • We upper-case gene symbols internally for overlap accounting.

Outputs (key files)

  • CIBERSORT_results.csv — Deconvolution results (rows=samples). Common trailing columns:  
         
    • P-value, Correlation, RMSE, Absolute score (if provided by CIBERSORT impl)
    •  
  • 01_stacked_bar_fractions*.png — Horizontal stacked bars of cell fractions (paged by --chunk-size)
  • 02_heatmap_all_cell_types.png — Heatmap (samples × LM22 types)
  • 03_pvalue_histogram.png — Per-sample P-value distribution (if present)
  • 04_scatter_correlation_vs_RMSE.png — Fit scatter (if both fields present)
  • LM22_overlap_gene_values_by_sample.csv — Per-sample expression for LM22-overlap genes
  • LM22_overlap_gene_summary.csv — Mean expression + cell types per gene
  • LM22_overlap_report.txt — Coverage summary
  • CIBERSORT_Quality_Assessment.csv — QA summary per sample:  
         
    • Fraction sum check (0.85–1.15), discretized P/Corr/RMSE, Quality_Category
    •  
  • RUN_SUMMARY.txt — Inputs, counts, session info, medians

Logging

  • Python logs to stdout (--log-level DEBUG for verbose).
  • R messages are surfaced via rpy2.
  • Failures during R package install or CIBERSORT run are bubbled up and return exit code 1.

Python deps

RUN pip install --no-cache-dir rpy2

Copy code

WORKDIR /app COPY Cibersort.py .

Default command (print help)

CMD ["python", "Cibersort.py", "--help"]

Build & run:

docker build -t cibersort-runner .
docker run --rm -v "$PWD":/work -w /work cibersort-runner \
  python /app/Cibersort.py --help

>

Maintenance Checklist

  • Validate paths quoting on all OSes.
  • Confirm R package installs on clean hosts (--install first run).
  • Keep LM22 reference versioned in inst/extdata/.
  • CI smoke-test: run with a tiny synthetic dataset (see below) to ensure plots/CSVs render.

Tiny Synthetic Smoke Test (optional)

# create_fake.py
import numpy as np, pandas as pd
rng = np.random.default_rng(1)
genes = [f"GENE{i}" for i in range(200)]
samples = [f"S{i}" for i in range(12)]
df = pd.DataFrame(rng.poisson(5, size=(len(genes), len(samples))), index=genes, columns=samples)
df.insert(0, "GeneSymbol", genes)
df.to_csv("fake_counts.csv", index=False)
print("fake_counts.csv written")

Then:

python create_fake.py
python Cibersort.py --counts fake_counts.csv --lm22 inst/extdata/LM22.txt --out out_fake --perm 10 --qn false --install

FAQ

  • Why is QN default false?
      For RNA-seq/TPM, quantile normalization is generally not recommended; we leave it opt-in.
  • Where do QA thresholds come from?
      They’re pragmatic defaults: P≤0.05, Corr≥0.5, RMSE≤0.7 and fraction sum in [0.85, 1.15]. Tune in the R QA block if needed.
  • Can I plug other signatures?
      Yes. Use any CIBERSORT-compatible signature matrix via --lm22 <path>; plots/QA still work.

One-liner to remember (Linux/macOS)

python Cibersort.py --counts 'Bulk-data/counts.csvv' \
  --meta 'Bulk-data/meta.csv' \
  --lm22 'inst/extdata/LM22.txt' \
  --out 'CIBERSORT_outputs-v' --perm 100 --qn false --chunk-size 40 --install

About

Python-based CIBERSORT runner for immune cell deconvolution using LM22, with automated QC, plotting, and seamless R integration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages