1- # pyXenium
1+ pyXenium
2+ ========
23
3- ** pyXenium** is a Python library for loading and analyzing 10x Genomics Xenium * in situ* exports.
4- It supports robust partial loading of incomplete exports and provides utilities for multi‑modal
5- Xenium runs that include both RNA and protein measurements.
4+ pyXenium is a Python library for loading and analyzing ** 10x Genomics Xenium** in‑situ outputs.
5+ It supports ** robust partial loading** of incomplete exports and provides utilities for ** multi‑modal (RNA + Protein)** runs.
66
7- > If you are already familiar with Xenium outputs, jump to:
8- > - [ Partial loading (incomplete exports)] ( #partial-loading-incomplete-exports )
9- > - [ RNA + Protein loader] ( #rna--protein-loader )
10- > - [ Gene–protein correlation] ( #gene–protein-correlation )
11- > - [ RNA/protein joint analysis] ( #rnaprotein-joint-analysis )
7+ Version: 0.1.1
128
13- ## Features
9+ ---
1410
15- - ** Partial loading of incomplete exports** — Load what is available even when the
16- ` cell_feature_matrix ` MEX is missing/partial; optional attachment of clusters and spatial centroids.
17- - ** RNA + Protein support** — Read combined cell-feature matrices, split features by type
18- (Gene Expression vs Protein Expression), and return matched cell × gene/protein matrices.
19- - ** Protein–gene correlation** — Compute Pearson/Spearman correlations between gene expression
20- and protein intensities across cells.
11+ Features
12+ --------
13+ - ** Partial loading of incomplete exports** — assemble an ` AnnData ` even when some Xenium artifacts
14+ are missing; opportunistically attaches clusters (` analysis.zarr[.zip] ` ) and spatial centroids (` cells.zarr[.zip] ` ).
15+ - ** RNA + Protein support** — read combined cell‑feature matrices from Zarr/HDF5/MEX, split features by type,
16+ and return matched cell × gene/protein data.
17+ - ** Protein–gene spatial correlation** — compute correlations between protein intensity and gene transcript
18+ density across spatial bins; export plots and CSV summaries.
19+ - ** Toy dataset included** — a minimal Xenium‑like dataset (` toy_slide ` ) to get started quickly.
2120
22- ## Installation
21+ Installation
22+ ------------
23+ The package is organized as a standard ` src/ ` layout. Until a PyPI release is available, install from source or Git:
2324
2425``` bash
25- # From PyPI (if available)
26- pip install pyXenium
27-
28- # Or install the latest from GitHub
29- pip install git+https://github.com/hutaobo/pyXenium.git
26+ # From GitHub (source)
27+ pip install " git+https://github.com/hutaobo/pyXenium.git"
3028```
3129
32- Python ≥3.9 is recommended.
30+ Requirements (typical): Python 3.9+; ` anndata ` , ` numpy ` , ` pandas ` , ` scipy ` , ` zarr ` , ` fsspec ` , ` matplotlib ` , ` scikit-learn ` , ` click ` .
31+ (Exact dependencies follow the project configuration and imports.)
3332
34- ## Quick start
33+ Quick Start
34+ -----------
3535
36- ### Partial loading (incomplete exports)
36+ ### 1) Partial loading (incomplete exports)
3737
38- ` pyXenium.io.partial_xenium_loader.load_anndata_from_partial ` tries to assemble an ` AnnData `
39- object from a Xenium export directory or HTTP(S) base. It attaches optional results when present
40- (e.g. ` analysis.zarr[.zip] ` and ` cells.zarr[.zip] ` ).
38+ Use ` pyXenium.io.partial_xenium_loader.load_anndata_from_partial(...) ` to assemble an ` AnnData ` from any available pieces.
4139
40+ ** Local files example:**
4241``` python
4342from pyXenium.io.partial_xenium_loader import load_anndata_from_partial
4443
45- # Local export directory
4644adata = load_anndata_from_partial(
47- base_dir = " /path/to/xenium_export" ,
48- analysis_name = " analysis.zarr" , # optional
49- cells_name = " cells.zarr" , # optional
45+ mex_dir = " /path/to/xenium_export/cell_feature_matrix" , # MEX triplet folder
46+ analysis_name = " /path/to/xenium_export/analysis.zarr.zip" , # optional
47+ cells_name = " /path/to/xenium_export/cells.zarr.zip" , # optional
48+ # transcripts_name="/path/to/xenium_export/transcripts.zarr.zip", # optional
5049)
51-
52- # Or remote base (files hosted under <BASE>/)
53- # adata = load_anndata_from_partial(
54- # base_url="https://example.org/xenium_run",
55- # analysis_name="analysis.zarr.zip",
56- # cells_name="cells.zarr.zip",
57- # )
58- print (adata) # cells × genes AnnData
50+ print (adata)
5951```
6052
61- ** What gets loaded** (when available):
62- - Counts from ` cell_feature_matrix/{matrix.mtx.gz, features.tsv.gz, barcodes.tsv.gz} `
63- - Clusters from ` analysis.zarr[.zip] `
64- - Spatial centroids from ` cells.zarr[.zip] `
53+ ** Remote base example:**
54+ ``` python
55+ adata = load_anndata_from_partial(
56+ base_url = " https://example.org/xenium_run" , # artifacts live under <base_url>/
57+ analysis_name = " analysis.zarr.zip" ,
58+ cells_name = " cells.zarr.zip" ,
59+ )
60+ ```
6561
66- If the MEX triplet is missing, the function still returns a valid ` AnnData ` (empty genes)
67- and attaches clusters/spatial if found — useful for inspecting partial/early exports.
62+ Behavior:
63+ - If the MEX triplet is unavailable, the function still returns a valid ` AnnData ` (empty genes) and attaches
64+ clusters/spatial information when possible.
65+ - Zarr roots are auto‑detected inside ` *.zarr.zip ` even when the root metadata sits in a subfolder.
66+
67+ ** Signature (summary):**
68+ ``` text
69+ load_anndata_from_partial(
70+ base_url: str | None = None,
71+ analysis_name: str | None = None,
72+ cells_name: str | None = None,
73+ transcripts_name: str | None = None,
74+ mex_dir: str | None = None,
75+ mex_matrix_name: str = "matrix.mtx.gz",
76+ mex_features_name: str = "features.tsv.gz",
77+ mex_barcodes_name: str = "barcodes.tsv.gz",
78+ build_counts_if_missing: bool = True,
79+ ) -> anndata.AnnData
80+ ```
6881
69- ### RNA + Protein loader
82+ ### 2) RNA + Protein loader
7083
71- Use the dedicated loader in ` pyXenium.io.xenium_gene_protein_loader ` to read a Xenium export
72- that includes protein measurements. It separates features by type and aligns cells across modalities.
84+ Use ` pyXenium.io.xenium_gene_protein_loader.load_xenium_gene_protein(...) ` to load Xenium exports with protein measurements.
7385
7486``` python
7587from pyXenium.io.xenium_gene_protein_loader import load_xenium_gene_protein
7688
7789adata = load_xenium_gene_protein(
78- base_path = " /mnt/taobo.hu/long/10X_datasets/Xenium/Xenium_Kidney/Xenium_V1_Human_Kidney_FFPE_Protein"
90+ base_path = " /path/to/xenium_export" ,
91+ prefer = " auto" , # auto | zarr | h5 | mex
7992)
93+ # adata.X: RNA counts (CSR); adata.layers["rna"] may hold RNA counts explicitly
94+ # adata.obsm["protein"]: DataFrame of protein intensities
95+ # adata.obsm["spatial"]: cell centroids when available
8096```
8197
8298Notes:
83- - The loader expects a combined MEX under ` cell_feature_matrix/ ` where the 3rd column in
84- ` features.tsv.gz ` indicates the feature type (e.g., ` "Gene Expression" ` , ` "Protein Expression" ` ).
85- - Invalid/control entries (e.g., blank/unassigned codewords) are filtered by default.
86- - Both matrices share ** identical cell order** , enabling 1:1 comparisons across modalities.
99+ - Supported matrix formats: Zarr (` cell_feature_matrix.zarr/ ` or ` cell_feature_matrix/ ` ), HDF5 (` cell_feature_matrix.h5 ` ), or MEX (` matrix.mtx.gz ` triplet).
100+ - Feature types are split using the 3rd column of ` features.tsv.gz ` (e.g., "Gene Expression", "Protein Expression").
101+ - Optionally attaches centroids/boundaries into ` adata.obsm["spatial"] ` and ` adata.uns ` .
102+ - If present, clustering results at ` analysis/clustering/gene_expression_graphclust/clusters.csv ` are merged into ` adata.obs["cluster"] ` by default.
103+
104+ ** Signature (summary):**
105+ ``` text
106+ load_xenium_gene_protein(
107+ base_path: str,
108+ *,
109+ prefer: str = "auto", # "auto" | "zarr" | "h5" | "mex"
110+ mex_dirname: str = "cell_feature_matrix",
111+ mex_matrix_name: str = "matrix.mtx.gz",
112+ mex_features_name: str = "features.tsv.gz",
113+ mex_barcodes_name: str = "barcodes.tsv.gz",
114+ cells_csv: str = "cells.csv.gz",
115+ cells_parquet: str | None = None,
116+ read_morphology: bool = False,
117+ attach_boundaries: bool = True,
118+ clusters_relpath: str | None = "analysis/clustering/gene_expression_graphclust/clusters.csv",
119+ cluster_column_name: str = "cluster",
120+ ) -> anndata.AnnData
121+ ```
87122
88- ### Gene–protein correlation
123+ ### 3) Protein–gene spatial correlation
89124
90- Compute correlations between gene and protein across cells.
125+ ` pyXenium.analysis.protein_gene_correlation.protein_gene_correlation(...) ` computes Pearson correlations between
126+ ** protein average intensity** and ** gene transcript density** across spatial bins; it saves per‑pair figures and CSVs,
127+ plus a summary CSV.
91128
92129``` python
93- BASE = " /mnt/taobo.hu/long/10X_datasets/Xenium/Xenium_Kidney/Xenium_V1_Human_Kidney_FFPE_Protein"
94- pairs = [(" CD3E" , " CD3E" ), (" E-Cadherin" , " CDH1" )] # (protein, gene)
95-
96130from pyXenium.analysis.protein_gene_correlation import protein_gene_correlation
131+
132+ pairs = [(" CD3E" , " CD3E" ), (" E-Cadherin" , " CDH1" )] # (protein, gene)
97133summary = protein_gene_correlation(
98134 adata = adata,
99- transcripts_zarr_path = BASE + " /transcripts.zarr.zip" ,
135+ transcripts_zarr_path = " /path/to /transcripts.zarr.zip" ,
100136 pairs = pairs,
101137 output_dir = " ./protein_gene_corr" ,
102- grid_size = (50 , 50 ), # 可自定义网格
103- pixel_size_um = 0.2125 , # Xenium 常见像素尺寸
138+ grid_size = (50 , 50 ), # μm per bin (used if grid_counts is None)
139+ pixel_size_um = 0.2125 ,
104140 qv_threshold = 20 ,
105- overwrite = False
141+ overwrite = False ,
142+ auto_detect_cell_units = True ,
106143)
107- print (summary)
144+ print (summary.head() )
108145```
109146
110- ### RNA/protein joint analysis
147+ ** Signature (summary):**
148+ ``` text
149+ protein_gene_correlation(
150+ adata,
151+ transcripts_zarr_path,
152+ pairs,
153+ output_dir,
154+ grid_size=(50, 50),
155+ grid_counts=(50, 50),
156+ pixel_size_um=0.2125,
157+ qv_threshold=20,
158+ overwrite=False,
159+ auto_detect_cell_units=True,
160+ ) -> pandas.DataFrame
161+ ```
162+
163+ ### 4) RNA/protein joint analysis
111164
112- Cluster cells using RNA expression, then explain within-cluster protein
113- heterogeneity by training neural network classifiers on the RNA latent space.
165+ Train small classifiers on the RNA latent space to explain within‑cluster protein heterogeneity:
114166
115167``` python
116168from pyXenium.analysis import rna_protein_cluster_analysis
@@ -123,33 +175,73 @@ summary, models = rna_protein_cluster_analysis(
123175 min_cells_per_group = 30 ,
124176 hidden_layer_sizes = (128 , 64 ),
125177)
126-
127- # Inspect metrics for the first few cluster × protein combinations
128178print (summary.head())
179+ ```
129180
130- # Retrieve the fitted model for a specific cluster and protein
131- podocin_model = models[" cluster_3" ][" Podocin" ]
132- print (podocin_model.test_accuracy)
181+ ** Signature (summary):**
182+ ``` text
183+ rna_protein_cluster_analysis(
184+ adata: anndata.AnnData,
185+ *,
186+ n_clusters: int = 12,
187+ n_pcs: int = 30,
188+ cluster_key: str = "rna_cluster",
189+ random_state: int | None = 0,
190+ target_sum: float = 1e4,
191+ min_cells_per_cluster: int = 50,
192+ min_cells_per_group: int = 20,
193+ protein_split_method: str = "median",
194+ protein_quantile: float = 0.75,
195+ test_size: float = 0.2,
196+ hidden_layer_sizes: tuple[int, ...] = (64, 32),
197+ max_iter: int = 200,
198+ early_stopping: bool = True,
199+ ) -> tuple[pandas.DataFrame, dict]
133200```
134201
135- ## Data format expectations
202+ Command‑line
203+ ------------
136204
137- - ** Cell-feature matrix (MEX)** under ` cell_feature_matrix/ ` :
138- - ` matrix.mtx.gz ` : sparse counts/intensities
139- - ` features.tsv.gz ` : 3 columns: ` id ` , ` name ` , ` feature_type `
140- - ` barcodes.tsv.gz ` : cell barcodes (one per row)
141- - ** Optional** : ` analysis.zarr[.zip] ` (clusters), ` cells.zarr[.zip] ` (spatial centroids)
205+ A small CLI is provided via ` python -m pyXenium ` (requires ` click ` ).
142206
143- ## API reference (summary)
207+ ``` bash
208+ # Print a quick sanity check on the toy dataset
209+ python -m pyXenium demo
144210
145- - ` pyXenium.io.partial_xenium_loader.load_anndata_from_partial(base_dir=None, base_url=None, mex_dir=None, analysis_name=None, cells_name=None) `
146- - ` pyXenium.io.xenium_gene_protein_loader.load_gene_protein(base_dir, mex_dir=None, drop_controls=True) `
147- - ` pyXenium.analysis.protein_gene_correlation.compute(gene_expr, protein_expr, method='pearson') `
211+ # Fetch a toy dataset to a cache directory
212+ python -m pyXenium datasets --name toy_slide --dest ~ /.cache/pyXenium
213+ ```
214+
215+ Data layout expectations
216+ ------------------------
217+ - ** cell_feature_matrix/**
218+ ` matrix.mtx.gz ` , ` features.tsv.gz ` (≥3 columns: id, name, feature_type), ` barcodes.tsv.gz `
219+ - Optional: ` analysis.zarr[.zip] ` (clusters), ` cells.zarr[.zip] ` (spatial centroids)
220+ - ` transcripts.zarr[.zip] ` for spatial transcript coordinates used in correlation analyses.
221+
222+ Minimal API reference (index)
223+ -----------------------------
224+ - ` pyXenium.io.partial_xenium_loader.load_anndata_from_partial(...) `
225+ - ` pyXenium.io.xenium_gene_protein_loader.load_xenium_gene_protein(...) `
226+ - ` pyXenium.analysis.protein_gene_correlation.protein_gene_correlation(...) `
227+ - ` pyXenium.analysis.rna_protein_cluster_analysis.rna_protein_cluster_analysis(...) `
148228
149- ## Contributing
229+ Example data
230+ ------------
231+ The package ships with a tiny Xenium‑like toy dataset. Programmatic access:
150232
151- Issues and pull requests are welcome. Please include minimal examples and tests where possible.
233+ ``` python
234+ from pyXenium.io.io import load_toy
235+ z = load_toy()
236+ cells = z[" cells" ] # zarr group
237+ transcripts = z[" transcripts" ]
238+ analysis = z[" analysis" ]
239+ ```
152240
153- ## License
241+ Citations
242+ ---------
243+ If this toolkit helps your work, please cite the project and the 10x Genomics Xenium platform as appropriate.
154244
155- MIT. See ` LICENSE ` .
245+ License
246+ -------
247+ All rights reserved by the author.
0 commit comments