Skip to content

Commit 588a124

Browse files
committed
Improve packaging and document validated 10x dataset
1 parent 7b1c807 commit 588a124

17 files changed

Lines changed: 263 additions & 28 deletions

.github/workflows/test.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,7 @@ jobs:
88
- uses: actions/setup-python@v5
99
with:
1010
python-version: "3.11"
11-
- run: pip install -U "pyXenium>=0.1.0" pytest
12-
- run: pytest
11+
- run: python -m pip install -U pip
12+
- run: pip install -e ".[dev]"
13+
- run: pytest -q
14+
- run: python -m build

LICENSE

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,43 @@
1-
All rights reserved by the authors.
1+
pyXenium Non-Commercial License
22

3-
Copyright (c) 2025
3+
Copyright (c) 2025 Taobo Hu. All rights reserved.
4+
5+
This software and associated documentation files (the "Software") are
6+
proprietary and are licensed, not sold.
7+
8+
Permission is granted to use, reproduce, modify, and redistribute the Software
9+
solely for non-commercial purposes, subject to the following conditions:
10+
11+
1. You must retain this license text, copyright notice, and all existing
12+
attribution notices in any copy of the Software or substantial portion of
13+
the Software.
14+
2. Any modified version that you share must be clearly marked as modified.
15+
3. You may not use the Software, or any derivative work of the Software, for
16+
any commercial purpose without prior written permission from Taobo Hu.
17+
4. You may not sublicense or impose terms that expand the permissions granted
18+
by this license.
19+
5. No trademark, patent, or other intellectual property rights are granted
20+
except for the limited copyright license expressly stated here.
21+
22+
For purposes of this license, "commercial purpose" includes any use that is
23+
primarily intended for or directed toward commercial advantage or monetary
24+
compensation. Commercial purpose includes, without limitation:
25+
26+
- selling, licensing, sublicensing, or distributing the Software for a fee;
27+
- providing the Software, or a service substantially based on the Software, to
28+
third parties for a fee or other commercial benefit;
29+
- using the Software to operate, support, or develop a product or service that
30+
is sold, licensed, hosted, or otherwise commercialized;
31+
- internal use by or for a for-profit entity in connection with revenue-
32+
generating activity, client work, consulting, or managed services.
33+
34+
If you need commercial rights, you must obtain prior written permission from
35+
the copyright holder.
36+
37+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
38+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
39+
FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NON-INFRINGEMENT. IN NO EVENT
40+
SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM, DAMAGES,
41+
OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE,
42+
ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
43+
DEALINGS IN THE SOFTWARE.

README.md

Lines changed: 34 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,6 @@ pyXenium
44
pyXenium is a Python library for loading and analyzing **10x Genomics Xenium** in‑situ outputs.
55
It supports **robust partial loading** of incomplete exports and provides utilities for **multi‑modal (RNA + Protein)** runs.
66

7-
Version: 0.1.1
8-
97
---
108

119
Features
@@ -20,16 +18,39 @@ Features
2018

2119
Installation
2220
------------
23-
The package is organized as a standard `src/` layout. Until a PyPI release is available, install from source or Git:
21+
Install from PyPI or directly from GitHub:
2422

2523
```bash
24+
# From PyPI
25+
pip install pyXenium
26+
2627
# From GitHub (source)
2728
pip install "git+https://github.com/hutaobo/pyXenium.git"
2829
```
2930

3031
Requirements (typical): Python 3.9+; `anndata`, `numpy`, `pandas`, `scipy`, `zarr`, `fsspec`, `matplotlib`, `scikit-learn`, `click`.
3132
(Exact dependencies follow the project configuration and imports.)
3233

34+
Validated Public Dataset
35+
------------------------
36+
pyXenium has been smoke-tested against the official 10x Genomics dataset
37+
`Xenium In Situ Gene and Protein Expression data for FFPE Human Renal Cell Carcinoma`:
38+
39+
- Source page: https://www.10xgenomics.com/datasets/xenium-protein-ffpe-human-renal-carcinoma
40+
- Provider: 10x Genomics
41+
- Modality: Xenium RNA + Protein
42+
- Software: Xenium Onboard Analysis 4.0.0
43+
- Upstream data license: CC BY 4.0
44+
45+
Validation summary from a local download of the public bundle:
46+
47+
- `load_xenium_gene_protein(..., prefer="auto")` loaded the Zarr-backed dataset successfully.
48+
- `load_xenium_gene_protein(..., prefer="h5")` loaded the HDF5-backed dataset successfully.
49+
- The validated bundle produced an `AnnData` with `465545` cells, `405` RNA features,
50+
`27` protein markers, spatial centroids in `adata.obsm["spatial"]`, and merged cluster labels in `adata.obs["cluster"]`.
51+
- In the downloaded bundle used for validation, `metrics_summary.csv` reports `num_cells_detected=465545`,
52+
and pyXenium reproduced that value from both supported matrix backends.
53+
3354
Quick Start
3455
-----------
3556

@@ -202,14 +223,17 @@ rna_protein_cluster_analysis(
202223
Command‑line
203224
------------
204225

205-
A small CLI is provided via `python -m pyXenium` (requires `click`).
226+
A small CLI is provided via `python -m pyXenium` or the installed `pyxenium` command.
206227

207228
```bash
208229
# Print a quick sanity check on the toy dataset
209230
python -m pyXenium demo
210231

211232
# Fetch a toy dataset to a cache directory
212233
python -m pyXenium datasets --name toy_slide --dest ~/.cache/pyXenium
234+
235+
# Equivalent console script
236+
pyxenium demo
213237
```
214238

215239
Data layout expectations
@@ -244,4 +268,9 @@ If this toolkit helps your work, please cite the project and the 10x Genomics Xe
244268

245269
License
246270
-------
247-
All rights reserved by the author.
271+
Copyright (c) 2025 Taobo Hu. All rights reserved.
272+
273+
This project is source-available, not open source. You may use, modify, and
274+
redistribute it only for non-commercial purposes under the terms of the
275+
[LICENSE](LICENSE) file. Commercial use requires prior written permission from
276+
the copyright holder.

pyproject.toml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ version = "0.1.0"
88
description = "A toy Python package for analyzing 10x Xenium data."
99
readme = "README.md"
1010
requires-python = ">=3.8"
11-
license = { text = "MIT" }
11+
license = "LicenseRef-Proprietary-NonCommercial"
12+
license-files = ["LICENSE"]
1213
authors = [{ name = "Taobo Hu" }]
1314

1415
# 运行时依赖(按你项目真实需要填写;示例)
@@ -22,8 +23,12 @@ dependencies = [
2223
"fsspec>=2024.6.0",
2324
"requests>=2.31",
2425
"aiohttp",
26+
"click>=8.1",
2527
]
2628

29+
[project.scripts]
30+
pyxenium = "pyXenium.__main__:main"
31+
2732
[project.optional-dependencies]
2833
# ★ 新增:CI 里用到的 .[dev](pytest、构建/发布、文档等常见开发依赖)
2934
dev = [
@@ -38,6 +43,13 @@ dev = [
3843
[tool.setuptools]
3944
# ★ 明确 src 布局
4045
package-dir = {"" = "src"}
46+
include-package-data = true
47+
48+
[tool.setuptools.package-data]
49+
pyXenium = [
50+
"config/*.yaml",
51+
"datasets/toy_slide/*.zip",
52+
]
4153

4254
[tool.setuptools.packages.find]
4355
where = ["src"]

src/pyXenium/__init__.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
from ._version import __version__
2+
from .analysis import protein_gene_correlation
3+
from .datasets import PUBLIC_DATASET_SOURCES, get_public_dataset_sources
24
from .io.partial_xenium_loader import load_anndata_from_partial
35
from .io.xenium_gene_protein_loader import load_xenium_gene_protein
4-
from .analysis import protein_gene_correlation
56

6-
# src/pyXenium/__init__.py
77
__all__ = [
8-
*globals().get("__all__", []),
98
"__version__",
9+
"PUBLIC_DATASET_SOURCES",
10+
"get_public_dataset_sources",
1011
"load_xenium_gene_protein",
1112
"load_anndata_from_partial",
1213
"protein_gene_correlation",

src/pyXenium/__main__.py

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,46 @@
1-
import shutil
21
from pathlib import Path
2+
33
import click
4-
from src.pyXenium.io.io import load_toy
4+
5+
from .io.io import copy_bundled_dataset, load_toy
6+
57

68
@click.group()
79
def app():
810
"""pyXenium: Xenium toolkit (toy data included)"""
911

12+
1013
@app.command()
1114
def demo():
1215
ds = load_toy()
1316
click.echo(f"Loaded groups: {list(ds)}")
1417

18+
1519
@app.command()
1620
@click.option("--name", default="toy_slide", show_default=True)
1721
@click.option("--url", default=None, help="Optional URL to download a dataset archive")
18-
@click.option("--dest", default=str(Path.home()/".cache"/"pyXenium"), show_default=True)
22+
@click.option("--dest", default=str(Path.home() / ".cache" / "pyXenium"), show_default=True)
1923
def datasets(name, url, dest):
2024
"""Fetch example datasets to a local cache."""
21-
cache = Path(dest); cache.mkdir(parents=True, exist_ok=True)
22-
target = cache / name
25+
cache = Path(dest)
26+
cache.mkdir(parents=True, exist_ok=True)
2327
if url:
2428
import urllib.request
29+
30+
target = cache / name
2531
urllib.request.urlretrieve(url, str(target))
2632
click.echo(f"Downloaded to {target}")
2733
else:
28-
from importlib import resources
29-
base = resources.files("pyXenium.datasets.toy_slide")
30-
target.mkdir(parents=True, exist_ok=True)
31-
for fn in ["cells.zarr.zip", "transcripts.zarr.zip", "analysis.zarr.zip"]:
32-
shutil.copyfile(base/fn, target/fn)
34+
try:
35+
target = copy_bundled_dataset(name=name, dest=cache)
36+
except FileNotFoundError as exc:
37+
raise click.ClickException(str(exc)) from exc
3338
click.echo(f"Copied bundled toy dataset to {target}")
3439

35-
if __name__ == "__main__":
40+
41+
def main():
3642
app()
43+
44+
45+
if __name__ == "__main__":
46+
main()

src/pyXenium/analysis/protein_microenvironment.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
hundreds of thousands of cells. The intra-cluster adjacency (CSR) is only used for Moran's I on a subset.
2525
2626
Author: (c) 2025
27-
License: All rights reserved.
27+
License: Proprietary; non-commercial use only.
2828
"""
2929

3030
from __future__ import annotations

src/pyXenium/datasets/__init__.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
"""Bundled example datasets and curated public source metadata shipped with pyXenium."""
2+
3+
from .catalog import (
4+
PUBLIC_DATASET_SOURCES,
5+
RENAL_FFPE_PROTEIN_10X_DATASET,
6+
PublicDatasetSource,
7+
get_public_dataset_sources,
8+
)
9+
10+
__all__ = [
11+
"PublicDatasetSource",
12+
"RENAL_FFPE_PROTEIN_10X_DATASET",
13+
"PUBLIC_DATASET_SOURCES",
14+
"get_public_dataset_sources",
15+
]

src/pyXenium/datasets/catalog.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
from __future__ import annotations
2+
3+
from dataclasses import dataclass
4+
5+
6+
@dataclass(frozen=True)
7+
class PublicDatasetSource:
8+
slug: str
9+
title: str
10+
provider: str
11+
url: str
12+
modality: str
13+
software: str
14+
species: str
15+
tissue: str
16+
preservation_method: str
17+
disease_state: str
18+
upstream_data_license: str
19+
first_published: str
20+
current_release_date: str
21+
release_notes: str
22+
local_validation_summary: str
23+
24+
25+
RENAL_FFPE_PROTEIN_10X_DATASET = PublicDatasetSource(
26+
slug="xenium-protein-ffpe-human-renal-carcinoma",
27+
title="Xenium In Situ Gene and Protein Expression data for FFPE Human Renal Cell Carcinoma",
28+
provider="10x Genomics",
29+
url="https://www.10xgenomics.com/datasets/xenium-protein-ffpe-human-renal-carcinoma",
30+
modality="RNA + Protein",
31+
software="Xenium Onboard Analysis 4.0.0",
32+
species="Human",
33+
tissue="Kidney",
34+
preservation_method="FFPE",
35+
disease_state="Renal cell carcinoma",
36+
upstream_data_license="CC BY 4.0",
37+
first_published="2025-07-17",
38+
current_release_date="2025-09-25",
39+
release_notes=(
40+
"10x Genomics states that the dataset was first published on July 17, 2025, "
41+
"reanalyzed with the final Xenium Onboard Analysis v4.0 pipeline on August 27, 2025, "
42+
"and replaced again on September 25, 2025 to fix a bug with no changes to the biological results."
43+
),
44+
local_validation_summary=(
45+
"pyXenium successfully loaded a local copy of the public bundle through both the Zarr-backed "
46+
"and HDF5-backed cell_feature_matrix inputs, producing an AnnData object with 465545 cells, "
47+
"405 RNA features, 27 protein markers, spatial centroids, and merged cluster labels."
48+
),
49+
)
50+
51+
PUBLIC_DATASET_SOURCES = (RENAL_FFPE_PROTEIN_10X_DATASET,)
52+
53+
54+
def get_public_dataset_sources() -> tuple[PublicDatasetSource, ...]:
55+
"""Return curated public dataset sources used to validate pyXenium."""
56+
return PUBLIC_DATASET_SOURCES
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""Toy Xenium-like dataset used for smoke tests and demos."""

0 commit comments

Comments
 (0)