Skip to content

Commit 08cb0c0

Browse files
committed
a
1 parent 2f084b0 commit 08cb0c0

File tree

4 files changed

+135
-59
lines changed

4 files changed

+135
-59
lines changed

README.md

Lines changed: 40 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,64 @@
11
# pyXenium
22

3-
A toy Python package for analyzing 10x Xenium data.
3+
Utilities for loading and analyzing 10x Genomics Xenium exports in Python.
44

5-
## Installation
5+
## Quickstart
6+
7+
Install:
68

79
```bash
8-
pip install -U "pyXenium>=0.2.0"
9-
pip install "git+https://github.com/hutaobo/pyXenium.git@main"
10+
pip install pyXenium
1011
```
1112

12-
## Quickstart
13-
14-
### Load a partial Xenium dataset from Hugging Face
15-
16-
The snippet below uses the **public demo dataset** and the **v2 loader** that supports `base_url`.
13+
Load a dataset hosted online (e.g. Hugging Face). **By default, `load_anndata_from_partial` looks for the 10x MEX triplet under `<base>/cell_feature_matrix/`**:
1714

18-
<!-- START load_partial_example -->
1915
```python
2016
from pyXenium.io.partial_xenium_loader import load_anndata_from_partial
2117

2218
BASE = "https://huggingface.co/datasets/hutaobo/pyxenium-gsm9116572/resolve/main"
2319

2420
adata = load_anndata_from_partial(
2521
base_url=BASE,
26-
analysis_name="analysis.zarr.zip",
27-
cells_name="cells.zarr.zip",
28-
transcripts_name="transcripts.zarr.zip",
29-
# Optional: if you uploaded a 10x MEX triplet under BASE/mex/
30-
# mex_dir=BASE + "/mex",
31-
# mex_matrix_name="matrix.mtx.gz",
32-
# mex_features_name="features.tsv.gz",
33-
# mex_barcodes_name="barcodes.tsv.gz",
34-
build_counts_if_missing=True,
22+
analysis_name="analysis.zarr.zip", # optional, attaches clusters if present
23+
cells_name="cells.zarr.zip", # optional, attaches spatial centroids if present
24+
# By default it will read MEX from: BASE + "/cell_feature_matrix/"
3525
)
3626
print(adata)
3727
```
38-
<!-- END load_partial_example -->
39-
40-
> **Note:** Requires `pyXenium>=0.2.0`.
41-
> The demo dataset is hosted at:
42-
> - Hugging Face Datasets: [hutaobo/pyxenium-gsm9116572](https://huggingface.co/datasets/hutaobo/pyxenium-gsm9116572)
43-
44-
---
4528

46-
## Development
29+
**What gets loaded:**
30+
- **Counts**: from MEX (`cell_feature_matrix/{matrix.mtx.gz, features.tsv.gz, barcodes.tsv.gz}`).
31+
- **Clusters** (optional): from `analysis.zarr[.zip]` if provided.
32+
- **Spatial centroids** (optional): from `cells.zarr[.zip]` if provided.
4733

48-
To install with development dependencies (testing, docs, etc.):
34+
If MEX is missing:
35+
- The function returns an **empty-gene AnnData** (rows=cells if we can infer cell IDs; otherwise empty).
36+
- Clusters/spatial are still attached when possible.
37+
- To get real counts, upload MEX to `<base>/cell_feature_matrix/` or pass `mex_dir=...` explicitly.
4938

50-
```bash
51-
pip install -e ".[dev]"
52-
pytest
39+
### Override the MEX location (optional)
40+
```python
41+
adata = load_anndata_from_partial(
42+
base_url=BASE,
43+
mex_dir=BASE + "/cell_feature_matrix", # explicit
44+
analysis_name="analysis.zarr.zip",
45+
cells_name="cells.zarr.zip",
46+
)
5347
```
5448

55-
---
56-
57-
## Links
58-
59-
- 📦 PyPI: [pyXenium](https://pypi.org/project/pyXenium/)
60-
- 📖 Documentation: [Read the Docs](https://pyxenium.readthedocs.io/en/latest/)
61-
- 💻 Source code: [GitHub](https://github.com/hutaobo/pyXenium)
62-
63-
## License
49+
### Local folder example
50+
```python
51+
adata = load_anndata_from_partial(
52+
base_dir="/path/to/xenium_export",
53+
analysis_name="analysis.zarr",
54+
cells_name="cells.zarr",
55+
# will look for /path/to/xenium_export/cell_feature_matrix/
56+
)
57+
```
6458

65-
MIT
59+
### Troubleshooting
60+
- **FileNotFoundError: MEX missing files** → Ensure the three files exist in `cell_feature_matrix/`:
61+
`matrix.mtx.gz`, `features.tsv.gz`, `barcodes.tsv.gz`.
62+
- **Different obs names** → We honor 10x barcodes (from MEX). If your Zarr stores numeric
63+
cell IDs, we normalize them to strings internally but prefer the barcodes from MEX.
64+
- **Large downloads** → Remote MEX is downloaded once into a temp dir per session run.

docs/usage/load_partial.md

Lines changed: 58 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,62 @@
1-
# Load a partial Xenium dataset from Hugging Face
1+
# Partial Loading: counts + clusters + spatial
22

3-
This example uses the **public demo dataset** and the **v2 loader** that supports `base_url`.
3+
`load_anndata_from_partial` reconstructs an `AnnData` using any subset of Xenium outputs:
44

5-
```{include} ../_includes/README.md
6-
:start-after: <!-- START load_partial_example -->
7-
:end-before: <!-- END load_partial_example -->
5+
- **Counts** (required for expression): **MEX triplet** from `cell_feature_matrix/`
6+
- **Clusters** (optional): `analysis.zarr` / `analysis.zarr.zip`
7+
- **Spatial centroids** (optional): `cells.zarr` / `cells.zarr.zip`
8+
9+
## Default behavior
10+
11+
- If `mex_dir` is **not** provided, the loader **automatically** looks for
12+
`<base>/cell_feature_matrix/{matrix.mtx.gz, features.tsv.gz, barcodes.tsv.gz}`.
13+
- If found ⇒ counts are loaded from MEX (fast & robust).
14+
- If not found ⇒ returns an **empty-gene AnnData** but still attaches clusters/spatial if available.
15+
16+
## Examples
17+
18+
### Remote (Hugging Face)
19+
20+
```python
21+
from pyXenium.io.partial_xenium_loader import load_anndata_from_partial
22+
23+
BASE = "https://huggingface.co/datasets/hutaobo/pyxenium-gsm9116572/resolve/main"
24+
25+
adata = load_anndata_from_partial(
26+
base_url=BASE,
27+
analysis_name="analysis.zarr.zip",
28+
cells_name="cells.zarr.zip",
29+
)
30+
print(adata)
31+
```
32+
33+
### Local folder
34+
35+
```python
36+
adata = load_anndata_from_partial(
37+
base_dir="/data/xenium_export",
38+
analysis_name="analysis.zarr",
39+
cells_name="cells.zarr",
40+
)
41+
```
42+
43+
### Explicit MEX path (optional)
44+
45+
```python
46+
adata = load_anndata_from_partial(
47+
base_url=BASE,
48+
mex_dir=BASE + "/cell_feature_matrix",
49+
)
850
```
951

10-
> **Requires** `pyXenium >= 0.2.0`.
11-
> Demo dataset: `hutaobo/pyxenium-gsm9116572` on Hugging Face.
52+
## What gets attached
53+
54+
- **Counts**: MEX → `.X` (CSR) and `.layers["counts"]`
55+
- **Clusters** (if `analysis.zarr*` present): `adata.obs["Cluster"]`
56+
- **Spatial** (if `cells.zarr*` present): `adata.obsm["spatial"]` (or `spatial3d`)
57+
58+
## Notes & FAQ
59+
60+
- We prioritize **10x barcodes** (MEX) as `obs.index`.
61+
- If Zarr stores numeric `cell_id` as `(N,2)` integers, we normalize internally; alignment prefers barcodes.
62+
- Want counts but don’t have MEX? Upload `cell_feature_matrix/` or export MEX from 10x Xenium output.

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,5 @@ site_name: pyXenium
22
nav:
33
- Home: index.md
44
- Usage: usage.md
5+
- Guide:
6+
- Partial Loading: guide/partial-loading.md

src/pyXenium/io/partial_xenium_loader.py

Lines changed: 35 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
- Do NOT build counts from transcripts.zarr anymore. If MEX missing, returns an AnnData with
1616
empty gene dimension (but still attaches clusters/spatial when available).
1717
- Robust `cell_id` detection: include root-level "cell_id" and support `(N,2)` numeric ids -> "cell_{num}".
18+
- Fix: when downloading MEX from URLs, download all three files into the SAME temporary directory
19+
to avoid "MEX missing files" errors.
1820
- Keep all previous public APIs and behaviors (backward compatible).
1921
2022
Author: Taobo Hu (pyXenium project)
@@ -66,7 +68,7 @@ def _is_url(p: Optional[str | os.PathLike]) -> bool:
6668

6769

6870
def _fetch_to_temp(src: str, suffix: Optional[str] = None) -> Path:
69-
"""Download a URL to a temporary file and return its Path."""
71+
"""Download a single URL to a temporary file and return its Path."""
7072
logger.info(f"Downloading: {src}")
7173
r = requests.get(src, stream=True, timeout=120)
7274
r.raise_for_status()
@@ -80,6 +82,24 @@ def _fetch_to_temp(src: str, suffix: Optional[str] = None) -> Path:
8082
return dst
8183

8284

85+
def _download_many_to_same_temp(urls: Sequence[str]) -> Path:
86+
"""Download multiple URLs into the SAME temporary directory.
87+
Returns the temp directory Path containing all files (base names preserved).
88+
"""
89+
tmpdir = Path(tempfile.mkdtemp(prefix="pyxenium_"))
90+
for u in urls:
91+
logger.info(f"Downloading: {u}")
92+
r = requests.get(u, stream=True, timeout=120)
93+
r.raise_for_status()
94+
name = Path(urllib.parse.urlparse(u).path).name or "tmp"
95+
dst = tmpdir / name
96+
with open(dst, "wb") as f:
97+
for chunk in r.iter_content(chunk_size=1024 * 1024):
98+
if chunk:
99+
f.write(chunk)
100+
return tmpdir
101+
102+
83103
def _p(x: Optional[os.PathLike | str]) -> Optional[Path]:
84104
return None if x is None else Path(x).expanduser().resolve()
85105

@@ -384,12 +404,14 @@ def load_anndata_from_partial(
384404
# 1) If explicit mex_dir is provided
385405
if mex_dir is not None:
386406
if _is_url(mex_dir):
387-
# download three files into a temp dir
407+
# download three files into the SAME temp dir
388408
base_url_mex = str(mex_dir).rstrip("/")
389-
m_p = _fetch_to_temp(base_url_mex + "/" + mex_matrix_name)
390-
_ = _fetch_to_temp(base_url_mex + "/" + mex_features_name)
391-
_ = _fetch_to_temp(base_url_mex + "/" + mex_barcodes_name)
392-
mex_dir_p = m_p.parent
409+
tmpdir = _download_many_to_same_temp([
410+
base_url_mex + "/" + mex_matrix_name,
411+
base_url_mex + "/" + mex_features_name,
412+
base_url_mex + "/" + mex_barcodes_name,
413+
])
414+
mex_dir_p = tmpdir
393415
else:
394416
mex_dir_p = _p(mex_dir)
395417

@@ -402,13 +424,15 @@ def load_anndata_from_partial(
402424
mex_dir_p = cand
403425

404426
if mex_dir_p is None and base_url is not None:
405-
# download three files from base_url/cell_feature_matrix/
427+
# download three files from base_url/cell_feature_matrix/ into SAME temp dir
406428
root_url = base_url.rstrip("/") + "/" + mex_default_subdir
407429
try:
408-
m_p = _fetch_to_temp(root_url + "/" + mex_matrix_name)
409-
_ = _fetch_to_temp(root_url + "/" + mex_features_name)
410-
_ = _fetch_to_temp(root_url + "/" + mex_barcodes_name)
411-
mex_dir_p = m_p.parent
430+
tmpdir = _download_many_to_same_temp([
431+
root_url + "/" + mex_matrix_name,
432+
root_url + "/" + mex_features_name,
433+
root_url + "/" + mex_barcodes_name,
434+
])
435+
mex_dir_p = tmpdir
412436
except Exception as e:
413437
logger.warning(f"Failed to fetch MEX from {root_url}: {e}")
414438

0 commit comments

Comments
 (0)