Unified access to neuroscience and scientific datasets
Full Documentation · pip install scitex-dataset
Interfaces: Python ⭐⭐⭐ (primary) · CLI ⭐ · MCP ⭐⭐ · Skills ⭐⭐ · Hook — · HTTP —
| # | Problem | Solution |
|---|---|---|
| 1 | Public dataset repositories balkanized -- OpenNeuro (BIDS) + DANDI (NWB) + PhysioNet (WFDB) + Zenodo (generic) + GEO / ChEMBL / ClinicalTrials — different APIs, auth, download tools | Unified fetcher -- stx.dataset.neuroscience.openneuro.fetch_all_datasets() same call shape across all; local FTS5 search across metadata |
| 2 | "Download this BIDS dataset" means reading DataLad docs first -- the barrier is tooling, not knowledge | One-line fetch -- no DataLad setup; the module handles auth, resumption, checksums transparently |
Neuroscience datasets are scattered across multiple repositories -- OpenNeuro, DANDI Archive, PhysioNet, Zenodo -- each with its own API, data format, and query interface. Researchers waste time navigating incompatible APIs to discover relevant data. AI agents lack a unified way to search and evaluate datasets programmatically.
SciTeX Dataset provides a single Python API, CLI, and MCP (Model Context Protocol) server to discover and query metadata from major scientific data repositories. It focuses on fast metadata retrieval without downloading full datasets.
| Repository | Description | Data Types |
|---|---|---|
| OpenNeuro | Open platform for sharing neuroimaging data | MRI, EEG, MEG, iEEG, PET |
| DANDI | BRAIN Initiative data archive | Electrophysiology, Ophys |
| PhysioNet | Physiological signal databases | ECG, EEG, clinical data |
| Zenodo | General scientific data repository (CERN) | Any research data |
Table 1. Supported data repositories. Each source is queried via its public API; no authentication required for metadata access.
Requires Python >= 3.10.
pip install scitex-datasetMCP support:
pip install scitex-dataset[mcp]
from scitex_dataset import fetch_all_datasets, format_dataset
# Fetch datasets from OpenNeuro
datasets = fetch_all_datasets(max_datasets=10)
# Format for analysis
for ds in datasets:
formatted = format_dataset(ds)
print(f"{formatted['id']}: {formatted['name']} ({formatted['n_subjects']} subjects)")Python API
from scitex_dataset import fetch_all_datasets, format_dataset, search_datasets, sort_datasets
from scitex_dataset import neuroscience, database
# Fetch from specific sources
datasets = fetch_all_datasets(max_datasets=100) # OpenNeuro
dandi_ds = neuroscience.dandi.fetch_all_datasets(max_datasets=50) # DANDI
phys_ds = neuroscience.physionet.fetch_all_datasets() # PhysioNet
# Search and filter
eeg_datasets = search_datasets(datasets, modality="eeg", min_subjects=20)
popular = sort_datasets(datasets, by="downloads", descending=True)
# Local database for fast full-text search
database.build() # index all sources
results = database.search("alzheimer EEG", min_subjects=20)CLI Commands
scitex-dataset --help-recursive # Show all commands
# Fetch from repositories
scitex-dataset openneuro -n 100 -o datasets.json -v
scitex-dataset dandi -n 50 -o dandi.json -v
scitex-dataset physionet -n 50 -v
scitex-dataset zenodo -q "neuroscience" -n 20
# Local database
scitex-dataset db build # index all sources
scitex-dataset db search "epilepsy EEG" # full-text search
scitex-dataset db stats # show statistics
# Introspection
scitex-dataset list-python-apis -v # list Python API tree
scitex-dataset mcp list-tools -v # list MCP toolsMCP Server -- for AI Agents
AI agents can discover and query neuroscience datasets autonomously.
| Tool | Description |
|---|---|
dataset_openneuro_fetch |
Fetch datasets from OpenNeuro |
dataset_dandi_fetch |
Fetch datasets from DANDI Archive |
dataset_physionet_fetch |
Fetch datasets from PhysioNet |
dataset_zenodo_fetch |
Fetch datasets from Zenodo |
dataset_search |
Filter datasets by modality, subjects, etc. |
dataset_list_sources |
List available data repositories |
dataset_db_build |
Build local search database |
dataset_db_search |
Full-text search across all sources |
dataset_db_stats |
Database statistics |
Table 2. Nine MCP tools available for AI-assisted dataset discovery. All tools accept JSON parameters and return JSON results.
scitex-dataset mcp startSkills — for AI Agent Discovery
Skills provide workflow-oriented guides that AI agents query to discover capabilities and usage patterns.
scitex-dataset skills list # List available skill pages
scitex-dataset skills get SKILL # Show main skill page
scitex-dev skills export --package scitex-dataset # Export to Claude Code| Skill | Content |
|---|---|
quick-start |
Basic usage |
data-sources |
OpenNeuro, DANDI, PhysioNet |
cli-reference |
CLI commands |
mcp-tools |
MCP tools for AI agents |
SciTeX Dataset is part of SciTeX. When used inside the SciTeX framework, dataset discovery integrates with reproducible research sessions:
import scitex
from scitex_dataset import fetch_all_datasets, format_dataset
@scitex.session
def main(logger=scitex.INJECTED):
datasets = fetch_all_datasets(max_datasets=100, logger=logger)
formatted = [format_dataset(ds) for ds in datasets]
scitex.io.save(formatted, "openneuro_datasets.json")
return 0The SciTeX ecosystem follows the Four Freedoms for Research, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere -- your machine, your terms.
- The freedom to study how every step works -- from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 -- because we believe research infrastructure deserves the same freedoms as the software it runs on.