diff --git a/docs/tools/odm-sdk/terminal/study/uploading-study.md b/docs/tools/odm-sdk/terminal/study/uploading-study.md index 66a9f2ac..2066c20d 100644 --- a/docs/tools/odm-sdk/terminal/study/uploading-study.md +++ b/docs/tools/odm-sdk/terminal/study/uploading-study.md @@ -34,6 +34,7 @@ odm-import-data -h - `-sm,--samples`: URL of the samples file or accession of existing samples file to be linked - `-lb, --libraries`: URL of the libraries file or accession of existing libraries file to be linked - `-pr, --preparations`: URL of hosted preparations file or accession of existing preparations file to be linked + - `-c, --cell`: URL of hosted cell metadata file or accession of existing cell file to be linked - `-e,--expression`: URL of any tabular data file (not only expression data) except Gene Variant or Flow Cytometry - `-em,--expression-metadata`: URL of any tabular metadata file (not only expression data) except Gene Variant or Flow Cytometry - `-v, --variant`: URL of the variants data file @@ -73,7 +74,7 @@ Additional optional parameters: ## Data model -The script supports 2 data models: +The script supports several data models: ![Data Model](uploading-study/data-model.png) - Study - Samples - Omics data: @@ -82,6 +83,9 @@ The script supports 2 data models: - the script uses this data model if parameters for libraries or preparations loading are specified; - omics data can be linked only to libraries or preparations; - only expression data (the parameters --expression and --expression-metadata) is supported. +- Study - Samples - (optional: Libraries/Preparations) - Cell metadata - Omics data: + - the script uses this data model if parameter for cell metadata loading is specified; + - expression data can be linked to cell metadata; The script works sequentially, linking the object with the previous one according to the data model. Below you can find examples to demonstrate different combinations: @@ -141,6 +145,25 @@ odm-import-data --token [token] -H [HOST] \ - `preparations_1` will be linked to `samples_2` - `expression_1` will be linked to `preparations_1` +### _Example 4_ + +```shell +odm-import-data --token [token] -H [HOST] \ + --study http://data_source/study.csv \ + --samples http://data_source/samples_1.csv \ + --samples http://data_source/samples_2.csv \ + --libraries http://data_source/libraries_1.csv \ + --cell http://data_source/cell_1.csv \ + --expression http://data_source/expression_1.gct \ + --expression-metadata http://data_source/expression_metadata_1.gct.tsv +``` + +- `samples_1` will be linked to `study` +- `samples_2` will be linked to `study` +- `libraries_1` will be linked to `samples_2` +- `cell_1` will be linked to `libraries_1` +- `expression_1` will be linked to `cell_1` + ## Link all to all The `-lata` parameter allows to bypass the restriction of sequential linking of objects. The behaviour of the script @@ -355,3 +378,31 @@ odm-import-data --token [token] -H [HOST] \ --samples http://data_source/arabidopsis_sample_metadata_uncurated.tsv \ --expression http://data_source/arabidopsis.gct ``` + +### Study with single cell data + +For working with Cell metadata and Cell expression use the following example files: + +- [Study_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/study_metadata.tsv), a tab-delimited file of the study attributes +- [Samples_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/samples.tsv), a tab-delimited file of sample attributes +- [Cell_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv), a tab-delimited file of cell attributes +- [Cell_expression](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv), a tab-delimited file of cell expression data + +Run the script with the above by typing the following (inserting your token +instead of [token], note you may need to escape or quote strings depending on +your specific command line interface): + +Script example (Study → Samples → Cells → Expression) + +```default +odm-import-data \ +--server \ +--token \ +--study 's3://bio-test-data/User_guide_test_data/Single_cell_data/study_metadata.tsv' \ +--samples 's3://bio-test-data/User_guide_test_data/Single_cell_data/samples.tsv' \ +--cells 's3://bio-test-data/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv' \ +--expression 's3://bio-test-data/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv' \ +--data-class 'Single-cell transcriptomics' \ +--number-of-feature-attributes 1 \ +--allow-duplicates +``` diff --git a/docs/tools/odm-sdk/terminal/study/uploading-study/data-model.png b/docs/tools/odm-sdk/terminal/study/uploading-study/data-model.png index 5b5156e0..c6f66249 100644 Binary files a/docs/tools/odm-sdk/terminal/study/uploading-study/data-model.png and b/docs/tools/odm-sdk/terminal/study/uploading-study/data-model.png differ diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data-model+metainfo-editor.png b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data-model+metainfo-editor.png deleted file mode 100644 index 962ae701..00000000 Binary files a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data-model+metainfo-editor.png and /dev/null differ diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data-model.png b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data-model.png new file mode 100644 index 00000000..c6f66249 Binary files /dev/null and b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data-model.png differ diff --git a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data_model.png b/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data_model.png deleted file mode 100644 index 882a77e6..00000000 Binary files a/docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data_model.png and /dev/null differ diff --git a/docs/user-guide/doc-odm-user-guide/import-data-using-api.md b/docs/user-guide/doc-odm-user-guide/import-data-using-api.md index 500d4dff..a8ca1678 100644 --- a/docs/user-guide/doc-odm-user-guide/import-data-using-api.md +++ b/docs/user-guide/doc-odm-user-guide/import-data-using-api.md @@ -19,11 +19,12 @@ You can import studies, samples, and any data in the tabular format: - **Study**: the context of an experiment, such as the aim and statistical design. - **Sample**: the biological attributes of a sample, such as tissue, disease, and treatment. -- **Data**: Includes transcriptomics, proteomics, gene variant, flow cytometry data, and more. You can import the metadata (e.g. genome version, normalization - method, and the locations of raw/processed data in your storage) together with the processed data (e.g. expression counts, genotypes). -- **Cross-reference mapping**: a list of transcript and gene ids and how they map to each other. - **Libraries metadata**: TSV file describing sequencing libraries or other indexable data types. It includes information on library preparation, type (e.g., single-end or paired-end), protocol, barcodes, and platform. - **Preparations metadata**: metadata describing how samples were prepared prior to data generation, applicable to proteomics, transcriptomics, and other data types. +- **Cell metadata**: all the information stored per cell (per barcode) that describes that cell and its context, separate from the actual molecular measurements (like the gene expression counts matrix which should be uploaded as expression within the ODM) +- **Data**: Includes transcriptomics, proteomics, gene variant, flow cytometry data, cell expression, and more. You can import the metadata (e.g. genome version, normalization + method, and the locations of raw/processed data in your storage) together with the processed data (e.g. expression counts, genotypes). +- **Cross-reference mapping**: a list of transcript and gene ids and how they map to each other. - **Attached Files**: Supplement your study by attaching related research materials like PDF, XLSX, DOCX, PPTX files, images, and more. Please note, contents of these attached files won't be indexed or made searchable. Once imported, studies, samples, and data metadata will be queryable and editable from both the User Interface and APIs, whilst the signal data will only be queryable via APIs. @@ -35,17 +36,18 @@ Importing data has two stages. First, you import studies, samples, and data sepa The **Sample Source ID** is used as the default linking key. You can choose another attribute from the template for linking data to samples. The data model and how it looks in the User Interface is shown below. -In addition to core data types, **Libraries** and **Preparations** require special handling. These files must include the **Sample Source ID**, which is used to link them to the appropriate samples. +In addition to core data types, **Libraries**, **Preparations**, **Cell metadata** require special handling. These files must include the **Sample Source ID**, which is used to link them to the appropriate samples. The correct order of linking follows the system logic and available endpoints: - **Samples** are linked to a **Study** - **Libraries** and **Preparations** are linked to **Samples** -- **Omics data** (e.g. transcriptomics, proteomics) are linked to **Samples**, or to **Libraries/Preparations** depending on the data type +- **Cell metadata** is linked to **Samples** or **Libraries** or **Preparations** +- **Omics data** (e.g. transcriptomics, proteomics, cell expression) are linked to **Samples**, or to **Libraries/Preparations**, or to **Cell metadata** depending on the data type - **Attached files** are linked directly to a **Study** -![image](doc-odm-user-guide/images/data-model+metainfo-editor.png) +![image](doc-odm-user-guide/images/data-model.png) ## Data Loading via APIs To load the data via APIs each entity is created via a separate endpoint specific for this data type. Then they are sequentially linked in the Integration layer. @@ -271,6 +273,51 @@ As soon as the import process will be completed, you will be able to get the pre } } ``` + +### Import Cell metadata + +For working with Cell metadata and Cell expression use the following example files: + +- [Study_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/study_metadata.tsv), a tab-delimited file of the study attributes +- [Samples_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/samples.tsv), a tab-delimited file of sample attributes +- [Cell_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv), a tab-delimited file of cell attributes +- [Cell_expression](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv), a tab-delimited file of cell expression data + +To import Cell metadata, you will need to use `POST /api/v1/jobs/import/cells` endpoint: + +```default +curl -X 'POST' \ + 'https:///api/v1/jobs/import/cells?allow_dups=false' \ + -H 'accept: application/json' \ + -H 'Genestack-API-Token: ' \ + -H 'Content-Type: application/json' \ + -d '{ + "dataLink": "https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv" +``` + +Similar to the previous step, you should see the **jobExecId** in the response: + +```json +{ + "jobExecId": 24, + "startedBy": "job@genestack.com", + "jobName": "IMPORT_CELLS", + "status": "COMPLETED", + "createTime": "2026-02-05 11:35:36", + "endTime": "2026-02-05 11:35:38" +} +``` +As soon as the import process will be completed, you will be able to get the Cell metadata **groupAccession** by querying the **jobExecId** in `GET /api/v1/jobs/{jobExecId}/output` endpoint: + +```json +{ + "status": "COMPLETED", + "result": { + "groupAccession": "GSF016786" + } +} +``` + ### Linking entities #### Samples to Study @@ -328,6 +375,31 @@ If successful you will see a preparation tab appear in the Metadata Editor: ![image](doc-odm-user-guide/images/preparation-added.png) +#### Cell metadata to Samples/Libraries/Preparations + +You can link the **Cell metadata group** to the **samples/libraries/preparation groups** using the endpoints: + +* Link to Samples + + **Path:** POST `/api/v1/as-curator/integration/link/cell/group/{sourceId}/to/sample/group/{targetId}` + +* Link to Libraries + + **Path:** POST `/api/v1/as-curator/integration/link/cells/group/{sourceId}/to/library/group/{targetId}` + +* Link to Preparations + + **Path:** POST `/api/v1/as-curator/integration/link/cells/group/{sourceId}/to/preparation/group/{targetId}` + +For `sourceId` field provide accession of your Cell metadata group. + +For `targetId` field provide accession of selected Sample, Library, or Preparation group where Cell metadata should be linked. + +Cell metadata will be linked if there are matches between `batch` values in Cell metadata and `Sample Source ID` for Samples, +`Library ID` for Libraries, and `Preparation ID` for Preparations. + +If successful you will find the Cells via `GET /api/v1/as-curator/omics/cells` API endpoint when Study accession is provided for `studyQuery` parameter. + ### Working with the jobExecId The following endpoints allow you to manage and inspect jobs using the jobExecId, which is returned after initiating an asynchronous import task. @@ -937,4 +1009,3 @@ Example response: ] } ``` - diff --git a/docs/user-guide/doc-odm-user-guide/import-data-using-python-script.md b/docs/user-guide/doc-odm-user-guide/import-data-using-python-script.md index 913cc5b8..4e598c08 100644 --- a/docs/user-guide/doc-odm-user-guide/import-data-using-python-script.md +++ b/docs/user-guide/doc-odm-user-guide/import-data-using-python-script.md @@ -8,7 +8,7 @@ to be able to import and edit data in ODM. Read the full list of requirements [here](../../../tools/odm-sdk/terminal/study/uploading-study/#requirements) -## Optional experimental (signal) data files +## Optional files You can optionally also provide: @@ -17,7 +17,8 @@ You can optionally also provide: - The server address if you want to apply the script to a different ODM server. Use `--host ` to specify. - Any data in the Tabular format (Data Frame) as a TSV, hosted at an HTTPS web address -- Gene expression data in [GCT](https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29) format, hosted at an HTTPS web address +- Gene expression data in [GCT](https://docs.gsea-msigdb.org/#GSEA/Data_Formats/#gct-gene-cluster-text-file-format-gct) format, hosted at an HTTPS web address +- Gene expression or Cell expression data in TSV format, hosted at an HTTPS web address - Gene expression metadata in TSV format, hosted at an HTTPS web address - Gene variant data in [VCF](https://samtools.github.io/hts-specs/VCFv4.2.pdf) format, hosted at an HTTPS web address - Gene variant metadata in TSV format, hosted at an HTTPS web address @@ -29,9 +30,10 @@ You can optionally also provide: - A libraries file in TSV format, hosted at an HTTPS web address, or the accession of an existing library file - A preparations file in TSV format, hosted at an HTTPS web address, or the - accession of an existing preparations file. + accession of an existing preparations file +- A Cell metadata file in TSV format, hosted at an HTTPS web address -Once imported, studies, samples, and signal metadata will be queryable and +Once imported, studies, samples, libraries, preparations, cells metadata, and signal metadata will be queryable and editable from both the User Interface and APIs, whilst the signal data will only queryable via APIs. @@ -87,6 +89,11 @@ Optionally include data files by appending any or all of the following to the ab ```default --preparations [URL] ``` + +```default +--cell [URL] +``` + ## Importing Multiple Tabular Files - [Test_basic_generic_expression.tsv](https://bio-test-data.s3.us-east-1.amazonaws.com/odm/user-guide/Test_basic_generic_expression.tsv), a tab-separated file containing tabular expression data with two text features and two numeric features, followed by expression values for four samples. @@ -156,12 +163,19 @@ accessions must be supplied. See the example below: The following are some example files to illustrate file formats: - [Test_1000g.study.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.study.tsv), a tab-delimited file of the study attributes -- [Test_1000g.samples.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.samples.tsv), a tab-delimited file of sample attributes. -- [Test_1000g.gct](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct), a [GCT](https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29) file of expression data from multiple sequencing runs +- [Test_1000g.samples.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.samples.tsv), a tab-delimited file of sample attributes +- [Test_1000g.gct](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct), a [GCT](https://docs.gsea-msigdb.org/#GSEA/Data_Formats/#gct-gene-cluster-text-file-format-gct) file of expression data from multiple sequencing runs - [Test_1000g.gct.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct.tsv), a tab-separated file that describes the expression data - [Test_1000g.vcf](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.vcf), a [VCF](https://samtools.github.io/hts-specs/VCFv4.2.pdf) file of variant data from multiple sequencing runs - [Test_1000g.vcf.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.vcf.tsv), a tab-separated file that describes the variant data +For working with Cell metadata and Cell expression use the following example files: + +- [Study_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/study_metadata.tsv), a tab-delimited file of the study attributes +- [Samples_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/samples.tsv), a tab-delimited file of sample attributes +- [Cell_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv), a tab-delimited file of cell attributes +- [Cell_expression](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv), a tab-delimited file of cell expression data + Run the script with the above by typing the following (inserting your token instead of [token], note you may need to escape or quote strings depending on your specific command line interface): @@ -169,3 +183,18 @@ your specific command line interface): ```default odm-import-data --token [token] --host [HOST] --study https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.study.tsv --samples https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.samples.tsv --expression https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct --expression_metadata https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct.tsv --variant https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.vcf --variant_metadata https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.vcf.tsv ``` + +Script example (Study → Samples → Cells → Expression) + +```default +odm-import-data \ +--server \ +--token \ +--study 's3://bio-test-data/User_guide_test_data/Single_cell_data/study_metadata.tsv' \ +--samples 's3://bio-test-data/User_guide_test_data/Single_cell_data/samples.tsv' \ +--cells 's3://bio-test-data/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv' \ +--expression 's3://bio-test-data/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv' \ +--data-class 'Single-cell transcriptomics' \ +--number-of-feature-attributes 1 \ +--allow-duplicates +``` diff --git a/docs/user-guide/doc-odm-user-guide/single-cell.md b/docs/user-guide/doc-odm-user-guide/single-cell.md new file mode 100644 index 00000000..722121ae --- /dev/null +++ b/docs/user-guide/doc-odm-user-guide/single-cell.md @@ -0,0 +1,444 @@ +Single Cell data refers to molecular measurements obtained from individual cells, rather than bulk samples where +signals are averaged across many cells. This approach allows researchers to study the heterogeneity within a +cell population, uncovering differences in gene expression, epigenetic states, or protein abundance between cells. + +ODM now supports the Cell entity to store and manage metadata and expression for individual cells in Single Cell datasets. +Each cell record belongs to a Cell Group, which represents a single cell table (group). + +## Cell metadata and Cell expression in ODM +Cell metadata can be imported into ODM using the `job` endpoints and [odm_import_data script](../../tools/odm-sdk/terminal/study/uploading-study.md). +Only TSV file format is supported to upload cell metadata. + +### Uploading via API endpoints + +Let's upload a new Study with Samples, Cell metadata, and Cell expression. For data import, you should go to the `job` +section and choose the endpoint relevant for the specific data type. + +In this example we will upload the following files: + +[Study_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/study_metadata.tsv), +a tab-delimited file of the study attributes: + +| Study Source | Study Source ID | Study Title | +|--------------|-----------------|-------------------------------------| +| S3 | EXP_S_9988 | Single Cell Expression Data Search | + +Import study as [described here](../doc-odm-user-guide/import-data-using-api.md/#import-study). + +[Samples_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/samples.tsv), +a tab-delimited file of sample attributes: + +| Sample Name | Sample Source ID | Sample Source | Sex | Age | Cell Type | Disease | +|-------------|------------------|---------------|--------|-----|-------------|----------| +| EXP_SN_8801 | EXP_SSID_8801 | S3 | female | 28 | EXP_CT_8801 | diabetes | +| EXP_SN_8802 | EXP_SSID_8802 | S3 | male | 29 | EXP_CT_8802 | melanoma | +| ... | ... | ... | ... | ... | ... | ... | + +Import samples as [described here](../doc-odm-user-guide/import-data-using-api.md/#import-samples). + +[Cell_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv), +a tab-delimited file of cell attributes: + +| barcode | sample_id | cell_type | treatment | protocol | cluster | n_counts | percent_mito | umap | pca | n_genes | doublet_scores | donor | organ | sort | method | file | assay | disease | organism | sex | development_stage | +|----------------|---------------|------------|------------|-------------|--------------------|----------|---------------|------------|----------|---------|----------------|---------|---------|---------|--------|-----------------|------------|----------|---------------|--------|-------------------| +| SMPL_CID_A1 01 | EXP_SSID_8801 | CD4_T_cell | stimulated | Smart-seq2 | Activated T cells | 12500 | 0.8 | -1.2,2.5 | 1.8,-0.7 | 2800 | 0.05 | DONOR_A | spleen | FACS_A | scRNA | SampleFile_A101 | Smart-seq2 | healthy | Homo sapiens | female | adult | +| SMPL_CID_A102 | EXP_SSID_8802 | NK_cell | resting | Smart-seq2 | Resting NK_cells | 8900 | 1.1 | 2.3,-1.8 | -0.9,2.1 | 2100 | 0.08 | DONOR_A | blood | FACS_A | scRNA | SampleFile_A102 | Smart-seq2 | healthy | Homo sapiens | male | adult | +| SMPL_CID_A103 | EXP_SSID_8803 | CD4_T_cell | stimulated | Smart-seq2 | Memory T cells | 15200 | 0.9 | -2.1,1.7 | 0.6,-1.9 | 3200 | 0.04 | DONOR_A | spleen | FACS_A | scRNA | SampleFile_A103 | Smart-seq2 | healthy | Homo sapiens | female | adult | +| SMPL_CID_A104 | EXP_SSID_8804 | CD8_T_cell | cytotoxic | Smart-seq2 | Cytotoxic T cells | 11800 | 1.2 | 1.9,-2.4 | -1.5,0.8 | 2900 | 0.07 | DONOR_A | blood | FACS_A | scRNA | SampleFile_A104 | Smart-seq2 | healthy | Homo sapiens | male | adult | +| SMPL_CID_A105 | EXP_SSID_8805 | CD8_T_cell | resting | Smart-seq2 | Naive CD8_T_cells | 9300 | 1.0 | -0.8,1.3 | 2.2,-1.1 | 2500 | 0.06 | DONOR_A | spleen | FACS_A | scRNA | SampleFile_A105 | Smart-seq2 | healthy | Homo sapiens | female | adult | + +For Cell metadata use the following endpoints: + +* Supply the file URL via dataLink + + **Path:** POST `/api/v1/jobs/import/cells` + +* Upload directly from TSV file + + **Path:** POST `/api/v1/jobs/import/cells/multipart` + +Import Cell metadata as [described here](../doc-odm-user-guide/import-data-using-api.md/#import-cell-metadata). + +[Cell_expression](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv), + a tab-delimited file of cell expression data: + +| gene_id | SMPL_CID_A101 | SMPL_CID_A102 | SMPL_CID_A103 | SMPL_CID_A104 | SMPL_CID_A105 | +|------------------|---------------|---------------|---------------|---------------|---------------| +| ENSG00000230368 | 1.01 | 1.02 | 1.03 | 1.04 | 1.05 | +| ENSG00000188976 | 2.01 | 2.02 | 2.03 | 2.04 | 2.05 | +| ACTB | 3.01 | 3.02 | 3.03 | 3.04 | 3.05 | + +For Cell expression use the following endpoints: + +* Supply the file URL via dataLink + + **Path:** POST `/api/v1/jobs/import/expression` + +* Upload directly from TSV file + + **Path:** POST `/api/v1/jobs/import/expression/multipart` + + **It is recommended to use TSV files archived in `.br` or `.lz4` extensions for Cell expression.** + +When the import job finishes successfully, the resulting Group accession can be retrieved with the following endpoint: +GET `/api/v1/jobs/{jobExecId}/output`. + +Example response: +```json +{ +"groupAccession": "GSF1234567" +} +``` +Learn more about [uploading data to ODM via API here](../doc-odm-user-guide/import-data-using-api.md). + +### Uploading via script + +Curators can upload and link Cell metadata groups to ODM using the [import_ODM_data script](../../tools/odm-sdk/terminal/study/uploading-study.md). +This extension allows you to include Cell groups in the same import workflow as other metadata entities (Studies, +Samples, Libraries, and Preparations), ensuring a consistent and automated data-loading process. + +#### Parameters + +The script supports optional parameter for Cell metadata: `-c` `--cell` + +| Feature | Description | +| -------------------- | ------------------------------------------------ | +| **Parameter** | `--cell` / `-c` | +| **Input format** | TSV (same format as `/api/v1/jobs/import/cells`) | +| **Linking targets** | Samples, Libraries, or Preparations | +| **Multiple imports** | Supported in one run | +| **Error handling** | Aligned with Cell import endpoint | + +For uploading Cell expression please use regular `-e` `--expression` parameters. + +#### Supported Import Scenarios + +Cells can be imported and linked in several hierarchical contexts, depending on your dataset structure. There are few examples: + +1. **Study → Samples → Cells → Expression** + + Used when cells are directly associated with samples. + +2. **Study → Samples → Library → Cells → Expression** / **Study → Samples → Preparation → Library → Cells → Expression** + + Used when cells originate from library-level data. + +3. **Study → Samples → Preparations → Cells → Expression** / **Study → Samples → Library → Preparation → Cells → Expression** + + Used when cells originate from preparation-level data. + +Note that Cell metadata will be linked to the nearest metadata group mentioned above in the script. + +#### Script example (Study → Samples → Cells → Expression) + +``` +odm-import-data \ +--server \ +--token \ +--study 's3://bio-test-data/User_guide_test_data/Single_cell_data/study_metadata.tsv' \ +--samples 's3://bio-test-data/User_guide_test_data/Single_cell_data/samples.tsv' \ +--cells 's3://bio-test-data/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv' \ +--expression 's3://bio-test-data/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv' \ +--data-class 'Single-cell transcriptomics' \ +--number-of-feature-attributes 1 \ +--allow-duplicates +``` + +### Common rules for TSV files with Cell metadata + +#### Stored attributes and limitations +There is the list of values parsed and stored within the system. + +All other values presented in Cell metadata file will be stored as custom attributes with string data type. + +| Attribute Name | Stored as type | Description | Required | +|----------------|----------------|--------------------------------------------------------------------------------------------------------------|----------| +| cellID | string | Unique cell identifier generated by ODM (composite key of `groupAccession` + `barcode`) | Yes | +| barcode | string | Raw cell barcode. **Must be unique**. | Yes | +| batch | string | Sample/batch origin | Yes | +| cellType | string | Annotated cell type | | +| cluster | string | Clustering labels | | +| nCounts | integer | Total UMI count (Unique Molecular Identifier) | | +| percentMito | float | % mitochondrial gene expression | | +| umap | float | Dimensionality reduction results (Uniform Manifold Approximation and Projection). Up to 3 values are stored. | | +| pca | float | Dimensionality reduction results (Principal Component Analysis results). Up to 100 values are stored. | | +| tsne | float | Dimensionality reduction results (t-distributed Stochastic Neighbor Embedding). Up to 3 values are stored. | | + +#### Validation + +Fail conditions: + +* Missing required attributes (`barcode`, `batch`) +* Duplicate barcodes within a group +* Blank values in required attributes + +Warnings (ignored values): + +* Invalid data type for attribute + +### Linking Cell metadata to Samples, Libraries, Preparations + +#### Common rules + +To link Cell metadata to other metadata groups use the following endpoints: + +**Swagger definition:** `integrationCurator` → `Cell integration as Curator` + +* Link to Samples + + **Path:** POST `/api/v1/as-curator/integration/link/cell/group/{sourceId}/to/sample/group/{targetId}` + +* Link to Libraries + + **Path:** POST `/api/v1/as-curator/integration/link/cells/group/{sourceId}/to/library/group/{targetId}` + +* Link to Preparations + + **Path:** POST `/api/v1/as-curator/integration/link/cells/group/{sourceId}/to/preparation/group/{targetId}` + +For `sourceId` field provide accession of your Cell metadata group. +For `targetId` field provide accession of selected Sample, Library, or Preparation group where Cell metadata should be linked. + +Cell metadata will be linked if there are matches between `batch` values in Cell metadata and `Sample Source ID` for Samples, +`Library ID` for Libraries, and `Preparation ID` for Preparations. + +#### Validation + +Fail conditions: + +* There is no Sample Source/Library/Preparation ID in Sample/Library/Preparation metadata group. +* There are no matches between `batch` in Cell metadata and Sample Source/Library/Preparation IDs. + +The amount of successfully created links between Cells and Samples/Libraries/Preparations will be shown in response +message if linkage is successful. + +### Linking Cell expression to Cell metadata + +To link Cell expression to Cell metadata group use the following endpoint: + +**Swagger definition:** `integrationCurator` → `Expression integration as Curator` + +**Path:** POST `/api/v1/as-curator/integration/link/expression/group/{sourceId}/to/cell/group/{targetId}` + +For `sourceId` field provide accession of your Cell expression group. + +For `targetId` field provide accession of selected Cell metadata group which Cell expression should be linked to. + +A Cell expression group can be linked to one Cell metadata group only. + +## [BETA] Analytics + +### Cell ratio +Compute cell ratio statistics across groups or metadata attributes in single-cell data. +This endpoint calculates cell ratio statistics based on single-cell metadata. +It quantifies the proportion of cells that meet specific criteria (`countSelected`, e.g., expression +threshold, cell type, or cluster) relative to a defined reference group or the total cell population +(`countAvailable`) defined by study, samples, library, or preparation metadata. + +**Swagger definition:** `integrationCurator` → `[BETA] Analytics omics queries as Curator` + +**Path:** POST `/api/v1/as-curator/omics/cells/analytics/cell-ratio` + +The Cell Ratio endpoint computes a simple proportion: + +* `countSelected` = number of cells that match all provided criteria (study/sample/library/preparation + cell metadata + optional expression constraints) +* `countAvailable` = number of cells in the reference population defined **only** by study/sample/library/preparation queries & filters +* `ratio` = `countSelected` / `countAvailable` + +This endpoint returns **counters only** (no cell records). + +Use it when you want to answer questions like: + +* “What fraction of cells in `Study X` are `Monocytes`?” +* “Within samples matching `Clozapine`, what proportion of cells have expression in a given range?” +* “Among cells from a specific library/preparation, what fraction match a cell metadata definition?” + +Request example: +```json +{ + "cellGroup": { + "studyFilter": "\"Study Source\"=ArrayExpress", + "studyQuery": "RNA-Seq of human dendritic cells", + "sampleFilter": "\"Species or strain\"=\"Homo sapiens\"", + "sampleQuery": "Clozapine", + "libraryFilter": "\"Library Type\"=RNA-Seq-1", + "libraryQuery": "illumina HiSeq500", + "preparationFilter": "Digestion=Trypsin", + "preparationQuery": "reversed-phase liquid chromatography", + "cellQuery": "cellType=Macrophage,Monocyte", + "searchSpecificTerms": false + }, + "exQuery": "-3 < value < 3" +} +``` +Response example: +```json +{ + "countSelected": 1243393, + "countAvailable": 9234945, + "ratio": 0.13465 +} +``` +### Gene summary +The Gene Summary endpoint returns **descriptive statistics and distribution summaries** for expression values of up to +**100 genes** across a filtered set of single cells. + +You use it when you want quick “what does this gene look like in these cells?” metrics: +mean/median, spread, quantiles, min/max, and a histogram-style density summary. + +**Swagger definition:** `integrationCurator` → `[BETA] Analytics omics queries as Curator` + +**Path:** POST `/api/v1/as-curator/omics/cells/analytics/gene-summary` + +For each requested gene, the response includes: + +* `geneId`: gene identifier (e.g., Ensembl ID) +* `cellCount`: number of cells with measurable expression for the gene under the applied filters +* `mean`: average expression value +* `median`: median expression value +* `stdDev`: standard deviation (dispersion) +* `min` / `max`: observed range of expression values +* `quantiles`: expression percentiles (configurable set of percentiles; returned as an ordered list of values) +* `histogram` (density): binned distribution summary suitable for plotting expression density + +Request example: +```json +{ + "cellGroup": { + "studyFilter": "\"Study Source\"=ArrayExpress", + "studyQuery": "RNA-Seq of human dendritic cells", + "sampleFilter": "\"Species or strain\"=\"Homo sapiens\"", + "sampleQuery": "Clozapine", + "libraryFilter": "\"Library Type\"=RNA-Seq-1", + "libraryQuery": "illumina HiSeq500", + "preparationFilter": "Digestion=Trypsin", + "preparationQuery": "reversed-phase liquid chromatography", + "cellQuery": "cellType=Macrophage,Monocyte", + "searchSpecificTerms": false + }, + "geneNames": [ + "ENSG00000230368", + "ENSG00000188976", + "ENSG00000188982" + ], + "exQuery": "-3 < value < 3" +} +``` +Response example: +```json +{ + "resultsPerGene": [ + { + "geneId": "ENSG00000111640", + "cellCount": 8968167, + "mean": 7.747614311820911, + "median": 7, + "stdDev": 6.499314669429827, + "min": 1, + "max": 496, + "quantiles": [ + 1, + 1, + 2, + 3, + 5, + 7, + 10, + 12, + 15, + 27, + 192 + ], + "histogram": "[(1, 15.50289002318, 7686678.375), (15.50289002318, 35.49570418233824, 1229164),\n(35.49570418233824, 56.93121325335453, 36531.25), (56.93121325335453, 77.21467372919479, 6910.625)]\n" + } + ] +} +``` + +### Differential expression +The Differential Expression endpoint compares gene expression between two cell populations: +a `Case` group and a `Control` group. It returns per-gene metrics that quantify how strongly expression +differs between the two groups, including **fold change** and **Mann–Whitney U test** results. + +**Swagger definition:** `integrationCurator` → `[BETA] Analytics omics queries as Curator` + +**Path:** POST `/api/v1/as-curator/omics/cells/analytics/differential-expression` + +Use it to answer questions like: + +* “Which genes are upregulated in `Monocytes` vs all other cells?” +* “Which genes differ between case samples and control samples within the same study?” +* “What changes under a treatment condition vs untreated controls?” + +Calculations for each returned `geneId`: + +* `caseCellCount`: number of case cells contributing measurable expression for that gene +* `controlCellCount`: number of control cells contributing measurable expression for that gene +* `caseAvgEx`: mean expression across contributing case cells +* `controlAvgEx`: mean expression across contributing control cells +* `expressionDifference`: `caseAvgEx` - `controlAvgEx` +* `foldChange`: `caseAvgEx` / `controlAvgEx` +* `mannWhitneyU` / `pValue`: Mann–Whitney U test outputs (as implemented by ClickHouse mannwhitneyutest) +* `log2FC`: the fold change expressed on a base-2 logarithmic scale + +If you apply exQuery expression thresholds, only cells/expression values that satisfy those rules contribute to the counts and averages. + +Request example: +```json +{ + "caseGroup": { + "studyFilter": "\"Study Source\"=ArrayExpress", + "studyQuery": "RNA-Seq of human dendritic cells", + "sampleFilter": "\"Species or strain\"=\"Homo sapiens\"", + "sampleQuery": "Clozapine", + "libraryFilter": "\"Library Type\"=RNA-Seq-1", + "libraryQuery": "illumina HiSeq500", + "preparationFilter": "Digestion=Trypsin", + "preparationQuery": "reversed-phase liquid chromatography", + "cellQuery": "cellType=Macrophage,Monocyte", + "searchSpecificTerms": false + }, + "controlGroup": { + "studyFilter": "\"Study Source\"=ArrayExpress", + "studyQuery": "RNA-Seq of human dendritic cells", + "sampleFilter": "\"Species or strain\"=\"Homo sapiens\"", + "sampleQuery": "Clozapine", + "libraryFilter": "\"Library Type\"=RNA-Seq-1", + "libraryQuery": "illumina HiSeq500", + "preparationFilter": "Digestion=Trypsin", + "preparationQuery": "reversed-phase liquid chromatography", + "cellQuery": "cellType=Macrophage,Monocyte", + "searchSpecificTerms": false + }, + "exQuery": "feature=ENSG00000230368,ENSG00000188976", + "limit": 2000, + "offset": 0 +} +``` +Response example: +```json +{ + "resultsPerGene": [ + { + "geneId": "ENSG00000230368", + "caseCellCount": 8450, + "controlCellCount": 8123, + "caseAvgExpression": 1.24, + "controlAvgExpression": 0.62, + "expressionDifference": 0.62, + "foldChange": 2, + "mannWhitneyU": 1.5, + "pValue": 0.95 + } + ], + "pagination": { + "currentResultsCount": 1, + "limit": 2000, + "offset": 0 + } +} +``` + +## Delete Cell metadata and Cell expression + +Please use [manage-data/data endpoint](../../user-guide/quick-start/admin-api.md/#use-case-example-delete-data-in-odm) to delete Cell metadata or Cell expression group. diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md index dfef7a21..618d7618 100644 --- a/docs/user-guide/index.md +++ b/docs/user-guide/index.md @@ -145,3 +145,4 @@ Want to know more? Learn more by watching our videos below. * [Cross-reference mapping file](doc-odm-user-guide/supported-formats.md#cross-reference-mapping-file) * [Libraries file](doc-odm-user-guide/supported-formats.md#libraries-file) * [Preparations file](doc-odm-user-guide/supported-formats.md#preparations-file) +* [Working with Single Cell Data](doc-odm-user-guide/single-cell.md) diff --git a/mkdocs.yml b/mkdocs.yml index 90bf0053..413d5265 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -48,6 +48,7 @@ nav: - Getting a Genestack API token: user-guide/doc-odm-user-guide/getting-a-genestack-api-token.md - Getting Access Token (Azure): user-guide/doc-odm-user-guide/getting-access-token-azure.md - Supported File Formats: user-guide/doc-odm-user-guide/supported-formats.md + - Working with Single Cell Data: user-guide/doc-odm-user-guide/single-cell.md - Attachments transformation: user-guide/doc-odm-user-guide/attachment-transformation.md - Access Control: - Users: access-control/users.md