User guide for working with Cell metadata and expression #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

genestack-okunitsyn merged 23 commits into develop from feature/single-cell-guide

Feb 5, 2026

docs/tools/odm-sdk/terminal/study/uploading-study.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -34,6 +34,7 @@ odm-import-data -h @@
         - `-sm,--samples`: URL of the samples file or accession of existing samples file to be linked
         - `-lb, --libraries`: URL of the libraries file or accession of existing libraries file to be linked
         - `-pr, --preparations`: URL of hosted preparations file or accession of existing preparations file to be linked
+        - `-c, --cell`: URL of hosted cell metadata file or accession of existing cell file to be linked
         - `-e,--expression`: URL of any tabular data file (not only expression data) except Gene Variant or Flow Cytometry
         - `-em,--expression-metadata`: URL of any tabular metadata file (not only expression data) except Gene Variant or Flow Cytometry
         - `-v, --variant`: URL of the variants data file
@@ Expand Down Expand Up / @@ -73,7 +74,7 @@ Additional optional parameters: @@
     ## Data model
-    The script supports 2 data models:
+    The script supports several data models:
     ![Data Model](uploading-study/data-model.png)
     - Study - Samples - Omics data:
@@ Expand All / @@ -82,6 +83,9 @@ The script supports 2 data models: @@
         - the script uses this data model if parameters for libraries or preparations loading are specified;
         - omics data can be linked only to libraries or preparations;
         - only expression data (the parameters --expression and --expression-metadata) is supported.
+    - Study - Samples - (optional: Libraries/Preparations) - Cell metadata - Omics data:
+        - the script uses this data model if parameter for cell metadata loading is specified;
+        - expression data can be linked to cell metadata;
     The script works sequentially, linking the object with the previous one according to the data model. Below you can find
     examples to demonstrate different combinations:
@@ Expand Down Expand Up / @@ -141,6 +145,25 @@ odm-import-data --token [token] -H [HOST] \ @@
     - `preparations_1` will be linked to `samples_2`
     - `expression_1` will be linked to `preparations_1`
+    ### _Example 4_
+    ```shell
+    odm-import-data --token [token] -H [HOST] \
+      --study http://data_source/study.csv \
+      --samples http://data_source/samples_1.csv \
+      --samples http://data_source/samples_2.csv \
+      --libraries http://data_source/libraries_1.csv \
+      --cell http://data_source/cell_1.csv \
+      --expression http://data_source/expression_1.gct \
+      --expression-metadata http://data_source/expression_metadata_1.gct.tsv
+    ```
+    - `samples_1` will be linked to `study`
+    - `samples_2` will be linked to `study`
+    - `libraries_1` will be linked to `samples_2`
+    - `cell_1` will be linked to `libraries_1`
+    - `expression_1` will be linked to `cell_1`
     ## Link all to all
     The `-lata` parameter allows to bypass the restriction of sequential linking of objects. The behaviour of the script
@@ Expand Down Expand Up / @@ -355,3 +378,31 @@ odm-import-data --token [token] -H [HOST] \ @@
       --samples http://data_source/arabidopsis_sample_metadata_uncurated.tsv \
       --expression http://data_source/arabidopsis.gct
     ```
+    ### Study with single cell data
+    For working with Cell metadata and Cell expression use the following example files:
+    - [Study_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/study_metadata.tsv), a tab-delimited file of the study attributes
+    - [Samples_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/samples.tsv), a tab-delimited file of sample attributes
+    - [Cell_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv), a tab-delimited file of cell attributes
+    - [Cell_expression](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv), a tab-delimited file of cell expression data
+    Run the script with the above by typing the following (inserting your token
+    instead of [token], note you may need to escape or quote strings depending on
+    your specific command line interface):
+    Script example (Study → Samples → Cells → Expression)
+    ```default
+    odm-import-data \
+    --server <HOST> \
+    --token <TOKEN> \
+    --study 's3://bio-test-data/User_guide_test_data/Single_cell_data/study_metadata.tsv' \
+    --samples 's3://bio-test-data/User_guide_test_data/Single_cell_data/samples.tsv' \
+    --cells 's3://bio-test-data/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv' \
+    --expression 's3://bio-test-data/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv' \
+    --data-class 'Single-cell transcriptomics' \
+    --number-of-feature-attributes 1 \
+    --allow-duplicates
+    ```

docs/tools/odm-sdk/terminal/study/uploading-study/data-model.png

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

...ide/doc-odm-user-guide/doc-odm-user-guide/images/data-model+metainfo-editor.png

Binary file not shown.

docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data-model.png

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

docs/user-guide/doc-odm-user-guide/doc-odm-user-guide/images/data_model.png

Binary file not shown.

docs/user-guide/doc-odm-user-guide/import-data-using-api.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -19,11 +19,12 @@ You can import studies, samples, and any data in the tabular format:
  
    - **Study**: the context of an experiment, such as the aim and statistical design.

    - **Sample**: the biological attributes of a sample, such as tissue, disease, and treatment.

    - **Data**: Includes transcriptomics, proteomics, gene variant, flow cytometry data, and more. You can import the metadata (e.g. genome version, normalization

      method, and the locations of raw/processed data in your storage) together with the processed data (e.g. expression counts, genotypes).

    - **Cross-reference mapping**: a list of transcript and gene ids and how they map to each other.

    - **Libraries metadata**: TSV file describing sequencing libraries or other indexable data types. It includes information on library preparation, type (e.g., single-end or paired-end), protocol, barcodes, and platform.

    - **Preparations metadata**: metadata describing how samples were prepared prior to data generation, applicable to proteomics, transcriptomics, and other data types.

    - **Cell metadata**: all the information stored per cell (per barcode) that describes that cell and its context, separate from the actual molecular measurements (like the gene expression counts matrix which should be uploaded as expression within the ODM)

    - **Data**: Includes transcriptomics, proteomics, gene variant, flow cytometry data, cell expression, and more. You can import the metadata (e.g. genome version, normalization

      method, and the locations of raw/processed data in your storage) together with the processed data (e.g. expression counts, genotypes).

    - **Cross-reference mapping**: a list of transcript and gene ids and how they map to each other.

    - **Attached Files**: Supplement your study by attaching related research materials like PDF, XLSX, DOCX, PPTX files, images, and more. Please note, contents of these attached files won't be indexed or made searchable.

    Once imported, studies, samples, and data metadata will be queryable and editable from both the User Interface and APIs, whilst the signal data will only be queryable via APIs.

    @@ -35,17 +36,18 @@ Importing data has two stages. First, you import studies, samples, and data sepa
  
    The **Sample Source ID** is used as the default linking key. You can choose another attribute from the template for linking data to samples. The data model and how it looks in the User Interface is shown below.

    In addition to core data types, **Libraries** and **Preparations** require special handling. These files must include the **Sample Source ID**, which is used to link them to the appropriate samples. 

    In addition to core data types, **Libraries**, **Preparations**, **Cell metadata** require special handling. These files must include the **Sample Source ID**, which is used to link them to the appropriate samples. 

    The correct order of linking follows the system logic and available endpoints:

    - **Samples** are linked to a **Study**

    - **Libraries** and **Preparations** are linked to **Samples**

    - **Omics data** (e.g. transcriptomics, proteomics) are linked to **Samples**, or to **Libraries/Preparations** depending on the data type

    - **Cell metadata** is linked to **Samples** or **Libraries** or **Preparations**

    - **Omics data** (e.g. transcriptomics, proteomics, cell expression) are linked to **Samples**, or to **Libraries/Preparations**, or to **Cell metadata** depending on the data type

    - **Attached files** are linked directly to a **Study**

    ![image](doc-odm-user-guide/images/data-model+metainfo-editor.png)

    ![image](doc-odm-user-guide/images/data-model.png)

    ## Data Loading via APIs

    To load the data via APIs each entity is created via a separate endpoint specific for

    this data type. Then they are sequentially linked in the Integration layer.

    @@ -271,6 +273,51 @@ As soon as the import process will be completed, you will be able to get the pre
  
      }

    }

    ```

    ### Import Cell metadata

    For working with Cell metadata and Cell expression use the following example files:

    - [Study_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/study_metadata.tsv), a tab-delimited file of the study attributes

    - [Samples_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/samples.tsv), a tab-delimited file of sample attributes

    - [Cell_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv), a tab-delimited file of cell attributes

    - [Cell_expression](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv), a tab-delimited file of cell expression data

    To import Cell metadata, you will need to use `POST /api/v1/jobs/import/cells` endpoint:

    ```default

    curl -X 'POST' \

      'https://<HOST>/api/v1/jobs/import/cells?allow_dups=false' \

      -H 'accept: application/json' \

      -H 'Genestack-API-Token: <TOKEN>' \

      -H 'Content-Type: application/json' \

      -d '{

      "dataLink": "https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv"

    ```

    Similar to the previous step, you should see the **jobExecId** in the response:

    ```json

    {

      "jobExecId": 24,

      "startedBy": "job@genestack.com",

      "jobName": "IMPORT_CELLS",

      "status": "COMPLETED",

      "createTime": "2026-02-05 11:35:36",

      "endTime": "2026-02-05 11:35:38"

    }

    ```

    As soon as the import process will be completed, you will be able to get the Cell metadata **groupAccession** by querying the **jobExecId** in `GET /api/v1/jobs/{jobExecId}/output` endpoint:

    ```json

    {

      "status": "COMPLETED",

      "result": {

        "groupAccession": "GSF016786"

      }

    }

    ```

    ### Linking entities

    #### Samples to Study

    @@ -328,6 +375,31 @@ If successful you will see a preparation tab appear in the Metadata Editor:
  
    ![image](doc-odm-user-guide/images/preparation-added.png)

    #### Cell metadata to Samples/Libraries/Preparations

    You can link the **Cell metadata group** to the **samples/libraries/preparation groups** using the endpoints:

    * Link to Samples

        **Path:** POST `/api/v1/as-curator/integration/link/cell/group/{sourceId}/to/sample/group/{targetId}`

    * Link to Libraries

        **Path:** POST `/api/v1/as-curator/integration/link/cells/group/{sourceId}/to/library/group/{targetId}`

    * Link to Preparations

        **Path:** POST `/api/v1/as-curator/integration/link/cells/group/{sourceId}/to/preparation/group/{targetId}`

    For `sourceId` field provide accession of your Cell metadata group.

    For `targetId` field provide accession of selected Sample, Library, or Preparation group where Cell metadata should be linked.

    Cell metadata will be linked if there are matches between `batch` values in Cell metadata and `Sample Source ID` for Samples,

    `Library ID` for Libraries, and `Preparation ID` for Preparations.

    If successful you will find the Cells via `GET /api/v1/as-curator/omics/cells` API endpoint when Study accession is provided for `studyQuery` parameter.

    ### Working with the jobExecId

    The following endpoints allow you to manage and inspect jobs using the jobExecId, which is returned after initiating an asynchronous import task.

    @@ -937,4 +1009,3 @@ Example response:
  
      ]

    }

    ```

docs/user-guide/doc-odm-user-guide/import-data-using-python-script.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -8,7 +8,7 @@ to be able to import and edit data in ODM.
  
    Read the full list of requirements [here](../../../tools/odm-sdk/terminal/study/uploading-study/#requirements)

    ## Optional experimental (signal) data files

    ## Optional files

    You can optionally also provide:

    @@ -17,7 +17,8 @@ You can optionally also provide:
  
    - The server address if you want to apply the script to a different ODM server.

      Use `--host <HOST>` to specify.

    - Any data in the Tabular format (Data Frame) as a TSV, hosted at an HTTPS web address

    - Gene expression data in [GCT](https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29) format, hosted at an HTTPS web address

    - Gene expression data in [GCT](https://docs.gsea-msigdb.org/#GSEA/Data_Formats/#gct-gene-cluster-text-file-format-gct) format, hosted at an HTTPS web address

    - Gene expression or Cell expression data in TSV format, hosted at an HTTPS web address

    - Gene expression metadata in TSV format, hosted at an HTTPS web address

    - Gene variant data in [VCF](https://samtools.github.io/hts-specs/VCFv4.2.pdf) format, hosted at an HTTPS web address

    - Gene variant metadata in TSV format, hosted at an HTTPS web address

    @@ -29,9 +30,10 @@ You can optionally also provide:
  
    - A libraries file in TSV format, hosted at an HTTPS web address, or the

      accession of an existing library file

    - A preparations file in TSV format, hosted at an HTTPS web address, or the

      accession of an existing preparations file.

      accession of an existing preparations file

    - A Cell metadata file in TSV format, hosted at an HTTPS web address

    Once imported, studies, samples, and signal metadata will be queryable and

    Once imported, studies, samples, libraries, preparations, cells metadata, and signal metadata will be queryable and

    editable from both the User Interface and APIs, whilst the signal data will

    only queryable via APIs.

    @@ -87,6 +89,11 @@ Optionally include data files by appending any or all of the following to the ab
  
    ```default

    --preparations [URL]

    ```

    ```default

    --cell [URL]

    ```

    ## Importing Multiple Tabular Files

    - [Test_basic_generic_expression.tsv](https://bio-test-data.s3.us-east-1.amazonaws.com/odm/user-guide/Test_basic_generic_expression.tsv), a tab-separated file containing tabular expression data with two text features and two numeric features, followed by expression values for four samples.

    @@ -156,16 +163,38 @@ accessions must be supplied. See the example below:
  
    The following are some example files to illustrate file formats:

    - [Test_1000g.study.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.study.tsv), a tab-delimited file of the study attributes

    - [Test_1000g.samples.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.samples.tsv), a tab-delimited file of sample attributes.

    - [Test_1000g.gct](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct), a [GCT](https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29) file of expression data from multiple sequencing runs

    - [Test_1000g.samples.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.samples.tsv), a tab-delimited file of sample attributes

    - [Test_1000g.gct](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct), a [GCT](https://docs.gsea-msigdb.org/#GSEA/Data_Formats/#gct-gene-cluster-text-file-format-gct) file of expression data from multiple sequencing runs

    - [Test_1000g.gct.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct.tsv), a tab-separated file that describes the expression data

    - [Test_1000g.vcf](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.vcf), a [VCF](https://samtools.github.io/hts-specs/VCFv4.2.pdf) file of variant data from multiple sequencing runs

    - [Test_1000g.vcf.tsv](https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.vcf.tsv), a tab-separated file that describes the variant data

    For working with Cell metadata and Cell expression use the following example files:

    - [Study_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/study_metadata.tsv), a tab-delimited file of the study attributes

    - [Samples_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/samples.tsv), a tab-delimited file of sample attributes

    - [Cell_metadata](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv), a tab-delimited file of cell attributes

    - [Cell_expression](https://bio-test-data.s3.us-east-1.amazonaws.com/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv), a tab-delimited file of cell expression data

    Run the script with the above by typing the following (inserting your token

    instead of [token], note you may need to escape or quote strings depending on

    your specific command line interface):

    ```default

    odm-import-data --token [token] --host [HOST] --study https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.study.tsv --samples https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.samples.tsv --expression https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct --expression_metadata https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.gct.tsv --variant https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.vcf --variant_metadata https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.vcf.tsv

    ```

    Script example (Study → Samples → Cells → Expression)

    ```default

    odm-import-data \

    --server <HOST> \

    --token <TOKEN> \

    --study 's3://bio-test-data/User_guide_test_data/Single_cell_data/study_metadata.tsv' \

    --samples 's3://bio-test-data/User_guide_test_data/Single_cell_data/samples.tsv' \

    --cells 's3://bio-test-data/User_guide_test_data/Single_cell_data/cells_2_samples_full_match.tsv' \

    --expression 's3://bio-test-data/User_guide_test_data/Single_cell_data/expression_2_cells_linked_to_samples.tsv' \

    --data-class 'Single-cell transcriptomics' \

    --number-of-feature-attributes 1 \

    --allow-duplicates

    ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User guide for working with Cell metadata and expression #177

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!