Reference Genome instruction by MikhailAf · Pull Request #112 · genestack/user-docs

MikhailAf · 2025-04-03T13:57:12Z

docs/user-guide/doc-odm-user-guide/variants.md

eeliane · 2025-07-24T12:44:43Z

docs/user-guide/doc-odm-user-guide/variants.md

+
+Before importing a new reference genome, users are encouraged to **check which reference genomes are already available** in the system. This helps avoid duplication and ensures consistency across datasets. Users can:
+
+* Browse existing reference genomes in the **File Browser** (under the Reference Genomes category), or


File Browser -> File Manager

eeliane · 2025-07-24T12:45:59Z

docs/user-guide/doc-odm-user-guide/variants.md

+
+### **Required File Format**
+
+If the reference genome needed is not listed in the **File Browser** or returned by the `GET /api/v1/reference-genomes` endpoint, users can import a custom reference genome into ODM to support their dataset.


File Browser -> File Manager

eeliane · 2025-07-24T12:50:44Z

docs/user-guide/doc-odm-user-guide/variants.md

+
+This response confirms successful import and provides a unique **accession ID**.
+
+The newly imported reference genome is now available in ODM and visible in the File Manager.


I believe we have to also mention that this imported reference genome will be available only after successful initialisation, otherwise it will be useless

eeliane · 2025-07-24T12:53:37Z

docs/user-guide/doc-odm-user-guide/variants.md

+
+### **Preparing Metadata**
+
+To upload VCF files, you must also provide a metadata file in TSV (tab-separated values) format. This file should include at least the following fields:


you must also provide a metadata file in TSV

Let's maybe mention that it's needed in this particular case to mention reference genome information. I'm a bit confused and I'd like to add more details here because in general metadata file can be skipped

This file should include at least the following fields:

Why? No, to mention what reference genome should be used user have to have in the metadata file either attribute Genome Version or attribute genestack.bio:organism

eeliane · 2025-07-24T12:59:30Z

docs/user-guide/doc-odm-user-guide/variants.md

+
+To upload VCF files, you must also provide a metadata file in TSV (tab-separated values) format. This file should include at least the following fields:
+
+* **Genome Version**: The exact name of the reference genome as it appears in ODM


Genome Version may contain one of two variables:

assembly: in case of multiple releases (for example, 100 and 109) a link with the latest (109) release will be created;
OR

name: a link with the exact release will be created

eeliane · 2025-07-24T13:01:26Z

docs/user-guide/doc-odm-user-guide/variants.md

+
+Additional optional fields, such as **Version**, **Accession**, or **User**, may also be included and will not interfere with the upload. The system is flexible and accepts metadata files with varying numbers of columns.
+
+!!! note "Metadata file examples"


I'm confused about these examples and suggest to use information from this old article https://genestack.atlassian.net/wiki/spaces/~940367389/pages/3417047043/Working+with+Reference+genomes+version+ODM+1.53#Examples to show what options user has and how they can be used. Amount of other columns is unnecessary information when we tell about reference genomes.

eeliane · 2025-07-24T13:04:52Z

docs/user-guide/doc-odm-user-guide/variants.md

+
+As with other data types, the request should include:
+
+* A **metadata file** with information about the reference genome and organism


and organism

can be removed, it's necessary to provide information about organism

eeliane · 2025-07-24T13:06:17Z

docs/user-guide/doc-odm-user-guide/variants.md

+
+* A **metadata file** with information about the reference genome and organism
+* A **VCF file** compressed **.vcf.gz** or plain **.vcf** (See example of a [VCF file](https:///s3.amazonaws.com/bio-test-data/gVCF_Mm_Demo.vcf))
+* A **link structure** connecting the data to samples, libraries, or preparations


Why? we don't provide this information in the body of job endpoint to import vcf data

eeliane · 2025-07-24T13:07:09Z

docs/user-guide/doc-odm-user-guide/variants.md

+* A **link structure** connecting the data to samples, libraries, or preparations
+
+!!! note "Important" 
+    Unlike transcriptomics or flow cytometry data, **a reference genome must be specified** when importing VCF files. If no metadata is provided, the system defaults to using the **human reference genome (GRCh38)**. To use a different genome, you must include a metadata file where the **Genome Version** matches the name of a **previously imported custom reference genome** in your ODM instance.


"must be specified" -> "can be specified"? Maybe for our client is okay to use default reference genome

eeliane · 2025-07-24T13:10:00Z

docs/user-guide/doc-odm-user-guide/variants.md

+* **organism**, **assembly**, **release**: Core genome attributes
+* **annotationUrl**: Link to the annotation file used (e.g., GTF from Ensembl)
+* **genestack:accession**: ODM accession for the reference genome
+* **initializationStatus**: Should be COMPLETE if the genome is ready for use


Should be COMPLETE if the genome is ready for use

If I remember correctly...user won't be able to import vcf file with the metadata file where unsuccessfully initialised reference genome is mentioned

Reference Genome instruction

c934f68

MikhailAf requested a review from eeliane April 3, 2025 13:57

MikhailAf requested review from a team as code owners April 3, 2025 13:57

MikhailAf requested a review from a team April 3, 2025 13:57

eeliane reviewed Apr 4, 2025

View reviewed changes

docs/user-guide/doc-odm-user-guide/variants.md Outdated Show resolved Hide resolved

eeliane and others added 2 commits April 17, 2025 12:31

Merge branch 'develop' into feature/guide-reference-genome

08a701d

Merge branch 'develop' into feature/guide-reference-genome

cd0d53b

MikhailAf marked this pull request as draft May 30, 2025 12:30

Updated details

0c24517

srz11d requested review from MariaBorodaenko and eeliane June 2, 2025 15:55

MikhailAf added 4 commits June 3, 2025 12:54

Supported file formats link fixed

07551dd

Merge branch 'develop' into feature/guide-reference-genome

6117315

All linkes are fixed

ab1f263

Merge branch 'develop' into feature/guide-reference-genome

a76f8d1

MikhailAf marked this pull request as ready for review July 20, 2025 16:30

MikhailAf requested review from a team as code owners July 20, 2025 16:30

eeliane requested changes Jul 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference Genome instruction#112

Reference Genome instruction#112
MikhailAf wants to merge 8 commits intodevelopfrom
feature/guide-reference-genome

MikhailAf commented Apr 3, 2025

Uh oh!

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

eeliane Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		Before importing a new reference genome, users are encouraged to check which reference genomes are already available in the system. This helps avoid duplication and ensures consistency across datasets. Users can:

		* Browse existing reference genomes in the File Browser (under the Reference Genomes category), or


		### Required File Format

		If the reference genome needed is not listed in the File Browser or returned by the `GET /api/v1/reference-genomes` endpoint, users can import a custom reference genome into ODM to support their dataset.


		This response confirms successful import and provides a unique accession ID.

		The newly imported reference genome is now available in ODM and visible in the File Manager.


		### Preparing Metadata

		To upload VCF files, you must also provide a metadata file in TSV (tab-separated values) format. This file should include at least the following fields:


		To upload VCF files, you must also provide a metadata file in TSV (tab-separated values) format. This file should include at least the following fields:

		* Genome Version: The exact name of the reference genome as it appears in ODM


		Additional optional fields, such as Version, Accession, or User, may also be included and will not interfere with the upload. The system is flexible and accepts metadata files with varying numbers of columns.

		!!! note "Metadata file examples"


		As with other data types, the request should include:

		* A metadata file with information about the reference genome and organism

Conversation

MikhailAf commented Apr 3, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants