BWA-gene-mapping

Mapping genome sequence to a reference genome on Eddie using bwa software package

Why do we need to map the sequence data

Mapping the sequenced genome of the study to the reference genome provides insights into the structural variants which is critical for understanding the evolution process. The reference genome features the chromosomes and position where the sequenced data originated. Mapping is a process of matching the sequence data to the specific chromosome. This provides a clear understanding of which region and gene a read belongs to, the exact chromosomes, and discovers where there are repetitive regions. Another significant aspect of mapping is that it provides a clear understanding of structural variations. In addition, it is important to align our sequenced data to the reference genome for variant calling using tools such as samtools, GATK, and others which is vital for estimating the demographic model in this study. Moving on now there are several tools available for mapping, many studies have utilized bwa to index and map the sequence data to reference genomes. This present study utilizes Burrow Wheeler Analysis (BWA) to index the reference genome of Atlantic salmon and map the sequence data to estimate the effective population size of Atlantic salmon to the reference genome. The BWA software package process is classified into bwa index and bwa mem.

Data source

The sequence data used for this study is a paired-end read of an Atlantic salmon from North America: https://www.ebi.ac.uk/ena/browser/view/SRR28213514. And this was mapped to a reference genome from this source: https://www.ebi.ac.uk/ena/browser/view/GCA_905237065.2

bwa index

Indexing data makes it easier and safe to align the sequence data to a large reference genome structure. This reduce the time it takes to search through the whole genome every time it has to align the sequenced data. Bwa index provides an efficient means of aligning the sequence data to the reference genome. The BWA index runs generates extensions like .amb, .ann, .bwt, .pac. and .sa files required for efficient alignment. The process entails reconstruction of FASTA file into the Burrow-Wheeler Transform (BWT) related files, thus the genome sequence was converted to a compressed format that optimized the searching process and enabled efficiency.

bwa mem

The BWA mem runs make use of the original ref. Genome fasta file because it’s the only one with the actual sequence file but retrieves the associated index files from the directory to generate an alignment file during the mapping process. The mapping process is particularly essential to population genetic study and provides insight into the sequence data of the study. This process produced a Sequence Alignment/Map (SAM) file format which is very large as an output.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
bwa index		bwa index

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BWA-gene-mapping

Why do we need to map the sequence data

Data source

bwa index

bwa mem

About

Uh oh!

Releases

Packages

Languages

EmmaImole/BWA-gene-mapping

Folders and files

Latest commit

History

Repository files navigation

BWA-gene-mapping

Why do we need to map the sequence data

Data source

bwa index

bwa mem

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages