Skip to content

EmmaImole/BWA-gene-mapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

BWA-gene-mapping

Mapping genome sequence to a reference genome on Eddie using bwa software package

Why do we need to map the sequence data

Mapping the sequenced genome of the study to the reference genome provides insights into the structural variants which is critical for understanding the evolution process. The reference genome features the chromosomes and position where the sequenced data originated. Mapping is a process of matching the sequence data to the specific chromosome. This provides a clear understanding of which region and gene a read belongs to, the exact chromosomes, and discovers where there are repetitive regions. Another significant aspect of mapping is that it provides a clear understanding of structural variations. In addition, it is important to align our sequenced data to the reference genome for variant calling using tools such as samtools, GATK, and others which is vital for estimating the demographic model in this study. Moving on now there are several tools available for mapping, many studies have utilized bwa to index and map the sequence data to reference genomes. This present study utilizes Burrow Wheeler Analysis (BWA) to index the reference genome of Atlantic salmon and map the sequence data to estimate the effective population size of Atlantic salmon to the reference genome. The BWA software package process is classified into bwa index and bwa mem.

Data source

The sequence data used for this study is a paired-end read of an Atlantic salmon from North America: https://www.ebi.ac.uk/ena/browser/view/SRR28213514. And this was mapped to a reference genome from this source: https://www.ebi.ac.uk/ena/browser/view/GCA_905237065.2

bwa index

Indexing data makes it easier and safe to align the sequence data to a large reference genome structure. This reduce the time it takes to search through the whole genome every time it has to align the sequenced data. Bwa index provides an efficient means of aligning the sequence data to the reference genome. The BWA index runs generates extensions like .amb, .ann, .bwt, .pac. and .sa files required for efficient alignment. The process entails reconstruction of FASTA file into the Burrow-Wheeler Transform (BWT) related files, thus the genome sequence was converted to a compressed format that optimized the searching process and enabled efficiency.

bwa mem

The BWA mem runs make use of the original ref. Genome fasta file because it’s the only one with the actual sequence file but retrieves the associated index files from the directory to generate an alignment file during the mapping process. The mapping process is particularly essential to population genetic study and provides insight into the sequence data of the study. This process produced a Sequence Alignment/Map (SAM) file format which is very large as an output.

About

Mapping genome sequence to a reference genome on Eddie using bwa software package

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages