This repository contains a collection of scripts, functions, and exercises developed during my progress through the Genomic Data Science Specialization and the Bioinformatics I Honours Track Certification. Furthermore, my solutions to homework from the rosalind bioinformatics platform has also been uploaded. The goal of this work is to build practical proficiency in implementing and debugging genomic algorithms used in computational biology.
To efficiently apply fundamental mathematical concepts including combinatorics, set theory and probability to tackle problems in bioinformatics.
To gain hands-on experience with core algorithmic techniques in bioinformatics, including sequence analysis, pattern matching, and motif discovery, as part of a structured genomics specialization program.
GC content and parsing FASTA
Transcription and reverse complement
Hamming Distance computation for DNA sequence comparison
k-mer Clump Finding to locate high-frequency regions (e.g., origins of replication)
Boyer-Moore Pattern Matching Algorithm for efficient string search
Motif Detection using the MEME Suite
Use of BioPython for parsing and analyzing biological data
Strengthening Python skills through string manipulation, pattern matching, and data parsing
Understanding biological concepts like transcription, translation, and motifs
Practicing algorithm design with real biological datasets
Maintaining version control discipline by tracking every step in GitHub
Attained the Spidey achievement on the Rosalind bioinformatics platform by solving 64 (2^6) problems
Completed 10 coding-intensive modules in the Genomic Data Science track
Earned the Bioinformatics I and II Honors Track Certificates, demonstrating mastery of both theory and applied coding tasks
Python (3.10+)
BioPython
MEME Suite
SPAdes
QUAST
Jupyter Notebook (for stepwise development)
Extend repository with dynamic programming algorithms (e.g., global/local alignment)
Implement basic genome assembly techniques (e.g., De Bruijn graphs)
Apply algorithms to real sequencing datasets (FASTQ/FASTA) using BioPython pipelines.
Solve advanced Rosalind problems (e.g. Genome Assembly, Dynamic Programming, Phylogeny)
Explore algorithm implementations from Bioinformatics Algorithms by Compeau & Pevzner
Connect solutions to real genomic datasets