Codon_Usage_Clustermap

Overview

This project analyzes codon usage patterns across different genes and species, clustering them based on their codon usage frequencies. By leveraging publicly available nucleotide sequences, the study provides insights into codon preferences and their potential evolutionary relationships. The visualization of codon usage similarities helps to identify patterns across species and their genetic code adaptation.

Purpose

The primary goal of this project is to:

Compare codon usage across different species by analyzing their coding sequences (CDS).
Identify optimal codons for each amino acid, which could provide insights into translational efficiency.
Cluster species based on their codon usage to examine evolutionary and functional relationships.
Visualize codon usage trends using heatmaps and hierarchical clustering.
Provide a foundation for further research in gene expression, protein synthesis efficiency, and evolutionary biology.

Tools & Libraries Used

Python
BioPython (Bio.Entrez, Bio.SeqIO, Bio.Seq.CodonTable) – For fetching sequence data from NCBI and extracting CDS.
Pandas – For data manipulation and analysis.
Seaborn – For heatmap and clustering visualizations.
Matplotlib – For graphical representation.

Implementation

Fetch genetic sequences: The script retrieves coding sequences (CDS) from GenBank using user-provided accession numbers.
Extract codons: The CDS is parsed, and codon frequencies are computed using a selected genetic code table.
Compute codon usage bias: The relative frequency of each codon per amino acid is calculated.
Identify optimal codons: The most frequently used codons per amino acid are determined.
Generate heatmaps and clustermaps: Codon usage patterns are visualized across multiple species.

Example Usage

Run the script and enter GenBank accession numbers when prompted.
Choose a genetic code table ID (1-33) for the species of choice.
The script will fetch sequences, compute codon usage, and generate visualizations.

Future Improvements

Include Species Names: Instead of displaying only accession IDs, integrating species names will make the results more interpretable.
Incorporate scRNA-seq Data & Gene Expression Analysis:
- Relate codon usage bias to gene expression levels.
- Investigate how codon preference affects mRNA translation efficiency.
- Compare tissue-specific expression levels and codon bias.
Codon Adaptation Index (CAI): Compute CAI to assess how closely a gene follows the optimal codon usage of a species.
Comparative Evolutionary Analysis: Use phylogenetic trees to study the relationship between species based on codon usage similarity.
Machine Learning for Codon Optimization: Train models to predict codon optimization for synthetic biology applications.

Conclusion

This project provides a foundation for codon usage analysis and its implications in evolutionary biology, translational efficiency, and gene expression regulation. Future expansions will allow deeper insights into how codon bias impacts protein synthesis across species and conditions.

References:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
123B_Source_Code.ipynb		123B_Source_Code.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codon_Usage_Clustermap

About

Uh oh!

Releases

Packages

Languages

emrunali/Codon_Usage_Clustermap

Folders and files

Latest commit

History

Repository files navigation

Codon_Usage_Clustermap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages