Hi there!
The purpose of this repository is to highlight practical, data-driven approaches to solving real problems in bioinformatics. It includes notebooks focused on dimensionality reduction, semantic modeling, and knowledge management in scientific support environments.
This project was developed through hands-on experience in biotech technical support and bioinformatics coursework. In these fields, real-world data often lacks clear structure or annotations. The project demonstrates how classical and modern methods (including vector embeddings, unsupervised learning, and text mining) can be rigorously applied to biological and operational datasets.
See how informatics can accelerate decision-making in biotech!
📂 Repository Contents
KBA_UMAP.ipynb Generates UMAP to turn a large set of text-based support articles into a visual map that shows which topics are similar or related, helping you identify natural clusters and themes within your knowledge base. Identify thematic overlaps, knowledge gaps, etc.
KiA_algo2.ipynb Prototype for a semantic NLP model designed to recommend support resources by understanding natural language problem statements.
⚙️ Environment & Dependencies
python >= 3.10 jupyter numpy pandas scikit-learn umap-learn transformers matplotlib
To install dependencies:
pip install -r requirements.txt
To get started:
Clone the repository
git clone https://github.com/kesterlyn-wilson/applied-bioinformatics.git
cd applied-bioinformatics
Launch Jupyter Notebook
jupyter notebook
Explore the Notebooks Open any .ipynb file to walk through the analysis interactively.
👩🏽🔬 Kesterlyn Wilson, M.S.
Bioinformatics Scientist & Technical Applications Specialist
📍 Boston, MA